# Introduction

ccurate and error-free of diagnosis and treatment given to patients has been a major issue highlighted in medical service nowadays. Quality service in health care field implies diagnosing patients correctly and administering treatments that are effective [11].Hospitals can also minimize the cost of clinical tests by employing appropriate computer-based information and/or decision support systems. Most hospitals today use some sort of hospital information systems to manage their healthcare or patient data [10]. These systems generate huge amounts of data which take the form of numbers, text, charts and images.

Data mining is the process of extracting hidden patterns from large data sets. Data mining is a searching process done automatically for hidden patterns present in a large database [2]. Data mining is an iterative process. Its progress is defined by discovery, through either automatic or manual methods. Data mining is reflected in its wide range of methodologies and techniques [8]. These techniques can be applied to a connection of problem sets. Classification deals in generating rules that partition the data into disjoint groups. Classification is a data mining (machine learning) technique used to predict group membership for data instances [4]. The goal of the classification is to assign a class to find previously unseen records as accurately as possible. Classification process consists of training set that are analyzed by a classification algorithms and the classifier or learner model is represented in the form of classification rules [9].

There are various kinds of classification method including decision tree induction, Bayesian networks, knearest neighbor classifier, case-based reasoning, genetic algorithm and fuzzy logic techniques. Systems that construct classifiers are one of the commonly used tools in data mining. Such systems take as input a collection of cases, each belonging to one of a small number of classes and described by its values for a fixed set of attributes, and output a classifier that can accurately predict the class to which a new case belongs [7].

Our goal is to use the publicly available dataset heart disease, and use PART and K-Means data mining algorithms to predict about heart disease, analyses the results and use the rules generated by these algorithms for further predictions. The rest of this paper is organized as following. Section II provides a review of literature. The problem definition is given in Section III. Subsequently, our proposed approach is discussed in Section IV. The experimental results are given in Section V. Finally, Section VI gives the conclusion and future work.


# II.


# Related Works

A classification rule or classifier is a function that can be evaluated for any possible value specifically given the data it will yield a similar classification. In a binary classification, the elements that are not correctly classified are named false positives and false negatives [12]. Some classification rules are static functions. There are various classification rule algorithms namely OneR, Ridor, Conjuctive Rule etc. There are two types in extracting classification rules namely direct method and indirect method. In direct method the rules are extracted from data [5]. In indirect method the rules are extracted from other classification models. The classification rules are also known as if then rules.

In [1], the author proposed enhanced K-Means clustering algorithm for predicting coronary heart disease. There are two strategies are used for enhancing K-means clustering algorithm. First the author proposed weighted ranking algorithm to overcome the problem of random selection of initial centroids. Second the attributes associated with weights concerned by the physicians are taken into account in both ranking and the K-means algorithm instead of assigning unit weight to all the attributes. The heart dataset was collected from UCI machine learning repository. Moreover 35 conditions are carried out to assign weights to attributes. This paper describes about the rule based classification algorithm namely Part and Simple K-Means clustering algorithm. In this paper we review about the role of those two algorithms in various concepts.


# III.


# Problem Definition

Given a dataset D, a set of classes C, a set of classification rules R over D through the algorithms K-Means, Part and Part based on K-Means, find the best algorithm using some the performance factors.

IV.


# Proposed System

In the proposed system a clear view of the two algorithms is given. This section discusses a brief description of the two data mining algorithms.


# a) K-Means Clustering Algorithm

Clustering the medical data into small with meaningful data can aid in the discovery of forms by supporting the abstraction of several suitable features from each of the collections thereby introducing party into the data and helping the application of orthodox data mining techniques. The k-means is the simplest, most commonly and good behavior clustering algorithm used in many applications [3,6]. The simplicity is due to the use of squared error as the stopping criteria, which tends to work well with isolated and compact clusters. Its time complexity depends on the number of data points to be clustered and the number of iteration. The K mean algorithm works on the Euclidian Distance Method, is initialized from some random or approximate solution.

K-means groups the data in accord with their individual values into k distinct collections. Data categorized into the identical cluster have a like feature values. K, the positive number representing the number of collections, needs to be delivered in advance. The phases convoluted in a k-means algorithm are given consequently: Prophecy of heart disease using K -Means clustering techniques ? K points denoting the data to be bunched are positioned into the space. These points signify the primary collection centroids.

? The data are consigned to the group that is nearby to the centroids.

? The points of all the K centroids are again calculated as swiftly as all the data are allotted. ? Steps 2 and 3 are repeated until the centroids stop affecting any further. This results in the isolation of data into groups from which the metric to be diminished can be reflected. The preprocessed heart illness data is grouped using the K-means algorithm with the K values. Clustering is a type of multivariate statistical examination also known as cluster analysis, unsupervised classifycation analysis, or numerical taxonomy. K-Means clustering produces a definite number of separate, flat (non-hierarchical) clusters.


# b) Classification Rule Based PART Algorithm

Classification is a concept or process of finding a model which finds the class of unknown objects. It basically maps the data items into one of the some predefined classes. Classification model generate a set of rules based on the features of the data in the training dataset. Further these rules can be use for classification of future unknown data items. Classification is the one of the most important data mining technique. Medical diagnosis is an important application of classification for example; diagnosis of new patients based on their symptoms by using the classification rules about diseases from known cases.

PART stands for Projective Adaptive Resonance Theory. The input for PART algorithm is the vigilance and distance parameters [13].


# i.

Initialization Number m of nodes in F1 layer:=number of dimensions in the input vector. Number m of nodes in F layer: =expected maximum number of clusters that can be formed at each clustering level. Initialize parameters L, ?o, ?h, ?, ?, ?, and e.  


# Experimental Results

The above two algorithms are combined using dataset namely Heart Disease. These dataset are collected from UCI Repository in the website www.ucirepository.com. The heart disease dataset contains 303 instances and 14 selected attributes. Initially dataset contained some fields, in which some value in the records was missing. These were identified and replaced with most appropriate values using ReplaceMissingValues filter from Weka 3.7. This process is known as Data Preprocessing. After pre-processing the data, data mining clustering and classification techniques namely Simple K-Means Clustering and PART were applied.

To measure the stability of the performance of the proposed model the data is divided into training and testing data with 10-fold cross validation. A confusion matrix shows how many instances have been assigned to each class. In our experiment we have two classes or clusters, and therefore we have a 2x2 confusion matrix. The entries of this matrix are used to explain the performance measures. The following charts and figure are based on the combined made of two algorithms namely K-Means and PART for heart disease dataset.

We are evaluating the performance of Simple K-Means algorithm Clustering using the mode of classes to clusters evaluation with the prediction attribute nom. Table 1, Table 2, Table 3 and Table 4 illustrates the confusion matrix of Simple k-means, PART, PART via Simple K-means (Classification via Clustering) and Accuracy of algorithm respectively. Results shows that 169 (56%) records are grouped into cluster 0 and 134 (44%) to cluster 1. Cluster 1 those who have heart disease and cluster 0 has no heart disease.


# Table 1 : Confusion Matrix of K-Means

Actual Class   Table 5 illustrates the number of rules created by PART algorithm without K-Means, PART based on K-Means. Figure 2       In our future work, we have planned to design and develop an efficient heart attack prediction system with Patient Prescription Support using the web mining and data warehouse techniques. New algorithms and techniques are to be developed which overcome the drawbacks of the existing system. In future some privacy preserving technique can be induced for the rule generation in the classification technique. We intend to improve performance of these basic classification techniques by creating Meta model which will be used to predict cardiovascular disease in patients.
1![Fig. 1 : Algorithm for PART V.](image-2.png "Fig. 1 :")
![& 3 illustrates the rules generated by Part and Part with cluster relevant data where class value 0 & cluster value 1 has heart disease.](image-3.png "")
23![Fig. 2 : Generated Rules by PART](image-4.png "Fig. 2 :Fig. 3 :")
455![Fig. 4 : Threshold Curve of PART for Class 1](image-5.png "Fig. 4 :Fig. 5 :Fig. 5 :")
6789![Fig. 6 : Threshold Curve of PART for Class value cluster 0](image-6.png "Fig. 6 :Fig. 7 :Figure 8 & 9 :")
9![Figure 9 : Cost Curve for Class value Cluster1](image-7.png "Figure 9 :")
2Actual Class10Predicted Classa=1 b=0131 3428 110
3Actual ClassbaPredicted Classb=cluster 1 125 a=cluster 0 129 157
4ClassificationTime(seconds) AccuracyTechniques%Simple K-Means0.0280.858PART0.0679.538PART via K-Means0.0293.0693
5Classification Techniques No. of Rules PART 26 PART via Simple K-Means Clustering 11Volume XIV Issue I Version IPredictedClass
			© 2014 Global Journals Inc. (US)
		
		
* 
	
		Enhanced Weighted K-Means Clustering Based Risk Level Prediction for Coronary Heart Disease
		
			RSumathi
		
		
			EKirubakaran
		
	
		European Journal of Scientific Research
		1450-216X
		
			71
			4
			
			2012. 2012
			Euro Journals Publishing, Inc
		
	
* 
	
		Data Mining An Overview
		
			JeffreyWSeifert
		
		
			CRS Report for Congress
		
	
* 
	
		Top 10 algorithms in data mining analysis
		
			XWu
		
	
		Knowl. Inf. Syst
		
			2007
		
	
* 
	
		FURIA: An Algorithm for Unordered Fuzzy Rule Induction
		
			JensHühn
		
		
			EykeHüllermeier
		
		
			Philipps-Universität Marburg, Department of Mathematics and Computer Science
		
	
* 
	
		Classification by Association Rules: The Importance of Minimal Rule Sets
		
			JianyuYang
		
		
			Rutgers
		
		
			New Brunswick, NJ 08903 USA
		
		
			the State University of New Jersey
		
	
* 
	
		Principles of data mining
		
			MBramer
		
		
			2007
			Springer
		
	
* 
	
		Rule-based Text Categorization Using Hierarchical Categories
		
			SasakiMinoru
		
		
			KitaKenji
		
		
			Tokushima, Japan
		
		
			Faculty of Engineering, Tokushima University
		
	
* 
	
		Classifi-cation Learning Using All Rules
		
			MurlikrishnaVishwanathan
		
		
			GeoffreyIWebb
		
	
		Proceedings of the Tenth European Conference on Machine Learning (ECML '98)
				the Tenth European Conference on Machine Learning (ECML '98)
		
			Springer
			
		
* 
	
		A survey of temporal data mining
		
			P SSrivatsan Laxman
		
		
			Sastry
		
		
			2006
			31
			
			India
		
	
* 
	
		Clinical Reminder System: A Relational Database Application for Evidence-Based Medicine Practice
		
			HerbertDiamond
		
		
			MichaelPJohnson
		
		
			RemaPadman
		
		
			KaiZheng
		
	
		INFORMS Spring National Conference
				Salt Lake City, Utah
		
			April 26, 2004
		
	
* 
	
		Web-Based Heart Disease Decision Support System using Data Mining Classification Modeling Techniques
		
			RafiahSellappan Palaniappan
		
		
			Awang
		
	
		Proceedings
		
			2007
		
	
* 
	
		Towards the use of C4.5 algorithm for classifiying banking dataset
		
			VeronicaSMoertini
		
		
			JurusanIlmuKomputer
		
	
		Fakultas Matematika dan Ilmu Pengetahuan Alam universitas
				
			Katolik Parahyangan Bandung
		
	
* 
	
		Projective ART for clustering data sets in high dimensional spaces
		
			YongqiangCao
		
		
			JianhongWu
		
	
		Neural Networks
		
			15
			
			2002
			Elsevier Science Ltd