multidimensional association rule based data mining...

Post on 13-May-2018

221 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Multidimensional Association Rule Based Data Mining Technique for Cattle Health Monitoring

Using Wireless Sensor Network Ankit R. Bhavsar

PhD Scholar, PACIFIC University- pacific Hills Udaipur, Assistant Professor GLSIC, Ellisbridge Ahmedabad,

Gujarat, India -380006. ankit@glsica.org

Harshal A. Arolkar Associate Professor, GLSICT,

Ellisbridge, Ahmedabad, Gujarat, India – 380006. harshal@glsict.org

Abstract – Wireless Sensor Networks produce large amount of data during their lifetime operation. Sometimes this data is in unknown format and bulky. Hence data storage and appropriate mining technique become a critical issue for this domain. Recent innovation in data mining technique receives attention in extracting knowledge from WSNs data. In this paper, we have proposed a Multidimensional Association rule based data mining technique for a cattle health monitoring system based on WSN. We have also given an overview of data mining concept with some selected mining techniques used for WSNs data. We have discussed the different diseases and their symptoms found in cattle. Finally we have proposed a rule based mining technique for identifying the disease based on symptom. Keywords – Association, Cattle, Data Mining, Health, Mining, Multidimensional, Sensor, Wireless Sensor Network.

I. INTRODUCTION Wireless Sensor Networks (WSNs) today are widely used for monitoring physical happening of the environment. The data gathered using WSN is bulky, heterogeneous and distributed. This data is gathered over a long period of time. The cumulative collection over the years makes this collection too huge. Thus, massive data gathering needs an appropriate technique to convert it into productive information. In these circumstances, data mining becomes useful. Data mining is the process of analyzing data from various perspectives and summarizing it into useful information. Continuous innovations in computer processing power, disk storage, and statistical tools are dramatically increasing the accuracy of analysis while driving down the cost. Various data mining techniques are available which can be used in an application to create knowledge based data. In this paper we have suggested a data mining technique to identify the disease based on symptom for cattle health data that is collected using WSN. The section II of the paper introduces Data Mining Concept, section III shows the different data mining techniques, section IV shows the related work, section V identifies the cattle diseases and their symptoms, section VI shows the proposed data mining

technique for the cattle health followed by conclusion in section VII.

II. DATA MINING Data Mining is the process of the exploration and analysis of large quantities of the data in order to discover meaningful patterns and rules [3, 12]. Data mining is a way of learning from past to take better decision for the future. Basically Data mining has two flavors. (1) Directed Data Mining (2) Undirected Data Mining Directed Data mining is used to achieve some predefine target. Undirected Data mining is used to identify or investigate some patterns without specific target. In Data mining data comes in various forms, in many formats and from multiple systems. Data mining process is generally divides into four stages: (1) Identifying the Problem (2) Transforming data into information (3) Taking action (4) Measuring the outcome. Data mining discipline has wide and diverse application sets. Data Mining is used in Business as well as Science sector. Some of the application area of Data Mining is given in [4, 5, 6, 9, 15, 21]. In general data mining is used in medical treatment as well as DNA analysis, in financial or banking data analysis, retail industry, telecommunication and many more.

III. TECHNIQUES IN DATA MINING Several Data mining techniques are available for data analysis. Some of the technique have been described here: [4,5,17]. A. Association Rule Mining Association rule mining or market basket analysis is the processes of analysis of a large database of supermarket transactions with the aim of finding association rules. It involves searching for interesting customer habits by looking at associations. Association rules mining has many applications other than market basket analysis which includes applications in marketing, customer segmentation, medicine, electronic

810978-93-80544-12-0/14/$31.00 c©2014 IEEE

commerce, bioinformatics and finance. Association rules can be classified in the following ways:

• Boolean Association Rule Mining • Quantitative Association Rule Mining • Single Dimensional Association Rule Mining • Multidimensional Association Rule Mining • Single-level Association Rule Mining • Multilevel Association Rule Mining

For Association rule mining Apriori algorithm is used but the algorithm becomes inefficient for applications where the data is very large. Apriori-TID, DHP (Direct Hashing and Pruning), FP-Growth (Frequent Patterns) algorithms are also used for association rule mining.

B. Classification Classification is an important data mining technique that has its origin in machine learning. Classification is the separation of objects into classes. Classification is appropriate to use if the data is known to have a small number of classes when sample data is available. Classification is probably the most widely used data mining technique. One of the most widely used classification technique is the decision tree. The decision tree technique is widely used because it generates easily understandable rules for classifying data. C. Cluster Analysis Cluster analysis is similar to classification with one difference, that cluster analysis is useful when the classes in the data are not known and the sample data is not available. The main aim of cluster analysis is to find groups that are very different from each other in a collection of data. Partition methods, Hierarchical Methods, Density-based methods, Grid-based methods and Model-based methods are the methods for Cluster analyses. One of the most widely used cluster analysis techniques is K-means algorithm of partition methods. D. Web Data Mining The World Wide Web has become an extremely valuable resource for a large number of people all around the world. Since the last decade, the web revolution has had a profound impact on the way of searching and finding information. Millions of web pages are added every day and millions of others are modified or deleted. Web pages are written in a variety of languages and they provide information in a variety of mediums like text, animation, images, photos and sound. Search engines are huge databases of web pages as well as software packages for indexing and retrieving the pages that enable users to find information of interest. Search engines databases update automatically by web crawlers.

IV. RELATED WORK Different data mining techniques have been used in various fields of WSN. This section shows some of the data mining techniques that have been used in different WSN applications. Mohamed Watfa et al. discussed new approach of data mining for WSNs data in [14]. They have presented Energy Efficient Approach (EEIA) based new distributed algorithm for query processing in Wireless Sensor Network. The main goal of the new approach is to reduce in power consumption through reducing the number of query related messages in the whole network. Manisha Rajpoot et al. analyzed the framework of association rule mining for sensor data [11]. They have analyzed three mining technique namely Positional Lexicographic Tree (PLT), Sensor Pattern Tree (SP-Tree) and Frequent Pattern Tree (FP-Tree) to mine the sensor data. They have presented a comparative performance analysis of these techniques based on experimentation. The experimental result demonstrates that the CPU time consumed by PLT is more than that of SP-Tree and less than FP-Tree. Over all they show that FP-Tree consumes initially high CPU time in low support values and SP-Tree consumes considerably less CPU time than PLT. Milica Knezevie et al. gave an overview of selected algorithms for mining in Wireless Sensor Networks and discussed benefits of integration of agent systems and data mining algorithms in [13]. They proposed classification of the existing distributed data mining algorithms for WSNs and possible integration of agent systems with each class of algorithms. Kushboo Sharma et al. have used Nearest Neighbor Classification technique to classify the Wireless Sensor Network data [08]. The primary advantage of this technique is its high classification accuracy. The classification results have demonstrated performing classification accuracy as well as the classification efficiency. On the base of experimental investigation they got classification success rate up to 92.3%. Xu Cheng et al. proposed hierarchical distributed classification approach in [18]. In this approach local classifier are built by individual sensor and merged along the routing path. The classifiers are iteratively enhanced by combining strategically generated pseudo data and new local data. They demonstrate high classification accuracy with very low storage and communication overhead. They address the critical issue of heterogeneous data distribution among the sensors. S. Bandyopadhyay et al. describe the technique for clustering distributed data in sensor networks environment in [16]. They propose the technique is based on the principal of K-means algorithm. Experimental result demonstrates the effectiveness of the K-Means clustering algorithm for the case when the full data is uniformly and non-uniformly distributed over the nodes. M. Halatchev et al. proposed new technique called Window Association Rule Mining (WARM) in [10]. In WSN significant amount of data sent from sensor to processing points which may be corrupted or lost WARM deal with this type of problem. K.K.Loo et al. proposed Interval-List-Based (ILB), online mining algorithm for discovering frequent sensor value sets [7].

2014 International Conference on Computing for Sustainable Global Development (INDIACom) 811

They have compared the performance of ILB against an application of Lossy Counting (LC) using weighted transformation method. Experimental result demonstrates that ILB outperforms LC significantly for large sensor networks.

V. CATTLE HEALTH MONITORING SYSTEM USING WSN

In [1], we have proposed a health monitoring and reporting system that uses WSN architecture. Using this architecture we intended to monitor the health and environmental scenario of animals located in rural area of the State of Gujarat. In [2] we have proposed a distributed data storage model used for WSN based cattle health monitoring. We have divided this model into two levels namely a local level and a central level. The main aim of storing data locally is to get quick response for any query raised by the user. The second level where the data is centralized is used to make long term decision, planning and policy for the cattle health monitoring. Once the data is available, proper analysis of the data should be done. The animal owner plays an important role in this system. He should develop keen eye to spot the ill animal/s. Any change in behavior or appearance of an animal should be identified by the animal owner. Based on the symptoms, the diseases can be identified as contagious or non contagious [19], [20]. In cattle each disease has set of symptoms. The set of symptom represents the disease/s. It is possible that one symptom is part of more than one disease. At the time of illness to identify disease, we should find the set of symptoms. Let us call this set “Symptom Set”. To classify disease as contagious or non contagious we need to identify them first. Assume that we use a coded system to identify the contagious diseases. We allocate code - CDi (Contagious Diseasei) for each contagious disease. Table 1 shows some of the disease with its code.

TABLE I: Codes of Contagious Disease Code CONTAGIOUS DISEASE CD1 Anthrax CD2 Black – Quarter CD3 Haemorrhagic Septicaemia CD4 Mastitis CD5 Cow – Pox CD6 Foot-and-Mouth Disease CD7 Rinder pest CD8 Brucellosis CD9 Tuberculosis

Similar to disease identification we have also coded the symptoms. Table 2 shows some of the symptoms with its code - CSi (Contagious Symptomi).

TABLE II: Code of Symptoms Code SYMPTOMS OF CONTAGIOUS DISEASE CS1 High Body Temperature (Fever) CS2 Pulse Rate Affected CS3 Respiration Affected CS4 Extreme Pain CS5 Red Eyes CS6 Swelling (Enlargement)

Code SYMPTOMS OF CONTAGIOUS DISEASE CS7 Demages udder tissue CS8 saliva hang from mouth CS9 Hemorrhage (Blood Loss) CS10 Icterus (Yellowish Pigmentation on the Skin) CS11 Discharge from eyes and nostrils CS12 Foul odor from mouth (Smell from mouth) CS13 Bloody mucoid CS14 Shivering CS15 Breathing is difficult CS16 Dung is stained with blood CS17 Bloody discharge from mouth CS18 off feed CS19 Unable to Stand (deadly lame) CS20 blood tinge discharge from the vagina CS21 pain in throat CS22 suffocating cough

In cattle each disease is bound to a set of symptoms. Figure 1 shows the relation between symptoms and associated disease. Here each column represents the disease symptom name with code CSi and each row represents the disease name with the code CDi. The value 1 in the intersection of row and column shows that a symptom is present within disease, while 0 indicates that the symptom is not present within disease. For example here the disease Anthrax (CD1) has symptoms such as Fever (CS1), Extreme Pain (CS4), Read Eye (CS5), Shivering (CS14), Breathing Difficulty (CS15), Dung is stained with blood (CS16) and Blood discharge from mouth (CS17).

VI. PROPOSED DATA MINING TECHNIQUE In the scenario of the animal health monitoring using WSN, massive amounts of data is continuously being collected and stored. We propose a Multidimensional Association Rule Mining for such a system. The discovery of interesting association relationships among huge amount of data can help in making decisions, spot new diseases and policy making for monitoring cattle health. Multidimensional Association Rule mining involves more than one dimension. These methods should be organized according to their behavior of quantitative attributes. For Multidimensional Association Rule mining we need to create rules which are based on set of disease symptoms. The syntax of rule is as below:

{Set of Symptoms} { CDi } Here CDi is the name of the contagious disease; Set of Symptoms represents the set of different symptoms. We represent each rule with the code Ri. Syntax of Multidimensional Association Rule mining then can be formed as:

Ri : (Symptom1 = Yes/No) ^ (Symptom2 = Yes/No) ^ ….. ^ ….. CDi OR

Ri : {CSx, CSy,CSz, ……… } CDi To illustrate the concept, we have created four different sample rules to identity the disease. The symptoms that may occur in the animal are as shown in figure 2.

812 2014 International Conference on Computing for Sustainable Global Development (INDIACom)

Fig. 1. Symptom v/s Disease Chart

Fig. 2. Identify Disease Code based on Symptoms Let X = {CS1, CS2, CS3, CS4, ………., CS21, CS22} be a set of binary attributes called symptoms and Y = {CD1,CD2, CD3,…….,CD9} be a set of Diseases. Each disease in Y has a set of symptoms, subset of the attributes in X. A rule is defined as an implication of the form Ri: Xi Yi where Xi X and Yi Y. An example rule to identify the disease CD1 is {CS1, CS4, CS14, CS15, CS16, CS17} {CD1}. Here it indicates that if symptoms seen in cattle are CS1, CS2, CS14, CS15, CS16 and CS17 then the disease should be CD1. Thus,

R1: {CS1, CS4, CS14, CS15, CS16, CS17} {CD1} It can also be represented as:

R1: (CS1=Yes) ^ (CS4=Yes) ^ (CS14=Yes) ^ (CS15=Yes) ^ (CS16=Yes) ^ (CS17=Yes) (CD1)

Similarly, R2: {CS1, CS6, CS7} {CD4} R3: {CS1, CS20} {CD8} R4: {CS19, CS22} {CD9} Thus rule R2, R3 and R4 is used to identify the disease Mastitis (CD4), Brucellosis (CD8) and Tuberculosis (CD9) respectively.

VII. CONCLUSION In this paper, we have presented data mining concept with various data mining technique for WSN. We have also discussed various disease and its symptoms of cattle. We have investigate and suggests data mining technique for identifying cattle disease. Implementation of this technique will help user to take approprate action for identifying cattle disease. We believe that our work presented here will help and inspire the researchers to design more robust and efficient data mining technique to identify disease for cattle health data gethered using WSN.

REFERENCES

[1]. Ankit Bhavsar, Harshal Arolkar, “Wireless Sensor Networks: A possible solution or Animal Health Issues

in Rural Area of Gujarat”, IJECBS, 2012, Vol 2(2), ISSN 2230-8849.

[2]. Ankit Bhavsar, Disha Shah, Harshal Arolkar, “Distributed Data Storage Model for Cattle Health Monitoring Using WSN”, ACSIJ, 2013, Vol 2, Issue 2, ISSN 2322-5157.

[3]. Daniel T. Larose, “Discovering Knowledge in Data: An Introduction to Data Mining”, Willy, 2005, ISDN - 0-471-66657-2.

[4]. G. K. Gupta, “Introduction to Data Minig with Case Studies”, Prentice-Hall Of India Private Limited 2006, ISDN - 81-203-3053-6.

[5]. Jiawei Han, Micheline Kamber, “Data Mining Concept & Techniques”, Morgan Kaufmann Publishers 2006, ISDN - 13: 978- 1-55860-901-3.

[6]. Jyoti Soni, Ujam Ansari, Dipesh Sharma, Sunia soni, “Predictive Data Minig for Medical Diagnosis : An overview of Heart Dieases Prediction”, International Journal of Computer Application, 2011, Vol 17 - No 8, pp. 43-48, ISSN 0975-8887.

[7]. K.K.Loo, I.Tong, B. Kao, D.Chenung, “Online Algorithms for the Mining Inter-Stream Associations from Large Sensor Networks”, Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2005, pp. 291-302.

[8]. Khushboo Sharma, Manisha Rajpoot, Lokesh Kumar Sharma, “Nearest Neighbour Classification for Wireless Sensor Network Data”, International Journal of Computer Trends and Technolgy, 2011, Vol - 2 Issue – 2, pp. 41-43, ISSN 2231-2803.

[9]. [9] krzysztof J. Cios, G.William Moore, “Uniquenss of Medical Data Mining”, Artifical Intelligence in Medicine, Elsevier 2002, Vol – 26, ISSN - 0933-3657.

2014 International Conference on Computing for Sustainable Global Development (INDIACom) 813

[10]. M.Halatchev, L.Gruenwald, “Estimating Missing Vales in Related Sensor Data Stream”, 11th International Conference Management Data, 2005, pp. 83-94.

[11]. Manisha Rajpoot, Lokesh Kumar Sharma, “Comparative

Study of Association Rule Mining for Sensor Data”, International Journal of Computer Application, 2011, vol - 19 no 1, pp. 34-36, ISSN - 0975- 8887.

[12]. [Michael J.A.Berry, Gordon S. Linoff, “Data Mining Techniques”, Wiley 2003 ISDN - 81-265-0517-6.

[13]. Milica Knezevic, Nenad Mitic, Zoran Ognjanovic, Veljko Milutinovic, “Agent Based Data Minig in Wireless Sensor Networks: A Survey”, 2nd International conference on Information Society Technology, 2012

[14]. Mohamed Watfa, William Dahar, Hishma Al Azar, “A Sensor Network Data Aggregration Technique”, International journal of Computer Theory and Engineer, 2009, Vol 1, No 7, pp. 19-26, ISSN - 1793-8201.

[15]. Rafael S. Parpinelli, Heitor S. Lopes, Alex A. Freitas, “An Ant Colony Based System for Data Mining: Application to Medical Data”, 2009.

[16]. Bandyopadhyay, C.Giannella, U.Maulik,H.Kargupta, K.Liu, S.Datta, “Clustering Distributed Data Streams in

Peer-to-Peer Environments”, Information Science, 2006, Vol 176, Issue14, pp. 1952-1985, ISSN - 0020-0255.

[17]. XinongWu, Vipin kumar, j.Ross Quinlan, “Top 10 algorithms in Data Mining”, Springer, 2007, ISBN - 978-0387359755.

[18]. Xu. Cheng, J.Xu, J.pei, J.Liu, “Hierachical Distributed Data Classifcation in Wireless Sensor Network”, Computer Communications, 2010, Vol - 33 Issue – 12, pp. 1404-1413, ISSN - 0140-3664. Web References

[19]. “Cattle Diseases, Animal Husbandry- Cattle: CAS – 6”, http://ebookbrowse.com/ca/cattle-disease.

[20]. “Physical Examination of a Dairy Cow”, www.mosesorganic.org.

[21]. “Data Mining in Health Care: Current Application and Issues”, http://book.download4.org/DATA-MINING-IN-HEALTHCARE-CURRENT-APPLICATIONS-AND-ISSUES-pdf-e311.pdf

814 2014 International Conference on Computing for Sustainable Global Development (INDIACom)

top related