data mining applications in agricultureiasri.res.in/ebook/ebadat/6-other useful techniques/17-data...

24
DATA MINING APPLICATIONS DATA MINING APPLICATIONS IN AGRICULTURE IN AGRICULTURE Prof. Navneet Goyal Department of Computer Science & Information Systems, BITS, Pilani.

Upload: ngothuan

Post on 19-Mar-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

DATA MINING APPLICATIONS DATA MINING APPLICATIONS IN AGRICULTUREIN AGRICULTURE

Prof. Navneet GoyalDepartment of Computer Science & Information Systems, BITS, Pilani.

Page 2: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 22

Agricultural ApplicationsAgricultural Applications

Mushroom GradingMushroom GradingApple Pest Management (PICO)Apple Pest Management (PICO)Apple Proliferation DiseaseApple Proliferation DiseaseSoil SalinitySoil SalinityIntegrated Production in AgricultureIntegrated Production in AgriculturePesticide Abuse Pesticide Abuse Precision AgriculturePrecision AgricultureDrought Risk ManagementDrought Risk Management

Page 3: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 33

Mushroom GradingMushroom Grading

Classification problemClassification problemDevelop a classification system Develop a classification system for quality grading of mushrooms for quality grading of mushrooms achieving an accuracy similar to achieving an accuracy similar to that of human inspectorsthat of human inspectorsDetails can be found in Details can be found in Kusabs et al, 1998.

Page 4: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 44

Mushroom Grading: Building Mushroom Grading: Building the Modelthe Model

Data PreprocessingData Preprocessing–– Cleansing of raw dataCleansing of raw data–– Construction of test data set in Construction of test data set in

collaboration with agricultural collaboration with agricultural researchersresearchers

–– Dataset contains descriptions of 282 Dataset contains descriptions of 282 mushroomsmushrooms

–– Objective & subjective measuresObjective & subjective measures

Page 5: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 55

Mushroom Grading: Building Mushroom Grading: Building the Modelthe Model

Objective MeasuresObjective Measures–– WeightWeight–– FirmnessFirmness–– Percentage of cap openingPercentage of cap opening

Subjective Measures Subjective Measures -- Likertscaleestimates of the degree of dirt, stalk damage bruising, shrivel, bacterial blotch and P.gingeri

Page 6: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 66

Mushroom Grading: Building Mushroom Grading: Building the Modelthe Model

Three inspectors independently graded the mushrooms using the three broad commercial grades (1st, 2nd, and 3rd grade)Digital images were captured for the 282 mushrooms60 image based attributes: frequency bin values (0-4) from the analysis of Red, Green and Blue (R,G,B) and Hue, Saturation and Value (H,S,V) histograms for top (t) and bottom (b) images of the sample mushrooms.Dimensionality reduction: many of the 68 attributes were eliminated

Page 7: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 77

Mushroom GradingMushroom Grading

Figure taken from “Developing innovative applications in agriculture using data mining” by Sally Jo Cunningham and Geoffrey Holmes

Page 8: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 88

Mushroom GradingMushroom Grading

A separate model was developed for the three inspectors Models suggested that each inspector used different combinations of attributes when assigning grades to mushroomsAll predictive models used attributes from top & bottom imagesOnly inspector 2 used weight for the classification

Page 9: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 99

Mushroom GradingMushroom Grading

Subjective measurements did not increase the accuracy of any of the prediction models So were removed by the wrapper techniqueEach model finally used between 4 – 7 attributesAvg. accuracy of these models was comparable with that human experts

Page 10: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 1010

Mushroom GradingMushroom Grading

Results indicate that visual attributes, which can be extracted from digital images are sufficient for mushroom gradingSubjective attributes, commonly believed to play a crucial role in grading are apparently irrelevant to the taskSurprising bit of Surprising bit of ‘‘minedmined’’ information information

Page 11: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 1111

Mushroom GradingMushroom Grading

Mined information echoes the conclusions of a classic ML paper (Michlski & Chilausky, 1980) Paper induced a set of rules for diagnosing soybean diseaseRules were strikingly dissimilar to expert Rules were strikingly dissimilar to expert opinions on the correct diagnosis procedureopinions on the correct diagnosis procedureBut rules were so accurate that one expert But rules were so accurate that one expert adopted the adopted the ‘‘discovereddiscovered’’ rules in place of his rules in place of his ownown

Page 12: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 1212

Mushroom GradingMushroom Grading

Mined information from data can provide insights into the domain being studied that may run counter to the received wisdom of a fieldLocating these surprising or unusual portions of the model can be the focus for a data mining analysis, so that the results can be applied back in the domain from which the data was drawnIn this case, the results indicate that the subjective attributes for mushroom grading may not be useful in practice, and so perhaps they need not be measured or recordedClassification model may prove useful in developing more objective standards for quality grading and market pricing of mushrooms

Page 13: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 1313

Pesticide Abuse*Pesticide Abuse*

Recent studies by agriculture researchers in Pakistan have shown that attempts of crop yield maximization through pro-pesticide state policies have led to a dangerously high pesticide usage. These studies have reported a negative correlation between pesticide usage and crop yield in PakistanExcessive use (or abuse) of pesticides is harming the farmers with adverse financial, environmental and social impacts.

* Based on paper titled Learning Dynamics of Pesticide Abuse through Data Mining by Ahsan Abdullah Stephen Brobst Ijaz Pervaiz at The Australasian Workshop on Data Mining and Web Intelligence (AWDM&WI2004), Dunedin, NewZealand. Conferences in Research and Practice in InformationTechnology, Vol. 32. Editors, James Hogan, Paul Montague, Martin Purvis and Chris Steketee.

Page 14: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 1414

Pesticide AbusePesticide AbuseData Mining of integrated agricultural data including pest scouting, pesticide usage and meteorological recordings is useful for optimization (and reduction) of pesticide usage.Clustering of this data through Recursive Noise Removal (RNR) heuristic of Abdullah and Brobst (2003)Clusters reveal interesting patterns of farmer practices along with pesticide usage dynamics and hence help identify the reasons for this pesticide abuse.

Page 15: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 1515

Pesticide AbusePesticide Abuse

Page 16: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 1616

Data CollectionData CollectionPest Scouting Data – Past Pest Situation– Pesticide Usage Data– Farmer Demographics

Pest scouting is a systematic field sampling process that provides field specific information on pest pressure and crop injuryThis data was obtained from the Directorate of Pest Warning and Quality Control of Pesticides (DPWQCP) Government of PunjabSince 1984 the said directorate has been collecting and recording pest scouting data on a weekly basis from mostly 1800 random locations.For this study the province of Punjab has been selected, because it is a major producer of the cotton crop (Federal Bureau of Statistics 2002).

Page 17: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 1717

Data CollectionData CollectionPest scouting data by itself is a “Gold Mine”of data Coupling it with pesticide usage and meteorological data can provide an excellent insight into the dynamics of past situations and their outcomesIn Pakistan the Pest scouting data has never been digitized and until now it was impossible for any researcher to use it for an in depth analysis. As a pilot project we implemented a data warehouse using two years of pest scouting, pesticide usage and meteorological data consisting of 200 typed sheets and each record consisting of 40 attributes.

Page 18: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 1818

Data CollectionData CollectionThe data warehouse was implemented after digitization, cleansing and integration of data generated by multiple disparate sources. In the first phase of implementation we covered district Multan only, which is one of the thirty four districts of Province of Punjab.District Multan is the hub of cotton production and cotton related activities in the Province (Federal Bureau of Statistics 2002)

Page 19: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 1919

Objective Objective Finding the conditions in which the pesticide usage will be optimalA typical question of a cotton grower would be “which pesticide should be used? And when ?”These questions were modeled by looking for a pattern and relationship between pest population and meteorological data elements, and to find out (if possible) temperature and humidity thresholds at which population of a certain pest booms (or declines)

Page 20: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 2020

ApproachApproachRandomly 20 records were chosen for year 2001 and noted populations of cotton pests such as Jasid (Amrasca), Thrips (ThripTabaci) and Spotted Boll Worm (EariasVitella)For each record the Min, Max temperature and % humidity were retrieved from the daily weather database20×21 table(20×20) similarity matrix based on calculating pairwise Pearson’s correlation

Page 21: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 2121

FindingsFindings

Clusters Identified by RNR

C1

C2

Page 22: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 2222

FindingsFindings

Page 23: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 2323

FindingsFindings

Temp > 29 AND Humidity > 70 then high pest incidenceTemp < 27 AND Humidity < 67 then low pest incidence

Page 24: DATA MINING APPLICATIONS IN AGRICULTUREiasri.res.in/ebook/EBADAT/6-Other Useful Techniques/17-Data Mining... · In Pakistan the Pest scouting data has never been digitized and until

2/23/20072/23/2007 Dr. Navneet Goyal, BITS, PilaniDr. Navneet Goyal, BITS, Pilani 2424

FindingsFindingsChecking these rules against the data (325 matching records retrieved out of 2,000+) shows some very exciting results as shown in FigureThis experimentation presents a very credible case that common farmer questions can be modeled through this data mining technique and answers can be given based on evidence present in the data before the pest attack occurs.