weka presentation

Download Weka presentation

If you can't read please download the document

Upload: saeed-iqbal

Post on 17-May-2015

5.274 views

Category:

Technology


6 download

TRANSCRIPT

  • 1. An Introduction to WEKASaeed Iqbal

2. ContentWhat is WEKA?Data set in WEKAThe Explorer:Preprocess dataClassificationClusteringAssociation RulesAttribute SelectionData VisualizationReferences and Resources2 01/07/13 3. What is WEKA?Waikato Environment for Knowledge AnalysisIts a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand.Weka is a collection of machine learning algorithms for data mining tasks.Weka is open source software issued under the GNU General Public License.301/07/13 4. Download and Install WEKAWebsite: http://www.cs.waikato.ac.nz/~ml/weka/index.htmlSupport multiple platforms (written in java): Windows, Mac OS X and LinuxWeka Manual:http://transact.dl.sourceforge.net/sourcefor ge/weka/WekaManual-3.6.0.pdf401/07/13 5. Main Features49 data preprocessing tools76 classification/regression algorithms8 clustering algorithms3 algorithms for finding association rules15 attribute/subset evaluators + 10 search algorithmsfor feature selection501/07/13 6. Main GUI Three graphical user interfacesThe Explorer (exploratory data analysis)The Experimenter (experimental environment)The KnowledgeFlow (new process model inspired interface)Simple CLI (Command prompt) Offers some functionality not available via the GUI601/07/13 7. Datasets in WekaEach entry in a dataset is an instance of the java class:weka.core.InstanceEach instance consists of a number of attributesNominal: one of a predefined list of values e.g. red, green, blueNumeric: A real or integer numberString: Enclosed in double quotesDateRelational 8. ARFF FilesWeka wants its input data in ARFF format.A dataset has to start with a declaration of its name: @relation name@attribute attribute_name specification If an attribute is nominal, specification contains a list of the possibleattribute values in curly brackets: @attribute nominal_attribute {first_value, second_value,third_value} If an attribute is numeric, specification is replaced by the keywordnumeric: (Integer values are treated as real numbers in WEKA.) @attribute numeric_attribute numericAfter the attribute declarations, the actual data is introduced bya tag: @data 9. ARFF File@relation weather@attribute outlook { sunny, overcast, rainy }@attribute temperature numeric@attribute humidity numeric@attribute windy { TRUE, FALSE }@attribute play { yes, no }@datasunny, 85, 85, FALSE, nosunny, 80, 90, TRUE, noovercast, 83, 86, FALSE, yesrainy, 70, 96, FALSE, yesrainy, 68, 80, FALSE, yesrainy, 65, 70, TRUE, noovercast, 64, 65, TRUE, yessunny, 72, 95, FALSE, nosunny, 69, 70, FALSE, yesrainy, 75, 80, FALSE, yessunny, 75, 70, TRUE, yesovercast, 72, 90, TRUE, yesovercast, 81, 75, FALSE, yesrainy, 71, 91, TRUE, no 10. WEKA: ExplorerPreprocess: Choose and modify the data being acted on.Classify: Train and test learning schemes that classify or perform regression.Cluster: Learn clusters for the data.Associate: Learn association rules for the data.Select attributes: Select the most relevant attributes in the data.Visualize: View an interactive 2D plot of the data. 11. Content What is WEKA? Data set in WEKA The Explorer: Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization References and Resources11 01/07/13 12. Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF,CSV, C4.5, binary Data can also be read from a URL or from an SQL database(using JDBC) Pre-processing tools in WEKA are called filters WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, 1201/07/13 13. WEKA only deals with flat files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ...1301/07/13 14. WEKA only deals with flat files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ...1401/07/13 15. load filteranalyze15University of Waikato 01/07/13 16. 16 University of Waikato 01/07/13 17. 17 University of Waikato 01/07/13 18. 18 University of Waikato 01/07/13 19. 19 University of Waikato 01/07/13 20. 20 University of Waikato 01/07/13 21. 21 University of Waikato 01/07/13 22. 22 University of Waikato 01/07/13 23. 23 University of Waikato 01/07/13 24. 24 University of Waikato 01/07/13 25. 25 University of Waikato 01/07/13 26. 26 University of Waikato 01/07/13 27. 27 University of Waikato 01/07/13 28. 28 University of Waikato 01/07/13 29. 29 University of Waikato 01/07/13 30. 30 University of Waikato 01/07/13 31. 31 University of Waikato 01/07/13 32. 32 University of Waikato 01/07/13 33. 33 University of Waikato 01/07/13 34. 34 University of Waikato 01/07/13 35. 35 University of Waikato 01/07/13 36. Explorer: building classifiers Classifiers in WEKA are models for predicting nominal ornumeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes nets, 3601/07/13 37. Decision Tree Induction: Training Dataset ageincome student credit_rating buys_computer 40 low yes fair yesID3 (Playing >40 low yes excellentno 3140 low yes excellentyesTennis)