artificial intelligence project #3 : analysis of decision tree learning using weka may 23, 2006

16
Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006

Upload: gwenda-cook

Post on 03-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006

Artificial Intelligence Project #3: Analysis of Decision Tree

Learning Using WEKA

May 23, 2006

Page 2: Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006

Introduction

Decision tree learning is a method for approximating discrete-valued target functionThe learned function is represented by a decision treeDecision tree can also be re-represented as if-then rules to improve human readability

Page 3: Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006

An Example of Decision Tree

Page 4: Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006

Decision Tree Representation (1/2)

Decision tree classify instances by sorting them down the tree from the root to some leaf node

Node

Specifies test of some attribute

Branch

Corresponds to one of the possible values for this attribute

Page 5: Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006

Decision Tree Representation (2/2)

Each path corresponds to a

conjunction of attribute tests

(Outlook=sunny, Temperature=Hot,

Humidity=high, Wind=Strong)

(Outlook=Sunny ∧ Humidity=High) so NO

Decision trees represent a

disjunction of conjunction of

constraints on the attribute values of

instances

(Outlook=Sunny ∧Humidity=normal)

∨(Outlook=Overcast)

∨(Outlook=Rain ∧Wind=Weak)

Outlook

Humidity Yes

No Yes

Wind

No Yes

Sunny

Overcast

Rain

High Normal Strong Weak

What is the merit of tree representation?

Page 6: Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006

Appropriate Problems for Decision Tree Learning

Instances are represented by attribute-value pairs

The target function has discrete output values

Disjunctive descriptions may be required

The training data may contain errors

Both errors in classification of the training examples and errors in the attribute values

The training data may contain missing attribute values

Suitable for classification

Page 7: Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006

Study

Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells, MH Cheok et al., Nature Genetics 35, 2003.

60 leukemia patients

Bone marrow samples

Affymetrix GeneChip arrays

Gene expression data

Page 8: Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006

Gene Expression Data

# of data examples120 (60: before treatment, 60: after treatment)

# of genes measured12600 (Affymetrix HG-U95A array)

TaskClassification between “before treatment” and “after treatment” based on gene expression pattern

Page 9: Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006

Affymetrix GeneChip Arrays

Use short oligos to detect gene expression level.Each gene is probed by a set of short oligos.Each gene expression level is summarized by

Signal: numerical value describing the abundance of mRNAA/P call: denotes the statistical significance of signal

Page 10: Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006

Preprocessing

Remove the genes having more than 60 ‘A’ calls# of genes: 12600 3190

Discretization of gene expression levelCriterion: median gene expression value of each sample0 (low) and 1 (high)

Page 11: Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006

Gene Filtering

Using mutual information

Estimated probabilities were used.# of genes: 3190 1000

Final dataset# of attributes: 1001 (one for the class)

Class: 0 (after treatment), 1 (before treatment)

# of data examples: 120

,

( , )( ; ) ( , ) log

( ) ( )G C

P G CI G C P G C

P G P C

Page 12: Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006

Final Dataset

120

1000

Page 13: Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006

Materials for the Project

GivenPreprocessed microarray data file: data2.txt

DownloadableWEKA (http://www.cs.waikato.ac.nz/ml/weka/)

Page 14: Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006

Analysis of Decision Tree Learning

Page 15: Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006

Analysis of Decision Tree Learning

Page 16: Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006

Submission

Due date: June 15 (Thu.), 12:00(noon)Report: Hard copy(301-419) & e-mail.

ID3, J48 and another decision tree algorithm with learning parameter.Show the experimental results of each algorithm. Except for ID3, you should try to find out better performance, changing learning parameter.Analyze what makes difference between selected algorithms.E-mail : [email protected]