machine learning reading: chapter 18. 2 machine learning and ai improve task performance through...
Post on 20-Dec-2015
223 views
TRANSCRIPT
![Page 1: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/1.jpg)
Machine Learning
Reading: Chapter 18
![Page 2: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/2.jpg)
2
Machine Learning and AI
Improve task performance through observation, teaching
Acquire knowledge automatically for use in a task
Learning as a key component in intelligence
![Page 3: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/3.jpg)
3
What does it mean to learn?
![Page 4: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/4.jpg)
4
Different kinds of Learning
Rote learning
Learning from instruction
Learning by analogy
Learning from observation and discovery
Learning from examples
![Page 5: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/5.jpg)
5
Inductive Learning
Input: x, f(x)
Output: a function h that approximates f
A good hypothesis, h, is a generalization or learned rule
![Page 6: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/6.jpg)
6
How do systems learn?
Supervised
Unsupervised
Reinforcement
![Page 7: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/7.jpg)
7
Three Types of Learning
Rule induction E.g., decision trees
Knowledge based E.g., using a domain theory
Statistical E.g., Naïve bayes, Nearest neighbor,
support vector machines
![Page 8: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/8.jpg)
8
Applications
Language/speech Machine translation Summarization Grammars
IR Text categorization, relevance feedback
Medical Assessment of illness severity
Vision Face recognition, digit recognition, outdoor scene recognition
Security Intrusion detection, network traffic, credit fraud
Social networks Email traffic
To think about: applications to systems, computer engineering, software?
![Page 9: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/9.jpg)
9
Language Tasks
Text summarization Task: given a document which sentences
could serve as the summary Training data: summary + document pairs Output: rules which extract sentences given
an unseen document Grammar induction
Task: produce a tree representing syntactic structure given a sentence
Training data: set of sentences annotated with parse tree
Output: rules which generate a parse tree given an unseen sentence
![Page 10: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/10.jpg)
10
IR Task
Text categorization http://www.yahoo.com Task: given a web page, is it news or not?
Binary classification (yes, no) Classify as one of
business&economy,news&media, computer Training data: documents labeled with
category Output: a yes/no response for a new
document; a category for a new document
![Page 11: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/11.jpg)
11
Medical
Task: Does a patient have heart disease (on a scale from 1 to 4)
Training data: Age, sex,cholesterol, chest pain location,
chest pain type, resting blood pressure, smoker?, fasting blood sugar, etc.
Characterization of heart disease (0,1-4) Output:
Given a new patient, classification by disease
![Page 12: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/12.jpg)
12
General Approach
Formulate task
Prior model (parameters, structure)
Obtain data
What representation should be used? (attribute/value pairs)
Annotate data
Learn/refine model with data(training)
Use model for classification or prediction on unseen data (testing)
Measure accuracy
![Page 13: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/13.jpg)
13
Issues
Representation How to map from a representation in the
domain to a representation used for learning? Training data
How can training data be acquired? Amount of training data
How well does the algorithm do as we vary the amount of data?
Which attributes influence learning most? Does the learning algorithm provide
insight into the generalizations made?
![Page 14: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/14.jpg)
14
Classification Learning
Input: a set of attributes and values
Output: discrete valued function Learning a continuous valued function is
called regression
Binary or boolean classification: category is either true or false
![Page 15: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/15.jpg)
15
Learning Decision Trees
Each node tests the value of an input attribute
Branches from the node correspond to possible values of the attribute
Leaf nodes supply the values to be returned if that leaf is reached
![Page 16: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/16.jpg)
16
Example
http://www.ics.uci.edu/~mlearn/MLSummary.html Iris Plant Database Which of 3 classes is a given Iris plant?
Iris Setosa Iris Versicolour Iris Virginica
Attributes Sepal length in cm Sepal width in cm Petal length in cm Petal width in cm
![Page 17: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/17.jpg)
17
Summary Statistics: Min Max Mean SD ClassCorrelation sepal length: 4.3 7.9 5.84 0.83 0.7826 sepal width: 2.0 4.4 3.05 0.43 -0.4194 petal length: 1.0 6.9 3.76 1.76 0.9490 (high!) petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)
Rules to learn If sepal length > 6 and sepal width > 3.8 and petal
length < 2.5 and petal width < 1.5 then class = Iris Setosa
If sepal length > 5 and sepal width > 3 and petal length >5.5 and petal width >2 then class = Iris Versicolour
If sepal length <5 and sepal width > 3 and petal length 2.5 and ≤ 5.5 and petal width 1.5 and ≤ 2 then class = Iris Virginica
![Page 18: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/18.jpg)
DataS-length S-width P-length P-width Class
1 6.8 3 6.3 2.3 Versicolour
2 7 3.9 2.4 2.2 Setosa
3 2 3 2.6 1.7 Verginica
4 3 3.4 2.5 1.1 Verginica
5 5.5 3.6 6.8 2.4 Versicolour
6 7.7 4.1 1.2 1.4 Setosa
7 6.3 4.3 1.6 1.2 Setosa
8 1 3.7 2.8 2.2 Verginica
9 6 4.2 5.6 2.1 Versicolour
![Page 19: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/19.jpg)
DataS-length S-width P-length Class
1 6.8 3 6.3 Versicolour
2 7 3.9 2.4 Setosa
3 2 3 2.6 Verginica
4 3 3.4 2.5 Verginica
5 5.5 3.6 6.8 Versicolour
6 7.7 4.1 1.2 Setosa
7 6.3 4.3 1.6 Setosa
8 1 3.7 2.8 Verginica
9 6 4.2 5.6 Versicolour
![Page 20: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/20.jpg)
20
Constructing the Decision Tree
Goal: Find the smallest decision tree consistent with the examples
Find the attribute that best splits examples Form tree with root = best attribute For each value vi (or range) of best attribute
Selects those examples with best=vi
Construct subtreei by recursively calling decision tree with subset of examples, all attributes except best
Add a branch to tree with label=vi and subtree=subtreei
![Page 21: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/21.jpg)
21
Construct example decision tree
![Page 22: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/22.jpg)
22
Tree and Rules Learned
S-length
<5.5 5.5
3,4,8 P-length
5.6 ≤2.4
1,5,9 2,6,7
If S-length < 5.5., then Verginica
If S-length 5.5 and P-length 5.6, then Versicolour
If S-length 5.5 and P-length ≤ 2.4, then Setosa
![Page 23: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/23.jpg)
23
Comparison of Target and Learned
If S-length < 5.5., then VerginicaIf S-length 5.5 and P-length 5.6, then VersicolourIf S-length 5.5 and P-length ≤ 2.4, then Setosa
If sepal length > 6 and sepal width > 3.8 and petal length < 2.5 and petal width < 1.5 then class = Iris SetosaIf sepal length > 5 and sepal width > 3 and petal length >5.5 and petal width >2 then class = Iris VersicolourIf sepal length <5 and sepal width > 3 and petal length 2.5 and ≤ 5.5 and petal width 1.5 and ≤ 2 then class = Iris Virginica
![Page 24: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/24.jpg)
24
Text Classification
Is texti a finance new article?
Positive Negative
![Page 25: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/25.jpg)
25
20 attributes
Investors 2 Dow 2 Jones 2 Industrial 1 Average 3 Percent 5 Gain 6 Trading 8 Broader 5 stock 5 Indicators 6 Standard 2 Rolling 1 Nasdaq 3 Early 10 Rest 12 More 13 first 11 Same 12 The 30
![Page 26: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/26.jpg)
26
20 attributes
Men’s Basketball Championship UConn Huskies Georgia Tech Women Playing Crown Titles Games Rebounds All-America early rolling Celebrates Rest More First The same
![Page 27: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/27.jpg)
27
Examplestock rolling the class
1 0 3 40 other
2 6 8 35 finance
3 7 7 25 other
4 5 7 14 other
5 8 2 20 finance
6 9 4 25 finance
7 5 6 20 finance
8 0 2 35 other
9 0 11 25 finance
10 0 15 28 other
![Page 28: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/28.jpg)
28
Issues
Representation How to map from a representation in the
domain to a representation used for learning? Training data
How can training data be acquired? Amount of training data
How well does the algorithm do as we vary the amount of data?
Which attributes influence learning most? Does the learning algorithm provide
insight into the generalizations made?
![Page 29: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/29.jpg)
29
Constructing the Decision Tree
Goal: Find the smallest decision tree consistent with the examples
Find the attribute that best splits examples Form tree with root = best attribute For each value vi (or range) of best attribute
Selects those examples with best=vi
Construct subtreei by recursively calling decision tree with subset of examples, all attributes except best
Add a branch to tree with label=vi and subtree=subtreei
![Page 30: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/30.jpg)
30
Choosing the Best Attribute:Binary Classification
Want a formal measure that returns a maximum value when attribute makes a perfect split and minimum when it makes no distinction
Information theory (Shannon and Weaver 49)
H(P(v1),…P(vn))=∑-P(vi)log2P(vi)
H(1/2,1/2)=-1/2log21/2-1/2log21/2=1 bitH(1/100,1/99)=-.01log2.01-.99log2.99=.08 bits
n
i=1
![Page 31: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/31.jpg)
31
Information based on attributes
= Remainder (A)
P=n=10, so H(1/2,1/2)= 1 bit
![Page 32: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/32.jpg)
32
Information Gain
Information gain (from attribute test) = difference between the original information requirement and new requirement
Gain(A)=H(p/p+n,n/p+n)-Remainder(A)
![Page 33: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/33.jpg)
33
Examplestock rolling the class
1 0 3 40 other
2 6 8 35 finance
3 7 7 25 other
4 5 7 14 other
5 8 2 20 finance
6 9 4 25 finance
7 5 6 20 finance
8 0 2 35 other
9 0 11 25 finance
10 0 15 28 other
![Page 34: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/34.jpg)
stock rolling
<5 5-10 10 10
5-10<5
1,8,9,10 2,3,4,5,6,7
1,5,6,82,3,4,7
9,10
Gain(stock)=1-[4/10H(1/10,3/10)+6/10H(4/10,2/10)]= 1-[.4((-.1* -3.32)-(.3*-1.74))+.6((-.4*-1.32)-(.2*-2.32))]=
1-[.303+.5952]=.105
Gain(rolling)=1-[4/10H(1/2,1/2)+4/10H(1/2,1/2)+2/10H(1/2,1/2)]=0
![Page 35: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/35.jpg)
35
Other cases What if class is discrete valued, not
binary?
What if an attribute has many values (e.g., 1 per instance)?
![Page 36: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/36.jpg)
36
Training vs. Testing
A learning algorithm is good if it uses its learned hypothesis to make accurate predictions on unseen data
Collect a large set of examples (with classifications)
Divide into two disjoint sets: the training set and the test set
Apply the learning algorithm to the training set, generating hypothesis h
Measure the percentage of examples in the test set that are correctly classified by h
Repeat for different sizes of training sets and different randomly selected training sets of each size.
![Page 37: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/37.jpg)
37
![Page 38: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/38.jpg)
38
Division into 3 sets
Inadvertent peeking
Parameters that must be learned (e.g., how to split values)
Generate different hypotheses for different parameter values on training data
Choose values that perform best on testing data
Why do we need to do this for selecting best attributes?
![Page 39: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/39.jpg)
39
Overfitting
Learning algorithms may use irrelevant attributes to make decisions For news, day published and newspaper
Decision tree pruning Prune away attributes with low
information gain Use statistical significance to test
whether gain is meaningful
![Page 40: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/40.jpg)
40
K-fold Cross Validation
To reduce overfitting
Run k experiments Use a different 1/k of data for testing
each time Average the results
5-fold, 10-fold, leave-one-out
![Page 41: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/41.jpg)
41
Ensemble Learning
Learn from a collection of hypotheses
Majority voting
Enlarges the hypothesis space
![Page 42: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/42.jpg)
42
Boosting
Uses a weighted training set Each example as an associated weight wj0 Higher weighted examples have higher importance
Initially, wj=1 for all examples Next round: increase weights of misclassified
examples, decrease other weights From the new weighted set, generate hypothesis
h2 Continue until M hypotheses generated Final ensemble hypothesis = weighted-majority
combination of all M hypotheses Weight each hypothesis according to how well it did
on training data
![Page 43: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/43.jpg)
43
AdaBoost
If input learning algorithm is a weak learning algorithm L always returns a hypothesis with weighted
error on training slightly better than random Returns hypothesis that classifies training
data perfectly for large enough M Boosts the accuracy of the original
learning algorithm on training data
![Page 44: Machine Learning Reading: Chapter 18. 2 Machine Learning and AI Improve task performance through observation, teaching Acquire knowledge automatically](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d4f5503460f94a2eaca/html5/thumbnails/44.jpg)
44
Issues
Representation How to map from a representation in the
domain to a representation used for learning? Training data
How can training data be acquired? Amount of training data
How well does the algorithm do as we vary the amount of data?
Which attributes influence learning most? Does the learning algorithm provide
insight into the generalizations made?