is random model better? - on its accuracy and efficiency - wei fan ibm t.j.watson joint work with...
TRANSCRIPT
![Page 1: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/1.jpg)
Is Random Model Better? -On its accuracy and efficiency-
Wei FanIBM T.J.Watson
Joint work withHaixun Wang, Philip S. Yu, and Sheng Ma
![Page 2: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/2.jpg)
Optimal Model Loss function L(t,y) to evaluate performance.
t is true label and y is prediction Optimal decision decision y* is the label that
minimizes expected loss when x is sampled repeatedly: 0-1 loss: y* is the label that appears the most
often, i.e., if P(fraud|x) > 0.5, predict fraud cost-sensitive loss: the label that minimizes the
“empirical risk”.• If P(fraud|x) * $1000 > $90 or p(fraud|x) > 0.09, predict
fraud
![Page 3: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/3.jpg)
How we look for optimal model? NP-hard for most “model representation” We think that simplest hypothesis that fits
the data is the best. We employ all kinds of heuristics to look
for it. info gain, gini index, Kearns-Mansour, etc pruning: MDL pruning, reduced error-pruning,
cost-based pruning. Reality: tractable, but still pretty
expensive
![Page 4: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/4.jpg)
On the other hand Occam’s Razor’s interpretation: two
hypotheses with the same loss, we should prefer the simpler one.
Very complicated hypotheses that are highly accurate: Meta-learning Boosting (weighted voting) Bagging (sampling without replacement)
Where are we? The above are very complicated to compute.
Question: do we have to?
![Page 5: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/5.jpg)
Do we have to be “perfect”? 0-1 loss binary problem:
P(positive|x) > 0.5, we predict x to be positive. P(positive|x) = 0.6, P(positive|x) = 0.9 makes
no difference in final prediction! Cost-sensitive problems:
P(fraud|x) * $1000 > $90, we predict x to be fraud.
Re-write it P(fraud|x) > 0.09 P(fraud|x) = 1.0 and P(fraud|x) = 0.091 makes
no difference.
![Page 6: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/6.jpg)
Random Decision Tree Building several empty iso-depth tree structures
without even looking at the data. Example is sorted through a unique path from
the root the the leaf. Each tree node records the number of instances belonging to each class.
Update each empty node by scanning the data set only once. It is like “classifying” the data. When an example reaches a node, the number of
examples belonging to a particular class label increments
![Page 7: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/7.jpg)
Classification Each tree outputs membership probability
p(fraud|x) = n_fraud/(n_fraud + n_normal) The membership probability from multiple
random trees are averaged to approximate the true probability
Loss function is required to make a decision 0-1 loss: p(fraud|x) > 0.5, predict fraud cost-sensitive loss: p(fraud|x) $1000 > $90
![Page 8: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/8.jpg)
Tree depth To create diversity Half of the number of features Combinations peak at half the size
of population Such as, combine 2 out 4 gives 6
choices.
![Page 9: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/9.jpg)
Number of trees Sampling theory:
30 gives pretty good estimate with reasonably small variance
10 is usually already in the range. Worst scenario
Only one feature is relevant. All the rest are noise.
Probability:
![Page 10: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/10.jpg)
Simple Feature Info Gain Limitation:
At least one feature with info gain by itself
Same limitation as C4.5 and dti Example
![Page 11: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/11.jpg)
Training Efficiency One complete scan of the training
data. Memory Requirement:
Hold one tree (or better multiple trees) One example read at a time.
![Page 12: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/12.jpg)
Donation Dataset Decide whom to send charity
solicitation letter. It costs $0.68 to send a letter. Loss function
![Page 13: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/13.jpg)
Result
![Page 14: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/14.jpg)
Result
![Page 15: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/15.jpg)
Credit Card Fraud Detect if a transaction is a fraud There is an overhead to detect a
fraud, {$60, $70, $80, $90} Loss Function
![Page 16: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/16.jpg)
Result
![Page 17: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/17.jpg)
Extreme situation
![Page 18: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/18.jpg)
Tree Depth
![Page 19: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/19.jpg)
Compare with Boosting
![Page 20: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/20.jpg)
Compare with Bagging
![Page 21: Is Random Model Better? - On its accuracy and efficiency - Wei Fan IBM T.J.Watson Joint work with Haixun Wang, Philip S. Yu, and Sheng Ma](https://reader036.vdocument.in/reader036/viewer/2022062404/551515ad550346c77d8b4ceb/html5/thumbnails/21.jpg)
Conclusion Point out the reality that
conventional inductive learning (single best and multiple complicated) are probably way too complicated beyond necessity
Propose a very efficient and accurate random tree algorithm