learning with trees
DESCRIPTION
Learning with Trees. Rob Nowak. University of Wisconsin-Madison Collaborators: Rui Castro, Clay Scott, Rebecca Willett. www.ece.wisc.edu/~nowak. Artwork: Piet Mondrian. Basic Problem: Partitioning. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/1.jpg)
Learning with Trees
University of Wisconsin-MadisonCollaborators: Rui Castro, Clay Scott, Rebecca Willett
Rob Nowak
Artwork: Piet Mondrian www.ece.wisc.edu/~nowak
![Page 2: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/2.jpg)
Basic Problem: Partitioning
Many problems in statistical learning theory boil down to finding a good partition
function partition
![Page 3: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/3.jpg)
Classification
Learning and Classification: build a decision rule based on labeled training data
Labeled trainingfeatures
Classification rule:partition of feature space
![Page 4: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/4.jpg)
MRI data brain aneurysm
Signal and Image Processing
Recover complex geometrical structure from noisy data
Extracted vascular network
![Page 5: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/5.jpg)
Partitioning Schemes
Support Vector Machine
image partitions
![Page 6: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/6.jpg)
Why Trees ?
CART: Breiman, Friedman, Olshen, and Stone, 1984Classification and Regression Trees
C4.5: Quinlan 1993, C4.5: Programs for Machine Learning
• Simplicity of design
• Interpretability
• Ease of implementation
• Good performance in practice
Trees are one of the most popular and widely used machine learning / data analysis tools
JPEG 2000: Image compression standard, 2000 http://www.jpeg.org/jpeg2000/
![Page 7: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/7.jpg)
Example: Gamma-Ray Burst Analysis
One burst (10’s of seconds) emits as much energy as our entire MilkyWay does in one hundred years !
x-ray “after glow”
time
photoncounts
Compton Gamma-Ray ObservatoryBurst and Transient Source Experiment (BATSE)
burst
![Page 8: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/8.jpg)
coarse partition
Trees and Partitions
fine partition
![Page 9: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/9.jpg)
Estimation using Pruned Tree
Each leaf corresponds to a sample f(ti ), i=0,…,N-1piecewise constant fits to data on each piece of the partition provides a good estimate
![Page 10: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/10.jpg)
piecewise linearfit on each cell
piecewise polynomialfit on each cell
Gamma-Ray Burst 845
![Page 11: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/11.jpg)
Recursive Partitions
![Page 12: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/12.jpg)
Adapted Partition
![Page 13: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/13.jpg)
Image Denoising
![Page 14: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/14.jpg)
Decision (Classification) Trees
labeled training data Bayes decision boundary complete partition pruned partition
decision tree- majority vote at each leaf
![Page 15: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/15.jpg)
Classification
Ideal classfier Adapted partition histogram
256 cells ineach partition
![Page 16: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/16.jpg)
Image Partitions
1024 cells in each partition
![Page 17: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/17.jpg)
![Page 18: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/18.jpg)
Image Coding
JPEG 0.125 bpp JPEG 2000 0.125 bppnon-adaptive partitioning adaptive partitioning
![Page 19: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/19.jpg)
Probabilistic Framework
![Page 20: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/20.jpg)
Prediction Problem
![Page 21: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/21.jpg)
Challenge
![Page 22: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/22.jpg)
Empirical Risk
![Page 23: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/23.jpg)
Empirical Risk Minimization
![Page 24: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/24.jpg)
Classification and Regression Trees
![Page 25: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/25.jpg)
Classification and Regression Trees
1
0 0 00
1
11
11
00 0
0
11
1 0
0
![Page 26: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/26.jpg)
Empirical Risk Minimization on Trees
![Page 27: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/27.jpg)
Overfitting Problemstable
variable
crude
accurate
![Page 28: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/28.jpg)
Bias/Variance Trade-off
fine partition
coarse partition
small variance
large variancesmall bias
large bias
![Page 29: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/29.jpg)
Estimation and Approximation Error
![Page 30: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/30.jpg)
Estimation Error in Regression
![Page 31: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/31.jpg)
Estimation Error in Classification
![Page 32: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/32.jpg)
Partition Complexity and Overfitting
empirical risk
variance
# leaves
trust no trust
overfitting to data
risk
![Page 33: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/33.jpg)
Controlling Overfitting
![Page 34: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/34.jpg)
Complexity Regularization
![Page 35: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/35.jpg)
Per-Cell Variance Bounds: Regression
![Page 36: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/36.jpg)
Per-Cell Variance Bounds: Classification
![Page 37: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/37.jpg)
Variance Bounds
![Page 38: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/38.jpg)
A Slightly Weaker Variance Bound
![Page 39: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/39.jpg)
Complexity Regularization
“small” leafscontribute very little to penalty
![Page 40: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/40.jpg)
Example: Image Denoising
This is special case of “wavelet denoising” using Haar wavelet basis
![Page 41: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/41.jpg)
Theory of Complexity Regularization
![Page 42: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/42.jpg)
![Page 43: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/43.jpg)
Classification
eyesm
usta
che
![Page 44: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/44.jpg)
Probabilistic Framework
![Page 45: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/45.jpg)
Learning from Data
0
1
0
1
![Page 46: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/46.jpg)
Approximation and Estimation
0
1
Approximation
Model selection
BIAS
VARIANCE
![Page 47: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/47.jpg)
Classifier Approximations
0
1
![Page 48: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/48.jpg)
Approximation Error
Symmetric difference set
Error
![Page 49: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/49.jpg)
Approximation Error
boundary smoothness
risk functional (transition) smoothness
![Page 50: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/50.jpg)
Boundary Smoothness
![Page 51: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/51.jpg)
Transition Smoothness
![Page 52: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/52.jpg)
Transition Smoothness
![Page 53: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/53.jpg)
Fundamental Limit to Learning
Mammen & Tsybakov (1999)
![Page 54: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/54.jpg)
Related Work
![Page 55: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/55.jpg)
Box-Counting Class
![Page 56: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/56.jpg)
Box-Counting Sub-Classes
![Page 57: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/57.jpg)
Dyadic Decision Trees
labeled training data Bayes decision boundary complete RDP pruned RDP
Dyadic decision tree- majority vote at each leaf
Joint work with Clay Scott, 2004
![Page 58: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/58.jpg)
Dyadic Decision Trees
![Page 59: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/59.jpg)
The Classifier Learning Problem
Problem:
Training Data:
Model Class:
![Page 60: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/60.jpg)
Empirical Risk
![Page 61: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/61.jpg)
Chernoff’s Bound
![Page 62: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/62.jpg)
Chernoff’s Bound
actual risk is probably not much larger than empirical risk
![Page 63: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/63.jpg)
Error Deviation Bounds
![Page 64: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/64.jpg)
Uniform Deviation Bound
![Page 65: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/65.jpg)
Setting Penalties
![Page 66: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/66.jpg)
Setting Penalties
prefix codes for trees:
0
100
0 01 1 1 1
1
code: 0001001111+ 6 bits for leaf labels
![Page 67: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/67.jpg)
Uniform Deviation Bound
![Page 68: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/68.jpg)
Decision Tree Selection
Compare with :
Oracle Bound:
ApproximationError (Bias)
EstimationError (Variance)
![Page 69: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/69.jpg)
Rate of Convergence
BUT…
Why too slow ?
![Page 70: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/70.jpg)
same number of leafssame number of leafs
Balanced vs. Unbalanced Trees
all |T| leaf trees are equally favored
![Page 71: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/71.jpg)
Spatial Adaptation
local empirical local empirical errorerror
local errorlocal error
![Page 72: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/72.jpg)
Relative Chernoff Bound
![Page 73: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/73.jpg)
Designing Leaf Penalties
01 = “right branch”
00
01
11 = “terminate”
0/1 = “label”
010001110
prefix code construction :
![Page 74: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/74.jpg)
Uniform Deviation Bound
Compare with :
![Page 75: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/75.jpg)
Spatial Adaptivity
Key: local complexity is offset by small volumes!
![Page 76: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/76.jpg)
Bound Comparison for Unbalanced Tree
J leafsdepth J-1
Non-adaptive bound:
Adaptive bound:
![Page 77: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/77.jpg)
same number of leafssame number of leafs
Balanced vs. Unbalanced Trees
![Page 78: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/78.jpg)
Decision Tree Selection
Oracle Bound:
ApproximationError
EstimationError
![Page 79: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/79.jpg)
Rate of Convergence
![Page 80: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/80.jpg)
Computable Penalty
achieves same rate of convergence
![Page 81: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/81.jpg)
Adapting to Dimension - Feature Rejection
01
![Page 82: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/82.jpg)
Adapting to Dimension - Data Manifold
![Page 83: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/83.jpg)
Cyclic DDT: force coordinate splits in cyclic order
Free-Split DDT: no order enforcement in splits
Computational Issues additive penalty
![Page 84: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/84.jpg)
DDTs in Action
![Page 85: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/85.jpg)
Comparison to State-of-Art
Best results: (1) AdaBoost with RBF-Network, (2) Kernel Fisher Discriminant, (3) SVM with RBF-Kernel,
ODCT = DDT + cross-validation
![Page 86: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/86.jpg)
Elevation Map St. Louis
Noisy data
Level set
Thresholded data Spatially adapt. penalty
Penalty proportional to |T|
Application to Level Set Estimation
![Page 87: Learning with Trees](https://reader036.vdocument.in/reader036/viewer/2022062814/56816792550346895ddccab6/html5/thumbnails/87.jpg)
Conclusions and Future Work
Open Problem:
www.ece.wisc.edu/~nowakMore Info:
www.ece.wisc.edu/~nowak/ece901