learning with trees

87
Learning with Trees University of Wisconsin-Madison Collaborators: Rui Castro, Clay Scott, Rebecca Willett Rob Nowak Artwork: Piet Mondrian www.ece.wisc.edu/~nowak

Upload: hong

Post on 25-Feb-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Learning with Trees. Rob Nowak. University of Wisconsin-Madison Collaborators: Rui Castro, Clay Scott, Rebecca Willett. www.ece.wisc.edu/~nowak. Artwork: Piet Mondrian. Basic Problem: Partitioning. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning with Trees

Learning with Trees

University of Wisconsin-MadisonCollaborators: Rui Castro, Clay Scott, Rebecca Willett

Rob Nowak

Artwork: Piet Mondrian www.ece.wisc.edu/~nowak

Page 2: Learning with Trees

Basic Problem: Partitioning

Many problems in statistical learning theory boil down to finding a good partition

function partition

Page 3: Learning with Trees

Classification

Learning and Classification: build a decision rule based on labeled training data

Labeled trainingfeatures

Classification rule:partition of feature space

Page 4: Learning with Trees

MRI data brain aneurysm

Signal and Image Processing

Recover complex geometrical structure from noisy data

Extracted vascular network

Page 5: Learning with Trees

Partitioning Schemes

Support Vector Machine

image partitions

Page 6: Learning with Trees

Why Trees ?

CART: Breiman, Friedman, Olshen, and Stone, 1984Classification and Regression Trees

C4.5: Quinlan 1993, C4.5: Programs for Machine Learning

• Simplicity of design

• Interpretability

• Ease of implementation

• Good performance in practice

Trees are one of the most popular and widely used machine learning / data analysis tools

JPEG 2000: Image compression standard, 2000 http://www.jpeg.org/jpeg2000/

Page 7: Learning with Trees

Example: Gamma-Ray Burst Analysis

One burst (10’s of seconds) emits as much energy as our entire MilkyWay does in one hundred years !

x-ray “after glow”

time

photoncounts

Compton Gamma-Ray ObservatoryBurst and Transient Source Experiment (BATSE)

burst

Page 8: Learning with Trees

coarse partition

Trees and Partitions

fine partition

Page 9: Learning with Trees

Estimation using Pruned Tree

Each leaf corresponds to a sample f(ti ), i=0,…,N-1piecewise constant fits to data on each piece of the partition provides a good estimate

Page 10: Learning with Trees

piecewise linearfit on each cell

piecewise polynomialfit on each cell

Gamma-Ray Burst 845

Page 11: Learning with Trees

Recursive Partitions

Page 12: Learning with Trees

Adapted Partition

Page 13: Learning with Trees

Image Denoising

Page 14: Learning with Trees

Decision (Classification) Trees

labeled training data Bayes decision boundary complete partition pruned partition

decision tree- majority vote at each leaf

Page 15: Learning with Trees

Classification

Ideal classfier Adapted partition histogram

256 cells ineach partition

Page 16: Learning with Trees

Image Partitions

1024 cells in each partition

Page 17: Learning with Trees
Page 18: Learning with Trees

Image Coding

JPEG 0.125 bpp JPEG 2000 0.125 bppnon-adaptive partitioning adaptive partitioning

Page 19: Learning with Trees

Probabilistic Framework

Page 20: Learning with Trees

Prediction Problem

Page 21: Learning with Trees

Challenge

Page 22: Learning with Trees

Empirical Risk

Page 23: Learning with Trees

Empirical Risk Minimization

Page 24: Learning with Trees

Classification and Regression Trees

Page 25: Learning with Trees

Classification and Regression Trees

1

0 0 00

1

11

11

00 0

0

11

1 0

0

Page 26: Learning with Trees

Empirical Risk Minimization on Trees

Page 27: Learning with Trees

Overfitting Problemstable

variable

crude

accurate

Page 28: Learning with Trees

Bias/Variance Trade-off

fine partition

coarse partition

small variance

large variancesmall bias

large bias

Page 29: Learning with Trees

Estimation and Approximation Error

Page 30: Learning with Trees

Estimation Error in Regression

Page 31: Learning with Trees

Estimation Error in Classification

Page 32: Learning with Trees

Partition Complexity and Overfitting

empirical risk

variance

# leaves

trust no trust

overfitting to data

risk

Page 33: Learning with Trees

Controlling Overfitting

Page 34: Learning with Trees

Complexity Regularization

Page 35: Learning with Trees

Per-Cell Variance Bounds: Regression

Page 36: Learning with Trees

Per-Cell Variance Bounds: Classification

Page 37: Learning with Trees

Variance Bounds

Page 38: Learning with Trees

A Slightly Weaker Variance Bound

Page 39: Learning with Trees

Complexity Regularization

“small” leafscontribute very little to penalty

Page 40: Learning with Trees

Example: Image Denoising

This is special case of “wavelet denoising” using Haar wavelet basis

Page 41: Learning with Trees

Theory of Complexity Regularization

Page 42: Learning with Trees
Page 43: Learning with Trees

Classification

eyesm

usta

che

Page 44: Learning with Trees

Probabilistic Framework

Page 45: Learning with Trees

Learning from Data

0

1

0

1

Page 46: Learning with Trees

Approximation and Estimation

0

1

Approximation

Model selection

BIAS

VARIANCE

Page 47: Learning with Trees

Classifier Approximations

0

1

Page 48: Learning with Trees

Approximation Error

Symmetric difference set

Error

Page 49: Learning with Trees

Approximation Error

boundary smoothness

risk functional (transition) smoothness

Page 50: Learning with Trees

Boundary Smoothness

Page 51: Learning with Trees

Transition Smoothness

Page 52: Learning with Trees

Transition Smoothness

Page 53: Learning with Trees

Fundamental Limit to Learning

Mammen & Tsybakov (1999)

Page 54: Learning with Trees

Related Work

Page 55: Learning with Trees

Box-Counting Class

Page 56: Learning with Trees

Box-Counting Sub-Classes

Page 57: Learning with Trees

Dyadic Decision Trees

labeled training data Bayes decision boundary complete RDP pruned RDP

Dyadic decision tree- majority vote at each leaf

Joint work with Clay Scott, 2004

Page 58: Learning with Trees

Dyadic Decision Trees

Page 59: Learning with Trees

The Classifier Learning Problem

Problem:

Training Data:

Model Class:

Page 60: Learning with Trees

Empirical Risk

Page 61: Learning with Trees

Chernoff’s Bound

Page 62: Learning with Trees

Chernoff’s Bound

actual risk is probably not much larger than empirical risk

Page 63: Learning with Trees

Error Deviation Bounds

Page 64: Learning with Trees

Uniform Deviation Bound

Page 65: Learning with Trees

Setting Penalties

Page 66: Learning with Trees

Setting Penalties

prefix codes for trees:

0

100

0 01 1 1 1

1

code: 0001001111+ 6 bits for leaf labels

Page 67: Learning with Trees

Uniform Deviation Bound

Page 68: Learning with Trees

Decision Tree Selection

Compare with :

Oracle Bound:

ApproximationError (Bias)

EstimationError (Variance)

Page 69: Learning with Trees

Rate of Convergence

BUT…

Why too slow ?

Page 70: Learning with Trees

same number of leafssame number of leafs

Balanced vs. Unbalanced Trees

all |T| leaf trees are equally favored

Page 71: Learning with Trees

Spatial Adaptation

local empirical local empirical errorerror

local errorlocal error

Page 72: Learning with Trees

Relative Chernoff Bound

Page 73: Learning with Trees

Designing Leaf Penalties

01 = “right branch”

00

01

11 = “terminate”

0/1 = “label”

010001110

prefix code construction :

Page 74: Learning with Trees

Uniform Deviation Bound

Compare with :

Page 75: Learning with Trees

Spatial Adaptivity

Key: local complexity is offset by small volumes!

Page 76: Learning with Trees

Bound Comparison for Unbalanced Tree

J leafsdepth J-1

Non-adaptive bound:

Adaptive bound:

Page 77: Learning with Trees

same number of leafssame number of leafs

Balanced vs. Unbalanced Trees

Page 78: Learning with Trees

Decision Tree Selection

Oracle Bound:

ApproximationError

EstimationError

Page 79: Learning with Trees

Rate of Convergence

Page 80: Learning with Trees

Computable Penalty

achieves same rate of convergence

Page 81: Learning with Trees

Adapting to Dimension - Feature Rejection

01

Page 82: Learning with Trees

Adapting to Dimension - Data Manifold

Page 83: Learning with Trees

Cyclic DDT: force coordinate splits in cyclic order

Free-Split DDT: no order enforcement in splits

Computational Issues additive penalty

Page 84: Learning with Trees

DDTs in Action

Page 85: Learning with Trees

Comparison to State-of-Art

Best results: (1) AdaBoost with RBF-Network, (2) Kernel Fisher Discriminant, (3) SVM with RBF-Kernel,

ODCT = DDT + cross-validation

Page 86: Learning with Trees

Elevation Map St. Louis

Noisy data

Level set

Thresholded data Spatially adapt. penalty

Penalty proportional to |T|

Application to Level Set Estimation

Page 87: Learning with Trees

Conclusions and Future Work

Open Problem:

www.ece.wisc.edu/~nowakMore Info:

www.ece.wisc.edu/~nowak/ece901