icvss2008: randomized decision forests

Randomized Decision Forestsfor

Segmentation and Recognition

Jamie Shotton

ICVSS 2008, Sicily, Italy

http://jamie.shotton.org/work/presentations/ICVSS2008.zip

Randomized Decision Forests

• Very fast– for classification– for clustering

• Generalization through random training

• Inherently multi-class– automatic feature sharing [Torralba et al. 07]

• Simple training / testing algorithms

“Randomized Decision Forests” = “Randomized Forests” = “Random ForestsTM”

detailedreferences atend of slides

(Amongothers...)

Randomized Forests in Vision

[Lepetit et al., 06]keypoint recognition

[Amit & Geman, 97]digit recognition

[Moosmann et al., 06]visual word clustering

[Shotton et al., 08]object segmentation

water

boatchair

tree

road

Live Demo [Shotton et al. 08]

• Real-time object segmentationusing randomized decision forests– trained on MSRC 21-category database:

• Segment image and label segments:

airplane bicycle bird boat body bookbuilding car cat chair cow dog face flower

grass road sheep sign sky tree water

Winner CVPR 2008 Best Demo Award!

Outline

• Tutorial on Randomized Decision Forests

• Applications to Vision– keypoint recognition [Lepetit et al. 06]–object segmentation [Shotton et al. 08]

please ask questions as we go!

The Basics: Is The Grass Wet?

world state

is it raining?

is the sprinkler on?P(wet)= 0.95

P(wet)= 0.9

yesno

yesno

P(wet)= 0.1

The Basics: Binary Decision Trees

1

2 3

6 74

9

5

8

category c

split nodesleaf nodes

v

10 11 12 13

14 15 16 17

• feature vector v• split functions

fn(v)• thresholds tn

• ClassificationsPn(c)

≥

<

<

≥

Decision Tree Pseudo-Code

double[] ClassifyDT(node, v)if node.IsSplitNode then

if node.f(v) >= node.t thenreturn

ClassifyDT(node.right, v)else

return ClassifyDT(node.left, v)

endelse

return node.Pend

end

Toy Learning Example

x

y

• feature vectors are x, y coordinates: v = [x, y]T

• split functions are lines with parameters a, b: fn(v) = ax + by• threshold determines intercepts: tn

• four classes: purple, blue, red, green

• Try several lines, chosen at random

• Keep line that best separates data– information gain

• Recurse

• Recursive algorithm– set In of training examples that reach node n is split:

• Features f and thresholds t chosen at random

• At leaf node n, Pn(c) is histogram of examples In

Randomized Learning

left split

right split thresholdfunction ofexample i’s

feature vector

• Features f(v) chosen from feature pool f 2 F

• Thresholds t chosen in range

• Choose f and t to maximize gain in information

More Randomized Learning

left split

right split

Implementation Details

• How many features and thresholds to try?– just one = “extremely randomized” [Geurts et al. 06]– few -> fast training, may under-fit– many -> slower training, may over-fit

• When to stop?– maximum depth– minimum entropy gain– delta class distribution– pruning?

• Unsupervised training– information gain -> most balanced split

Randomized Learning Pseudo CodeTreeNode LearnDT(I)

repeat featureTests timeslet f = RndFeature()

repeat threshTests timeslet t = RndThreshold(I, f)let (I_l, I_r) = Split(I, f, t)let gain = InfoGain(I_l, I_r)if gain is best then remember f, t, I_l, I_r

endend

if best gain is sufficient return SplitNode(f, t, LearnDT(I_l),

LearnDT(I_r))else

return LeafNode(HistogramExamples(I))end

end

• Forest is ensemble ofseveral decision trees

– classification is

A Forest of Trees

……tree t1 tree tT

category ccategory c

split nodesleaf nodes

[Amit & Geman 97][Breiman 01][Lepetit et al. 06]

v v

Decision Forests Pseudo-Code

double[] ClassifyDF(forest, v)// allocate memorylet P = double[forest.CountClasses]

// loop over trees in forestfor t = 1 to forest.CountTrees

let P’ = ClassifyDT(forest.Tree[t], v) P = P + P’ // sum distributions

end

// normaliseP = P / forest.CountTrees

end

Learning a Forest

• Divide training examples into T subsets It µ I– improves generalization– reduces memory requirements & training time

• Train each decision tree t on subset It– same decision tree learning as before

• Multi-core friendly

• Subsets can be chosen at random or hand-picked• Subsets can have overlap (and usually do)• Could also divide the feature pool into subsets

Learning a Forest Pseudo Code

Forest LearnDF(countTrees, I)// allocate memorylet forest = Forest(countTrees)

// loop over trees in forestfor t = 1 to countTrees

let I_t = RandomSplit(I)forest[t] = LearnDT(I_t)

end

// return forest objectreturn forest

end

Toy Forest Classification Demo

ToyClassification

Demo

Randomized Forests for Clustering [Moosmann et al. 06]

• Visual words good for e.g. matching, recognitionbut k-means clustering very slow

• Randomized forests for clustering descriptors– e.g. SIFT, texton filter-banks, etc.

• Leaf nodes in forest are clusters– concatenate histograms from trees in forest

543

1

2

8 96 7

42 61 3

98

……

tree t1 tree tT

75

[Sivic et al. 03][Csurka et al. 04]

Randomized Forests for Clustering [Moosmann et al. 06]

543

1

2

8 96 7

42 61 3

98

……

tree t1 tree tT

75fr

eque

ncy

tree t1 tree tT

node index

we’ll see later how to use whole tree

hierarchy!

“bag of words”

Relation to Cascades [Viola & Jones 04]

• Cascades– very unbalanced tree– good for unbalanced binary problems

e.g. sliding window object detection

• Randomized forests– less deep, fairly balanced– ensemble of trees gives robustness– good for multi-class problems

Random Ferns

• Naïve Bayes classifier over random sets of features

• Can be good alternativeto randomized forests

[Özuysal et al. 07] [Bosch et al. 07]

set of features

“random ferns”

individual features

“naïve Bayes”

Bayes’ rule

Short Pause

Any QuestionsSo Far?

Outline

• Tutorial on Randomized Decision Forests

• Applications to Vision Problems– keypoint recognition [Lepetit et al. 06]–object segmentation [Shotton et al. 08]

Fast Keypoint Recognition [Lepetit et al. 06]

• Wide-baseline matchingas classification problem

• Extract prominent key-points in training images

• Forest to classifies:– patches -> keypoints

• Features– pixel comparisons

• Augmented training set– gives robustness to patch scaling, translation, rotation

Fast Keypoint Recognition [Lepetit et al. 06]

• Example videos– from http://cvlab.epfl.ch/research/augm/detect.php

http://cvlab.epfl.ch/research/augm/detect.php

Real-Time Object Segmentation [Shotton et al. 2008]

• Aim – a better visual vocabulary– image categorization

• does this image contain cows, trees, etc.?– object segmentation

• draw and label the outlines of the cow, grass, etc.

• Design goals– fast and accurate– use learned

semantic information

Object Recognition Pipeline

extract features

SIFT, filter bank

clustering

k-means

unsupervisedhand-crafted

classification algorithm

SVM, decision forest, boostingsupervised

assignment

nearest neighbour


STF

clustering‘semantic textons’

local classification

Semantic Texton Forest (STF)• decision forest for both

clustering & classification• tree nodes have learned

object category associations

classification algorithm

SVM, decision forest, boostingsupervised


STF

clustering‘semantic textons’


SF

dog

road

building

object segmentation

SVM

test image

buildingdogroad

image categorization




Support Vector Machine (SVM)• pyramid match kernel

in learned tree hierarchies

Segmentation Forest (SF)• second decision forest• features use layout & context• semantic context


STF

semantic textons(clustering)


SF

dog

road

building

object segmentation

SVM

test image

buildingdogroad







Segmentation Forest (SF)• second decision forest• features use layout & context

extract features

Textons & Visual Words

• Textons [Julesz 81]– computed densely e.g. references– clustered filter-bank responses [Malik 01] [Varma 05]– used for object recognition [Winn 05] [Shotton

07]• Visual words

– usually computed sparsely [Mikolacjzyk 04]– clustered descriptors [Lowe 04]– used for object recognition [Sivic 03] [Csurka 04]

localdescriptors

e.g. [Lowe 04]

filterbank clustering

k-means

assignment

nearest neighbourExpensive!

Semantic Texton Forests (STF)

• A STF is– a decision forest applied at each image pixel– simple pixel-based features

• How is this new?– no descriptors or filter-banks– decision forest

• fast clustering & assignment• local classification Very Fast

learned semantic information

i

Image Patch Features

p

i

f(p) learnedthreshold>

Pixel i gives patch p(21x21 pixels in experiments)

tree split function

?

Example Semantic Texton Forest

Input Image Ground Truth

A[r] + B[r] > 363A[b] > 98

A[g] - B[b] > 28

A[g] - B[b] > 13A[b] + B[b] > 284

|A[r] - B[b]| > 21|A[b] - B[g]| > 37

Exam

ple

Patc

hes

Leaf Node Visualization

• Average of all training patches at each leaf node

tree 1

tree 2tree 3

tree 4tree 5

STF Training Examples

• Supervised training

• Regular grid• Random transformations

– learn invariances [Lepetit et al. 06]

(GT colors categories)

Different Levels of Supervision

• STF can be trained with:– no supervision (just the images)

• clustering only – no local classification

– weak supervision (image labels)• trained as if all image labels at pixels

– full supervision (pixel labels)

treebenchgrass

Balancing the Training Set

• Datasets often unbalanced– poor average class accuracy

• Weight training examplesby inverse class frequency

Building

Grass

Tree

CowShee

p

SkyAir-plan

e

Water

FaceCarBike

Flower

SignBird BookChair

Road

CatDog

Body

Boat

Proportion of pixels by class(MSRC dataset)

Semantic Textons & Local Classification

test image

ground truth(for reference)

semantic textons(color leaf node index)

local classification(color most likely category)

comparable

Live Demo

MSRC Naïve Segmentation Baseline

• Use only local classification P(c|l) from STF

globalaccuracy

averageaccuracy

supervised 49.7% 34.5% weakly supervised 14.8% 24.1%

STF

Bags of Semantic Textons (BoSTs)semantic textons

(colors leaf node indices)local classification(colors categories)image

semantic texton histogram

freq

uenc

y

tree t1 tree tT

2 3 4 52 3 4 5depth:

prob

abili

ty

region prior

object category

region r

node index

all trees

split nodeleaf node

Choice of Regions for BoSTs

• Image categorization– region r = whole image

• Object segmentation– many image regions r

r1

i

r

r2 ir3

i ....

Other Clustering Methods

• Efficient codebooks [Jurie et al. 05]

• Hyper-grid clustering [Tuytelaars et al. 07]

• Hierarchical k-means [Nister & Stewénius 06]

• Discriminant embedding [Hua et al. 07]

• Randomized clustering forests [Moosmann et al. 06]– tree hierarchy not used– ignores classification of forest– uses expensive local descriptors


STF



SF

dog

road

building

object segmentation

SVM

test image

buildingdogroad








Image Categorization

• SVM with learned Pyramid Match Kernel (PMK)– descriptor space [Grauman et al.

05]– image location space [Lazebnik et al. 06]

• New PMK acts on semantic texton histogram– matches P and Q in learned hierarchical histogram space– deeper node matches are more important

norm. depth weight

increased similarity at depth d

Categorization Experiments on MSRC

• Learned PMK vs. radial basis function (RBF)Mean AP

RBF 49.9 learned PMK 76.3

Number of Trees T

Mea

n Av

erag

e Pr

ecis

ion

NB mean average precision tougher than EER or AuC


STF



SF

dog

road

building

object segmentation

SVM

test image

buildingdogroad








Segmentation Forest

• Object segmentation

• Adapt TextonBoost [Shotton et al. 06]– boosted classifier → randomized decision forest

textons → semantic textons + region priors

– no conditional random field

bicycle

road

building

Features in Segmentation Forest

ri

offset rectangle r

freq

uenc

y

node index

semantictexton bin

prob

abili

tyobject category

regionprior bin

or

bincount

learnedthreshold>

?tree split function

How the Features Work

• Rectangles pair with semantic textons can capture– appearance, layout, textural context [Shotton et al. 07]

i1 i2

i3

r2

t2i4

input image feature1 = (r1, t1)

feature2 = (r2, t2)semantic texton map

feature1 responses

feature2 response

r1 t1

Features in Segmentation Forest

• Learning the randomized forest– regular grid (10x10 pixels)– discriminative pairs of region r and BoST bin

• Region prior allows semantic context– “sheep tend to stand on grass”

• Efficient calculation– compute bins only as required– use integral images [Viola & Jones 04]– sub-sample integral images Live Demo

• Combine– image categorization (SVM with learned PMK)– object segmentation (decision forest)

Image-Level Prior (ILP)

imagecategorization

posterior

prior forsegmentation

SF ILP

weighting

See also [Verbeek et al. 07]

MSRC Segmentation Results

test image

SF

SF + ILP

bicycle flower sign bird book chair road cat dog bodybuilding grass tree cow sheep sky airplane water face car

boat

More MSRC Results

bicycle flower sign bird book chair road cat dog bodybuilding grass tree cow sheep sky airplane water face car

boat

MSRC Quantitative Comparison

• Pixel-wise segmentation accuracy (%)

• Computation time

Building

Grass

Tree

Cow

Sheep

Sky

Airplane

Water

Face

Car

Bicycle

Flower

Sign

Bird

Book

Chair

Road

Cat

Dog

Body

Boat

Global

Average

[Shotton 06] 62 98 86 58 50 83 60 53 74 63 75 63 35 19 92 15 86 54 19 62 7 71 58 [Verbeek 07] 52 87 68 73 84 94 88 73 70 68 74 89 33 19 78 34 89 46 49 54 31 - 64 SF 41 84 75 89 93 79 86 47 87 65 72 61 36 26 91 50 70 72 31 61 14 68 63 SF + ILP 49 88 79 97 97 78 82 54 87 74 72 74 36 24 93 51 78 75 35 66 18 72 67

red = winner

Training Time Test Time [Shotton 06] 2 days 30 sec / image [Verbeek 07] 1 hr 2 sec / image SF 2 hrs < 0.125 sec / image

• Pixel-wise segmentation accuracy (%)

• Computation time

MSRC Quantitative Comparison

Building

Grass

Tree

Cow

Sheep

Sky

Airplane

Water

Face

Car

Bicycle

Flower

Sign

Bird

Book

Chair

Road

Cat

Dog

Body

Boat

Global

Average

[Shotton 06] 62 98 86 58 50 83 60 53 74 63 75 63 35 19 92 15 86 54 19 62 7 71 58 [Verbeek 07] 52 87 68 73 84 94 88 73 70 68 74 89 33 19 78 34 89 46 49 54 31 - 64 SF 41 84 75 89 93 79 86 47 87 65 72 61 36 26 91 50 70 72 31 61 14 68 63 SF + ILP 49 88 79 97 97 78 82 54 87 74 72 74 36 24 93 51 78 75 35 66 18 72 67 [Tu 08] 69 96 87 78 80 95 83 67 84 70 79 47 61 30 80 45 78 68 52 67 27 78 69

red = winner

Training Time Test Time [Shotton 06] 2 days 30 sec / image [Verbeek 07] 1 hr 2 sec / image SF 2 hrs < 0.125 sec / image [Tu 08] a few days 30-70 sec / image

average accuracy only leaf node bins 64.1% all tree node bins 65.5% only region prior bins 66.1% full model 66.9% no transformations 64.4% unsupervised STF 64.2% weakly supervised STF 64.6%

MSRC Influence of Design Decisions

• MSRC dataset• Category average accuracy

VOC 2007 Segmentation Results

table

person

dog

chai

r

Average Accuracy [Brookes] 9 SF 20 SF + Image-Level Prior 24 [TKK] 30 SF + Detection-Level Prior 42

• Detection-Level Prior– [TKK] detection bounding boxes as segmentation prior

Driving Video Database

• [Brostow, Shotton, Fauqueur, Cipolla, ECCV 2008]– new structure-from-motion cues

can improve object segmentation

test image ground truth(!) STF + SF result

Semantic Texton Forests Summary

• Semantic texton forests– effective alternative to textons for recognition

• Image categorization improves segmentation– “image-level prior”– can use identical image features

• Memory– high memory requirements for training

• Efficiency– very fast on CPU

References (red = most relevant)• Amit & Geman

– Shape Quantization and Recognition with Randomized Trees.– Neural Computation 1997.

• Bosch et al.– Image Classification using Random Forests and Ferns.– ICCV 2007.

• Breiman– Random Forests.– Machine Learning Journal 2001.

• Brostow et al.– To appear ECCV 2008.

• Csurka et al.– Visual Categorization with Bags of Keypoints.– ECCV Workshop on Statistical Learning in Computer Vision, 2004.

• Geurts et al.– Extremely Randomized Trees.– Machine Learning 2006.

• Grauman & Darrel– The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features.– ICCV 2005.

• Hua et al.– Discriminant Embedding for Local Image Descriptors.– ICCV 2007.

• Jurie & Triggs– Creating Efficient Codebooks for Visual Recognition.– ICCV 2005.

• Lazebnik et al.– Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories.– CVPR 2006.

• Lepetit et al.– Keypoint Recognition using Randomized Trees.– PAMI 2006.

• Lowe– Distinctive image features from scale-invariant keypoints.– IJCV 2004.

• Malik et al.– Contour and Texture Analysis for Image Segmentation.– IJCV 2001.

• Mikolajczyk & Schmid– Scale and Affine invariant interest point detectors.– IJCV 2004.

• Moosmann et al.– Fast Discriminative Visual Codebooks using Randomized Clustering Forests.– NIPS 2006.

• Nister & Stewenius– Scalable Recognition with a Vocabulary Tree.– CVPR 2006.

• Özuysal et al.– Fast Keypoint Recognition in Ten Lines of Code.– CVPR 2007.

• Sharp– To appear ECCV 2008.

• Shotton et al.– Semantic Texton Forests for Image Categorization and Segmentation.– CVPR 2008.

• Shotton et al.– TextonBoost for Image Understanding: Multi-Class Object Recognition and

Segmentation by Jointly Modeling Texture, Layout, and Context.– IJCV 2007.

• Sivic & Zisserman– Video Google: A Text Retrieval Approach to Object Matching in Videos.– ICCV 2003.

• Torralba et al.– Sharing visual features for multiclass and multiview object detection.– PAMI 2007.

• Tu– Auto-context and Its application to High-level Vision Tasks.– CVPR 2008.

• Tuytelaars & Schmid– Vector Quantizing Feature Space with a Regular Lattice.– ICCV 2007.

• Varma & Zisserman– A statistical approach to texture classification from single images.– IJCV 2005.

• Verbeek & Triggs– Region Classification with Markov Field Aspect Models.– CVPR 2007.

• Viola & Jones– Robust Real-time Object Detection.– IJCV 2004.

• Winn et al.– Object Categorization by Learned Universal Visual Dictionary.– ICCV 2005.

Take Home Message

• Randomized decision forests are– very fast (GPU friendly [Sharp, ECCV 08])

– simple to implement– flexible tools for computer vision

• Ideas for more research– biasing the randomness– optimal fusion of trees from different modalities

• e.g. appearance, SfM, optical flow

Thank [email protected]

http://jamie.shotton.org/work/presentations/ICVSS2008.zip

Internships at MSRC available for next year.Talk to me or see:

http://research.microsoft.com/aboutmsr/jobs/internships/about_uk.aspx

Example Tree in Segmentation Forest

Maximum Depth 14

More Results

objectclasses

test image

no ILP

with ILP

sky mountain tree

road sidewalk buildinggrass rock

sand plant carsnow sign

waterperson

Effect of Color Space

MSRC Categorization Results

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

Categories 1-5

building grass tree cow sheep

Recall

Prec

isio

n

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

Categories 6-10

sky aeroplane water face car

RecallPr

ecis

ion

MSRC Categorization Results

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

Categories 16-21

chair road cat dog bodyboat

RecallPr

ecis

ion

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

Categories 11-15

bicycle flower sign bird bookRecall

Prec

isio

n

icvss2008: randomized decision forests

Documents

pyramid match

support vector

allocate memory

semantic texton

classification

learned tree

semantic texton

wise segmentation