ivan laptev irisa/inria, rennes, france december 08, 2006 boosted histograms for improved object...
TRANSCRIPT
Ivan Laptev
IRISA/INRIA, Rennes, France
December 08, 2006
Boosted HistogramsBoosted Histogramsfor for
Improved Object DetectionImproved Object Detection
• [Swain & Ballard 1991] - Color histograms
• [Schiele & Crowley 1996] - Receptive field histograms
• [Lowe 1999] - localized orientation histograms (SIFT)
• [Schneiderman & Kanade 2000] - localized histograms of wavelet coef.
• [Leung & Malik 2001] - Texton histograms
• [Belongie et.al. 2002] - Shape context
• [Dalal & Triggs 2005] - Dense orientation histograms
Remarkable success of recognition methods using histograms of local image measurements:
Likely explanation: Histograms are robust to image variations such as limited geometric transformations and object class variability.
Histograms for object recognitionHistograms for object recognition
Histograms
What to measure?
• No guarantee for optimal recognition • Different regions may have different discriminative power
Color
[SB91]
Gaussian derivatives
[SC96]
Wavelet coeff.
[SK00]
Textons
[LM01]
Gradient orientation
[L99,DT05]
Where to measure?
AB
C
DAB
C
D
Whole image
[SB91,SC96]
Pre-defined grid
[SK00,BMP02,DT05]
Key points
[L99]
Histograms: What vs. WhereHistograms: What vs. Where
• Efficient discriminative classifier [Freund&Schapire’97]• Good performance for face detection [Viola&Jones’01]
IdeaIdea
boosting
selected features
weak classifier
AdaBoost:
Haar features
Histogram features Fisher discriminant
optimal threshold
SVM, ANN
1-bin classifier
Histogram featuresHistogram features
~10^5 rectangle features
Histograms over 4 gradient orientations, 4 subdivisions for each rectangle
Training dataTraining data
Crop and resize
• Perturb annotation
• Increase training set X 10
+
Training: Selected FeaturesTraining: Selected Features
376 of ~10^5 features selected 0.999 correct classification10^-5 false positives
• Scan and classify image windows at different positions and scales
• Cluster detections in the space-scale space• Assign cluster size to the detection confidence
Conf.=5
Object detectionObject detection
motorbikes
bicycles
people
cars
#217 / #220
#123 / #123
#152 / #149
#320 / #341
PASCAL Visual Object ClassesPASCAL Visual Object ClassesChallenge 2005 (VOC’05)Challenge 2005 (VOC’05)
Ground truth annotation
Detection results:• >50 % overlap of bounding box with GT•one bounding box for each object• confidence value for each detection
Precision-Recall (PR) curve:
Average Precision (AP) value:
Evaluation criteriaEvaluation criteria
Detection results:• >50 % overlap of bounding box with GT•one bounding box for each object• confidence value for each detection
Detection results:• >50 % overlap of bounding box with GT•one bounding box for each object• confidence value for each detection
Detection results:• >50 % overlap of bounding box with GT•one bounding box for each object• confidence value for each detection
PR-curves for the “Motorbike” validation dataset:
[Levi and Weiss, CVPR 2004] “Learning object detection from a small number of examples: The importance of good features”
Evaluation of detectionEvaluation of detection
FLD learner
+ 1-bin classifier
Bicycles test1 People test1
cars test1Motorbikes test1
Results for VOC’05 ChallengeResults for VOC’05 Challenge
Average Precision values:
Results for VOC’05 ChallengeResults for VOC’05 Challenge
PASCAL Visual Object ClassesPASCAL Visual Object ClassesChallenge 2006 (VOC’06)Challenge 2006 (VOC’06)
examples
Results for VOC’06 ChallengeResults for VOC’06 Challenge
Competition "comp3" (train on VOC data) Class “bicycle"
examples
Results for VOC’06 ChallengeResults for VOC’06 Challenge
Competition "comp3" (train on VOC data) Class “cow"
examples
Results for VOC’06 ChallengeResults for VOC’06 Challenge
Competition "comp3" (train on VOC data) Class “horse"
Results for VOC’06 ChallengeResults for VOC’06 Challenge
Competition "comp3" (train on VOC data) Class “motorbike"
Results for VOC’06 ChallengeResults for VOC’06 Challenge
Competition "comp3" (train on VOC data) Class “person"
bicycle bus car cat cow dog horse motorbike person sheep
Cambridge 0.249 0.138 0.254 0.151 0.149 0.118 0.091 0.178 0.030 0.131
ENSMP - - 0.398 - 0.159 - - - - -
INRIA_Douze 0.414 0.117 0.444 - 0.212 - - 0.390 0.164 0.251
INRIA_Laptev 0.440 - - - 0.224 - 0.140 0.318 0.114 -
TUD - - - - - - - 0.153 0.074 -
TKK 0.303 0.169 0.222 0.160 0.252 0.113 0.137 0.265 0.039 0.227
Average Precision values:
Results for VOC’06 ChallengeResults for VOC’06 Challenge
How “interesting” are boosted regions?How “interesting” are boosted regions?
Harris value is not a significant measure for boosted regions
Affine Harris regions
Boosted regions
“Interest regions” are not necessarily good for recognition?
• All results are obtained with a single set of parameters
• Small number of training samples is sufficient
• Efficient detection: 10fps on 320x280 images
• Extension to texton/color histogram features is straightforward
Open questions:
• Other free-shape regions better? How to find them?
• Better weak learner that takes advantage of histogram properties
• View transformations
Final NotesFinal Notes
• All results are obtained with a single set of parameters
• Small number of training samples is sufficient
• Efficient detection: 10fps on 320x280 images
• Extension to texton/color histogram features is straightforward
Open questions:
• Other free-shape regions better? How to find them?
• Better weak learner that takes advantage of histogram properties
• View transformations
Final NotesFinal Notes
• All results are obtained with a single set of parameters
• Small number of training samples is sufficient
• Efficient detection: 10fps on 320x280 images
• Extension to texton/color histogram features is straightforward
Open questions:
• Other free-shape regions better? How to find them?
• Better weak learner that takes advantage of histogram properties
• View transformations
Final NotesFinal Notes
• All results are obtained with a single set of parameters
• Small number of training samples is sufficient
• Efficient detection: 10fps on 320x280 images
• Extension to texton/color histogram features is straightforward
Open questions:
• Other free-shape regions better? How to find them?
• Better weak learner that takes advantage of histogram properties
• View transformations
Final NotesFinal Notes
• All results are obtained with a single set of parameters
• Small number of training samples is sufficient
• Efficient detection: 10fps on 320x280 images
• Extension to texton/color histogram features is straightforward
Open questions:
• Other free-shape regions better? How to find them?
• Better weak learner that takes advantage of histogram properties
• View transformations
Final NotesFinal Notes