detecting and reading text in natural scenes · pdf fileintroduction classifiers boosting...
TRANSCRIPT
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
Detecting and Reading Text in Natural Scenes
Louka Dlagnekov
October 19, 2004
X. Chen, A. L. Yuille
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
OutlineGoalsExampleMain IdeasResults
Outline
Goals
Classifiers
Boosting
Optimizing
Binarization
Questions
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
OutlineGoalsExampleMain IdeasResults
Introduction
Given an image of an outdoor scene, goals are to:
Identify regions where there is text
Extract the text using OCR
Convey this information to a blind person
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
OutlineGoalsExampleMain IdeasResults
Example
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
OutlineGoalsExampleMain IdeasResults
Main Ideas
Build a strong classifier trained by AdaBoost
Apply classifier to sub-regions of the image
Binarize candidate text regions
Use OCR on binarized candidate text regions
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
OutlineGoalsExampleMain IdeasResults
Results
Reasonably fast – 3 seconds on 3MP image
2.8% flase negatives
10% false positives
93% accuracy of OCR
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
OverviewSelection of Features
General Idea
Find good features of regions containing text that sets themapart from other regions
Construct classifiers that classify using these features
Each classifier can be weak (slightly higher than 50%accuracy)
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
OverviewSelection of Features
How do we come up with text features in image? [1/3]
Text has few common features with faces
PCA Analysis leads to far more non-zero eigenvalues
1
1http://www.geop.ubc.ca/CDSST/eigenfaces.htmlLouka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
OverviewSelection of Features
How do we come up with text features in image? [2/3]
Examine x and y derivatives
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
OverviewSelection of Features
How do we come up with text features in image? [3/3]
Histogram of pixel intensities
Edge detection ⇒ intensity gradient thresholding ⇒ edgelinking
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
OverviewGist of AdaBoostAlgorithmProblems
What is boosting?
Start with an example:
junk email classification
Basic idea:
finding many rough rules of thumb is easier than finding singlehighly accurate rule
History on boosting
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
OverviewGist of AdaBoostAlgorithmProblems
Gist of AdaBoost
AdaBoost algorithm is used to combine weak classifiers tomake a strong classifier
Each weak classifier produces a yes/no answer for a particularfeature
Need to come up with lots of features and let AdaBoostdecide how to combine them
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
OverviewGist of AdaBoostAlgorithmProblems
Algorithm
Given a set of examples (x1, y1), ..., (xN , yN), where xi ∈ X , and yi ∈ Y = {−1, +1}:1 Initialize D1(i) = 1/N2 For t = 1, ..., T :
1 Train weak classifier using distribution Dt
2 Obtain weak hypothesis ht : X 7→ {−1,+1} with error εt
3 Choose αt = 12 ln
(1−εt
εt
)4 Update distribution:
Dt+1(i) =Dt(i)
Zt×
{e−αt if ht(xi ) = yi
eαt if ht(xi ) 6= yi
=Dt(i)e
−αtyiht(xi )
Zt
3 Output final hypothesis: H(x) = sign
TX
t=1
αtht(x)
!
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
OverviewGist of AdaBoostAlgorithmProblems
Error
Zt =∑
i Dt(i)e−yiht(xi )
Upper bound on training error of strong classifier is∏
t Zt .
Minimizing∏
t Zt is also equivalent to minimizing overallclassification error
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
OverviewGist of AdaBoostAlgorithmProblems
Problems with AdaBoost
Minimizes classification error - not number of false negatives.
To ”fix”, Viola and Jones propose modifying distribution Dt –give more weight to positive examples.
They call this modification Asymmetric AdaBoost.
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
Cascade ClassificationIntegral Images
Cascade Classification
1 2 3T T FurtherProcessing
Sub-Window
Reject sub-window
T
F F F
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
Cascade ClassificationIntegral Images
Integral Images
First 3 layers of Chen and Yuille cascade use only mean, STD,and derivative features
These are easily calculated from integral images
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
Cascade ClassificationIntegral Images
Integral Images
A
C
B
D
1
4
2
3
Figure 3: The sum of the pixels within rectangle � can be computed with four array references. The valueof the integral image at location 1 is the sum of the pixels in rectangle . The value at location 2 is �,at location 3 is � � , and at location 4 is � � � � �. The sum within � can be computed as� � �� �� � ��.
sparse (or can be made so). A similar insight is that an invertible linear operation can be applied to � if its
inverse is applied to �: �� ����
�� ��
�� � � ��
Viewed in this framework computation of the rectangle sum can be expressed as a dot product, � ��, where
� is the image and � is the box car image (with value 1 within the rectangle of interest and 0 outside). This
operation can be rewritten
� � � � �
� ��� � ����
The integral image is in fact the double integral of the image (first along rows and then along columns). The
second derivative of the rectangle (first in row and then in column) yields four delta functions at the corners
of the rectangle. Evaluation of the second dot product is accomplished with four array accesses.
2.2 Feature Discussion
Rectangle features are somewhat primitive when compared with alternatives such as steerable filters [5, 7].
Steerable filters, and their relatives, are excellent for the detailed analysis of boundaries, image compression,
and texture analysis. In contrast rectangle features, while sensitive to the presence of edges, bars, and
other simple image structure, are quite coarse. Unlike steerable filters the only orientations available are
vertical and horizontal. It appears as though the set of rectangle features do however provide a rich image
representation which supports effective learning. The extreme computational efficiency of rectangle features
provides ample compensation for their limited flexibility.
6
D = I4 + I1 − (I2 + I3)
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
Binarization
Use Niblack’s adaptive binarization method
Tr (x) = µr (x) + kσr (x)
with a modification:
r(x) = minr (σr (x) > Tσ)
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
Questions [1/3]
Is AdaBoost the best thing out there for detecting objects?Does it not depend on features used?
��������������
Detector
False detections
10 31 50 65 78 95 110 167 422
Viola-Jones 78.3% 85.2% 88.8% 89.8% 90.1% 90.8% 91.1% 91.8% 93.7%Rowley-Baluja-Kanade 83.2% 86.0% - - - 89.2% - 90.1% 89.9%Schneiderman-Kanade - - - 94.4% - - - - -Roth-Yang-Ahuja - - - - (94.8%) - - - -
Table 3: Detection rates for various numbers of false positives on the MIT+CMU test set containing 130images and 507 faces.
The Sung and Poggio face detector [18] was tested on the MIT subset of the MIT+CMU test set since
the CMU portion did not exist yet. The MIT test set contains 23 images with 149 faces. They achieved a
detection rate of 79.9% with 5 false positives. Our detection rate with 5 false positives is 77.8% on the MIT
test set.
Figure 10 shows the output of our face detector on some test images from the MIT+CMU test set.
0 200 400 600 800 1000 12000.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
corr
ect d
etec
tion
rate
false positives
ROC curves for face detector
step=1.0, first scale=1.0 step=1.5, first scale=1.25
Figure 9: ROC curves for our face detector on the MIT+CMU test set. The detector was run once using astep size of 1.0 and starting scale of 1.0 (75,081,800 sub-windows scanned) and then again using a step sizeof 1.5 and starting scale of 1.25 (18,901,947 sub-windows scanned). In both cases a scale factor of 1.25 wasused.
21
2
2Viola & Jones. Robust Real-time Object Detection.Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
Questions [2/3]
Sensitivity of the algorithm to the input data and parameters?
Modulus of derivative feature?!? What is that?
Louka Dlagnekov Detecting and Reading Text in Natural Scenes
IntroductionClassifiersBoosting
OptimizingBinarization
Questions
Questions [3/3]
Does the order of weak classifiers used matter in AdaBoost?
What about perspective distortion?
Louka Dlagnekov Detecting and Reading Text in Natural Scenes