detecting and reading text in natural scenes · pdf fileintroduction classiﬁers boosting...

IntroductionClassifiersBoosting

OptimizingBinarization

Questions

Detecting and Reading Text in Natural Scenes

Louka Dlagnekov

October 19, 2004

X. Chen, A. L. Yuille

Louka Dlagnekov Detecting and Reading Text in Natural Scenes



Questions

OutlineGoalsExampleMain IdeasResults

Outline

Goals

Classifiers

Boosting

Optimizing

Binarization

Questions




Questions


Introduction

Given an image of an outdoor scene, goals are to:

Identify regions where there is text

Extract the text using OCR

Convey this information to a blind person




Questions


Example




Questions


Main Ideas

Build a strong classifier trained by AdaBoost

Apply classifier to sub-regions of the image

Binarize candidate text regions

Use OCR on binarized candidate text regions




Questions


Results

Reasonably fast – 3 seconds on 3MP image

2.8% flase negatives

10% false positives

93% accuracy of OCR




Questions

OverviewSelection of Features

General Idea

Find good features of regions containing text that sets themapart from other regions

Construct classifiers that classify using these features

Each classifier can be weak (slightly higher than 50%accuracy)




Questions


How do we come up with text features in image? [1/3]

Text has few common features with faces

PCA Analysis leads to far more non-zero eigenvalues

1

1http://www.geop.ubc.ca/CDSST/eigenfaces.htmlLouka Dlagnekov Detecting and Reading Text in Natural Scenes



Questions



Examine x and y derivatives




Questions



Histogram of pixel intensities

Edge detection ⇒ intensity gradient thresholding ⇒ edgelinking




Questions

OverviewGist of AdaBoostAlgorithmProblems

What is boosting?

Start with an example:

junk email classification

Basic idea:

finding many rough rules of thumb is easier than finding singlehighly accurate rule

History on boosting




Questions


Gist of AdaBoost

AdaBoost algorithm is used to combine weak classifiers tomake a strong classifier

Each weak classifier produces a yes/no answer for a particularfeature

Need to come up with lots of features and let AdaBoostdecide how to combine them




Questions


Algorithm

Given a set of examples (x1, y1), ..., (xN , yN), where xi ∈ X , and yi ∈ Y = {−1, +1}:1 Initialize D1(i) = 1/N2 For t = 1, ..., T :

1 Train weak classifier using distribution Dt

2 Obtain weak hypothesis ht : X 7→ {−1,+1} with error εt

3 Choose αt = 12 ln

(1−εt

εt

)4 Update distribution:

Dt+1(i) =Dt(i)

Zt×

{e−αt if ht(xi ) = yi

eαt if ht(xi ) 6= yi

=Dt(i)e

−αtyiht(xi )

Zt

3 Output final hypothesis: H(x) = sign

TX

t=1

αtht(x)

!




Questions


Error

Zt =∑

i Dt(i)e−yiht(xi )

Upper bound on training error of strong classifier is∏

t Zt .

Minimizing∏

t Zt is also equivalent to minimizing overallclassification error




Questions


Problems with AdaBoost

Minimizes classification error - not number of false negatives.

To ”fix”, Viola and Jones propose modifying distribution Dt –give more weight to positive examples.

They call this modification Asymmetric AdaBoost.




Questions

Cascade ClassificationIntegral Images

Cascade Classification

1 2 3T T FurtherProcessing

Sub-Window

Reject sub-window

T

F F F




Questions


Integral Images

First 3 layers of Chen and Yuille cascade use only mean, STD,and derivative features

These are easily calculated from integral images




Questions


Integral Images

A

C

B

D

1

4

2

3

Figure 3: The sum of the pixels within rectangle � can be computed with four array references. The valueof the integral image at location 1 is the sum of the pixels in rectangle . The value at location 2 is �,at location 3 is � � , and at location 4 is � � � � �. The sum within � can be computed as� � �� .

sparse (or can be made so). A similar insight is that an invertible linear operation can be applied to � if its

inverse is applied to �: ��

��

��

Viewed in this framework computation of the rectangle sum can be expressed as a dot product, � ��, where

� is the image and � is the box car image (with value 1 within the rectangle of interest and 0 outside). This

operation can be rewritten

� � � � �

� ��

The integral image is in fact the double integral of the image (first along rows and then along columns). The

second derivative of the rectangle (first in row and then in column) yields four delta functions at the corners

of the rectangle. Evaluation of the second dot product is accomplished with four array accesses.

2.2 Feature Discussion

Rectangle features are somewhat primitive when compared with alternatives such as steerable filters [5, 7].

Steerable filters, and their relatives, are excellent for the detailed analysis of boundaries, image compression,

and texture analysis. In contrast rectangle features, while sensitive to the presence of edges, bars, and

other simple image structure, are quite coarse. Unlike steerable filters the only orientations available are

vertical and horizontal. It appears as though the set of rectangle features do however provide a rich image

representation which supports effective learning. The extreme computational efficiency of rectangle features

provides ample compensation for their limited flexibility.

6

D = I4 + I1 − (I2 + I3)




Questions

Binarization

Use Niblack’s adaptive binarization method

Tr (x) = µr (x) + kσr (x)

with a modification:

r(x) = minr (σr (x) > Tσ)




Questions

Questions [1/3]

Is AdaBoost the best thing out there for detecting objects?Does it not depend on features used?

��

Detector

False detections

10 31 50 65 78 95 110 167 422

Viola-Jones 78.3% 85.2% 88.8% 89.8% 90.1% 90.8% 91.1% 91.8% 93.7%Rowley-Baluja-Kanade 83.2% 86.0% - - - 89.2% - 90.1% 89.9%Schneiderman-Kanade - - - 94.4% - - - - -Roth-Yang-Ahuja - - - - (94.8%) - - - -

Table 3: Detection rates for various numbers of false positives on the MIT+CMU test set containing 130images and 507 faces.

The Sung and Poggio face detector [18] was tested on the MIT subset of the MIT+CMU test set since

the CMU portion did not exist yet. The MIT test set contains 23 images with 149 faces. They achieved a

detection rate of 79.9% with 5 false positives. Our detection rate with 5 false positives is 77.8% on the MIT

test set.

Figure 10 shows the output of our face detector on some test images from the MIT+CMU test set.

0 200 400 600 800 1000 12000.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

corr

ect d

etec

tion

rate

false positives

ROC curves for face detector

step=1.0, first scale=1.0 step=1.5, first scale=1.25

Figure 9: ROC curves for our face detector on the MIT+CMU test set. The detector was run once using astep size of 1.0 and starting scale of 1.0 (75,081,800 sub-windows scanned) and then again using a step sizeof 1.5 and starting scale of 1.25 (18,901,947 sub-windows scanned). In both cases a scale factor of 1.25 wasused.

21

2

2Viola & Jones. Robust Real-time Object Detection.Louka Dlagnekov Detecting and Reading Text in Natural Scenes



Questions

Questions [2/3]

Sensitivity of the algorithm to the input data and parameters?

Modulus of derivative feature?!? What is that?




Questions

Questions [3/3]

Does the order of weak classifiers used matter in AdaBoost?

What about perspective distortion?


detecting and reading text in natural scenes · pdf fileintroduction classiﬁers boosting...

Documents