detection and extraction of artificial text for semantic indexing

25
1/ Detection and Extraction of Artificial Text for Semantic Indexing Laboratoire Reconnaissance de Formes et Vision Bât. Jules Verne, INSA de Lyon 69621 Villeurbanne cedex, France January 9 th 2002 tuhl Seminar on Content-Based Image and Video Retri Christian Wolf and Jean-Michel Jolion http://rfv.insa-lyon.fr/~wolf/presentations This presentation can be downloaded from:

Upload: richard-rivers

Post on 01-Jan-2016

27 views

Category:

Documents


5 download

DESCRIPTION

Detection and Extraction of Artificial Text for Semantic Indexing. Christian Wolf and Jean-Michel Jolion. Laboratoire Reconnaissance de Formes et Vision Bât. Jules Verne, INSA de Lyon 69621 Villeurbanne cedex, France. January 9 th 2002 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Detection and Extraction of Artificial Text for Semantic Indexing

1/25

Detection and Extraction of Artificial Text for Semantic Indexing

Laboratoire Reconnaissance de Formes et VisionBât. Jules Verne, INSA de Lyon

69621 Villeurbanne cedex, France

January 9th 2002Dagstuhl Seminar on Content-Based Image and Video Retrieval

Christian Wolf and Jean-Michel Jolion

http://rfv.insa-lyon.fr/~wolf/presentations

This presentation can be downloaded from:

Page 2: Detection and Extraction of Artificial Text for Semantic Indexing

2/25

Plan of the presentationIntroductionDetection and trackingEnhancement and binarization of the text

boxesExperiments and resultsOpen problemsConclusion and Outlook

634

291

25

Slides:

This work resulted in a patent submitted by France Télécom on May 23th, 2001 under the reference FR 01 06776.

Enh/Binarization Exp.Results Open problems ConclusionIntroduction Detection

Page 3: Detection and Extraction of Artificial Text for Semantic Indexing

3/25

Content based image retrieval

SimilarityFunction

ResultExample image

Indexing phase

Enh/Binarization Exp.Results Open problems ConclusionDetectionIntroduction

Page 4: Detection and Extraction of Artificial Text for Semantic Indexing

4/25

Similarity measures

similar

similar

Not similar

Enh/Binarization Exp.Results Open problems ConclusionDetectionIntroduction

Page 5: Detection and Extraction of Artificial Text for Semantic Indexing

5/25

Indexing using Text

Keyword basedSearch

Patrick Mayhew

Patrick MayhewMin. chargé de l´irlande de NordISRAELJerusalemmontageT.Nouel...............

ResultKey word

Indexing phase

Enh/Binarization Exp.Results Open problems ConclusionDetectionIntroduction

Page 6: Detection and Extraction of Artificial Text for Semantic Indexing

6/25

Video properties

80 px

12 px 8 px

Enh/Binarization Exp.Results Open problems ConclusionDetectionIntroduction

Page 7: Detection and Extraction of Artificial Text for Semantic Indexing

7/25

Text extraction: general scheme

TrackingDetection of the text in single frames

Image enhancement - Multiple frame integration

Segmentation/Binarisation

OCR

"EVENEMENT""ACTU""SPELEOS""Gouffre Berger (Isére)""aujourd'hui""France 3 Alpes""un spéléologue sauveteur"

Video

Enh/Binarization Exp.Results Open problems ConclusionIntroduction Detection

Page 8: Detection and Extraction of Artificial Text for Semantic Indexing

8/25Text detection by accumulation of horizontal gradients (LeBourgeois, 1997).

Justification: Text forms a regular texture containing vertical edges which are aligned horizontally.

Post processing by mathematical morphology.

Enh/Binarization Exp.Results Open problems ConclusionIntroduction Detection

Page 9: Detection and Extraction of Artificial Text for Semantic Indexing

9/25

Detection in video sequences

Detection per single frame

List of rectanglesper frame

Tracking -keeping track of text occurrences

Suppression offalse alarms

Image Enhancement -Multiple frame integration

Text occurrences

Frame nr.(time)

Enh/Binarization Exp.Results Open problems ConclusionIntroduction Detection

Page 10: Detection and Extraction of Artificial Text for Semantic Indexing

10/25

Image enhancementSuper-resolution(interpolation)

Multiple frame integration:Averaging

Integration of multiple frames to create a single image of higher quality.

M1

M4

M2

M3

An additional weight is included into the interpolation scheme, which decreases the weights of temporal outlier pixels.

Exp.Results Open problems ConclusionIntroduction Detection Enh/Binarization

Page 11: Detection and Extraction of Artificial Text for Semantic Indexing

11/25

Binarization

))1.(1.( Rs

kmT

skmT .

)(: max FL CCaCI

)()1( MmRs

aaMmaT

Niblack:

Sauvola et al.:

m mean of the windows standard deviation of the

windowk parameterR dynamics of the gray

values of the image

s

ImCL

Contrast in the center of the image

s

MmC

max

The maximum local contrast

RMm

CF

The contrast of the window

M minimum gray value of the image

Exp.Results Open problems ConclusionIntroduction Detection Enh/Binarization

Page 12: Detection and Extraction of Artificial Text for Semantic Indexing

12/25

Binarization methods: examples

Original image

Fisher

Fisher (windowed)

Yanowitz B.

Niblack

Sauvola et al.

Our method

Exp.Results Open problems ConclusionIntroduction Detection Enh/Binarization

Page 13: Detection and Extraction of Artificial Text for Semantic Indexing

13/25

Binarization using a priori knowledgeBayesian MAP estimation using prior knowledge on the spatial relationships in the image, modeled as a Markov random field.

Exp.Results Open problems ConclusionIntroduction Detection Enh/Binarization

(In collaboration with David Doermann from the Language and Media Processing Laboratory of the University of Maryland)

Page 14: Detection and Extraction of Artificial Text for Semantic Indexing

14/255 different MPEG 1 videos of resolution 384x288.

62 minutes93000 frames413 text appearances

Enh/Binarization Open problems ConclusionIntroduction Detection Exp.Results

Page 15: Detection and Extraction of Artificial Text for Semantic Indexing

15/25

Detection and OCR results

DETECTION %Pred. Text 301 93,5Pred. Non-Text 21Total 322

Positives 350False alarms 947Logos 75Scene text 72Pos+Log+Scene 497 34,4Total 1444

Detection results

Input Bin. method Recall Precision CostAIM2 Niblack 67,4 87,5 499

Sauvola R=128 53,8 87,6 616,5R=ad 75,0 87,8 384,5R=ad, shift 78,4 90,4 344,5

AIM3 Niblack 92,5 78,1 196Sauvola R=128 69,9 89,6 206R=ad 85,3 92,5 110R=ad, shift 96,2 95,3 51,00

AIM4 Niblack 78,5 92,0 252,00Sauvola R=128 48,6 87,7 490,50R=ad 69,8 84,8 360,50R=ad, shift 80,1 90,4 211,50

AIM5 Niblack 62,1 71,4 501,50Sauvola R=128 66,7 89,3 324,50R=ad 64,8 90,1 328,00R=ad, shift 69,0 91,0 294,50

Total Niblack 73,1 82,6 1448,5Sauvola R=128 58,4 88,5 1637,5R=ad 73,0 88,4 1183R=ad, shift 79,6 91,5 901,5

OCR Results, classified by binarization method

Enh/Binarization Open problems ConclusionIntroduction Detection Exp.Results

True pos.

False pos.

True neg.

False neg.

Page 16: Detection and Extraction of Artificial Text for Semantic Indexing

16/25

Open questions Scene text (general orientations, deformations) Moving text

Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems

Page 17: Detection and Extraction of Artificial Text for Semantic Indexing

17/25

What is scene text?

Video frames

Frames containingscene text

We do not have enough information about the importance of text in the destination domain. How many frames do contain text and scene text?

Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems

Frames containingartificial text

Page 18: Detection and Extraction of Artificial Text for Semantic Indexing

18/25

Detection:From artificial text to scene text

Several constraints have to be removed passing from artificial text to scene text:

The constraints on temporal stability need to be abandoned or at least softened (no initial frame integration)

Text can be aligned in all orientations (Creation of an oriented feature in multiple directions, similar to invariant features)

Contrast is possibly lower because scene text is not designed to be read easily (Is detection of unreadable text necessary?).

Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems

Page 19: Detection and Extraction of Artificial Text for Semantic Indexing

19/25

Text models

Simple Modelssets of edges or vertical strokes...

Complex Modelstemplates, probabilistic models (MRF)...

+Generalize well, respond to many kinds of text

- Many false alarms

+Powerful less false alarms

- Do not generalize well

Assumptions are necessary (on the font, size, style, contrast, color, length, etc.) but not sufficient.

Main problem: Distinction between characters and structures similar to text according to the chosen model.

Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems

Page 20: Detection and Extraction of Artificial Text for Semantic Indexing

20/25

Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems

Sven Dickinson: evolution of models

Page 21: Detection and Extraction of Artificial Text for Semantic Indexing

21/25

What is text?Whatever model we choose, we cannot detect/recognize all kinds of text without solving the general image understanding problem. The best thing we can do is to include richer features into the detection process: a composite model for text.

Structural analysis (e.g. detection and recognition of characters by strokes). Very hard and very unlikely to work in the case of noisy images, low resolutions and difficult fonts.

Statistical modeling of text features (e.g. by learning techniques). Problem: For a robust detection high neighborhood sizes are needed, which lead to combinatorial explosions.

E.g.: Texture based methods for small text and segmentation + perceptual grouping, structural methods for big text.

Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems

Page 22: Detection and Extraction of Artificial Text for Semantic Indexing

22/25

Learning techniques: pro et contra

Bibliography:

Learning directly the gray levels of the input image (Jung 2001)

Learning features, i.e. coefficients of the Haar wavelet (Li and Doermann 2000) or edge strength (Lienhart 2000)

+ Learning is an easy way to handle the complexity of text.

- Text can appear in videos in many different fonts, sizes, styles, colors, orientations etc. Learning all different forms is maybe not feasible.

Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems

Page 23: Detection and Extraction of Artificial Text for Semantic Indexing

23/25

Color processing for detection?

Original image Sobel on grayscale image Sobel on L*u*v* image

101

202

101

1

2

1

),( 0,10,1 xxeuclid IID

Saturating distance or non saturating distance?Reflection processing?

101

202

101

Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems

Page 24: Detection and Extraction of Artificial Text for Semantic Indexing

24/25

Tracking of moving scene textDo we detect the text in single frames (like artificial text), or do we treat the flow in its integrality?

Single frames: Multiple frame integration of moving text needs robust registration of the text boxes in different frames (e.g. rough segmentation into text and background pixels before the registration of the text pixels only) . Robust methods, which are able to track objects in clutter, are needed.

Detection of moving objects, e.g. by optical flow, spatio-temporal methods.

Mosaicing techniques can be employed for image enhancement.

Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems

Page 25: Detection and Extraction of Artificial Text for Semantic Indexing

25/25

Conclusion and Outlook We developed a system for detection, tracking,

enhancement and binarization of artificial text in videos.

The total recognition rate for artificial text is surprisingly high, given the quality of the text, but not yet good enough for indexing purposes.

The remaining problems in text extraction seem to be typical for applications in visual information management: We went as far as we could with low level features. We can’t do the necessary step to semantic information. What is text? Possible definition: text is, what (a human or an OCR) can recognize as text.

We have to include as much a priori knowledge as possible into the process.

Enh/Binarization Exp.Results Open problems ConclusionIntroduction Detection