datech2014-session1-document representation refinement for precise region description

Post on 22-Nov-2014

248 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides of the presentation of the paper Document Representation Refinement for Precise Region Description by Christian Clausner, Stefan Pletschacher and Apostolos Antonacopoulos. #digidays

TRANSCRIPT

Document Representation Refinement for Precise Region Description

Christian Clausner, Stefan Pletschacher and Apostolos Antonacopoulos

PRImA Lab, School of Computing, Science and Engineering, University of Salford,

United Kingdom

Document Page Regions

DATeCH 2014 2

Segmentation, Classification

• Region (block, zone): Connected area of a document image with content of a single specific type

• Examples: Text, graphic, table

Region Representation

• By geometric objects

– Bounding box

– Stack of rectangles

– Polygon

• By pixels

– Bitmap

– Run-length encoding

DATeCH 2014 3

Need for Precise Region Descriptions

• Precise description is crucial for all but the most trivial document analysis and recognition applications

• For performance evaluation: The loss of quality introduced by imprecise regions can be bigger than the variation of accuracy of the actual recognition method

DATeCH 2014 4

The Situation

• Trend to more precise descriptions, but…

• Output of state-of-the-art OCR systems:

– Stacks of rectangles (ABBYY FineReader Engine 11)

– Bounding boxes (Tesseract OCR 3.02)

• Popular formats for layout analysis and OCR results:

– ALTO XML (boxes, ellipses, polygons (region level only))

– FineReader XML (stacks of rectangles (region level only))

– PAGE XML (polygons for all levels)

– HOCR (boxes)

DATeCH 2014 5

Refinement through Polygonal Fitting

• Applicable to regions that have child objects in the document model

• A typical object hierarchy contains regions, text lines, words and glyphs (characters)

• Idea: Tightly wrap a polygon around the child objects

DATeCH 2014 6

Polygonal Fitting Approach

1. Create bitmasks for the child objects and transfer them to an empty bitmap

2. Fill the gaps between the child objects by a smearing approach

3. Optional: Exclude neighbour regions

4. Trace the contour of the foreground and create a polygon

DATeCH 2014 7

1 - Transferring Child Object to Bitmap

• Starting point: Polygonal object (e.g. text line, word, or glyph)

• Lossless conversion to rectangle based interval representation

• Transferring the rectangles to the target bitmap

DATeCH 2014 8

2 – Smearing Approach

• Goal: Connect all foreground components in the bitmap by filling the gaps in-between

1. Alternatingly fill horizontal and vertical gaps if they are smaller than a dynamic threshold (threshold is increased after each iteration)

2. If necessary, use diagonal smearing to connect remaining components

DATeCH 2014 9

3 – Subtraction of Neighbours

• Optional step to avoid overlap with adjacent regions

• Simply erase the corresponding pixels from the created bitmap

DATeCH 2014 10

4 – Outline Tracing

• Trace the contour of the foreground component in the created bitmap

• Create polygon on-the-fly by adding points for each change of direction (corner)

DATeCH 2014 11

Experiments

• Carried out on a dataset of contemporary documents consisting of scanned magazine and technical article pages

• Processed with Tesseract OCR 3.02 (open source)

• Exported to PAGE XML with and without refinement

DATeCH 2014 12

DATeCH 2014 13

Original (unrefined) Refined

Results

• Measurement of region overlaps (number and area)

DATeCH 2014 14

Overlapping Regions

Overlap Area (Megapixel)

Original Outlines

621 (45.8%) 19.9

Refined Outlines

286 (21.1%) 2.5

Impact on Performance Evaluation

• Real-world scenario

• Measure the performance of Tesseract OCR engine

• Evaluation metrics of previous ICDAR page segmentation competitions

DATeCH 2014 15

Average success rate using original outlines 81.1%

Average success rate using refined outlines 84.5%

Average improvement for all documents 3.4%

Maximum improvement 22.9%

Conclusion • Existing geometric region data can be significantly refined by fitting

precise polygons around child objects

• Validity and impact on real-world scenarios has been shown

• Refinement in performance evaluation helps to eliminate problems that arise from insufficient geometric descriptions → Concentrate on real issues of OCR methods

• Positive effect on accuracy of presentation/repurposing systems (highlighting, cropping, article tracking, etc.)

• Approach used in Aletheia ground truth editor and result viewer (primaresearch.org/tools)

DATeCH 2014 16

DATeCH 2014 17

top related