imaged document text retrieval without ocr ieee trans. on pami vol.24, no.6 june, 2002...

Post on 13-Dec-2015

227 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Imaged Document Text Retrieval without OCR

IEEE Trans. on PAMI vol.24, no.6

June, 2002

報告人:周遵儒

Outline IntroductionHTD and VTDClass of Character ObjectsSimilarity Measure of DocumentsExperimental ResultsConclusions

IntroductionRetrieval of Imaged DocumentsProcess with OCR v.s. without OCRLanguage dependence v.s. language

independence

Procedure Image Preprocessing Feature extraction of character objects

Horizontal Traverse Density (HTD) Vertical Traverse Density (VTD)

Clustering To Identify classes of character objects

Document representation Hash Table

N-Gram To construct indexes for imaged document

retrieval

Features: HTD and VTD

Class of Character ObjectsUnsupervise Clustering with HTD and V

TDDistance measure of character objects

Distance Measure of Character Objects

Examples of Character Objects

Similarity Measure of Documents

N-Gram AlgorithmCosine angle between two documents

CorpusUW1 database (600 dpi)

Experimental Results

Corpus IE01-E26

Experimental ResultsCorpus II

Experimental Results

Experimental Results

Experimental Results

Experimental Results

Conclusion and Future WorkA new method for image document

retrieval without OCRRetrieval of language independence Improvement of robustness for different

fonts and noisy documents

top related