imaged document text retrieval without ocr ieee trans. on pami vol.24, no.6 june, 2002...

17
Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.2 4, no.6 June, 2002 報報報 報報報

Upload: steven-ryan

Post on 13-Dec-2015

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

Imaged Document Text Retrieval without OCR

IEEE Trans. on PAMI vol.24, no.6

June, 2002

報告人:周遵儒

Page 2: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

Outline IntroductionHTD and VTDClass of Character ObjectsSimilarity Measure of DocumentsExperimental ResultsConclusions

Page 3: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

IntroductionRetrieval of Imaged DocumentsProcess with OCR v.s. without OCRLanguage dependence v.s. language

independence

Page 4: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

Procedure Image Preprocessing Feature extraction of character objects

Horizontal Traverse Density (HTD) Vertical Traverse Density (VTD)

Clustering To Identify classes of character objects

Document representation Hash Table

N-Gram To construct indexes for imaged document

retrieval

Page 5: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

Features: HTD and VTD

Page 6: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

Class of Character ObjectsUnsupervise Clustering with HTD and V

TDDistance measure of character objects

Page 7: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

Distance Measure of Character Objects

Page 8: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

Examples of Character Objects

Page 9: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

Similarity Measure of Documents

N-Gram AlgorithmCosine angle between two documents

Page 10: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

CorpusUW1 database (600 dpi)

Page 11: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

Experimental Results

Corpus IE01-E26

Page 12: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

Experimental ResultsCorpus II

Page 13: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

Experimental Results

Page 14: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

Experimental Results

Page 15: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

Experimental Results

Page 16: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

Experimental Results

Page 17: Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

Conclusion and Future WorkA new method for image document

retrieval without OCRRetrieval of language independence Improvement of robustness for different

fonts and noisy documents