imaged document text retrieval without ocr ieee trans. on pami vol.24, no.6 june, 2002...
TRANSCRIPT
Imaged Document Text Retrieval without OCR
IEEE Trans. on PAMI vol.24, no.6
June, 2002
報告人:周遵儒
Outline IntroductionHTD and VTDClass of Character ObjectsSimilarity Measure of DocumentsExperimental ResultsConclusions
IntroductionRetrieval of Imaged DocumentsProcess with OCR v.s. without OCRLanguage dependence v.s. language
independence
Procedure Image Preprocessing Feature extraction of character objects
Horizontal Traverse Density (HTD) Vertical Traverse Density (VTD)
Clustering To Identify classes of character objects
Document representation Hash Table
N-Gram To construct indexes for imaged document
retrieval
Features: HTD and VTD
Class of Character ObjectsUnsupervise Clustering with HTD and V
TDDistance measure of character objects
Distance Measure of Character Objects
Examples of Character Objects
Similarity Measure of Documents
N-Gram AlgorithmCosine angle between two documents
CorpusUW1 database (600 dpi)
Experimental Results
Corpus IE01-E26
Experimental ResultsCorpus II
Experimental Results
Experimental Results
Experimental Results
Experimental Results
Conclusion and Future WorkA new method for image document
retrieval without OCRRetrieval of language independence Improvement of robustness for different
fonts and noisy documents