advances in arabic and latin handwriting recognition using … · advances in arabic and latin...
Post on 02-Apr-2018
235 Views
Preview:
TRANSCRIPT
Advances in Arabic and Latin Handwriting Recognitionusing the RWTH-OCR System
Philippe Dreuw, Stephan Jonas, Georg Heigold,David Rybach, Christian Gollan, Hermann Ney
dreuw@cs.rwth-aachen.de
Séminaire DGA:Traitement de la parole, du langage et des documents multimédias – July 2009
Human Language Technology and Pattern RecognitionLehrstuhl für Informatik 6
Computer Science DepartmentRWTH Aachen University, Germany
Dreuw et. al.: Arabic and Latin Handwriting Recognition 1 / 17 Seminaire DGA July 2009
Outline
1. Introduction
2. Adaptation of the RWTH-ASR framework for Handwriting Recognition
I White Space Modeling in Arabic HandwritingI Model Length Estimation
3. Experimental Results
4. Summary
Dreuw et. al.: Arabic and Latin Handwriting Recognition 2 / 17 Seminaire DGA July 2009
Introduction
I Arabic handwriting system
. right-to-left, 28 characters, position-dependent character writing variants
. ligatures and diacritics
. Pieces of Arabic Word (PAWs) as subwords
(a) Ligatures (b) Diacritics
I state-of-the-art
. preprocessing (normalization, baseline estimation, etc.) + HMMs
I our approach:
. adaptation of RWTH-ASR framework for handwriting recognition
. preprocessing-free feature extraction, focus on modeling
Dreuw et. al.: Arabic and Latin Handwriting Recognition 3 / 17 Seminaire DGA July 2009
System Overview
Image Input
FeatureExtraction
Character Inventory
Writing Variants Lexicon
Language Model
Global Search:
maximize
x1...xT
Pr(w1...wN) Pr(x1...xT | w1...wN)
w1...wN
RecognizedWord Sequence
over
Pr(x1...xT | w1...wN )
Pr(w1...wN)
Dreuw et. al.: Arabic and Latin Handwriting Recognition 4 / 17 Seminaire DGA July 2009
Visual Modeling: Feature Extraction and HMM Transitions
I recognition of characters within a context, temporal alignment necessary
I features: sliding window, no preprocessing, PCA reduction
I important: HMM whitespace models (a) and state-transition penalties (b)
(a) (b)
Dreuw et. al.: Arabic and Latin Handwriting Recognition 5 / 17 Seminaire DGA July 2009
Visual Modeling: Writing Variants Lexicon
I most reported error rates are dependent on the number of PAWsI without separate whitespace model
I always whitespaces between compound words
I whitespaces as writing variants between and within words
White-Space Models for Pieces of Arabic Words [Dreuw & Jonas+ 08] in ICPR 2008
Dreuw et. al.: Arabic and Latin Handwriting Recognition 6 / 17 Seminaire DGA July 2009
Visual Modeling: Model Length Estimation
I more complex characters should be represented by more HMM states
I the number of states Sc for each character c is updated by
Sc =Nx,c
Nc
· α
withSc = estimated number states for character c
Nx,c = number of observations aligned to character cNc = character count of c seen in trainingα = character length scaling factor.
[Visualization]
Dreuw et. al.: Arabic and Latin Handwriting Recognition 7 / 17 Seminaire DGA July 2009
RWTH-OCR Training and Decoding Architectures
I Training
. Maximum Likelihood (ML)
. CMLLR-based Writer Adaptive Training (WAT)
. discriminative training using modified-MMI criterion (M-MMI)
I Decoding
. 1-pass◦ ML model◦ M-MMI model
. 2-pass◦ segment clustering for CMLLR writer adaptation◦ unsupervised confidence-based M-MMI training for model adaptation
Dreuw et. al.: Arabic and Latin Handwriting Recognition 8 / 17 Seminaire DGA July 2009
Arabic Handwriting - IFN/ENIT Database
Corpus development
I ICDAR 2005 Competition: a, b, c, d sets for training, evaluation on set e
I ICDAR 2007 Competition: ICDAR05 + e sets for training, evaluation on set f
I ICDAR 2009 Competition: ICDAR 2007 for training, evaluation on set f
Dreuw et. al.: Arabic and Latin Handwriting Recognition 9 / 17 Seminaire DGA July 2009
Arabic Handwriting - IFN/ENIT Database
I 937 classes
I 32492 handwritten Arabic words (Tunisian city names)
I database is used by more than 60 groups all over the world
I writer statisticsset #writers #samplesa 102 6537b 102 6710c 103 6477d 104 6735e 505 6033
Total 916 32492
I examples (same word):
Dreuw et. al.: Arabic and Latin Handwriting Recognition 10 / 17 Seminaire DGA July 2009
Arabic Handwriting - Experimental Results for IFN/ENIT
I Writer Adaptive Training + CMLLR for Writer Adaptationsee [Dreuw & Rybach+ 09], ICDAR 2009 [Visualization]
I M-MMI Training + Unsupervised Confidence-Based Model Adaptationsee [Dreuw & Heigold+ 09], ICDAR 2009 [Details]
I ICDAR 2005 Setup [Comparison]
Train Test WER[%]1st pass 2nd pass
ML +MLE +M-MMI WAT+CMLLR M-MMI-confunsup. sup.
abc d 10.88 7.83 6.12 7.72 5.82 5.95abd c 11.50 8.83 6.78 9.05 5.96 6.38acd b 10.97 7.81 6.08 7.99 6.04 5.84bcd a 12.19 8.70 7.02 8.81 6.49 6.79abcd e 21.86 16.82 15.35 17.12 11.22 14.55
Dreuw et. al.: Arabic and Latin Handwriting Recognition 11 / 17 Seminaire DGA July 2009
Arabic Handwriting - Experimental Results for IFN/ENIT
I RWTH-OCR systems have been evaluated at ICDAR 2009. external evaluation at TU Braunschweig, Germany - set f and set s are unknown (not available)
Competition ID WRR[%]set fa set ff set fg set f set s
ICDAR 2007 Siemens, ID08 88.41 89.26 89.72 87.22 73.94UOB-ENST, ID12 83.23 84.79 85.29 81.81 70.57Paris V / A2iA, ID14 81.36 83.45 83.82 80.18 64.38
ICDAR 2009 RWTH-OCR, ID12 86.97 88.08 87.98 85.51 71.33RWTH-OCR, ID13 87.17 88.63 88.68 85.69 72.54RWTH-OCR, ID15 86.97 88.08 87.98 83.90 65.99
I RWTH-OCR system achieves best university result w.r.t. ICDAR 2007 resultsNote:
I Siemens: winner of the ICDAR 2007 competition, commercial system
I RWTH 2009 systems: focus on modeling (ID12 and ID13) and speed (ID15) - no preprocessing
I all competition results will be presented at ICDAR 2009
Dreuw et. al.: Arabic and Latin Handwriting Recognition 12 / 17 Seminaire DGA July 2009
Latin Handwriting - IAM Database
I English handwriting, continuous sentences
Train Devel Eval 1 Eval 2 TotalLines 6,161 1,861 900 940 9,862Running words 53,884 17,720 7,901 8,568 88,073Vocabulary size 7,754 3,604 2,290 2,290 11,368Characters 281,744 83,641 41,672 42,990 450,047Writers 283 128 46 43 500OOV Rate ≈15% ≈17% ≈15%
I Example lines:
Dreuw et. al.: Arabic and Latin Handwriting Recognition 13 / 17 Seminaire DGA July 2009
Latin Handwriting - UPV Preprocessing
I Original images
I Images after color normalisation
I Images after slant correction
I Images after height normalisation
Dreuw et. al.: Arabic and Latin Handwriting Recognition 14 / 17 Seminaire DGA July 2009
Latin Handwriting - Experimental Results on IAM Database
Systems Devel WER [%] Eval WER [%]RWTH-OCR*Baseline 81.07 83.60
+ UPV Preprocessing 57.59 65.26+ LBW LM & 20k Lexicon 34.64 41.45
+ character recognition 35.42 41.49+ discriminative training 29.40 35.32
Other Single Systems[Bertolami & Bunke 08] 30.98 35.52[Natarajan & Saleem+ 08] - 40.01[Romero & Alabau+ 07] 30.6 -System Combination[Bertolami & Bunke 08] 26.85 32.83
*see [Jonas 09] for details
Dreuw et. al.: Arabic and Latin Handwriting Recognition 15 / 17 Seminaire DGA July 2009
Summary
I RWTH-ASR→ RWTH-OCR
. simple feature extraction and preprocessing
. Arabic: created a SOTA system, results are close to world leader’s
. Latin: created a SOTA system, best single system
I discriminative training
. unsupervised confidence-based discriminative training criterion
. relative improvements of about 33% w.r.t. ML training
I ongoing work
. to be evaluated in ASR experiments
. impact of preprocessing in feature extraction, more complex features
. character context modeling (e.g. CART)
. improve writer clustering
. further databases/languages
Dreuw et. al.: Arabic and Latin Handwriting Recognition 16 / 17 Seminaire DGA July 2009
Thank you for your attention
Philippe Dreuw
dreuw@cs.rwth-aachen.de
http://www-i6.informatik.rwth-aachen.de/
Dreuw et. al.: Arabic and Latin Handwriting Recognition 17 / 17 Seminaire DGA July 2009
References
[Bertolami & Bunke 08] R. Bertolami, H. Bunke: Hidden Markov model-basedensemble methods for offline handwritten text line recognition. PatternRecognition, Vol. 41, No. 11, pp. 3452–3460, Nov 2008. 15
[Dreuw & Heigold+ 09] P. Dreuw, G. Heigold, H. Ney: Confidence-BasedDiscriminative Training for Writer Adaptation in Offline Arabic HandwritingRecognition. In International Conference on Document Analysis andRecognition, Barcelona, Spain, July 2009. 11
[Dreuw & Jonas+ 08] P. Dreuw, S. Jonas, H. Ney: White-Space Models forOffline Arabic Handwriting Recognition. In International Conference onPattern Recognition, Tampa, Florida, USA, Dec. 2008. 6
[Dreuw & Rybach+ 09] P. Dreuw, D. Rybach, C. Gollan, H. Ney: Writer AdaptiveTraining and Writing Variant Model Refinement for Offline Arabic HandwritingRecognition. In International Conference on Document Analysis andRecognition, Barcelona, Spain, July 2009. 11
[Jonas 09] S. Jonas: Improved Modeling in Handwriting Recognition. Master’sthesis, Human Language Technology and Pattern Recognition Group, RWTHAachen University, Aachen, Germany, Jun 2009. 15
Dreuw et. al.: Arabic and Latin Handwriting Recognition 18 / 17 Seminaire DGA July 2009
[Natarajan & Saleem+ 08] P. Natarajan, S. Saleem, R. Prasad, E. MacRostie,K. Subramanian: Arabic and Chinese Handwriting Recognition, Vol.4768/2008 of LNCS, chapter Multi-lingual Offline Handwriting RecognitionUsing Hidden Markov Models: A Script-Independent Approach, pp. 231–250.Springer Berlin / Heidelberg, 2008. 15
[Romero & Alabau+ 07] V. Romero, V. Alabau, J.M. Benedi: Combination ofN-Grams and Stochastic Context-Free Grammars in an Offline HandwrittenRecognition System. Lecture Notes in Computer Science, Vol. 4477,pp. 467–474, 2007. 15
Dreuw et. al.: Arabic and Latin Handwriting Recognition 19 / 17 Seminaire DGA July 2009
Appendix: Comparisons for IFN/ENIT
I ICDAR 2005 Evaluation
Rank Group WRR [%]abc-d abcd-e
1. UOB 85.00 75.932. ARAB-IFN 87.94 74.693. ICRA (Microsoft) 88.95 65.744. SHOCRAN 100.00 35.705. TH-OCR 30.13 29.62
BBN 89.49 N.A.1* RWTH 94.05 85.45
*own evaluation result (no tuning on test data)
Dreuw et. al.: Arabic and Latin Handwriting Recognition 20 / 17 Seminaire DGA July 2009
Appendix: Participating Systems at ICDAR 2005 and 2007
I MITRE: Mitre Cooperation, USAover-segmentation, adaptive lengths, character recognition with post-processing
I UOB-ENST: University of Balamand (UOB), Lebanon and Ecole Nationale Superieure des Telecommunications (ENST), ParisHMM-based (HTK), slant correction
I MIE: Mie University, Japansegmentation, adaptive lengths
I ICRA: Intelligent Character Recognition for Arabic, Microsoftpartial word recognizer
I SHOCRAN: Egyptconfidential
I TH-OCR: Tsinghua Universty, Beijing, Chinaover-segmentation, character recognition with post-processing
I CACI: Knowledge and Information Management Division, Lanham, USAHMM-based, trajectory features
I CEDAR: Center of Excellence for Document Analysis and Recognition, Buffalo, USAover-segmentation, HMM-based
I PARIS V / A2iA: University of Paris 5, and A2iA SA, Francehybrid HMM/NN-based, shape-alphabet
I Siemens: SIEMENS AG Industrial Solutions and Services, GermanyHMM-based, adapative lenghths, writing variants
I ARAB-IFN: TU Braunschweig, GermanyHMM-based
Dreuw et. al.: Arabic and Latin Handwriting Recognition 21 / 17 Seminaire DGA July 2009
Appendix: Visual Modeling - Model Length Estimation
Original Length
I overall mean of character length = 7.9 pixel (≈ 2.6 pixel/state)
I total #states = 357
Dreuw et. al.: Arabic and Latin Handwriting Recognition 22 / 17 Seminaire DGA July 2009
Appendix: Visual Modeling - Model Length Estimation
Estimated Length
I overall mean of character length = 6.2 pixel (≈ 2.0 pixel/state)
I total #states = 558
Dreuw et. al.: Arabic and Latin Handwriting Recognition 23 / 17 Seminaire DGA July 2009
Unsupervised Confidence-Based Discriminative Training
I example for a word-graph and the corresponding 1-best state alignment
���� ��������
�������� ��
������
������
������
����
������
������
���
���
��������
���
���
������
������
���
���
������
������
���� �
���
���������
���
���
���
���
���
���
���
������
������
��������
���
���
������
������
�������� �
��
���
���
���
������������
����
∑ ∑ ∑ ∑ ∑ ∑ ∑
c = 0.001
c = 0.1
c = 0.7
I necessary steps for model adaptation during decoding:
. 1-pass recognition (unsupervised transcriptions and word-graph)
. calculation of corresponding confidences (sentence, word, or state-level)
. unsupervised M-MMI-conf training on test datato adapt models (w/ regularization)
I can be done iteratively with unsupervised corpus update!
Dreuw et. al.: Arabic and Latin Handwriting Recognition 24 / 17 Seminaire DGA July 2009
Appendix: Unsupervised Confidence-Based Discriminative Training
I ML training: accumulation of observations xt:
accs =R∑r=1
Tr∑t=1
xt
I M-MMI training: weighted accumulation of observations xt:
accs =R∑r=1
Tr∑t=1
ωr,s,t · xt
I M-MMI-conf training: confidence-weighted accumulation of observations xt:
accs =R∑r=1
Tr∑t=1
ωr,s,t · cr,s,t · xt
. with confidence at sentence-, word, or state-level
Dreuw et. al.: Arabic and Latin Handwriting Recognition 25 / 17 Seminaire DGA July 2009
Appendix: Modified-MMI Criterion And Confidences
I Training: weighted accumulation of observations xt:
accs =R∑r=1
Tr∑t=1
ωr,s,t · xt
1. MMI:
ωr,s,t :=
∑sTr1 :st=s
p(xTr1 |sTr1 )p(sTr1 )p(Wr)∑
V
∑sTr1 :st=s
p(xTr1 |sTr1 )p(sTr1 )p(V )
2. M-MMI:
ωr,s,t(ρ 6= 0) :=
∑sTr1 :st=s
p(xTr1 |sTr1 )p(sTr1 )p(Wr) · e−ρδ(Wr,Wr)
∑V
∑sTr1 :st=s
p(xTr1 |sTr1 )p(sTr1 )p(V ) · e−ρδ(Wr,V )
Dreuw et. al.: Arabic and Latin Handwriting Recognition 26 / 17 Seminaire DGA July 2009
Appendix: Modified-MMI Criterion And Confidences
3. M-MMI-conf:
ωr,s,t(ρ 6= 0) :=
∑sTr1 :st=s
p(xTr1 |sTr1 )p(sTr1 )p(Wr) · e−ρδ(Wr,Wr)
∑V
∑sTr1 :st=s
p(xTr1 |sTr1 )p(sTr1 )p(V )
︸ ︷︷ ︸posterior
· e−ρδ(Wr,V )︸ ︷︷ ︸margin
· δ(cr,s,t > cthreshold)︸ ︷︷ ︸confidence
I weighted accumulation becomes:
accs =R∑r=1
Tr∑t=1
ωr,s,t(ρ)︸ ︷︷ ︸margin posteriorρ 6=0
· cr,s,t︸︷︷︸confidence
· xt
I confidences at:
. sentence-, word-, or state-level
Dreuw et. al.: Arabic and Latin Handwriting Recognition 27 / 17 Seminaire DGA July 2009
Appendix: Modified-MMI Criterion And Confidences
I word-confidence based M-MMI training and rejections
19.1
19.2
19.3
19.4
19.5
19.6
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
1000
2000
3000
4000
5000
6000
7000
8000W
ER
[%]
word confidence
confidence-based M-MMI writer adaptationM-MMI baseline
#rejected segments
Dreuw et. al.: Arabic and Latin Handwriting Recognition 28 / 17 Seminaire DGA July 2009
Appendix: Alignment Visualization
I alignment visualization with and without discriminative training
I upper lines with 5-2 baseline setup, lower lines with additionaldiscriminative training
Dreuw et. al.: Arabic and Latin Handwriting Recognition 29 / 17 Seminaire DGA July 2009
Appendix: Segment Clustering
I unbalanced writer segment clustering histograms for IFN/ENIT database
Dreuw et. al.: Arabic and Latin Handwriting Recognition 30 / 17 Seminaire DGA July 2009
Arabic Handwriting - UPV Preprocessing
I Original images
I Images after slant correction
I Images after size normalisation
Experimental Results: [Details]
I important informations in ascender and descender areas are lost
I not yet suitable for Arabic HWR
Dreuw et. al.: Arabic and Latin Handwriting Recognition 31 / 17 Seminaire DGA July 2009
Arabic Handwriting - Experimental Results using Preprocessing
I Baseline (BASE)
I BASE with deslanting (SLANT)
I BASE with complete UPV preprocessing (UPV)
Train Test WER [%] CER [%]BASE SLANT UPV BASE SLANT UPV
abc d 9.29 13.87 32.74 4.01 4.38 9.58abd c 10.13 14.05 34.18 4.09 4.57 10.16acd b 9.05 12.98 33.14 3.62 4.12 9.70bcd a 9.71 14.22 35.96 4.26 4.72 10.90abcd e 20.32 26.92 53.39 8.25 9.18 16.02
I important informations in ascender and descender areas are lost
I not yet suitable for Arabic HWR
Dreuw et. al.: Arabic and Latin Handwriting Recognition 32 / 17 Seminaire DGA July 2009
top related