importance of recurrent layers for unconstrained ... · validation test time cnn+blstm without 6.98...

31
Context Studied architectures Experiments Conclusion References Importance of recurrent layers for unconstrained handwriting recognition Denis Coquenet, Yann Soullard, Clément Chatelain, Thierry Paquet LITIS Laboratory - EA 4108 Normandie University - University of Rouen, France SIFED, 6 th June 2019 Importance of recurrent layers for unconstrained handwriting recognition 1 / 26

Upload: others

Post on 16-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Importance of recurrent layers for unconstrainedhandwriting recognition

Denis Coquenet, Yann Soullard, Clément Chatelain, ThierryPaquet

LITIS Laboratory - EA 4108 Normandie University - University of Rouen, France

SIFED, 6th June 2019

Importance of recurrent layers for unconstrained handwriting recognition 1 / 26

Page 2: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Table of Contents

1 Context

2 Studied architectures

3 Experiments

4 Conclusion

Importance of recurrent layers for unconstrained handwriting recognition 2 / 26

Page 3: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Handwriting recognition system

ConstraintsImages (input) of variable sizeSequence of characters (output) of variable length

Towards a heavy use of neural networks

Importance of recurrent layers for unconstrained handwriting recognition 3 / 26

Page 4: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Deep learning handwriting recognition system

ArchitectureRecurrent layers (LSTM) and/or non-recurrent ones (CNN)Language model inclusion

Sequence alignment

Connectionist Temporal Classification (CTC)

Focus on optical model only without language model norlexicon constraints

Importance of recurrent layers for unconstrained handwriting recognition 4 / 26

Page 5: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

State of the art

Recurrent models - Multi-Dimensional Long-Short Term Memory(MDLSTM) [Pham2014].

Recurrence over horizontal and vertical axis (in bothdirections) : 4 LSTM/layer

Importance of recurrent layers for unconstrained handwriting recognition 5 / 26

Page 6: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

State of the art

Recurrent models - Convolutional Neural Network + BidirectionalLong-Short Term Memory (CNN+BLSTM) [Puigcerver2017].

Recurrence over horizontal axis only (in both directions) : 2LSTM/layer

Importance of recurrent layers for unconstrained handwriting recognition 6 / 26

Page 7: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

State of the art

Non-recurrent models - Convolutionnal Neural Network (CNN)

Importance of recurrent layers for unconstrained handwriting recognition 7 / 26

Page 8: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

State of the art

CNN for handwriting recognition

Fully Convolutional Networks (FCN) + CTC [Ptucha2018]Standard CNN without dense layer

FCN with gating mechanism + CTC [Yousef2018]Gates (tanh, sigmoid)Residual connectionsDepthwise Separable ConvolutionsHigh normalization (batch & layer)

FCN with gating mechanism + CTC[Ingle2019]Gates (ReLU, sigmoid)Shared weight layers

Importance of recurrent layers for unconstrained handwriting recognition 8 / 26

Page 9: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Context

ObjectiveDesign a convolutional network competitive with recurrent ones toreduce the training time : G-CNN

Main questionsAre recurrent layers really necessary for handwritingrecognition ?Are CNN really lighter than recurrent model in terms ofparameters ?Is it possible to easily obtain competitive results withoutrecurrence ?

Importance of recurrent layers for unconstrained handwriting recognition 9 / 26

Page 10: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Our baseline model - CNN+BLSTM

Flatten

Dense (256)

BLSTM (256)

Dense (n+1)

Softmax

Input (X, 32, 32, 1)

BLSTM (256)

Conv2D(u)

Conv2D(u)

MaxPooling(2, 2)

Dropout(d)

(X, 2, 2, 256)

x4u = [32 64 128 256] d = [0.4 0.4 0.4 0]

FeaturesFrom [Soullard2019](state-of-the-art results)Recurrent model8 convolutions2.5 million of parameters

n : number of characters in thealphabet

Importance of recurrent layers for unconstrained handwriting recognition 10 / 26

Page 11: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Our G-CNN - overview

Input (X, 32, 32, 1)

Conv2D(u)

Conv2D(u)

MaxPooling(2, 2)

Dropout(0.4)

Gate

GateBlock #2

Concatenate

Conv2D(512, k=(1,1))

Flatten

Dense (512)

Dense (n+1)

Softmax

(X, 2, 2, 512)

x2u = [32 64]

Conv2D(u)

Conv2D(u)

x4u = [128 256 256 512]

Dropout(0.4)

...

...GateBlock #1

(X, 2, 2, 1536)

FeaturesBased on baselinemodel

Non-recurrentmodel

21 convolutions

6.9 million ofparameters

Shared weightlayers

DepthwiseSeparableConvolutions

Residualconnections

Importance of recurrent layers for unconstrained handwriting recognition 11 / 26

Page 12: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Our G-CNN - gates

Gating mechanism

Importance of recurrent layers for unconstrained handwriting recognition 12 / 26

Page 13: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Training details

HyperparametersSliding window : 32x32 pxLoss : CTCOptimizer : AdamInitial learning rate : 10−4

Momentum : 0.9

Importance of recurrent layers for unconstrained handwriting recognition 13 / 26

Page 14: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Model evaluation

CriteriaCharacter Error Rate (CER)Number of parametersTraining time

Raw model comparisonWe focus only on the network performance aloneNo language modelNo lexicon constraints

DatasetRIMES (lines)

Importance of recurrent layers for unconstrained handwriting recognition 14 / 26

Page 15: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

RIMES dataset

Dataset characteristics+1,300 writersFrench writings12,723 pages segmented into lines

RIMES dataset split

Training Validation Test Alphabet9,947 1,333 778 100

Example

Importance of recurrent layers for unconstrained handwriting recognition 15 / 26

Page 16: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

First experiment : Raw comparison

ArchitectureCER(%) CER (%) Training

Parameters (M)validation test time

CNN+BLSTM 6.98 6.88 1d22h59 4.1CNN+Dense only 17.73 19.03 1h10 1.5

G-CNN 9.92 10.03 10h00 6.9

BLSTM layers responsible for a large amount of parameters (2.6 M)

BLSTM layers increase performance dramatically (-12.15% in test)

G-CNN : more parameters but training time shorter (parallelcomputing)

Importance of recurrent layers for unconstrained handwriting recognition 16 / 26

Page 17: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

First experiment : Raw comparison

ArchitectureCER(%) CER (%) Training

Parameters (M)validation test time

CNN+BLSTM 6.98 6.88 1d22h59 4.1CNN+Dense only 17.73 19.03 1h10 1.5

G-CNN 9.92 10.03 10h00 6.9

BLSTM layers responsible for a large amount of parameters (2.6 M)

BLSTM layers increase performance dramatically (-12.15% in test)

G-CNN : more parameters but training time shorter (parallelcomputing)

Importance of recurrent layers for unconstrained handwriting recognition 16 / 26

Page 18: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

First experiment : Raw comparison

ArchitectureCER(%) CER (%) Training

Parameters (M)validation test time

CNN+BLSTM 6.98 6.88 1d22h59 4.1CNN+Dense only 17.73 19.03 1h10 1.5

G-CNN 9.92 10.03 10h00 6.9

BLSTM layers responsible for a large amount of parameters (2.6 M)

BLSTM layers increase performance dramatically (-12.15% in test)

G-CNN : more parameters but training time shorter (parallelcomputing)

Importance of recurrent layers for unconstrained handwriting recognition 16 / 26

Page 19: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Second experiment - Robustness against complexified data

Modified version of RIMES datasetLined paper background addition

Examples

Importance of recurrent layers for unconstrained handwriting recognition 17 / 26

Page 20: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Second experiment - Robustness against complexified data

Architecture BackgroundCER(%) CER (%) Trainingvalidation test time

CNN+BLSTMWithout 6.98 6.88 1d22h59With 8.81 9.27 1d1h29

G-CNNWithout 9.92 10.03 10h00With 11.70 12.55 8h27

Similar behavior - CER increased by 2.39% for the CNN+BLSTMand 2.52% for the G-CNN (in test)

Importance of recurrent layers for unconstrained handwriting recognition 18 / 26

Page 21: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Third experiment - Impact of data augmentation

RIMES

1. Raw

2. Contrast

3. Sign flipping

4. Long scaling

5. Short scaling

6. Width dilation

7. Height dilation

Importance of recurrent layers for unconstrained handwriting recognition 19 / 26

Page 22: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Third experiment - Impact of data augmentation

Architecture Data augmentation CER(%) - validation CER (%) - test

CNN+BLSTMWithout 6.98 6.88With 6.59 5.94

G-CNNWithout 9.92 10.03With 8.93 8.73

CER decreased by 1.30% for the G-CNN and 0.94% for theCNN+BLSTM

Assumption : G-CNN needs more examples whereas CNN+BLSTMcompensates with its use of context

Importance of recurrent layers for unconstrained handwriting recognition 20 / 26

Page 23: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Ablation study - Part 1

ArchitectureCER(%) CER (%) Training

Parameters (M)validation test time

G-CNN 9.92 10.03 10h00 6.9(1) Only standard convolutions 10.02 9.97 6h41 9.0

(2) Max pooling from the very beginning 13.31 13.35 2h57 6.9(3) No shared weight layers 9.78 9.85 8h54 7.7

(1) Depthwise Separable Convolutions enables saving 2.1 M ofparameters preserving the performance (+0.06%)

(2) Delaying the use of max pooling increases the performance (by3.32%)

(3) Shared weight layers enable saving 0.8 M parameters, with asimilar CER (+0.18% only)

Importance of recurrent layers for unconstrained handwriting recognition 21 / 26

Page 24: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Ablation study - Part 1

ArchitectureCER(%) CER (%) Training

Parameters (M)validation test time

G-CNN 9.92 10.03 10h00 6.9(1) Only standard convolutions 10.02 9.97 6h41 9.0

(2) Max pooling from the very beginning 13.31 13.35 2h57 6.9(3) No shared weight layers 9.78 9.85 8h54 7.7

(1) Depthwise Separable Convolutions enables saving 2.1 M ofparameters preserving the performance (+0.06%)

(2) Delaying the use of max pooling increases the performance (by3.32%)

(3) Shared weight layers enable saving 0.8 M parameters, with asimilar CER (+0.18% only)

Importance of recurrent layers for unconstrained handwriting recognition 21 / 26

Page 25: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Ablation study - Part 1

ArchitectureCER(%) CER (%) Training

Parameters (M)validation test time

G-CNN 9.92 10.03 10h00 6.9(1) Only standard convolutions 10.02 9.97 6h41 9.0

(2) Max pooling from the very beginning 13.31 13.35 2h57 6.9(3) No shared weight layers 9.78 9.85 8h54 7.7

(1) Depthwise Separable Convolutions enables saving 2.1 M ofparameters preserving the performance (+0.06%)

(2) Delaying the use of max pooling increases the performance (by3.32%)

(3) Shared weight layers enable saving 0.8 M parameters, with asimilar CER (+0.18% only)

Importance of recurrent layers for unconstrained handwriting recognition 21 / 26

Page 26: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Ablation study - Part 2

ArchitectureCER(%) CER (%) Training

Parameters (M)validation test time

G-CNN 9.92 10.03 10h00 6.9(4) Doubled convolutions in GateBlocks 9.96 10.15 4h10 7.4

(5) Removal of the 2 GateBlocks 10.09 10.33 6h37 6.1

(4) Increasing the number of convolutions between gates is notnecessary (+0.12%)

(5) The majority of the work is done before the GateBlocks(+0.3%)

Importance of recurrent layers for unconstrained handwriting recognition 22 / 26

Page 27: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Ablation study - Part 2

ArchitectureCER(%) CER (%) Training

Parameters (M)validation test time

G-CNN 9.92 10.03 10h00 6.9(4) Doubled convolutions in GateBlocks 9.96 10.15 4h10 7.4(5) Removal of the 2 GateBlocks 10.09 10.33 6h37 6.1

(4) Increasing the number of convolutions between gates is notnecessary (+0.12%)(5) The majority of the work is done before the GateBlocks(+0.3%)

Importance of recurrent layers for unconstrained handwriting recognition 22 / 26

Page 28: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Conclusion - recurrent models

StructureConvolutional part (CNN): feature extractionRecurrent part (LSTM): sequence modeling

AdvantagesPerformanceSimple architectures

DrawbacksRecurrent models have long training times:

LSTM implies a large amount of parametersRecurrence implies sequential computations

Importance of recurrent layers for unconstrained handwriting recognition 23 / 26

Page 29: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Conclusion - G-CNN models

StructureFeature extraction similar to CNN+BLSTMGating mechanism to filter information

AdvantagesConvolution = parallelizable operation & few parameters

Reduced training timeDeeper networks, bigger receptive fields

DrawbacksNumber of hyperparameters, hard tuningComplex architecturePerformance hardly competitive

Importance of recurrent layers for unconstrained handwriting recognition 24 / 26

Page 30: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

Future works

Toward an even lighter networkGive up densely connected layers to build a Fully ConvolutionnalNetwork

Exploring other alternatives

Attention models [Chowdhury2018]Dense net [Huang2016]

Importance of recurrent layers for unconstrained handwriting recognition 25 / 26

Page 31: Importance of recurrent layers for unconstrained ... · validation test time CNN+BLSTM Without 6.98 6.88 1d22h59 With 8.81 9.27 1d1h29 G-CNN Without 9.92 10.03 10h00 With 11.70 12.55

Context Studied architectures Experiments Conclusion References

References

[Pham2014] V. Pham et al. “Dropout Improves Recurrent Neural Networks forHandwriting Recognition”. In: ICFHR (2014).

[Huang2016] Huang et al. Densely Connected Convolutional Networks. 2016.

[Puigcerver2017] J. Puigcerver. “Are Multidimensional Recurrent Layers ReallyNecessary for Handwritten Text Recognition?”. In: ICDAR. 2017,pp. 67–72.

[Chowdhury2018] C. Arindam et al. An Efficient End-to-End Neural Model forHandwritten Text Recognition. 2018.

[Yousef2018] M. Yousef et al. Accurate, Data-Efficient, Unconstrained TextRecognition with Convolutional Neural Networks. 2018.

[Ptucha2018] Felipe Petroski Such et al. “Intelligent Character Recognitionusing Fully Convolutional Neural Networks”. In: PatternRecognition 88 (Dec. 2018).

[Soullard2019] Y. Soullard et al. CTCModel: a Keras Model for ConnectionistTemporal Classification. 2019.

[Ingle2019] R. Ingle et al. A Scalable Handwritten Text Recognition System.2019.

Importance of recurrent layers for unconstrained handwriting recognition 26 / 26