national university of sciences and technolgoy learning and optical character... · national...

23
NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition FAISAL SHAFAIT

Upload: others

Post on 07-Nov-2019

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

Deep Learning and Optical Character Recognition

FAISAL SHAFAIT

Page 2: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

Artificial Neural Networks (ANNs)

Goal: make computers “intelligent”

Idea: Model human brain

12/29/2016 Shafait: Deep Learning and OCR 2

Axon

Nucleus

Cell

Dendrite

Synapse

x1 ....

x2xn

hi

w1 wn

w2

ii xwha

Artificial Neural Network

Page 3: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

InputLayer

OutputLayer

Features

Output

HiddenLayers

xi1 ....

xi2xin

hi

ai

wi1win

wi2

....

....

255240255255

.. ..

.... ....

2.09.001.0

255255255255

239238247252

255232248255

255240255255

12/29/2016 Shafait: Deep Learning and OCR 3

Page 4: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

ANN Applications: Recognize Patterns

Image Analysis

• Detection (e.g., disease)

• Recognition (e.g., objects)

• Identification (e.g., persons)

Data Mining

• Classification

• Change and Deviation Detection

• Knowledge Discovery

Prognosis

• Ozone prognosis

• Weather Forecast

• Stock market prediction

Games, …

12/29/2016 Shafait: Deep Learning and OCR 4

Page 5: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

The Rise and Fall of ANNs

ANN widely used in 1990s

Suddenly went out of flavour in 2000s

Renaissance – Deep Learning

Popular deep architectures

• Neocognitron [Fukushima 1980]

• Recurrent Neural Networks [Hopfield 1982]

• Convolutional Neural Networks [LeCun 1989]

• Long Short-Term Memory Networks

[Schmidhuber 1997]

• Deep Belief Networks [Hinton 2006]

• Self-Taught Learning [Ng 2007]

12/29/2016 Shafait: Deep Learning and OCR 5

Input Layer

Output Layer

Features

Output

Hidden LayerInput Layer

Input LayerInput Layer

Hidden Layer n

Page 6: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

Deep Learning Benchmarks

Highest accuracy on standard benchmarks

• The MNIST Handwritten Digits Benchmark

• The NORB Object Recognition Benchmark

The CIFAR Image Classification Benchmark

Winning Competitions

• ICDAR 2013 Arabic OCR Competition

• MICCAI 2013 Grand Challenge on Mitosis Detection

• IJCNN 2013 Traffic Sign Recognition Contest

• ICPR 2012 contest on Mitosis Detection in Histological Images

• ISBI 2012 Brain Image Segmentation Challenge

12/29/2016 Shafait: Deep Learning and OCR 6

Page 7: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

12/29/2016 Shafait: Deep Learning and OCR 7

Deep Learning with Long Short-Term Memory (LSTM) Networks

Page 8: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

Recurrent Neural Networks (RNNs)

Proposed by Hopfield in 1982

Recurrent connections are added in order to keep

information of previous time stamps in the network

Novel equation for the activation:

Context information is used

How to train those networks …?

1t

hh

t

ii

t

h bwxwhb

Input Layer

Output Layer

Features

Output

Hidden Layer

12/29/2016 Shafait: Deep Learning and OCR 8

Page 9: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

Training of RNNs –

Backpropagation Through Time

Unfold the network in time

• k timestamps (parameter)

• Perform Backpropagation for output at t

Repeat this for each

Input Layert-k

Output Layert

Featurest-k

Outputt

Hidden Layert-k Input Layert-k+1

Featurest-k+1

Hidden Layert-k+1 Input Layert

Featurest

Hidden Layert....

10 Tt

12/29/2016 Shafait: Deep Learning and OCR 9

Page 10: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

Vanishing Gradient

12/29/2016 Shafait: Deep Learning and OCR 10

1 2 3 4 5 6 7

Inputs

Hidden

Layer

Outputs

time

Page 11: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

Core Idea: New Memory Cell

Instead of Perceptron

12/29/2016 Shafait: Deep Learning and OCR 11

Page 12: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

No Vanishing Gradient

12/29/2016 Shafait: Deep Learning and OCR 12

1 2 3 4 5 6 7

Inputs

Hidden

Layer

Outputs

time

Page 13: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

12/29/2016

....

....

....Sigmoid or tanh

Multiplication

Full connection

Single connection

....

Input Layer

Forget Gate Layer

Hidden Layer

Output Gate Layer

Input Gate Layer

Output Layer

Shafait: Deep Learning and OCR 13

Page 14: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

Bidirectional RNN

Trained with back-propagation through time (forward path through all time

stamps for each hidden layer sequentially)

Output Layert-1

Outputt-1

Input Layert-1

Featurest-1

Forward Layert-1

Backw. Layert-1

Hidden Layert-1

Output Layert

Outputt

Input Layert

Featurest

Forward Layert

Backw. Layert

Hidden Layert

Output Layert+1

Outputt+1

Input Layert+1

Featurest+1

Forward Layert+1

Backw. Layert+1

Hidden Layert+1

12/29/2016 Shafait: Deep Learning and OCR 14

Page 15: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

Optical Character Recognition (OCR) with MD-BLSTM

Input: raw pixel data

Output machine-readable transcription

Importance of context

12/29/2016 Shafait: Deep Learning and OCR 15

Constitutional Irritation.

To

Capital “O” ?

Lower case “o” ?

Digit “0” ?

Mathematical circle symbol?

Page 16: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

Scanning Neural Network Architecture

12/29/2016 Shafait: Deep Learning and OCR

16

Sheikh Faisal Rashid, Faisal Shafait, T Breuel. “Scanning Neural Network for Text Line Recognition”, 10th IAPR Workshop on

Document Analysis Systems, DAS’12. Gold Coast, Australia, Mar. 2012.

Best

Stu

dent

Paper

Aw

ard

Page 17: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

Latin OCR with BLSTM

OCR System English Fontana

OCRopus 2.14 -

Tesseract 1.30 0.91

FineReader 0.85 1.23

BLSTM 0.59 0.15

12/29/2016 Shafait: Deep Learning and OCR 17

T Breuel, Adnan ul Hasan, M Al-Azawi, and Faisal Shafait. “High-Performance OCR for Printed English and

Fraktur Using LSTM Networks”, 12th Int. Conf. on Document Analysis and Recognition, ICDAR’13. Washington

DC, USA, Aug 2013.

Page 18: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

Urdu OCR with BLSTM

Cursive script

No word spacing

Small inter-line gap

12/29/2016 Shafait: Deep Learning and OCR 18

Page 19: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

Urdu OCR with MD-BLSTM

12/29/2016 Shafait: Deep Learning and OCR 19

Adnan ul Hasan, S. Ahmed, Sheikh Faisal Rashid, Faisal Shafait, T Breuel. “Offline Printed Urdu Nastaleeq Script Recognition with

Bidirectional LSTM Networks”, 12th Int. Conf. on Document Analysis and Recognition, ICDAR’13. Washington DC, USA, 2013.

Page 20: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

Urdu OCR with MD-BLSTM

12/29/2016 Shafait: Deep Learning and OCR 20

Saeeda Naz, A. Umar, R. Ahmad, M. I. Razzak, Sheikh Faisal Rashid, Faisal Shafait, "Urdu Nastaliq Text Recognition using

Implicit Segmentation based on Multi-Dimensional Long Short Term Memory Neural Networks", SpringerPlus, 2016

Page 21: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

Results of the ICDAR 2013 Arabic OCR Contest

Organized in four challenges

1. Font (B) in 12 pt size

2. Font (B) in multiple sizes

3. Font (I) in multiple sizes

4. All fonts in multiple sizes

Our system (jointly developed with

Siemens) won the TOP place in all

four challenges with a significant

margin

12/29/2016 Shafait: Deep Learning and OCR 21

Fouad Slimane, Slim Kanoun, Haikal El Abed, Adel M. Alimi, Rolf Ingold, Jean Hennebert: ICDAR2013 Competition on Multi-font and Multi-

size Digitally Represented Arabic Text. 12th International Conference on Document Analysis and Recognition, ICDAR 2013: 1433-1437

Page 22: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

Conclusion

Deep learning architectures simulate human brain

During the years they became more powerful

• Better architectures and algorithms

• Faster hardware

Diverse application areas

Training deep architectures needs

• many CPU cores

• a lot of patience

Effective training remains an art [LeCun 2013]

12/29/2016 Shafait: Deep Learning and OCR 22

Page 23: NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Learning and Optical Character... · NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLGOY Deep Learning and Optical Character Recognition

SEECS, NUST

NATIONAL UNIVERSITY OFSCIENCES AND TECHNOLGOY

Questions / Comments ?

12/29/2016 Shafait: Deep Learning and OCR 23