detecting text in natural image with connectionist text ... · detecting text in natural image with...

D D De e et t te e ec c ct t ti i in n ng g g T T T e e ex x xt t t i i in n n N N Na a at t tu u ur r r a a al l l I I Im m ma a ag g ge e e w w wi i it t th h h C C Co o on n nn n ne e ec c ct t ti i io o on n ni i is s st t t T T T e e ex x xt t t P P Pr r ro o op p po o os s sa a al l l N N Ne e et t tw w wo o or r rk k k Zhi Tian 1 , Weilin Huang 1,2 , Tong He 1 , Pan He 1 , and Yu Qiao 1,3 1 Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China 2 University of Oxford, UK 3 The Chinese University of Hongkong, China Insight Motivation ● Current bottom-up approaches are complicated, with weak robustness and reliability, and accumulated errors. ● Stat-of-the-art object detectors are powerful, but not accurate for text localisation. ● Fill the gap between general object detection (e.g., RPN [1]) and text detection. Connectionist Text Proposal Network (CTPN) (CTPN) CTPN Architecture CTPN Proposals ● Recurrently connect sequential proposals by BLSTM ● Jointly predict text scores, y-axis coordinates, and refinement offsets ● Detect text in sequences of fine-scale proposals Recurrent Connectionist Text Proposals Top: CTPN without recurrent connection. Bottom: with recurrent connection ● RNN layer connects sequential proposals directly in convolutional layer ● In-network recurrent architecture is end-to-end trainable ● Detect highly ambiguous text, and reduce false detections considerably Red Box: with side-refinement. Yellow Box: without side-refinement ● Predict offsets for side-proposals - horizontal sides rectification ● Further improve localisation accuracy ● Joint predictions - not a post-precessing step Side-Refinement RdB ith id fi t Y ll B ith t id fi t Detecting Text in Fine-Scale Proposals RPN Proposals CTPN Proposals ● Slide a 3x3 window through Conv5 ● Text anchors are used for each window ● Output a sequence of 16-pixel width proposals Details: ● Improve localisation accuracy ● Generalise to multi scales, aspects, and languages ● Using single-scale image Advantages: ● Encode rich context information Experimental Results [1] S. Ren, K. He, R, Girshick, and J. Sun: Faster R-CNN: Towards real-time object detection with region proposals network, NIPS, 2016. Reference: [2] K. Simonyan, A. Zisserman: Very deep convolutional networks for large-scale image recognition, ICLR, 2015. [3] P. He, W. Huang, Y. Qiao, C. C. Loy, and X. Tang: Reading scene text in deep convolutional sequences, AAAI, 2016. Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao European Conference on Computer Vision (ECCV), 2016 Online demo: textdet.com Summary: Red Box: CTPN detection. Yellow Box: ground truth ● Trained on 3K images in English and Chinese, generalise well to others (e,g., Korean) ● Fine-scale strategy improves Precision, while using RNN increases Recall and Precision ● Obtain 0.88 and 0.61 F-measures on the ICDAR 2013 and 2015, respectively ● Computationally efficient, with 0.14s/image GPU time (scale=600) ● Strong capability for detecting very small-size text

Upload: lamtuong

Post on 08-Nov-2018

226 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Detecting Text in Natural Image with Connectionist Text ... · Detecting Text in Natural Image with Connectionist Text Proposal Network 1Zhi Tian , Weilin Huang1,2, Tong He 1,Pan

2 University of Oxford, UK 3 The Chinese University of Hongkong, China

Insight

Motivation●Current bottom-up approaches are complicated, with weak robustness and

reliability, and accumulated errors.● Stat-of-the-art object detectors are powerful, but not accurate for text localisation.

● Fill the gap between general object detection (e.g., RPN [1]) and text detection.

Connectionist Text Proposal Network (CTPN)(CTPN)

CTPN Architecture

CTPN Proposals

●Recurrently connect sequential proposals by BLSTM

● Jointly predict text scores, y-axis coordinates, and refinement offsets

●Detect text in sequences of fine-scale proposals

Recurrent Connectionist Text Proposals

Top: CTPN without recurrent connection. Bottom: with recurrent connection

●RNN layer connects sequential proposals directly in convolutional layer● In-network recurrent architecture is end-to-end trainable

●Detect highly ambiguous text, and reduce false detections considerably

Red Box: with side-refinement. Yellow Box: without side-refinement

●Predict offsets for side-proposals - horizontal sides rectification● Further improve localisation accuracy● Joint predictions - not a post-precessing step

Side-Refinement

R d B ith id fi t Y ll B ith t id fi t

Detecting Text in Fine-Scale Proposals

RPN Proposals CTPN Proposals

● Slide a 3x3 window through Conv5● Text anchors are used for each window● Output a sequence of 16-pixel width proposals

Details:● Improve localisation accuracy●Generalise to multi scales, aspects,

and languages● Using single-scale image

Advantages:

●Encode rich context information

Experimental Results

[1] S. Ren, K. He, R, Girshick, and J. Sun: Faster R-CNN: Towards real-time object detection with region proposals network, NIPS, 2016. Reference:

[2] K. Simonyan, A. Zisserman: Very deep convolutional networks for large-scale image recognition, ICLR, 2015.[3] P. He, W. Huang, Y. Qiao, C. C. Loy, and X. Tang: Reading scene text in deep convolutional sequences, AAAI, 2016.

Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao European Conference on Computer Vision (ECCV), 2016

Online demo: textdet.com

Summary:

Red Box: CTPN detection. Yellow Box: ground truth

● Trained on 3K images in English and Chinese, generalise well to others (e,g., Korean)● Fine-scale strategy improves Precision, while using RNN increases Recall and Precision● Obtain 0.88 and 0.61 F-measures on the ICDAR 2013 and 2015, respectively● Computationally efficient, with 0.14s/image GPU time (scale=600)● Strong capability for detecting very small-size text

Chapter 1 Evolving Processes and Evolving Connectionist … · 12/16/2002 Nik Kasabov - Evolving Connectionist Systems Chapter 1 Evolving Processes and Evolving Connectionist Systems

Symbolic vs. Connectionist - Minsky

Connectionist Speaker Normalization with Generalized ...papers.nips.cc/paper/1016-connectionist-speaker... · Connectionist Speaker Normalization with Generalized Resource Allocating

Connectionist perspectives on language learning

Detecting flames and insults in text

Political Speech/Text - Detecting bias and …ssli.ee.washington.edu/courses/ee517/discussTalks/JM...Political Speech/Text Detecting bias and partisanship in political speeches and

Connectionist learning of regular graph grammars learning... · 2014. 11. 19. · Connectionist learning of regular graph grammars Abstract This paper presents a new connectionist

A Connectionist Technique for Accelerated Textual Input ...papers.nips.cc/paper/1015-a-connectionist... · A Connectionist Technique for Accelerated Textual Input: Letting a Network

Detecting Deception in Text: A Corpus-Driven …Detecting Deception in Text: A Corpus-Driven Approach by Franco Salvetti Laurea, Summa cum Laude, Universit´a degli Studi di Milano,

Connectionist Models and Linguistic Theory

Machine Learning: Connectionist

Detecting Oriented Text in Natural Images by …openaccess.thecvf.com/content_cvpr_2017/papers/Shi...Detecting Oriented Text in Natural Images by Linking Segments Baoguang Shi1 Xiang