deep learning for character-based information extractionqyj/papersa08/2014_ecir_deep.pdf ·...

12
Deep Learning for Character-based Information Extraction Yanjun Qi (University of Virginia, USA) Sujatha Das G (Penn. State University, USA) Ronan Collobert (IDIAP) Jason Weston (Google Research NY) Deep Learning for Character-based Information Extraction , ECIR 2014 1

Upload: others

Post on 28-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based

Deep Learning for Character-based Information Extraction

Yanjun Qi (University of Virginia, USA) Sujatha Das G (Penn. State University, USA)

Ronan Collobert (IDIAP) Jason Weston (Google Research NY)

Deep Learning for Character-based Information Extraction , ECIR 2014 1

Page 2: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based

•  Target  task:  automa.cally  extract  informa.on  about  pre-­‐specified  types  of  events  from  a  linear  sequence  of  unit  tokens    –  character-based, – no word boundaries, – no capitalization cues  

 •  For  examples,    

– Chinese language NLP: Chinese-character based – Protein sequence tagging: Amino-acid based  

Deep Learning for Character-based Information Extraction , ECIR 2014 2

Task:  Target  Applica0ons  

Page 3: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based

§  Natural Language Processing on Chinese sequences

Deep Learning for Character-based Information Extraction , ECIR 2014 3

Task:  Case  Study  (1):      

Word Segmentation (WS): Basic task, separate contiguous characters into words Part of Speech (POS) tagging: Determine part of speech of each word in the text Name Entity Recognition (NER): determine person, organization and location names in text

Page 4: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based

Protein Sequence è Structural Segments

§ Input X:Primary sequence

MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE

helices strands loops

§ Output Y: e.g. Secondary structure (SS)

Task:  Case  Study  (2):      

Deep Learning for Character-based Information Extraction , ECIR 2014 4

Page 5: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based

XNVLALDTSQRIRIGLRKGEDLFEISYTGEKKHAEILPV ...

LBBBBBBLHHHBBBBBBBHHBBBBBBBBHLHHHHHHHHH ... Labels

X  

Y  

 Task:  Context-­‐Window  based  Per-­‐Character  Tagging  

Deep Learning for Character-based Information Extraction , ECIR 2014 5

Character Inputs

Context window around each character of interest

Page 6: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based

Feature Engineering ü  Most time-consuming in development cycle

ü   Often hand-craft and task dependent in practice

Feature Learning ü  Easily adaptable to new similar tasks

ü  Layerwise feature representation learning

6

Method:  Deep  Learning  to  Rescue    

Previous approaches: Use task-specific/hand-crafted features with a shallow learning structure

Now: Task-independent “deep” structure using simple features input to deep neural network (NN) architecture

Page 7: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based

Learn  Feature  Representa.on  for  each  character  

Learn  Representa.on  for  each  segment  around  current  posi.on  

Learning  Func.on  to  map  from  representa.on  to  output  class  label    

7

Criteria to train: Negative Log Likelihood Using Stochastic Gradient descent (SGD)

Page 8: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based

ü  The first layer in our deep

structure; ü  Idea: Characters are embed

in a vector space ü  Embedding are trained

8

Method:  Character  to  Vector  Representa0ons  Learning  

How to train this embedding layer: ü  (1). Supervised: Trained as a normal NN layer, using SGD, based on target

task’s training pairs ü  (2). Initialized with unsupervised “language model” (lm) pre-training: to

captures similarities among characters based on their contexts in the unsupervised sequences, e.g. Chinese Wiki, swissprot protein sequence DB

Page 9: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based

•  A Viterbi algorithm to capture spatial dependencies between y_i

•  i.e. optimize the whole sentence-level log-likelihood •  i.e. encourage valid paths of output tags •  = Output tag transition scores + deep network scores

9

Method:  Modeling  spa0al  dependency  among  characters    

y2  –  y2  –  y1–  y1–  y1–  y2–  y2  –  y3  –  y3  –  y3  –y2  –  y2      

e.g.

Page 10: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based

10 10

Experiments  :  Data  Sets  

Page 11: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based

c1: character unigrams, c2: character bigrams, lm: embedding obtained with deep language model, vit: Viterbi algorithm 11

Experiments  :  Performance  Comparison  

Page 12: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based

Why is our method preferable?

§  No particular task-specific feature engineering. §  Robust and flexible

§  Easily adaptable to other character-based tagging tasks, e.g. Japanese NER

12

References

[0] http://www.cs.cmu.edu/~qyj/zhSenna/ [1] R. Collobert et al, Natural language processing (almost) from scratch, JMLR 12 [2] Qi et al, A unified multitask architecture for predicting local protein properties. PLoS ONE 12 [3] Levow, G.A.: The third international chinese language processing bakeoff, SIGHAN 2006 [4] Y. LeCun et al. 1998. Efficient BackProp.

Summary: