deep learning for character-based information extractionqyj/papersa08/2014_ecir_deep.pdf ·...

Deep Learning for Character-based Information Extraction

Yanjun Qi (University of Virginia, USA) Sujatha Das G (Penn. State University, USA)

Ronan Collobert (IDIAP) Jason Weston (Google Research NY)

Deep Learning for Character-based Information Extraction , ECIR 2014 1

•  Target task: automa.cally extract informa.on about pre-‐specified types of events from a linear sequence of unit tokens –  character-based, – no word boundaries, – no capitalization cues

•  For examples,

– Chinese language NLP: Chinese-character based – Protein sequence tagging: Amino-acid based

Task: Target Applica0ons

§  Natural Language Processing on Chinese sequences

Task: Case Study (1):

Word Segmentation (WS): Basic task, separate contiguous characters into words Part of Speech (POS) tagging: Determine part of speech of each word in the text Name Entity Recognition (NER): determine person, organization and location names in text

Protein Sequence è Structural Segments

§ Input X：Primary sequence

MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE

helices strands loops

§ Output Y： e.g. Secondary structure (SS)

Task: Case Study (2):

XNVLALDTSQRIRIGLRKGEDLFEISYTGEKKHAEILPV ...

LBBBBBBLHHHBBBBBBBHHBBBBBBBBHLHHHHHHHHH ... Labels

Task: Context-‐Window based Per-‐Character Tagging

Character Inputs

Context window around each character of interest

Feature Engineering ü  Most time-consuming in development cycle

ü   Often hand-craft and task dependent in practice

Feature Learning ü  Easily adaptable to new similar tasks

ü  Layerwise feature representation learning

Method: Deep Learning to Rescue

Previous approaches: Use task-specific/hand-crafted features with a shallow learning structure

Now: Task-independent “deep” structure using simple features input to deep neural network (NN) architecture

Learn Feature Representa.on for each character

Learn Representa.on for each segment around current posi.on

Learning Func.on to map from representa.on to output class label

Criteria to train: Negative Log Likelihood Using Stochastic Gradient descent (SGD)

ü  The first layer in our deep

structure; ü  Idea: Characters are embed

in a vector space ü  Embedding are trained

Method: Character to Vector Representa0ons Learning

How to train this embedding layer: ü  (1). Supervised: Trained as a normal NN layer, using SGD, based on target

task’s training pairs ü  (2). Initialized with unsupervised “language model” (lm) pre-training: to

captures similarities among characters based on their contexts in the unsupervised sequences, e.g. Chinese Wiki, swissprot protein sequence DB

•  A Viterbi algorithm to capture spatial dependencies between y_i

•  i.e. optimize the whole sentence-level log-likelihood •  i.e. encourage valid paths of output tags •  = Output tag transition scores + deep network scores

Method: Modeling spa0al dependency among characters

y2 – y2 – y1– y1– y1– y2– y2 – y3 – y3 – y3 –y2 – y2

Experiments : Data Sets

c1: character unigrams, c2: character bigrams, lm: embedding obtained with deep language model, vit: Viterbi algorithm 11

Experiments : Performance Comparison

Why is our method preferable?

§  No particular task-specific feature engineering. §  Robust and flexible

§  Easily adaptable to other character-based tagging tasks, e.g. Japanese NER

References

[0] http://www.cs.cmu.edu/~qyj/zhSenna/ [1] R. Collobert et al, Natural language processing (almost) from scratch, JMLR 12 [2] Qi et al, A unified multitask architecture for predicting local protein properties. PLoS ONE 12 [3] Levow, G.A.: The third international chinese language processing bakeoff, SIGHAN 2006 [4] Y. LeCun et al. 1998. Efficient BackProp.

Summary:

deep learning for character-based information extractionqyj/papersa08/2014_ecir_deep.pdf ·...

Documents

the character of "character"

digging deep: making “deep” connections. relate to...

a deep learning framework for character motion synthesis

protein interaction networks: protein domain interaction...

deepmimic: example-guided deep reinforcement learning...

a deep learning framework for character motion synthesis...

god reveals secrets god reveals- -deep and secret things -...

modeling stylized character expressions via deep...

character education character trait trustworthiness

1. general mix, composition and character – 10 minutes ·...

an integrated approach to blood-based...

random forest for...

modeling stylized character expressions via deep...

deep learning based real time devanagari character...

6 basic character conflicts 1.character vs. self 2.character...

yanjun qi , pavel p. kuksa , ronan collobert kunihiko...

telugu ocr framework using deep learning - arxiv · telugu...

the purpose of this presentation is to provide an outline...

deep learning based isolated arabic scene character...

entitled -...