learning and generating from structured datashalei120.github.io/docs/slideshow.pdf · the...

56
Learning and Generating from Structured Data Lei Sha Institute of Computational Linguistics, Peking University [email protected] May 28, 2018 Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 1 / 53

Upload: others

Post on 15-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Learning and Generating from Structured Data

Lei Sha

Institute of Computational Linguistics, Peking [email protected]

May 28, 2018

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 1 / 53

Page 2: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table of Contents

1 Learning Structured Data from Raw Text

2 Generating Text from Structured Data

3 Other Works

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 2 / 53

Page 3: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Introduction

Structured data can be used in two symmetrical procedures:

Text −→ Structured data

Structured data −→ Text

On September 11th, five hijackers crashed American Airlines

Flight 11 into the World Trade Center’s North Tower.

On September 11th, five hijackers crashed American Airlines

Flight 11 into the World Trade Center’s North Tower.

Learning Structured Data from Raw Text

Generating Text from Structured Data

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 3 / 53

Page 4: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table of Contents

1 Learning Structured Data from Raw Text

2 Generating Text from Structured Data

3 Other Works

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 4 / 53

Page 5: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Learning Structured Data from Raw Text

Structured data usually serve for applications like question answering,dialogues, and information retrieval

The tasks of learning structured data includes:

Relation extraction

Event extraction

We focused on event extraction in our research.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 5 / 53

Page 6: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction

Motivation:

Event extraction is important for knowledge acquisition from largeamounts of news text.

The result of event extraction can be used to construct knowledgebase, which can be applied to question answering, dialogue system,etc.

Its paradigm is ubiquitous in our daily life:

Knowledge GraphStructured summary of search engineWikipedia infobox

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 6 / 53

Page 7: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Applications of Event Extraction

The Google search result of September 11 attacks:

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 7 / 53

Page 8: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Applications of Event Extraction

The Wikipedia infobox of September 11 attacks:

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 8 / 53

Page 9: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Applications of Event Extraction

The extracted events can be transferred into triples and store in theknowledge graphs.The knowledge graphs can be leveraged by upper applications.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 9 / 53

Page 10: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction

What’s an event?

Event Type: Business

Trigger Release

ArgumentCompany MicrosoftProduct Surface Pro

Place USA

Figure: Microsoft releases surface Pro in USA.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 10 / 53

Page 11: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction

What’s an event?

Event Type: Attack

Trigger Crash

Argument

Attacker Five hijackers

TargetWorld Trade Center’s

North TowerInstrument American Airlines Flight 11

Time September 11th

Figure: On September 11th, five hijackers crashed American Airlines Flight 11into the World Trade Center’s North Tower.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 11 / 53

Page 12: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

What should we do?

Extract trigger

Identify arguments

Classify roles

Event Type: Die

Trigger Die

ArgumentVictim cameramanPlace Baghdad

Instrument American tank

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 12 / 53

Page 13: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

Challenges of event extraction:

Patterns are accurate (precision > 96%), but cannot cover every case

Pattern example

(weapon) tore [through] (building) at (place) ⇒ Attack{Roles· · · }

Some arguments tend to occur together: Cameraman & Americantank (in Die event)

Some arguments tend not to occur together: Baghdad & PalestineHotel (in Die event) (Hint: dependency distance is too long)

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 13 / 53

Page 14: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

Conventional pattern-based and classifier-based event extraction method

... In Baghdad, a cameraman died when ...

n, v, adj: trigger candidate

trigger = died find best pattern①

get arguments & roles

yes

MaxEnt for argument② MaxEnt for role③

MaxEnt for reportable event④

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 14 / 53

Page 15: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

Sha et al.: RBPB: Regularization-Based Pattern Balancing Method forEvent Extraction. (ACL 2016)

... In Baghdad, a cameraman died when ...

n, v, adj: trigger candidate

trigger = died

Classifier for Event type①

Classifier for argument② Classifier for role③

Regularization

... In Baghdad, a cameraman died when ...

n, v, adj: trigger candidate

trigger = died find best pattern①

get arguments & roles

yes

Classifier for argument② Classifier for role③

Classifier for reportable event④ Classifier for reportable event④

pattern balancing

argument-argument relationship

We transfer pattern information into embedded features

We use regularization method to capture argument-argumentrelationships

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 15 / 53

Page 16: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

How to use pattern better?

Turn pattern into event type probability distribution

pattern + other features → trigger identification & classification

Event type Trigger Pattern

Attack shoot pattern 1Injure shoot pattern 2

Die shoot pattern 3Injure shoot pattern 4Injure shoot pattern 5

Die shoot pattern 6Die shoot pattern 7

Attack shoot pattern 8Attack shoot pattern 9Attack shoot pattern 10

AttackInjure Die

Table: Candidate pattern result for trigger “shoot”

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 16 / 53

Page 17: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

Argument-argument relationship regularization

Arguments tend to occur together: Positive relationship

Arguments tend not to occur together: Negative relationship

We need to capture these two relationships:

Use a bunch of position features and dependency features to decidethe probability to be “Pos”: P(rel = “Pos”|argi , argj)The training set of arg-arg relationship classifier is generated from theoriginal dataset

We obtain the arg-arg relationship matrix M

Mi ,j = P(rel = “Pos”|argi , argj)

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 17 / 53

Page 18: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

We use a score function to evaluate the current configuration of argument:x

x(i) represents the role taken by i-th candidate argument

xbin = Bin(x)

xbin(i) represents whether i-th candidate argument is an argument forthe current trigger

Score(x) = x>binMxbin + farg(xbin) + frole(x)

farg: function for argument identificationfrole: function for role classification

We use Beam Search to find the best configuration (largest score)

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 18 / 53

Page 19: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

Example of the quadratic item:

Assume that “Baghdad” and “Palestine hotel” have negativerelationship

In this example, the closer the value is to 1, the more likely it ispositive; the closer the value is to 0, the more likely it is negative

In Baghdad, a cameraman died when an American tank fired on the Palestine hotel.

1.0 0.5 0.9 0.20.5 1.0 0.5 0.50.9 0.5 1.0 0.60.2 0.5 0.6 1.0

1 0 0 110 0 1

= 0.2 + …

Figure: Example calculation process of negative relation.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 19 / 53

Page 20: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

Another example of the quadratic item:

Assume that “Baghdad” and “American tank” have positiverelationship

Again, the closer the value is to 1, the more likely it is positive; thecloser the value is to 0, the more likely it is negative

In Baghdad, a cameraman died when an American tank fired on the Palestine hotel.

1.0 0.5 0.9 0.20.5 1.0 0.5 0.50.9 0.5 1.0 0.60.2 0.5 0.6 1.0

1 0 1 010 10

= 0.9 + …

Figure: Example calculation process of positive relation.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 20 / 53

Page 21: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

Argument-argument relationship regularization

Pos relation vs. Neg relationBilinear in score: Score(x) = x>binMxbin + farg(xbin) + frole(x)

We found that... Oops! That doesn’t work.

Strengthen the two relationships

Map float numbers of Mij to discrete integers: [0, 1] −→ −1, 0, 1

Powerful bombA waiting shed

Davao airportBus

Powerful bomb

A waiting shed

Davao airport

Bus

Powerful bombA waiting shed

Davao airportBus

Powerful bomb

A waiting shed

Davao airport

Bus

Figure: Visualization of M before and after strengthen.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 21 / 53

Page 22: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

Argument-argument relationship regularization

Pos relation vs. Neg relationBilinear in score: Score(x) = x>binMxbin + farg(xbin) + frole(x)

We found that...

Oops! That doesn’t work.

Strengthen the two relationships

Map float numbers of Mij to discrete integers: [0, 1] −→ −1, 0, 1

Powerful bombA waiting shed

Davao airportBus

Powerful bomb

A waiting shed

Davao airport

Bus

Powerful bombA waiting shed

Davao airportBus

Powerful bomb

A waiting shed

Davao airport

Bus

Figure: Visualization of M before and after strengthen.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 21 / 53

Page 23: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

Argument-argument relationship regularization

Pos relation vs. Neg relationBilinear in score: Score(x) = x>binMxbin + farg(xbin) + frole(x)

We found that... Oops! That doesn’t work.

Strengthen the two relationships

Map float numbers of Mij to discrete integers: [0, 1] −→ −1, 0, 1

Powerful bombA waiting shed

Davao airportBus

Powerful bomb

A waiting shed

Davao airport

Bus

Powerful bombA waiting shed

Davao airportBus

Powerful bomb

A waiting shed

Davao airport

Bus

Figure: Visualization of M before and after strengthen.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 21 / 53

Page 24: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

Argument-argument relationship regularization

Pos relation vs. Neg relationBilinear in score: Score(x) = x>binMxbin + farg(xbin) + frole(x)

We found that... Oops! That doesn’t work.

Strengthen the two relationships

Map float numbers of Mij to discrete integers: [0, 1] −→ −1, 0, 1

Powerful bombA waiting shed

Davao airportBus

Powerful bomb

A waiting shed

Davao airport

Bus

Powerful bombA waiting shed

Davao airportBus

Powerful bomb

A waiting shed

Davao airport

Bus

Figure: Visualization of M before and after strengthen.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 21 / 53

Page 25: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

Example of the quadratic item (strengthened):

Assume that “Baghdad” and “Palestine hotel” have negativerelationship

In this example, 1 represents positive relationship, −1 representsnegative relationship, 0 means unclear relationship.

In Baghdad, a cameraman died when an American tank fired on the Palestine hotel.

1 0 1 -1 0 1 0 0 1 0 1 0-1 0 0 1

1 0 0 110 0 1

= -1 + …

Figure: Example calculation process of negative relation.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 22 / 53

Page 26: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

Another example of the quadratic item (strengthened):

Assume that “Baghdad” and “American tank” have positiverelationship

Again, 1 represents positive relationship, −1 represents negativerelationship, 0 means unclear relationship.

In Baghdad, a cameraman died when an American tank fired on the Palestine hotel.

1 0 1 -1 0 1 0 0 1 0 1 0-1 0 0 1

1 0 1 010 10

= 1 + …

Figure: Example calculation process of positive relation.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 23 / 53

Page 27: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

Can we do better?Challenges of event extraction by the previous solutions√

Using syntax information as feature

× Using syntax information as architecture√

Capture two kinds of argument-argument relationship (Pos & Neg)

× Capture large amount of argument-argument relationship

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 24 / 53

Page 28: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

Sha et al.(AAAI 2018) Jointly Extracting Event Triggers and Argumentsby Dependency-Bridge RNN and Tensor-Based Argument Interaction

Motivation 1:

Dependency relation −→ Dependency bridge

According to definition of dependency relation, dependency edgesusually contain some information about temporal, consequence,conditional or purpose.

Figure: Example of dependency parse tree.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 25 / 53

Page 29: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

We add dependency bridges to conventional LSTM-RNN architecture.Bidirectionality:

Forward: Set all dependency bridges as forward.Backward: Set all dependency bridges as backward.

A cameraman died when tank fired on the hotel

LSTM layer

Figure: Dependency bridge on LSTM. Apart from the last LSTM cell, each cellalso receives information from former syntactically related cells.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 26 / 53

Page 30: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

Details of dependency bridge

We add a new gate dt and change the calculation of hidden state.

ht = ot � tanh(ct) + dt �(

1|Sin|

∑(i ,p)∈Sin aphi

)

𝑋#

𝜎 𝜎 𝑡𝑎𝑛ℎ 𝜎

× +

×𝑡𝑎𝑛ℎ

×

ℎ#

𝑓# 𝑖# 𝐶#~ 𝑜#

𝐶#01

ℎ#01

𝐶#

ℎ#

𝑋#

𝜎 𝜎 𝑡𝑎𝑛ℎ 𝜎

× +

×𝑡𝑎𝑛ℎ

×

ℎ#

𝑓# 𝑖# 𝐶#~ 𝑜#

𝐶#01

ℎ#01

𝐶#

ℎ#

𝜎

×

𝑊34567

𝑊345894

𝑊:;<=>

ℎ?ℎ>ℎ@

𝑑#

+

Figure: The calculation detail of dependency bridge.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 27 / 53

Page 31: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

Motivation 2:

We represent each arg-arg relationship by a vector

We use a tensor to represent all kinds of arg-arg relationships in asentence

Tensor layerD

NN

Representation of Candidate Arguments

Figure: The calculation detail of tensor layer.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 28 / 53

Page 32: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

The whole architecture ...

Tensor layer is applied to the hidden layer of the dependency bridgeRNN

Then we apply max-pooling over arguments to find the mostimportant “interactive features” for the arguments

Tensor layer

...

Acameraman

diedwhentankfired

onthe

hotel

Candidate trigger

Candidate Arguments

DBLSTM

-RNN

DBRNN layer Pooling layer Output layer

DNN

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 29 / 53

Page 33: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Weights of each dependency relation

nm

od:a

ssm

od cc

nm

od:t

mod

ccom

pnsu

bjp

ass

advm

od:c

om

pco

nj

am

od

mark

:clf

com

pound:v

caux:a

sp etc

aux:p

rtm

od

neg

dis

cours

eadvcl

:loc

auxpass

aux:m

odal

nsu

bj:xsu

bj

para

:prn

mod

nm

od:t

opic

xco

mp

nsu

bj

num

mod

advm

od

punct

nm

od:r

ange

nm

od:p

rep

case

cop

am

od:o

dm

od

nam

edep

appos

det

nm

od

dobj

mark

advm

od:loc

com

pound:n

nadvm

od:d

vp

root

aux:b

aacl

0.0

0.5

1.0

1.5

2.0The w

eig

hts

of

each

dependency

rela

tions

Figure: The visualization of trained weights of each dependency relations.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 30 / 53

Page 34: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Event Extraction from News Text

Figure: Performances of various approaches on ACE 2005 dataset.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 31 / 53

Page 35: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table of Contents

1 Learning Structured Data from Raw Text

2 Generating Text from Structured Data

3 Other Works

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 32 / 53

Page 36: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table-to-Text Brief Summary Generation

A table can be a list of RBF tuples:

John E Blaha birthDate 1942,08,26

John E Blaha birthPlace San Antonio

John E Blaha occupation Fighter pilot

San Antonio located in USA

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 33 / 53

Page 37: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table-to-Text Brief Summary Generation

A table can be also a list of attributes (like Wiki infobox):

Figure: An example of Wikipedia infobox.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 34 / 53

Page 38: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table-to-Text Brief Summary Generation

Generate brief summary from structured data is useful

In the last step of QA system, Table-to-text is used to generateanswer.

User

Question Analysis

Search from KB

Answer Generation

Answer

Question

Table

Text

Figure: Table-to-text in question answering system.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 35 / 53

Page 39: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table-to-Text Brief Summary Generation

Table-to-text can also be used to generate response in dialogue system

Slot extraction Intenttracking

Statetracking

SearchKB

Slot extraction Intenttracking

Statetracking

SearchKB

Table to TextGenerator

Table to TextGenerator

Figure: Table-to-text in dialogue system.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 36 / 53

Page 40: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table-to-Text Brief Summary Generation

We generate brief summary for wikipedia infobox

Sir Arthur Ignatius Conan Doyle(22 May 1859 – 7 July 1930) wasa British writer best known forhis detective fiction featuring thecharacter Sherlock Holmes.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 37 / 53

Page 41: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table-to-Text Brief Summary Generation

Motivation:

Traditional: language model based generator

Use probability of word-by-word: P(wt |wt−1)Different from human’s generation process

Human: first plan for order, then write

Use probability of field-by-field: P(ft |ft−1)

We propose to add human nature into machine learning models

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 38 / 53

Page 42: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table-to-Text Brief Summary Generation

Sha et al. (AAAI 2018) Order-Planning Neural Text Generation FromStructured Data (arxiv)

Content-based attention

Use the last output word yt−1 to predict the importance of each tablecontent for the next output.

Link-based attention

See which field we are going to generate this time.

Hybrid attention

Combine content-based and link-based attention together.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 39 / 53

Page 43: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table-to-Text Brief Summary Generation

How to build field-by-field probability (P(ft |ft−1))?

The element in the i-th row and j-th column is the probability of fieldj occurs after field i

Figure: Field-by-field probability matrix (Link matrix).

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 40 / 53

Page 44: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table-to-Text Brief Summary Generation

How to build link sub-matrix?

NameBorn Occupation

Nationality

Name

Born

Occupation

Nationality Link matrix

Link sub-matrix

Figure: The process of select link sub-matrix.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 41 / 53

Page 45: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table-to-Text Brief Summary Generation

We calculate the hybrid attention as follows:

(a) Encoder: Table Representation(b) Dispatcher: Planning What to Generate Next

Name Arthur

Name Ignatius

Name Conan

Name Doyle

Born 22

Born May

Born 1859

Occupation writer

Occupation physician

Nationality British

LST

M=

Last step'sattention

Link (sub)matrix

Content-basedattention

Link-basedattention

Hybrid attention

Weightedsum

Attentionvector

Field Content

Figure: Illustration of content-based attention and link-based attention.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 42 / 53

Page 46: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table-to-Text Brief Summary Generation

Then we generate text according to the hybrid attention:

Table content

LSTM LSTM LSTM

<start>

. . . LSTM

. . .

<eos>

LSTM

LastLSTMstate

Attentionvector

Embedding of thegenerated wordin last step

Figure: The decoder in our model, which is incorporated with a copyingmechanism.Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 43 / 53

Page 47: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table-to-Text Brief Summary Generation

Overall performance of our model:

Figure: Comparison of the overall performance between our model and previousmethods. lBest results in Lebret, Grangier, and Auli (2016).

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 44 / 53

Page 48: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table-to-Text Brief Summary Generation

Simple case study:

Figure: Case study. Left: Wikipedia infobox. Right: A reference and twogenerated sentences by different attention (both with the copy mechanism).

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 45 / 53

Page 49: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table-to-Text Brief Summary Generation

Visualization of attention probabilities in our model.

x-axis: generated words “...) was an american economist ...”;y-axis: 〈field : content word〉 pairs in the table.

)w

as anam

eric

anec

onom

ist

death place:uniteddeath place:states

nationality:americanoccupation:governor

occupation:ofoccupation:the

occupation:federaloccupation:reserveoccupation:system

occupation:,occupation:economics

occupation:professorknown for:expert

(a) αcontent

)w

as anam

eric

anec

onom

ist

(b) αlink

)w

as anam

eric

anec

onom

ist

(c) αhybrid

Figure: Subplot (b) exhibits strips because, by definition, link-based attention willyield the same score for all content words with the same field.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 46 / 53

Page 50: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Table of Contents

1 Learning Structured Data from Raw Text

2 Generating Text from Structured Data

3 Other Works

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 47 / 53

Page 51: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Other works

Research papers:

Event schema induction (Sha et al, NAACL 2016)

Chinese SRL (Sha et al, EMNLP 2016)

Textual Entailment Recognition (Sha et al, EMNLP 2015, Coling2016)

Repeated Reading (Sha et al, NLPCC 2017)

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 48 / 53

Page 52: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Other works

Automatically Design Neural Network Architecture (in SinovationVentures)

Implement a simple version of paper “Neural Architecture SearchWith Reinforcement Learning”Use policy gradient method to increase the probability of samplingchild networks with high reward

Figure: The process of a parent controller recurrent neural network sampling achild network.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 49 / 53

Page 53: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Other works

Multi-intent switch dialogue system (in MSRA system group)

Figure: The architecture of ISwitch.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 50 / 53

Page 54: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Publications (lead author)

Sha et al: Order-Planning Neural Text Generation From Structured Data. InAAAI 2018.

Sha et al: Joint Extracting Event Trigger and Arguments Using dependency bridgeRecurrent Tensor Network. In AAAI 2018.

Sha et al: Multi-View Fusion Neural Network for Answer Selection. In AAAI 2018.

Sha et al: Will Repeated Reading Benefit Natural Language Understanding? InNLPCC 2017.

Sha et al: Reading and Thinking: Re-read LSTM Unit for Textual EntailmentRecognition. In Coling 2016.

Sha et al: Capturing Argument Connection for Chinese Semantic Role Labeling. InEMNLP 2016.

Sha et al: RBPB: Regularization-Based Pattern Balancing Method for EventExtraction. In ACL 2016.

Sha et al: Joint Learning Templates and Slots for Event Schema Induction. InNAACL 2016.

Sha et al: Recognizing Textual Entailment Using Probabilistic Inference. InEMNLP 2015.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 51 / 53

Page 55: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Publications (co-author)

Feng Qian, Lei Sha, Baobao Chang, Lu-chen Liu, Ming Zhang: Syntax AwareLSTM Model for Chinese Semantic Role Labeling. In EMNLP 2017

Qiaolin Xia, Lei Sha, Baobao Chang, Zhifang Sui: A Progressive LearningApproach to Chinese SRL Using Heterogeneous Data. In ACL 2017

Xiaodong Zhang, Lei Sha, Sujian Li, Houfeng Wang: Attentive Interactive NeuralNetworks for Answer Selection in Community Question Answering. In AAAI 2017

Tingsong Jiang, Tianyu Liu, Tao Ge, Lei Sha, Sujian Li, Baobao Chang andZhifang Sui: Encoding Temporal Information for Time-Aware Link Prediction. InEMNLP 2016.

Tingsong Jiang, Tianyu Liu, Tao Ge, Lei Sha, Sujian Li, Baobao Chang andZhifang Sui: Towards Time-Aware Knowledge Graph Completion. In Coling 2016.

Li Li, Houfeng Wang, Lei Sha, Xu Sun, Baobao Chang, Shi Zhao: Multi-labelText Categorization with Joint Learning Predictions-as-Features Method. InEMNLP 2015.

Tingsong Jiang, Lei Sha and Zhifang Sui: Event Schema Induction Based onRelational Co-occurrence over Multiple Documents. In NLPCC 2014.

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 52 / 53

Page 56: Learning and Generating from Structured Datashalei120.github.io/docs/slideshow.pdf · the probability to be \Pos": P(rel = \Pos"jarg i;arg j) The training set of arg-arg relationship

Thank you. Any questions?

Lei Sha (ICL, Peking University) Learning and Generating from Structured Data May 28, 2018 53 / 53