structure 2 clustering models - united states naval academy

47
Extracting Rich Event Structure from Text Models and Evaluations Clustering Models Nate Chambers US Naval Academy

Upload: others

Post on 06-Jun-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structure 2 Clustering Models - United States Naval Academy

Extracting Rich Event Structure from Text Models and Evaluations Clustering Models

Nate Chambers US Naval Academy

Page 2: Structure 2 Clustering Models - United States Naval Academy

A First Approach

Entity-focused

Clustering

Mutual Information

2

Page 3: Structure 2 Clustering Models - United States Naval Academy

The Protagonist

protagonist:

(noun)

1. the principal character in a drama or other literary work

2. a leading actor, character, or participant in a literary work or real event

3

Page 4: Structure 2 Clustering Models - United States Naval Academy

Inducing Narrative Relations

1. Dependency parse a document.

2. Run coreference to cluster entity mentions.

3. Count pairs of verbs with coreferring arguments.

4. Use pointwise mutual information to measure relatedness.

Chambers and Jurafsky. Unsupervised Learning of Narrative Event Chains. ACL-08

Narrative Coherence Assumption

Verbs sharing coreferring arguments are semantically connected

by virtue of narrative discourse structure.

Page 5: Structure 2 Clustering Models - United States Naval Academy

Example Text

5

The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.

Page 6: Structure 2 Clustering Models - United States Naval Academy

Example Text

6

The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.

Page 7: Structure 2 Clustering Models - United States Naval Academy

Example Text

7

The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.

Page 8: Structure 2 Clustering Models - United States Naval Academy

Example Text

8

The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.

The oil stopped

gushing from BP’s ruptured well

it capped

engineers working

engineers plug

plug it

The damaged Macondo well spewed

spewed 4.9m barrels of oil

spewed into the gulf

killed 11 people

BP said

risen to $6.1bn

Page 9: Structure 2 Clustering Models - United States Naval Academy

Example Text

9

The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.

The oil stopped

gushing from BP’s ruptured well

it capped

engineers working

engineers plug

plug it

The damaged Macondo well spewed

spewed 4.9m barrels of oil

spewed into the gulf

killed 11 people

BP said

risen to $6.1bn

Page 10: Structure 2 Clustering Models - United States Naval Academy

Example Text

10

The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.

The oil stopped gushing from BP’s ruptured well

it capped

engineers working

engineers plug

plug it

The damaged Macondo well spewed

spewed 4.9m barrels of oil

Page 11: Structure 2 Clustering Models - United States Naval Academy

11

Page 12: Structure 2 Clustering Models - United States Naval Academy

12

pmi(x,y) logp(x,y)

p(x)p(y)*min(C(x),C(y))

min(C(x),C(y))1

Page 13: Structure 2 Clustering Models - United States Naval Academy

Chain Example

13

Page 14: Structure 2 Clustering Models - United States Naval Academy

Schema Example

Police, Agent, Authorities

Judge, Official Prosecutor, Attorney

Plea, Guilty, Innocent Suspect, Criminal, Terrorist, …

14

Page 15: Structure 2 Clustering Models - United States Naval Academy

Narrative Schemas

15

N (E,C)E = {arrest, charge, plead, convict, sentence}

C {C1,C2,C3}

Page 16: Structure 2 Clustering Models - United States Naval Academy

Add a Verb to a Schema

16

narsim(N,v) max(, maxcCchainsim(c, v,d )

dDv

)

chainsim(c, v,d ) maxaArgs

(score(c,a) sim( e,d , v,d ,a)i1

n

)

sim( e,d , v,d ,a) pmi( e,d , v,d ) logC( e,d , v,d ,a)

maxvV

narsim(N,v)

Page 17: Structure 2 Clustering Models - United States Naval Academy

Learning Schemas

17

narsim(N,v) max(, maxcCchainsim(c, v,d )

dDv

)

Page 18: Structure 2 Clustering Models - United States Naval Academy

Argument Induction

• Induce semantic roles by scoring argument head words.

score( ) (1 )pmi(ei,e j )j i1

n

i1

n1

log( freq(ei,e j, ))

criminal suspect man student immigrant person

18

=

Page 19: Structure 2 Clustering Models - United States Naval Academy

Learned Example: Viral

19

mosquito, aids, virus, tick, catastrophe, disease

virus, disease, bacteria, cancer, toxoplasma, strain

Page 20: Structure 2 Clustering Models - United States Naval Academy

Learned Example: Authorship

company, author, group, year, microsoft, magazine

book, report, novel, article, story, letter, magazine

Page 21: Structure 2 Clustering Models - United States Naval Academy

Database of Schemas

• 1813 base verbs

• 596 unique schemas

• Various sizes of schemas (6, 8, 10, 12)

• Temporal ordering data

– Available online: http://www.usna.edu/Users/cs/nchamber/data/schemas/acl09

Page 22: Structure 2 Clustering Models - United States Naval Academy

So What?

22

Page 23: Structure 2 Clustering Models - United States Naval Academy

Information Extraction (MUC)

23

2. INCIDENT: DATE 11 JAN 90 3. INCIDENT: LOCATION BOLIVIA: LA PAZ (CITY) 4. INCIDENT: TYPE BOMBING 5. INCIDENT: STAGE OF EXECUTION ATTEMPTED 6. INCIDENT: INSTRUMENT ID "BOMB" 7. INCIDENT: INSTRUMENT TYPE BOMB: "BOMB" 8. PERP: INCIDENT CATEGORY TERRORIST ACT 9. PERP: INDIVIDUAL ID - 10. PERP: ORGANIZATION ID "ZARATE WILLKA LIBERATION ARMED FORCES" 11. PERP: ORGANIZATION CONFIDENCE CLAIMED OR ADMITTED: "ZARATE WILLKA LIBERATION ARMED FORCES" 12. PHYS TGT: ID "GOVERNMENT HOUSE" 13. PHYS TGT: TYPE GOVERNMENT OFFICE OR RESIDENCE: "GOVERNMENT HOUSE" 14. PHYS TGT: NUMBER 1: "GOVERNMENT HOUSE" 15. PHYS TGT: FOREIGN NATION - 16. PHYS TGT: EFFECT OF INCIDENT SOME DAMAGE: "GOVERNMENT HOUSE" 17. PHYS TGT: TOTAL NUMBER - 18. HUM TGT: NAME - 19. HUM TGT: DESCRIPTION "CABINET MEMBERS" / "CABINET MINISTERS" 20. HUM TGT: TYPE GOVERNMENT OFFICIAL: "CABINET MEMBERS" / "CABINET MINISTERS" 21. HUM TGT: NUMBER PLURAL: "CABINET MEMBERS" / "CABINET MINISTERS" 22. HUM TGT: FOREIGN NATION - 23. HUM TGT: EFFECT OF INCIDENT - 24. HUM TGT: TOTAL NUMBER -

• Message Understanding Conference – 1992

• Semantic representation (messages) of a situation (incident) and its key attributes.

Page 24: Structure 2 Clustering Models - United States Naval Academy

MUC 4 Contains “Structure”

• Focus on the core attributes. – Type of incident – Main agent (perpetrator) – Affected entities (targets, both physical and human)

24

BOMBING DATE 11 JAN 90 LOCATION BOLIVIA: LA PAZ (CITY) INSTRUMENT TYPE BOMB: "BOMB” PERP "ZARATE WILLKA LIBERATION ARMED FORCES” PHYS TGT "GOVERNMENT HOUSE” PHYS TGT: EFFECT SOME DAMAGE HUM TGT "CABINET MINISTERS”

Page 25: Structure 2 Clustering Models - United States Naval Academy

Labeled Template Schemas

1. Attack

2. Bombing

3. Kidnapping

4. Arson

25

Perpetrator Victim Target Instrument

• Assume these template schemas are unknown.

• Assume documents containing templates are unknown.

Page 26: Structure 2 Clustering Models - United States Naval Academy

Expanding the dataset

• MUC is a small dataset: 1300 documents

– The protagonist is too sparse.

26

Page 27: Structure 2 Clustering Models - United States Naval Academy

Expanding the dataset

• MUC is a small dataset: 1300 documents

– The protagonist is too sparse.

• Instead, first cluster words based on proximity

27

kidnap, release, abduct, kidnapping, ransom, robbery

detonate, blow up, plant, hurl, stage, launch, detain, suspect, set off

Page 28: Structure 2 Clustering Models - United States Naval Academy

Information Retrieval

Retrieve new documents

28

kidnap, release, abduct, kidnapping, board

83 MUC documents NYT Gigaword 1.1 billion tokens

3954 documents

(Ji and Grishman, 2008)

Page 29: Structure 2 Clustering Models - United States Naval Academy

Cluster the Syntactic Slots?

• How do we learn the slots?

• Could just use PMI as before, but we still have low document counts.

Solution

- Represent events (e.g., subject/throw) as word vectors and cluster based on vector distance.

29

Page 30: Structure 2 Clustering Models - United States Naval Academy

Cluster the Syntactic Slots?

Novel Idea

Create a narrative vector of protagonist connections.

Each event is a vector of other events with which it shared a coreferring argument. Value is the frequency of coref.

Compare two events x and y:

30

),cos( narnar yx

Page 31: Structure 2 Clustering Models - United States Naval Academy

Selectional Preferences

Borrowed Idea

Create a selectional preferences vector.

• Subject of Detonate

– Man, member, person, suspect, terrorist

• Object of Set off

– Dynamite, bomb, truck, explosive, device

31

cosine similarity

Katrin Erk, ACL 2007. Bergsma et al., ACL 2008. Zapirain et al., ACL 2009. Calvo et al., MICAI/CIARP 2009 Ritter et al., ACL 2010. Christian Scheible, LREC 2010. Walde, LREC 2010.

Page 32: Structure 2 Clustering Models - United States Naval Academy

Cluster the Syntactic Functions

• Agglomerative clustering, average link scoring

• Two types of cosine similarity

– Selectional preferences

– Narrative protagonist relations

32

sim(x,y) max(cos( xselpref ,yselpref ),cos(xnar,ynar))

),cos()1(),cos( narnarselprefselpref yxyx

Page 33: Structure 2 Clustering Models - United States Naval Academy

Constrain the Argument Classes

• Use WordNet (Fellbaum, 1998) to constrain types – Person/Organization

– Physical Object

– Other

• The subject of detonate (Person) – Man, member, person, suspect, terrorist

• The object of detonate (Physical Object) – Dynamite, bomb, truck, explosive

• Only cluster events with the same type.

33

Page 34: Structure 2 Clustering Models - United States Naval Academy

Learned Templates

34

Learned Template

explode, detonate, blow up, explosion, damage, cause…

Person: X detonate, X blow up, X plant, X hurl, X stage

Phys Object: explode X, hurl X, X cause, X go off, plant X, …

Person: X raid, X question, X investigate, X defuse, X arrest

Phys Object: destroy X, damage X, explode at X, throw at X, …

Page 35: Structure 2 Clustering Models - United States Naval Academy

Learned Templates

35

Bombing

explode, detonate, blow up, explosion, damage, cause…

Perpetrator:

Target:

Instrument:

Police:

Person who detonates, blows up, plants, hurls, stages, is detained, is suspected, is blamed on, launches

Object that is damaged, is destroyed, is exploded at, is thrown at, is hit, is struck

Object that is exploded, explodes, is hurled, causes, goes off, is planted, damages, is set off, is defused

Person who raids, questions, investigates, defuses, arrests, …

Page 36: Structure 2 Clustering Models - United States Naval Academy

Learned Templates

Kidnapping

36

kidnap, release, abduct, kidnapping, board

Perpetrator:

Victim:

Person who releases, kidnaps, abducts, ambushes, holds, forces, captures, frees

Person who is kidnapped, is released, is freed, escapes, disappears, travels, is harmed

Page 37: Structure 2 Clustering Models - United States Naval Academy

New Templates

37

Elections

37

choose, favor, turns out, pledges, unites, blame, deny…

Voter: Person who chooses, is intimidated, favors, is appealed to, turns out

Government: Org. that authorizes, is chosen, blames, denies

Candidate:

Smuggling

smuggle, transport, seize, confiscate, detain, capture…

Perpetrator: Person who smuggles, is seized from, is captured, is detained

Police: Person who raids, seizes, captures, confiscates, detains, investigates

Instrument: Object that is smuggled, is seized, is confiscated, is transported

Person who resigns, unites, advocates, manipulates, pledges, is blamed

Page 38: Structure 2 Clustering Models - United States Naval Academy

Evaluate Templates

38

1. Attack

2. Bombing

3. Kidnapping

4. Arson

Perpetrator Victim Target Instrument Police ??

Precision: 14 of 16 (88%)

Recall: 12 of 13 (92%)

Page 39: Structure 2 Clustering Models - United States Naval Academy

Induction Summary

1. Learned the original templates.

– Attack, Bombing, Kidnapping, Arson

2. Learned a new role (Police)

3. Learned new structures, not annotated by humans.

– Elections, Smuggling

• Now we can perform the standard extraction task.

39

Page 40: Structure 2 Clustering Models - United States Naval Academy

Extraction Approach

Kidnapping

40

kidnap, release, abduct, kidnapping, board

Perpetrator: X kidnap, X release, X abduct

Victim: kidnap X, release X, kidnapping of X, release of X, X’s release

They announced the initial release of the villagers last weekend.

Page 41: Structure 2 Clustering Models - United States Naval Academy

Evaluation

• Training: 1300 documents – Learned template structure – Developed extraction algorithm

• Testing: 200 documents – Extracted slot fillers (perpetrator, target, etc.)

• Metric: F1-Score – Standard metric; balances precision and recall

41

F1 2precision recall

precision recall

Page 42: Structure 2 Clustering Models - United States Naval Academy

MUC Example

Two bomb attacks were carried out in La Paz last night, one in front of Government House following the message to the nation over a radio and television network by president Jaime Paz Zamora.

The explosions did not cause any serious damage but the police were mobilized, fearing a wave of attacks.

The self-styled `` Zarate willka Liberation Armed Forces '' sent simultaneous written messages to the media, calling on the people to oppose the government…

The second attack occurred at 2335 ( 0335 GMT on 12 January ), just after the cabinet members had left Government House where they had listened to the presidential message.

A bomb was placed outside Government House in the parking lot that is used by cabinet ministers .

The police placed the bomb in a nearby flower bed, where it went off.

The shock wave shattered some windows in Government House and street lamps in the Plaza Murillo.

As of 0500 GMT today, the police had received reports of two other explosions in two La Paz neighborhoods, but these have not yet been confirmed.

42

4. INCIDENT: TYPE BOMBING 7. INCIDENT: INSTRUMENT TYPE BOMB: "BOMB" 9. PERP: INDIVIDUAL ID - 10. PERP: ORGANIZATION ID "ZARATE WILLKA LIBERATION ARMED FORCES” 12. PHYS TGT: ID "GOVERNMENT HOUSE” 19. HUM TGT: DESCRIPTION "CABINET MEMBERS" / "CABINET MINISTERS”

Page 43: Structure 2 Clustering Models - United States Naval Academy

F1 Scores of Extraction Systems

• Rule-based systems (Chinchor et al. 93, Rau et al. 92)

• Supervised systems (Patwardhan/Riloff 2007,2009; Huang 2011)

• Event Schemas Unsupervised

43

0 1.0 0.2 0.4 0.6 0.8

Slot Filling F1

Rule-based

Supervised

Unsupervised

Page 44: Structure 2 Clustering Models - United States Naval Academy

F1 Scores of Extraction Systems

• Rule-based systems (Chinchor et al. 93, Rau et al. 92)

• Supervised systems (Patwardhan/Riloff 2007,2009; Huang 2011)

• Event Schemas Unsupervised (2011)

44

0 1.0 0.2 0.4 0.6 0.8

Slot Filling F1

Rule-based

Supervised

Unsupervised

Latest performance

Page 45: Structure 2 Clustering Models - United States Naval Academy

Summary

• Extracted without knowing what needed to be extracted.

• .40 F1 within range of more-informed approaches

• The first results on MUC-3 without schema knowledge

45

Page 46: Structure 2 Clustering Models - United States Naval Academy

Shortly after…

• More Progress! – Jans et al. (EACL 2012) – Balasubramanian et al. (NAACL 2013) – Pichotta and Mooney (EACL 2014)

• Semantic Role Labeling – Gerber and Chai (ACL 2010), Best Paper Award

• Coreference

– Irwin et al. (CoNLL 2011), Rahman and Ng (EMNLP 2012)

• Generative Models

– Cheung et al. (NAACL 2013) – Bamman et al. (ACL 2013) – Chambers (EMNLP 2013) – Nguyen et al. (ACL 2015)

46

Page 47: Structure 2 Clustering Models - United States Naval Academy

Coming up

• Generative Models – Cheung et al. (NAACL 2013)

– Bamman et al. (ACL 2013)

– Chambers (EMNLP 2013)

– Nguyen et al. (ACL 2015)

47