structure 2 clustering models - united states naval academy

Post on 06-Jun-2022

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Extracting Rich Event Structure from Text Models and Evaluations Clustering Models

Nate Chambers US Naval Academy

A First Approach

Entity-focused

Clustering

Mutual Information

2

The Protagonist

protagonist:

(noun)

1. the principal character in a drama or other literary work

2. a leading actor, character, or participant in a literary work or real event

3

Inducing Narrative Relations

1. Dependency parse a document.

2. Run coreference to cluster entity mentions.

3. Count pairs of verbs with coreferring arguments.

4. Use pointwise mutual information to measure relatedness.

Chambers and Jurafsky. Unsupervised Learning of Narrative Event Chains. ACL-08

Narrative Coherence Assumption

Verbs sharing coreferring arguments are semantically connected

by virtue of narrative discourse structure.

Example Text

5

The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.

Example Text

6

The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.

Example Text

7

The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.

Example Text

8

The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.

The oil stopped

gushing from BP’s ruptured well

it capped

engineers working

engineers plug

plug it

The damaged Macondo well spewed

spewed 4.9m barrels of oil

spewed into the gulf

killed 11 people

BP said

risen to $6.1bn

Example Text

9

The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.

The oil stopped

gushing from BP’s ruptured well

it capped

engineers working

engineers plug

plug it

The damaged Macondo well spewed

spewed 4.9m barrels of oil

spewed into the gulf

killed 11 people

BP said

risen to $6.1bn

Example Text

10

The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.

The oil stopped gushing from BP’s ruptured well

it capped

engineers working

engineers plug

plug it

The damaged Macondo well spewed

spewed 4.9m barrels of oil

11

12

pmi(x,y) logp(x,y)

p(x)p(y)*min(C(x),C(y))

min(C(x),C(y))1

Chain Example

13

Schema Example

Police, Agent, Authorities

Judge, Official Prosecutor, Attorney

Plea, Guilty, Innocent Suspect, Criminal, Terrorist, …

14

Narrative Schemas

15

N (E,C)E = {arrest, charge, plead, convict, sentence}

C {C1,C2,C3}

Add a Verb to a Schema

16

narsim(N,v) max(, maxcCchainsim(c, v,d )

dDv

)

chainsim(c, v,d ) maxaArgs

(score(c,a) sim( e,d , v,d ,a)i1

n

)

sim( e,d , v,d ,a) pmi( e,d , v,d ) logC( e,d , v,d ,a)

maxvV

narsim(N,v)

Learning Schemas

17

narsim(N,v) max(, maxcCchainsim(c, v,d )

dDv

)

Argument Induction

• Induce semantic roles by scoring argument head words.

score( ) (1 )pmi(ei,e j )j i1

n

i1

n1

log( freq(ei,e j, ))

criminal suspect man student immigrant person

18

=

Learned Example: Viral

19

mosquito, aids, virus, tick, catastrophe, disease

virus, disease, bacteria, cancer, toxoplasma, strain

Learned Example: Authorship

company, author, group, year, microsoft, magazine

book, report, novel, article, story, letter, magazine

Database of Schemas

• 1813 base verbs

• 596 unique schemas

• Various sizes of schemas (6, 8, 10, 12)

• Temporal ordering data

– Available online: http://www.usna.edu/Users/cs/nchamber/data/schemas/acl09

So What?

22

Information Extraction (MUC)

23

2. INCIDENT: DATE 11 JAN 90 3. INCIDENT: LOCATION BOLIVIA: LA PAZ (CITY) 4. INCIDENT: TYPE BOMBING 5. INCIDENT: STAGE OF EXECUTION ATTEMPTED 6. INCIDENT: INSTRUMENT ID "BOMB" 7. INCIDENT: INSTRUMENT TYPE BOMB: "BOMB" 8. PERP: INCIDENT CATEGORY TERRORIST ACT 9. PERP: INDIVIDUAL ID - 10. PERP: ORGANIZATION ID "ZARATE WILLKA LIBERATION ARMED FORCES" 11. PERP: ORGANIZATION CONFIDENCE CLAIMED OR ADMITTED: "ZARATE WILLKA LIBERATION ARMED FORCES" 12. PHYS TGT: ID "GOVERNMENT HOUSE" 13. PHYS TGT: TYPE GOVERNMENT OFFICE OR RESIDENCE: "GOVERNMENT HOUSE" 14. PHYS TGT: NUMBER 1: "GOVERNMENT HOUSE" 15. PHYS TGT: FOREIGN NATION - 16. PHYS TGT: EFFECT OF INCIDENT SOME DAMAGE: "GOVERNMENT HOUSE" 17. PHYS TGT: TOTAL NUMBER - 18. HUM TGT: NAME - 19. HUM TGT: DESCRIPTION "CABINET MEMBERS" / "CABINET MINISTERS" 20. HUM TGT: TYPE GOVERNMENT OFFICIAL: "CABINET MEMBERS" / "CABINET MINISTERS" 21. HUM TGT: NUMBER PLURAL: "CABINET MEMBERS" / "CABINET MINISTERS" 22. HUM TGT: FOREIGN NATION - 23. HUM TGT: EFFECT OF INCIDENT - 24. HUM TGT: TOTAL NUMBER -

• Message Understanding Conference – 1992

• Semantic representation (messages) of a situation (incident) and its key attributes.

MUC 4 Contains “Structure”

• Focus on the core attributes. – Type of incident – Main agent (perpetrator) – Affected entities (targets, both physical and human)

24

BOMBING DATE 11 JAN 90 LOCATION BOLIVIA: LA PAZ (CITY) INSTRUMENT TYPE BOMB: "BOMB” PERP "ZARATE WILLKA LIBERATION ARMED FORCES” PHYS TGT "GOVERNMENT HOUSE” PHYS TGT: EFFECT SOME DAMAGE HUM TGT "CABINET MINISTERS”

Labeled Template Schemas

1. Attack

2. Bombing

3. Kidnapping

4. Arson

25

Perpetrator Victim Target Instrument

• Assume these template schemas are unknown.

• Assume documents containing templates are unknown.

Expanding the dataset

• MUC is a small dataset: 1300 documents

– The protagonist is too sparse.

26

Expanding the dataset

• MUC is a small dataset: 1300 documents

– The protagonist is too sparse.

• Instead, first cluster words based on proximity

27

kidnap, release, abduct, kidnapping, ransom, robbery

detonate, blow up, plant, hurl, stage, launch, detain, suspect, set off

Information Retrieval

Retrieve new documents

28

kidnap, release, abduct, kidnapping, board

83 MUC documents NYT Gigaword 1.1 billion tokens

3954 documents

(Ji and Grishman, 2008)

Cluster the Syntactic Slots?

• How do we learn the slots?

• Could just use PMI as before, but we still have low document counts.

Solution

- Represent events (e.g., subject/throw) as word vectors and cluster based on vector distance.

29

Cluster the Syntactic Slots?

Novel Idea

Create a narrative vector of protagonist connections.

Each event is a vector of other events with which it shared a coreferring argument. Value is the frequency of coref.

Compare two events x and y:

30

),cos( narnar yx

Selectional Preferences

Borrowed Idea

Create a selectional preferences vector.

• Subject of Detonate

– Man, member, person, suspect, terrorist

• Object of Set off

– Dynamite, bomb, truck, explosive, device

31

cosine similarity

Katrin Erk, ACL 2007. Bergsma et al., ACL 2008. Zapirain et al., ACL 2009. Calvo et al., MICAI/CIARP 2009 Ritter et al., ACL 2010. Christian Scheible, LREC 2010. Walde, LREC 2010.

Cluster the Syntactic Functions

• Agglomerative clustering, average link scoring

• Two types of cosine similarity

– Selectional preferences

– Narrative protagonist relations

32

sim(x,y) max(cos( xselpref ,yselpref ),cos(xnar,ynar))

),cos()1(),cos( narnarselprefselpref yxyx

Constrain the Argument Classes

• Use WordNet (Fellbaum, 1998) to constrain types – Person/Organization

– Physical Object

– Other

• The subject of detonate (Person) – Man, member, person, suspect, terrorist

• The object of detonate (Physical Object) – Dynamite, bomb, truck, explosive

• Only cluster events with the same type.

33

Learned Templates

34

Learned Template

explode, detonate, blow up, explosion, damage, cause…

Person: X detonate, X blow up, X plant, X hurl, X stage

Phys Object: explode X, hurl X, X cause, X go off, plant X, …

Person: X raid, X question, X investigate, X defuse, X arrest

Phys Object: destroy X, damage X, explode at X, throw at X, …

Learned Templates

35

Bombing

explode, detonate, blow up, explosion, damage, cause…

Perpetrator:

Target:

Instrument:

Police:

Person who detonates, blows up, plants, hurls, stages, is detained, is suspected, is blamed on, launches

Object that is damaged, is destroyed, is exploded at, is thrown at, is hit, is struck

Object that is exploded, explodes, is hurled, causes, goes off, is planted, damages, is set off, is defused

Person who raids, questions, investigates, defuses, arrests, …

Learned Templates

Kidnapping

36

kidnap, release, abduct, kidnapping, board

Perpetrator:

Victim:

Person who releases, kidnaps, abducts, ambushes, holds, forces, captures, frees

Person who is kidnapped, is released, is freed, escapes, disappears, travels, is harmed

New Templates

37

Elections

37

choose, favor, turns out, pledges, unites, blame, deny…

Voter: Person who chooses, is intimidated, favors, is appealed to, turns out

Government: Org. that authorizes, is chosen, blames, denies

Candidate:

Smuggling

smuggle, transport, seize, confiscate, detain, capture…

Perpetrator: Person who smuggles, is seized from, is captured, is detained

Police: Person who raids, seizes, captures, confiscates, detains, investigates

Instrument: Object that is smuggled, is seized, is confiscated, is transported

Person who resigns, unites, advocates, manipulates, pledges, is blamed

Evaluate Templates

38

1. Attack

2. Bombing

3. Kidnapping

4. Arson

Perpetrator Victim Target Instrument Police ??

Precision: 14 of 16 (88%)

Recall: 12 of 13 (92%)

Induction Summary

1. Learned the original templates.

– Attack, Bombing, Kidnapping, Arson

2. Learned a new role (Police)

3. Learned new structures, not annotated by humans.

– Elections, Smuggling

• Now we can perform the standard extraction task.

39

Extraction Approach

Kidnapping

40

kidnap, release, abduct, kidnapping, board

Perpetrator: X kidnap, X release, X abduct

Victim: kidnap X, release X, kidnapping of X, release of X, X’s release

They announced the initial release of the villagers last weekend.

Evaluation

• Training: 1300 documents – Learned template structure – Developed extraction algorithm

• Testing: 200 documents – Extracted slot fillers (perpetrator, target, etc.)

• Metric: F1-Score – Standard metric; balances precision and recall

41

F1 2precision recall

precision recall

MUC Example

Two bomb attacks were carried out in La Paz last night, one in front of Government House following the message to the nation over a radio and television network by president Jaime Paz Zamora.

The explosions did not cause any serious damage but the police were mobilized, fearing a wave of attacks.

The self-styled `` Zarate willka Liberation Armed Forces '' sent simultaneous written messages to the media, calling on the people to oppose the government…

The second attack occurred at 2335 ( 0335 GMT on 12 January ), just after the cabinet members had left Government House where they had listened to the presidential message.

A bomb was placed outside Government House in the parking lot that is used by cabinet ministers .

The police placed the bomb in a nearby flower bed, where it went off.

The shock wave shattered some windows in Government House and street lamps in the Plaza Murillo.

As of 0500 GMT today, the police had received reports of two other explosions in two La Paz neighborhoods, but these have not yet been confirmed.

42

4. INCIDENT: TYPE BOMBING 7. INCIDENT: INSTRUMENT TYPE BOMB: "BOMB" 9. PERP: INDIVIDUAL ID - 10. PERP: ORGANIZATION ID "ZARATE WILLKA LIBERATION ARMED FORCES” 12. PHYS TGT: ID "GOVERNMENT HOUSE” 19. HUM TGT: DESCRIPTION "CABINET MEMBERS" / "CABINET MINISTERS”

F1 Scores of Extraction Systems

• Rule-based systems (Chinchor et al. 93, Rau et al. 92)

• Supervised systems (Patwardhan/Riloff 2007,2009; Huang 2011)

• Event Schemas Unsupervised

43

0 1.0 0.2 0.4 0.6 0.8

Slot Filling F1

Rule-based

Supervised

Unsupervised

F1 Scores of Extraction Systems

• Rule-based systems (Chinchor et al. 93, Rau et al. 92)

• Supervised systems (Patwardhan/Riloff 2007,2009; Huang 2011)

• Event Schemas Unsupervised (2011)

44

0 1.0 0.2 0.4 0.6 0.8

Slot Filling F1

Rule-based

Supervised

Unsupervised

Latest performance

Summary

• Extracted without knowing what needed to be extracted.

• .40 F1 within range of more-informed approaches

• The first results on MUC-3 without schema knowledge

45

Shortly after…

• More Progress! – Jans et al. (EACL 2012) – Balasubramanian et al. (NAACL 2013) – Pichotta and Mooney (EACL 2014)

• Semantic Role Labeling – Gerber and Chai (ACL 2010), Best Paper Award

• Coreference

– Irwin et al. (CoNLL 2011), Rahman and Ng (EMNLP 2012)

• Generative Models

– Cheung et al. (NAACL 2013) – Bamman et al. (ACL 2013) – Chambers (EMNLP 2013) – Nguyen et al. (ACL 2015)

46

Coming up

• Generative Models – Cheung et al. (NAACL 2013)

– Bamman et al. (ACL 2013)

– Chambers (EMNLP 2013)

– Nguyen et al. (ACL 2015)

47

top related