structure 2 clustering models - united states naval academy
TRANSCRIPT
Extracting Rich Event Structure from Text Models and Evaluations Clustering Models
Nate Chambers US Naval Academy
A First Approach
Entity-focused
Clustering
Mutual Information
2
The Protagonist
protagonist:
(noun)
1. the principal character in a drama or other literary work
2. a leading actor, character, or participant in a literary work or real event
3
Inducing Narrative Relations
1. Dependency parse a document.
2. Run coreference to cluster entity mentions.
3. Count pairs of verbs with coreferring arguments.
4. Use pointwise mutual information to measure relatedness.
Chambers and Jurafsky. Unsupervised Learning of Narrative Event Chains. ACL-08
Narrative Coherence Assumption
Verbs sharing coreferring arguments are semantically connected
by virtue of narrative discourse structure.
Example Text
5
The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.
Example Text
6
The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.
Example Text
7
The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.
Example Text
8
The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.
The oil stopped
gushing from BP’s ruptured well
it capped
engineers working
engineers plug
plug it
The damaged Macondo well spewed
spewed 4.9m barrels of oil
spewed into the gulf
killed 11 people
BP said
risen to $6.1bn
Example Text
9
The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.
The oil stopped
gushing from BP’s ruptured well
it capped
engineers working
engineers plug
plug it
The damaged Macondo well spewed
spewed 4.9m barrels of oil
spewed into the gulf
killed 11 people
BP said
risen to $6.1bn
Example Text
10
The oil stopped gushing from BP’s ruptured well in the Gulf of Mexico when it was capped on July 15 and engineers have since been working to permanently plug it. The damaged Macondo well has spewed about 4.9m barrels of oil into the gulf after an explosion on April 20 aboard the Deepwater Horizon rig which killed 11 people. BP said on Monday that its costs for stopping and cleaning up the spill had risen to $6.1bn.
The oil stopped gushing from BP’s ruptured well
it capped
engineers working
engineers plug
plug it
The damaged Macondo well spewed
spewed 4.9m barrels of oil
11
12
pmi(x,y) logp(x,y)
p(x)p(y)*min(C(x),C(y))
min(C(x),C(y))1
Chain Example
13
Schema Example
Police, Agent, Authorities
Judge, Official Prosecutor, Attorney
Plea, Guilty, Innocent Suspect, Criminal, Terrorist, …
14
Narrative Schemas
15
N (E,C)E = {arrest, charge, plead, convict, sentence}
C {C1,C2,C3}
Add a Verb to a Schema
16
narsim(N,v) max(, maxcCchainsim(c, v,d )
dDv
)
chainsim(c, v,d ) maxaArgs
(score(c,a) sim( e,d , v,d ,a)i1
n
)
sim( e,d , v,d ,a) pmi( e,d , v,d ) logC( e,d , v,d ,a)
maxvV
narsim(N,v)
Learning Schemas
17
narsim(N,v) max(, maxcCchainsim(c, v,d )
dDv
)
Argument Induction
• Induce semantic roles by scoring argument head words.
score( ) (1 )pmi(ei,e j )j i1
n
i1
n1
log( freq(ei,e j, ))
criminal suspect man student immigrant person
18
=
Learned Example: Viral
19
mosquito, aids, virus, tick, catastrophe, disease
virus, disease, bacteria, cancer, toxoplasma, strain
Learned Example: Authorship
company, author, group, year, microsoft, magazine
book, report, novel, article, story, letter, magazine
Database of Schemas
• 1813 base verbs
• 596 unique schemas
• Various sizes of schemas (6, 8, 10, 12)
• Temporal ordering data
– Available online: http://www.usna.edu/Users/cs/nchamber/data/schemas/acl09
So What?
22
Information Extraction (MUC)
23
2. INCIDENT: DATE 11 JAN 90 3. INCIDENT: LOCATION BOLIVIA: LA PAZ (CITY) 4. INCIDENT: TYPE BOMBING 5. INCIDENT: STAGE OF EXECUTION ATTEMPTED 6. INCIDENT: INSTRUMENT ID "BOMB" 7. INCIDENT: INSTRUMENT TYPE BOMB: "BOMB" 8. PERP: INCIDENT CATEGORY TERRORIST ACT 9. PERP: INDIVIDUAL ID - 10. PERP: ORGANIZATION ID "ZARATE WILLKA LIBERATION ARMED FORCES" 11. PERP: ORGANIZATION CONFIDENCE CLAIMED OR ADMITTED: "ZARATE WILLKA LIBERATION ARMED FORCES" 12. PHYS TGT: ID "GOVERNMENT HOUSE" 13. PHYS TGT: TYPE GOVERNMENT OFFICE OR RESIDENCE: "GOVERNMENT HOUSE" 14. PHYS TGT: NUMBER 1: "GOVERNMENT HOUSE" 15. PHYS TGT: FOREIGN NATION - 16. PHYS TGT: EFFECT OF INCIDENT SOME DAMAGE: "GOVERNMENT HOUSE" 17. PHYS TGT: TOTAL NUMBER - 18. HUM TGT: NAME - 19. HUM TGT: DESCRIPTION "CABINET MEMBERS" / "CABINET MINISTERS" 20. HUM TGT: TYPE GOVERNMENT OFFICIAL: "CABINET MEMBERS" / "CABINET MINISTERS" 21. HUM TGT: NUMBER PLURAL: "CABINET MEMBERS" / "CABINET MINISTERS" 22. HUM TGT: FOREIGN NATION - 23. HUM TGT: EFFECT OF INCIDENT - 24. HUM TGT: TOTAL NUMBER -
• Message Understanding Conference – 1992
• Semantic representation (messages) of a situation (incident) and its key attributes.
MUC 4 Contains “Structure”
• Focus on the core attributes. – Type of incident – Main agent (perpetrator) – Affected entities (targets, both physical and human)
24
BOMBING DATE 11 JAN 90 LOCATION BOLIVIA: LA PAZ (CITY) INSTRUMENT TYPE BOMB: "BOMB” PERP "ZARATE WILLKA LIBERATION ARMED FORCES” PHYS TGT "GOVERNMENT HOUSE” PHYS TGT: EFFECT SOME DAMAGE HUM TGT "CABINET MINISTERS”
Labeled Template Schemas
1. Attack
2. Bombing
3. Kidnapping
4. Arson
25
Perpetrator Victim Target Instrument
• Assume these template schemas are unknown.
• Assume documents containing templates are unknown.
Expanding the dataset
• MUC is a small dataset: 1300 documents
– The protagonist is too sparse.
26
Expanding the dataset
• MUC is a small dataset: 1300 documents
– The protagonist is too sparse.
• Instead, first cluster words based on proximity
27
kidnap, release, abduct, kidnapping, ransom, robbery
detonate, blow up, plant, hurl, stage, launch, detain, suspect, set off
Information Retrieval
Retrieve new documents
28
kidnap, release, abduct, kidnapping, board
83 MUC documents NYT Gigaword 1.1 billion tokens
3954 documents
(Ji and Grishman, 2008)
Cluster the Syntactic Slots?
• How do we learn the slots?
• Could just use PMI as before, but we still have low document counts.
Solution
- Represent events (e.g., subject/throw) as word vectors and cluster based on vector distance.
29
Cluster the Syntactic Slots?
Novel Idea
Create a narrative vector of protagonist connections.
Each event is a vector of other events with which it shared a coreferring argument. Value is the frequency of coref.
Compare two events x and y:
30
),cos( narnar yx
Selectional Preferences
Borrowed Idea
Create a selectional preferences vector.
• Subject of Detonate
– Man, member, person, suspect, terrorist
• Object of Set off
– Dynamite, bomb, truck, explosive, device
31
cosine similarity
Katrin Erk, ACL 2007. Bergsma et al., ACL 2008. Zapirain et al., ACL 2009. Calvo et al., MICAI/CIARP 2009 Ritter et al., ACL 2010. Christian Scheible, LREC 2010. Walde, LREC 2010.
Cluster the Syntactic Functions
• Agglomerative clustering, average link scoring
• Two types of cosine similarity
– Selectional preferences
– Narrative protagonist relations
32
sim(x,y) max(cos( xselpref ,yselpref ),cos(xnar,ynar))
),cos()1(),cos( narnarselprefselpref yxyx
Constrain the Argument Classes
• Use WordNet (Fellbaum, 1998) to constrain types – Person/Organization
– Physical Object
– Other
• The subject of detonate (Person) – Man, member, person, suspect, terrorist
• The object of detonate (Physical Object) – Dynamite, bomb, truck, explosive
• Only cluster events with the same type.
33
Learned Templates
34
Learned Template
explode, detonate, blow up, explosion, damage, cause…
Person: X detonate, X blow up, X plant, X hurl, X stage
Phys Object: explode X, hurl X, X cause, X go off, plant X, …
Person: X raid, X question, X investigate, X defuse, X arrest
Phys Object: destroy X, damage X, explode at X, throw at X, …
Learned Templates
35
Bombing
explode, detonate, blow up, explosion, damage, cause…
Perpetrator:
Target:
Instrument:
Police:
Person who detonates, blows up, plants, hurls, stages, is detained, is suspected, is blamed on, launches
Object that is damaged, is destroyed, is exploded at, is thrown at, is hit, is struck
Object that is exploded, explodes, is hurled, causes, goes off, is planted, damages, is set off, is defused
Person who raids, questions, investigates, defuses, arrests, …
Learned Templates
Kidnapping
36
kidnap, release, abduct, kidnapping, board
Perpetrator:
Victim:
Person who releases, kidnaps, abducts, ambushes, holds, forces, captures, frees
Person who is kidnapped, is released, is freed, escapes, disappears, travels, is harmed
New Templates
37
Elections
37
choose, favor, turns out, pledges, unites, blame, deny…
Voter: Person who chooses, is intimidated, favors, is appealed to, turns out
Government: Org. that authorizes, is chosen, blames, denies
Candidate:
Smuggling
smuggle, transport, seize, confiscate, detain, capture…
Perpetrator: Person who smuggles, is seized from, is captured, is detained
Police: Person who raids, seizes, captures, confiscates, detains, investigates
Instrument: Object that is smuggled, is seized, is confiscated, is transported
Person who resigns, unites, advocates, manipulates, pledges, is blamed
Evaluate Templates
38
1. Attack
2. Bombing
3. Kidnapping
4. Arson
Perpetrator Victim Target Instrument Police ??
Precision: 14 of 16 (88%)
Recall: 12 of 13 (92%)
Induction Summary
1. Learned the original templates.
– Attack, Bombing, Kidnapping, Arson
2. Learned a new role (Police)
3. Learned new structures, not annotated by humans.
– Elections, Smuggling
• Now we can perform the standard extraction task.
39
Extraction Approach
Kidnapping
40
kidnap, release, abduct, kidnapping, board
Perpetrator: X kidnap, X release, X abduct
Victim: kidnap X, release X, kidnapping of X, release of X, X’s release
They announced the initial release of the villagers last weekend.
Evaluation
• Training: 1300 documents – Learned template structure – Developed extraction algorithm
• Testing: 200 documents – Extracted slot fillers (perpetrator, target, etc.)
• Metric: F1-Score – Standard metric; balances precision and recall
41
F1 2precision recall
precision recall
MUC Example
Two bomb attacks were carried out in La Paz last night, one in front of Government House following the message to the nation over a radio and television network by president Jaime Paz Zamora.
The explosions did not cause any serious damage but the police were mobilized, fearing a wave of attacks.
The self-styled `` Zarate willka Liberation Armed Forces '' sent simultaneous written messages to the media, calling on the people to oppose the government…
The second attack occurred at 2335 ( 0335 GMT on 12 January ), just after the cabinet members had left Government House where they had listened to the presidential message.
A bomb was placed outside Government House in the parking lot that is used by cabinet ministers .
The police placed the bomb in a nearby flower bed, where it went off.
The shock wave shattered some windows in Government House and street lamps in the Plaza Murillo.
As of 0500 GMT today, the police had received reports of two other explosions in two La Paz neighborhoods, but these have not yet been confirmed.
42
4. INCIDENT: TYPE BOMBING 7. INCIDENT: INSTRUMENT TYPE BOMB: "BOMB" 9. PERP: INDIVIDUAL ID - 10. PERP: ORGANIZATION ID "ZARATE WILLKA LIBERATION ARMED FORCES” 12. PHYS TGT: ID "GOVERNMENT HOUSE” 19. HUM TGT: DESCRIPTION "CABINET MEMBERS" / "CABINET MINISTERS”
F1 Scores of Extraction Systems
• Rule-based systems (Chinchor et al. 93, Rau et al. 92)
• Supervised systems (Patwardhan/Riloff 2007,2009; Huang 2011)
• Event Schemas Unsupervised
43
0 1.0 0.2 0.4 0.6 0.8
Slot Filling F1
Rule-based
Supervised
Unsupervised
F1 Scores of Extraction Systems
• Rule-based systems (Chinchor et al. 93, Rau et al. 92)
• Supervised systems (Patwardhan/Riloff 2007,2009; Huang 2011)
• Event Schemas Unsupervised (2011)
44
0 1.0 0.2 0.4 0.6 0.8
Slot Filling F1
Rule-based
Supervised
Unsupervised
Latest performance
Summary
• Extracted without knowing what needed to be extracted.
• .40 F1 within range of more-informed approaches
• The first results on MUC-3 without schema knowledge
45
Shortly after…
• More Progress! – Jans et al. (EACL 2012) – Balasubramanian et al. (NAACL 2013) – Pichotta and Mooney (EACL 2014)
• Semantic Role Labeling – Gerber and Chai (ACL 2010), Best Paper Award
• Coreference
– Irwin et al. (CoNLL 2011), Rahman and Ng (EMNLP 2012)
• Generative Models
– Cheung et al. (NAACL 2013) – Bamman et al. (ACL 2013) – Chambers (EMNLP 2013) – Nguyen et al. (ACL 2015)
46
Coming up
• Generative Models – Cheung et al. (NAACL 2013)
– Bamman et al. (ACL 2013)
– Chambers (EMNLP 2013)
– Nguyen et al. (ACL 2015)
47