template-based event extraction
DESCRIPTION
Template-Based Event Extraction. Kevin Reschke – Aug 15 th 2013 Martin Jankowiak , Mihai Surdeanu , Dan Jurafsky , Christopher Manning. Outline. Recap from last time Distant supervision Plane crash dataset Current work Fully supervised setting MUC4 terrorism dataset. - PowerPoint PPT PresentationTRANSCRIPT
Template-Based Event Extraction
Kevin Reschke – Aug 15th 2013
Martin Jankowiak, Mihai Surdeanu, Dan Jurafsky, Christopher Manning
2
Outline• Recap from last time
• Distant supervision• Plane crash dataset
• Current work• Fully supervised setting• MUC4 terrorism dataset
Underlying theme: Joint Inference Models
Goal: Knowledge Base Population
“… Delta Flight 14
crashed in Mississippikilling 40 …”
…
<Plane Crash>
<Flight Number = Flight 14>
<Operator = Delta>
<Fatalities = 40>
<Crash Site = Mississippi> …
News Corpus Knowledge Base
Distant SupervisionUse known events to automatically label training data.
Training Knowledge-Base
<Plane crash>
<Flight Number = Flight 11>
<Operator = USAir>
<Fatalities = 200>
<Crash Site = Toronto>
One year after [USAir]Operator [Flight 11]FlightNumber
crashed in [Toronto]CrashSite, families of the [200]Fatalities victims attended a memorial service in [Vancouver]NIL.
Plane Crash Dataset80 plane crashes from Wikipedia infoboxes. Training set: 32; Dev set: 8; Test set: 40
Corpus: Newswire data from 1989 – present.
Extraction Models• Local Model
• Train and classify each mention independently.
• Pipeline Model• Classify sequentially; use previous label as
feature.• Captures dependencies between labels.
• E.g., Passengers and Crew go together:“4 crew and 200 passengers were on board.”
• Joint Model• Searn Algorithm (Daumé III et al., 2009).• Jointly models all mentions in a sentence.
Results
Precision Recall F1 score
Baseline (Maj. Class)
0.026 0.237 0.047
Local Model 0.159 0.407 0.229
Pipeline Model 0.154 0.422 0.226
Joint Model 0.213 0.422 0.283
8
Fully Supervised Setting:MUC4 Terrorism Dataset
• 4th Message Understanding Conference (1992).
• Terrorist activities in Latin America.
• 1700 docs ( train / dev / test = 1300 / 200 / 200 ).
• 50/50 mix of relevant and irrelevant doc.
9
MUC4 Task• 5 slots types:
• Perpetrator Individual (PerpInd)• Perpetrator Organization (PerpOrg)• Physical Target (Target)• Victim (Victim)• Weapon (Weapon)
• Task: Identify all slot fills in each document.• Don’t worry about differentiating multiple
events.
10
MUC4 Example
THE ARCE BATTALION COMMAND HAS REPORTED
THAT ABOUT 50 PEASANTS OF VARIOUS AGES
HAVE BEEN KIDNAPPED BY TERRORISTS OF THE
FARABUNDO MARTI NATIONAL LIBERATION FRONT [FMLN] IN
SAN MIGUEL DEPARTMENT.
Victim
PerpInd
PerpOrg
11
MUC4 Example
THE ARCE BATTALION COMMAND HAS REPORTED
THAT ABOUT 50 PEASANTS OF VARIOUS AGES
HAVE BEEN KIDNAPPED BY TERRORISTS OF THE
FARABUNDO MARTI NATIONAL LIBERATION FRONT [FMLN] IN
SAN MIGUEL DEPARTMENT.
PerpInd
PerpOrg
NIL
NIL
Victim
12
Baseline Results• Local Mention Model
• Multiclass logistic regression.• Pipeline Mention Model
• Previous non-NIL label (or “none”) is feature for current mention.
Precision Recall F1 scoreLocal 0.522 0.448 0.478Pipeline 0.578 0.405 0.471
13
Observation 1:• Local context is insufficient.• Need sentence-level measure. (Patwardhan & Riloff,
2009)
Two bridges were destroyed
. . . in Baghdad last night in a resurgence ofbomb attacks in the capital city.
. . . and $50 million in damage was caused bya hurricane that hit Miami on Friday.
. . . to make way for modern, safer bridgesthat will be constructed early next year.
✓
✗
✗
14
Baseline Models + Sentence Relevance• Binary relevance classifier – unigram / bigram features• HardSent:
• Discard all mentions in irrelevant sentences.• SoftSent:
• Sentence relevance is feature for mention classification.
Precision Recall F1 score
Local 0.522 0.448 0.478Local w/ HardSent 0.770 0.241 0.451Local w/ SoftSent 0.527 0.446 0.478Pipeline 0.578 0.405 0.471Pipeline w/ SoftSent 0.613 0.429 0.500
15
Observation 2:• Sentence relevance depends on surrounding
context. (Huang & Riloff, 2012)
“Obama was attacked.” (political attack vs. terrorist attack)
“He use a gun.” (weapon in terrorist event?)
16
Joint Inference Models• Idea: Model sentence relevance and mention
labels jointly – yield globally optimal decisions.
• Machinery: Conditional Random Fields (CRFs).• Model joint probability of relevance labels and mention
labels conditioned on input features.• Encode dependencies among labels.
• Software: Factorie (http://factorie.cs.umass.edu)• Flexibly design CRF graph structures.• Learning / Classification algorithms with exact and
approximate inference.
17
First Pass• Fully joint model.
S
M M M
• Approximate inference a likely culprit.
Precision Recall F1 score
Mention Pipeline w/ SoftSent
0.613 0.429 0.500
Fully Joint Model 0.54 0.39 0.45
18
Second Pass• Two linear-chain CRFs with relevance
threshold.
S S S
M M MPrecision Recall F1 score
Mention Pipeline w/ SoftSent
0.613 0.429 0.500
Fully Joint Model 0.54 0.39 0.45CRF: Mention Chain Only 0.485 0.470 0.470CRF: Sentence Chain Only 0.535 0.422 0.463CRF: Both Chains 0.513 0.470 0.481
19
Analysis• Many errors are reasonable extractions, but
come from irrelevant documents.
• Learned CRF model weights:RelLabel<+,<NIL>> = -0.071687RelLabel<+,Vict> = 0.716669RelLabel<-,Vict> = -1.688919 ...
RelRel<+, +> = -0.609790 RelRel<+, -> = -0.469663 RelRel<-, +> = -0.634649 RelRel<-, -> = 0.572855
The kidnappers were accused of kidnapping several businessmen for high sums of Money.
20
Possibilities for improvement• Label-specific relevance thresholds.
• Leverage Coref (Skip Chain CRFs).
• Incorporate doc-level relevance signal.
21
State of the art• Huang & Riloff (2012)
• P / R / F1 : 0.58 / 0.60 / 0.59
• CRF sentence model with local mention classifiers.
• Textual cohesion features to model sentence chains.
• Multiple binary mention classifiers (SVMs).
22
Future Work• Apply CRF models to plane crash dataset.
• New terrorism dataset from Wikipedia.
• Hybrid models: combine supervised MUC4 data with distant supervision on Wikipedia data.
23
Thanks!