template-based event extraction kevin reschke – aug 15 th 2013 martin jankowiak, mihai surdeanu,...

Template-Based Event Extraction

Kevin Reschke – Aug 15th 2013

Martin Jankowiak, Mihai Surdeanu, Dan Jurafsky, Christopher Manning

2

Outline

• Recap from last time• Distant supervision• Plane crash dataset

• Current work• Fully supervised setting• MUC4 terrorism dataset

Underlying theme: Joint Inference Models

Goal: Knowledge Base Population

“… Delta Flight 14

crashed in Mississippi

killing 40 …”

…

<Plane Crash>

<Flight Number = Flight 14>

<Operator = Delta>

<Fatalities = 40>

<Crash Site = Mississippi>

…

News Corpus Knowledge Base

Distant Supervision

Use known events to automatically label training data.

Training Knowledge-Base

<Plane crash>

<Flight Number = Flight 11>

<Operator = USAir>

<Fatalities = 200>

<Crash Site = Toronto>

One year after [USAir]Operator [Flight 11]FlightNumber

crashed in [Toronto]CrashSite, families of the [200]Fatalities victims attended a memorial service in [Vancouver]NIL.

Plane Crash Dataset80 plane crashes from Wikipedia infoboxes. Training set: 32; Dev set: 8; Test set: 40

Corpus: Newswire data from 1989 – present.

Extraction Models• Local Model

• Train and classify each mention independently.

• Pipeline Model• Classify sequentially; use previous label as

feature.• Captures dependencies between labels.

• E.g., Passengers and Crew go together:“4 crew and 200 passengers were on board.”

• Joint Model• Searn Algorithm (Daumé III et al., 2009).• Jointly models all mentions in a sentence.

Results

Precision Recall F1 score

Baseline (Maj. Class)

0.026 0.237 0.047

Local Model 0.159 0.407 0.229

Pipeline Model 0.154 0.422 0.226

Joint Model 0.213 0.422 0.283

8

Fully Supervised Setting:MUC4 Terrorism Dataset

• 4th Message Understanding Conference (1992).

• Terrorist activities in Latin America.

• 1700 docs ( train / dev / test = 1300 / 200 / 200 ).

• 50/50 mix of relevant and irrelevant doc.

9

MUC4 Task

• 5 slots types:• Perpetrator Individual (PerpInd)• Perpetrator Organization (PerpOrg)• Physical Target (Target)• Victim (Victim)• Weapon (Weapon)

• Task: Identify all slot fills in each document.• Don’t worry about differentiating multiple

events.

10

MUC4 Example

THE ARCE BATTALION COMMAND HAS REPORTED

THAT ABOUT 50 PEASANTS OF VARIOUS AGES

HAVE BEEN KIDNAPPED BY TERRORISTS OF THE

FARABUNDO MARTI NATIONAL LIBERATION FRONT [FMLN] IN

SAN MIGUEL DEPARTMENT.

Victim

PerpInd

PerpOrg

11

MUC4 Example

THE ARCE BATTALION COMMAND HAS REPORTED

THAT ABOUT 50 PEASANTS OF VARIOUS AGES

HAVE BEEN KIDNAPPED BY TERRORISTS OF THE

FARABUNDO MARTI NATIONAL LIBERATION FRONT [FMLN] IN

SAN MIGUEL DEPARTMENT.

PerpInd

PerpOrg

NIL

NIL

Victim

12

Baseline Results

• Local Mention Model• Multiclass logistic regression.

• Pipeline Mention Model• Previous non-NIL label (or “none”) is feature for current

mention.


Local 0.522 0.448 0.478

Pipeline 0.578 0.405 0.471

13

Observation 1:

• Local context is insufficient.• Need sentence-level measure. (Patwardhan & Riloff,

2009)

Two bridges were destroyed

. . . in Baghdad last night in a resurgence ofbomb attacks in the capital city.

. . . and $50 million in damage was caused bya hurricane that hit Miami on Friday.

. . . to make way for modern, safer bridgesthat will be constructed early next year.

✓

✗

✗

14

Baseline Models + Sentence Relevance

• Binary relevance classifier – unigram / bigram features• HardSent:

• Discard all mentions in irrelevant sentences.• SoftSent:

• Sentence relevance is feature for mention classification.


Local 0.522 0.448 0.478

Local w/ HardSent 0.770 0.241 0.451

Local w/ SoftSent 0.527 0.446 0.478

Pipeline 0.578 0.405 0.471

Pipeline w/ SoftSent 0.613 0.429 0.500

15

Observation 2:

• Sentence relevance depends on surrounding context. (Huang & Riloff, 2012)

“Obama was attacked.” (political attack vs. terrorist attack)

“He use a gun.” (weapon in terrorist event?)

16

Joint Inference Models

• Idea: Model sentence relevance and mention labels jointly – yield globally optimal decisions.

• Machinery: Conditional Random Fields (CRFs).• Model joint probability of relevance labels and mention

labels conditioned on input features.• Encode dependencies among labels.

• Software: Factorie (http://factorie.cs.umass.edu)

• Flexibly design CRF graph structures.• Learning / Classification algorithms with exact and

approximate inference.

17

First Pass

• Fully joint model.

S

M M M

• Approximate inference a likely culprit.


Mention Pipeline w/ SoftSent

0.613 0.429 0.500

Fully Joint Model 0.54 0.39 0.45

18

Second Pass

• Two linear-chain CRFs with relevance threshold.

S S S

M M MPrecision Recall F1 score

Mention Pipeline w/ SoftSent

0.613 0.429 0.500

Fully Joint Model 0.54 0.39 0.45

CRF: Mention Chain Only 0.485 0.470 0.470

CRF: Sentence Chain Only 0.535 0.422 0.463

CRF: Both Chains 0.513 0.470 0.481

19

Analysis

• Many errors are reasonable extractions, but come from irrelevant documents.

• Learned CRF model weights:

RelLabel<+,<NIL>> = -0.071687RelLabel<+,Vict> = 0.716669RelLabel<-,Vict> = -1.688919 ...

RelRel<+, +> = -0.609790 RelRel<+, -> = -0.469663 RelRel<-, +> = -0.634649 RelRel<-, -> = 0.572855

The kidnappers were accused of kidnapping several businessmen for high sums of Money.

20

Possibilities for improvement

• Label-specific relevance thresholds.

• Leverage Coref (Skip Chain CRFs).

• Incorporate doc-level relevance signal.

21

State of the art

• Huang & Riloff (2012)

• P / R / F1 : 0.58 / 0.60 / 0.59

• CRF sentence model with local mention classifiers.

• Textual cohesion features to model sentence chains.

• Multiple binary mention classifiers (SVMs).

22

Future Work

• Apply CRF models to plane crash dataset.

• New terrorism dataset from Wikipedia.

• Hybrid models: combine supervised MUC4 data with distant supervision on Wikipedia data.

23

Thanks!

template-based event extraction kevin reschke – aug 15 th 2013 martin jankowiak, mihai surdeanu,...

Documents