features, formalized

48
Features, Formalized Stephen Mayhew Hyung Sul Kim 1

Upload: jennis

Post on 23-Feb-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Features, Formalized. Stephen Mayhew Hyung Sul Kim. Outline. What are features? How are they defined in NLP tasks in general? How they are defined specifically for relation extraction ? (Kernel methods). What are features?. Feature Extraction Pipeline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Features, Formalized

1

Features, Formalized

Stephen MayhewHyung Sul Kim

Page 2: Features, Formalized

2

Outline• What are features?• How are they defined in NLP tasks in general?• How they are defined specifically for relation

extraction? (Kernel methods)

Page 3: Features, Formalized

3

What are features?

Page 4: Features, Formalized

4

Feature Extraction Pipeline1. Define Feature Generation Functions (FGF)2. Apply FGFs to Data to make a lexicon3. Translate examples into feature space4. Learning with vectors

Page 5: Features, Formalized

5

Feature Generation Functions• When we say ‘features’, we actually are often talking

about FGFs.• Define a relation over instance space

• For example, let instance • The relation containsWord(w) is active ( = 1) three

times: containsWord(little), containsWord(brown), containsWord(cow)

Page 6: Features, Formalized

6

Feature Generation FunctionsLet be an enumerable collection of relations on . A Feature Generation Function is a mapping:

that maps each to a set of all elements in that satisfy .

Common notation for FGF:

Page 7: Features, Formalized

7

Feature Generation FunctionsExample:

“Gregor Samsa woke from troubled dreams.”

Let { isCap(…), hasLen4(…), endsWithS(…) }

Define an FGF over and apply it to the instance:isCap(Gregor), isCap(Samsa), hasLen4(woke),

hasLen4(from), endsWithS(dreams)

Page 8: Features, Formalized

8

Feature Extraction Pipeline1. Define Feature Generation Functions (FGF)2. Apply FGFs to Data to make a lexicon3. Translate examples into feature space4. Learning with vectors

Page 9: Features, Formalized

9

LexiconApply our FGF to all input data.Creates grounded features and indexes them…3534: hasWord(stark)3535: hasWord(stamp)3536: hasWord(stampede)3537: hasWord(starlight)…

Page 10: Features, Formalized

10

Feature Extraction Pipeline1. Define Feature Generation Functions (FGF)2. Apply FGFs to Data to make a lexicon3. Translate examples into feature space4. Learning with vectors

Page 11: Features, Formalized

11

Translate examples to feature spaceFrom Lexicon:…98: hasWord(In)…241: hasWord(the)…3534: hasWord(stark)3535: hasWord(stamp)3536: hasWord(stampede)3537: hasWord(starlight)…

“In the stark starlight”

<98, 241, 3534, 3537>

Page 12: Features, Formalized

12

Feature Extraction Pipeline1. Define Feature Generation Functions (FGF) 2. Apply FGFs to Data to make a lexicon3. Translate examples into feature space4. Learning with vectors

Easy.

Page 13: Features, Formalized

13

Feature Extraction Pipeline

1. FGFs are already defined2. Lexicon is already defined3. Translate examples into feature space4. Learning with vectors

No surprises here.

Testing

Page 14: Features, Formalized

14

Structured Pipeline - Training1. Define Feature Generation Functions (FGF)

(Note: in this case: )2. Apply FGFs to data to make a lexicon3. Translate examples into feature space4. Learning with vectors

Exactly the same as before!

Page 15: Features, Formalized

15

Structured Pipeline - TestingRemember, the FGF is

Now we don’t have to use, but the idea is very similar: for every possible we create features.

Page 16: Features, Formalized

16

Automatic Feature GenerationTwo ways to look at this:1. Creating an FGF

This is a black art, not even intuitive for humans to do

2. Choosing the best subset of a closed setThis is possible, algorithms exist

Page 17: Features, Formalized

17

Exploiting Syntactico-Semantic Structures for Relation ExtractionBefore doing the hard task of relation classification, apply some easy heuristics to recognize:• Premodifiers: [the [Seattle] Zoo]• Possessives: [[California’s] Governor]• Prepositions: [officials] in [California]• Formulaics: [Medford] , [Massachusetts]

These 4 structures cover 80% of the mention pairs (in ACE 2004)

Chan and Roth, ACL 2011

Page 18: Features, Formalized

18

Kernels for Relation Extraction

Hyung Sul Kim

Page 19: Features, Formalized

19

Kernel Tricks• Borrowed a few slides from ACL2012 Tutorial for

Kernels in NLP by Moschitti

Page 20: Features, Formalized

20

Page 21: Features, Formalized

21

Page 22: Features, Formalized

22

Page 23: Features, Formalized

23

Page 24: Features, Formalized

24

All We Need is• K(x1, x2) = ϕ(x1) · ϕ(x2) • Computing K(x1, x2) can be possible without

mapping x to ϕ(x)

Page 25: Features, Formalized

25

Linear Kernels with Features(Zhou et al., 2005)

• Pairwise binary-SVM training • Features• Words• Entity Types• Mention Level• Overlap• Base Phrase Chunking• Dependency Tree• Parse Tree• Semantic Resources

Page 26: Features, Formalized

26

Feature Description ExampleWM1 bag-of-words in M1 {they}HM1 head word of M1 theyWM2 bag-of-words in M2 {their, children}HM2 head word of M2 childrenHM12 combination of HM1 and HM2 <they,children>WBNULL when no word in between 0WBFL the only word in between when only one word in between 0WBF first word in between when at least two words in between doWBL last word in between when at least two words in between putWBO other words in between except first and last words when at least three words

in betweennot

BM1F first word before M1 0BM1L second word before M1 0AM2F first word after M2 inAM2L second word after M2 a

Word Features

Page 27: Features, Formalized

27

Entity Types, Mention Level, Overlap

Feature Description Example 1 Example 2ET12 combination of mention entity types (PER, ORG, FAC, LOC, GPE) <PER, PER> <GPE, LOC>ML12 combination of mention levels (NAME, NOMIAL, PRONOUN) <PRO, NOM> <NAM,NAM>#MB number of other mentions in between 0 0#WB number of words in between 3 0M1>M2 1 if M2 is included in M1 0 1M1<M2 1 if M1 is included in M2 0 0

Page 28: Features, Formalized

28

Base Phrase Chunking

Feature Description ExampleCPHBNULL when no phrase in between 0CPHBFL the only phrase head when only one phrase in between 0CPHBF first phrase head in between when at least two phrases in between JAPANCPHBL last phrase head in between when at least two phrase heads in between KILLEDCPHBO other phrase heads in between except first and last phrase heads when at

least three phrases in between0

CPHBM1F first phrase head before M1 0CPHBM1L second phrase head before M1 0CPHAM2F first phrase head after M2 0CPHAM2L second phrase head after M2 0

Page 29: Features, Formalized

30

Performance of Features (F1 Measure)

Words

+ Entity Typ

e

+ Mention Le

vel

+ Overla

p

+ Chunking

+ Dependency Tree

+ Parse Tree

+ Seman

tic Reso

urces

0

10

20

30

40

50

60

Page 30: Features, Formalized

31

Performance ComparisonYear Authors Method F-Measure

2005 Zhou et al. Linear Kernels with Handcrafted Features 55.5

Page 31: Features, Formalized

32

Syntactic Kernels(Zhao and Grishman, 2005)

• Syntactic Kernels (Composite of 5 Kernels)• Argument Kernel• Bigram Kernel• Link Sequence Kernel• Dependency Path Kernel• Local Dependency Kernel

Page 32: Features, Formalized

33

Bigram Kernel

• All unigrams and bigrams in the text from M1 to M2

Unigram Bigram

they they do

do do not

not not put

put put their

their their children

children

Page 33: Features, Formalized

34

Dependency Path KernelThat's because Israel was expected to retaliate against Hezbollah forces in areas controlled by Syrian troops.

Page 34: Features, Formalized

35

Performance ComparisonYear Authors Method F-Measure

2005 Zhou et al. Linear Kernels with Handcrafted Features 55.5

2005 Zhao and Grishman Syntactic Kernels (Composite of 5 Kernels) 70.35

Page 35: Features, Formalized

36

Composite Kernel(Zhang et al., 2006)

• Composite of Two Kernels• Entity Kernel (Linear Kernel with entity related features

given by ACE datasets)• Convolution Tree Kernel (Collins and Duffy, 2001)

• Two ways to composite two kernels• Linear Combination• Polynomial Expansion

Page 36: Features, Formalized

37

Convolution Tree Kernel(Collins and Duffy, 2001)

An example treeEfficiently Compute K(x1, x2) by O(|x1|·|x2|)

Page 37: Features, Formalized

38

Relation Instance Spaces

51.361.9

59.260.4

Page 38: Features, Formalized

39

Performance ComparisonYear Authors Method F-Measure

2005 Zhou et al. Linear Kernels with Handcrafted Features 55.5

2005 Zhao and Grishman Syntactic Kernels (Composite of 5 Kernels) 70.35

2006 Zhang et al. Entity Kernel + Convolution Tree Kernel 72.1

Page 39: Features, Formalized

40

Context-Sensitive Tree Kernel(Zhou et al., 2007)

• Motivational Example: John and Mary got marriedcalled predicate-linked category (10%)

PT: 63.6 Context-Sensitive Tree Kernel: 73.2

Page 40: Features, Formalized

41

Performance ComparisonYear Authors Method F-Measure

2005 Zhou et al. Linear Kernels with Handcrafted Features 55.5

2005 Zhao and Grishman Syntactic Kernels (Composite of 5 Kernels) 70.35

2006 Zhang et al. Entity Kernel + Convolution Tree Kernel 72.1

2007 Zhou et al. (Zhou et al., 2005) + Context-sensitive Tree Kernel 75.8

Page 41: Features, Formalized

42

Best Kernel(Nguyen et al., 2009)

• Use Multiple Kernels on• Constituent Trees• Dependency Trees• Sequential Structures

• Design 5 different Kernel Composites with 4 Tree Kernels and 6 Sequential Kernels

Page 42: Features, Formalized

43

Convolution Tree Kernels on 4 Special Trees

68.9

56.3

60.258.5

PET

DW GR

GRW

PET + GR = 70.5DW + GR = 61.8

Page 43: Features, Formalized

44

Word Sequence Kernels on 6 Special Sequences

SK1. Sequence of terminals (lexical words) in the PET e.g. T2-LOC washington, U.S. T1-PER officials

SK2. Sequence of part-of-speech (POS) tags in the PETe.g. T2-LOC NN , NNP T1-PER NNS

SK3. Sequence of grammatical relations in the PETe.g. T2-LOC pobj , nn T1-PER nsubj

SK4. Sequence of words in the DWe.g. Washington T2-LOC In working T1-PER officials GPE U.S.

SK5. Sequence of grammatical relations in the GRe.g. pobj T2-LOC prep ROOT T1-PER nsubj GPE nn

SK6. Sequence of POS tags in the DWe.g. NN T2-LOC IN VBP T1-PER NNS GPE NNP

61.0

60.8

61.659.7

59.8

59.7

SK1 + SK2 + SK3 + SK4 + SK5 + SK6 = 69.8

Page 44: Features, Formalized

45

Word Sequence Kernels(Cancedda et al., 2003)

• Extended Sequence Kernels•Map to high-dimensional spaces using every

subsequence• Penalties to• common subsequences (using IDF)• longer subsequences• non-contiguous subsequences

Page 45: Features, Formalized

46

Performance Comparison

Year Authors Method F-Measure

2005 Zhou et al. Linear Kernels with Handcrafted Features 55.5

2005 Zhao and Grishman Syntactic Kernels (Composite of 5 Kernels) 70.35

2006 Zhang et al. Entity Kernel + Convolution Tree Kernel 72.1

2007 Zhou et al. (Zhou et al., 2005) + Context-sensitive Tree Kernel 75.8

2009 Nguyen et al. Multiple Tree Kernels + Multiple Sequence Kernels 71.5

(Zhang et al., 2006) F-measure 68.9 in our settings

(Zhou et al., 2007) “Such heuristics expand the tree and remove unnecessary information allowing a higher improvement on RE. They are tuned on the target RE task so although the result is impressive, we cannot use it to compare with pure automatic learning approaches, such us our models. “

Page 46: Features, Formalized

47

Topic Kernel(Wang et al., 2011)

• Use Wikipedia InfoBox to learn topics of relations (like topics of words) based on co-occurrences

Topics Top RelationsTopic 1 active_years_end_date, career_end, final_year, retiredTopic 2 commands, part_of, battles, not_able_commandersTopic 3 influenced, school_tradition, not_able_ideas, main_interestsTopic 4 destinations, end, through, post_townTopic 5 prizes, award, academy_awards, highlightsTopic 6 inflow, outflow, length, maxdepthTopic 7 after, successor, ending_terminusTopic 8 college, almamater, education…

Page 47: Features, Formalized

48

Overview

Page 48: Features, Formalized

49

Performance Comparison

Year Authors Method F-Measure

2005 Zhou et al. Linear Kernels with Handcrafted Features 55.5

2005 Zhao and Grishman Syntactic Kernels (Composite of 5 Kernels) 70.35

2006 Zhang et al. Entity Kernel + Convolution Tree Kernel 72.1

2007 Zhou et al. (Zhou et al., 2005) + Context-sensitive Tree Kernel 75.8

2009 Nguyen et al. Multiple Tree Kernels + Multiple Sequence Kernels 71.5

2011 Wang et al. Entity Features + Word Features + Dependency Path + Topic Kernels 73.24