machine learning 2 inductive dependency parsing joakim nivre uppsala universityväxjö university...

Machine Learning 2

Inductive Dependency Parsing

Joakim Nivre

Uppsala University Växjö University

Department of Linguistics and Philology

School of Mathematics and Systems Engineering

Inductive Dependency Parsing

• Dependency-based representations …– have restricted expressivity but provide a

transparent encoding of semantic structure.– have restricted complexity in parsing.

• Inductive machine learning …– is necessary for accurate disambiguation.– is beneficial for robustness.– makes (formal) grammars superfluous.

Dependency Graph

0 1 2 3 4 5 6 7 8 9

Economic news had little effect on financial markets .

JJ NN VBD JJ NN IN JJ NNS .

ROOT

NMOD SBJ NMOD NMOD

OBJ PMOD

NMOD

P

Key Ideas

• Deterministic:– Deterministic algorithms for building dependency graphs

(Kudo and Matsumoto 2002, Yamada and Matsumoto 2003, Nivre 2003)

• History-based:– History-based models for predicting the next parser action

(Black et al. 1992, Magerman 1995, Ratnaparkhi 1997, Collins 1997)

• Discriminative:– Discriminative machine learning to map histories to actions

(Veenstra and Daelemans 2000, Kudo and Matsumoto 2002, Yamada and

Matsumoto 2003, Nivre et al. 2004)

Guided Parsing

• Deterministic parsing:– Greedy algorithm for disambiguation– Optimal strategy given an oracle

• Guided deterministic parsing:– Guide = Approximation of oracle– Desiderata:

• High prediction accuracy• Efficient implementation (constant time)

– Solution:• Discriminative classifier induced from treebank data

Learning

• Classification problem (S T)– Parser states: S = { s | s = (1, …, p) }

– Parser actions: T = { t1, …, tm }

• Training data:– D = { (si-1, ti) | ti(si-1) = si in gold standard derivation s1, …, sn }

• Learning methods:– Memory-based learning– Support vector machines– Maximum entropy modeling– …

Feature Models

• Model P: PoS: t1, top, next, n1, n2

• Model D: P + DepTypes: t.hd, t.ld, t.rd, n.ld

• Model L2: D + Words: top, next

• Model L4: L2 + Words: top.hd, n1

hdld rd ld

.th next.top . n1…… …… n2 n3t1

Stack Input

Experimental Results (MBL)

Swedish English

AS EM AS EM

U L U L U L U L

P 77.4 70.1 26.6 17.8 79.0 76.1 14.4 10.0

D 82.5 75.1 33.5 22.2 83.4 80.5 21.9 17.0

L2 85.6 81.5 39.1 30.2 86.6 84.8 29.9 26.2

L4 85.9 81.6 39.8 30.4 87.3 85.6 31.1 27.7

• Results:– Dependency features help

– Lexicalisation helps …

– … up to a point (?)

Parameter Optimization

Model = L4 + PoS of n3 Swedish English

Parameter Manual Param Manual Param

Number of neighbors (-k) 5 11 7 19

Distance metric (-m) MVDM MVDM MVDM MVDM

Switching threshold (-L) 3 5 5 2

Feature weighting (-w) None GR None GR

Distance weighted class voting (-d) ID IL ID IL

Unlabeled attachment score (ASU) 86.2 86.0 87.7 86.8

Labeled attachment score (ASL) 81.9 82.0 85.9 84.9

• Learning algorithm parameter optimization:– Manual (Nivre 2005) vs. paramsearch (van den Bosch 2003)

Learning Curves

65

70

75

80

85

90

1 2 3 4 5 6 7 8 9 10

Training sections

Att

ac

hm

en

t s

co

re

D U

L2 U

D L

L2 L

65

70

75

80

85

90

1 2 3 4 5 6 7 8

Training sections

Att

ac

hm

en

t s

co

re

D U

L2 U

D L

L2 L

Swedish:– Attachment score (U/L)

– Models: D, L2

– 10K tokens/section

English:– Attachment score (U/L)

– Models: D, L2

– 100K tokens/section

Dependency Types: Swedish

• High accuracy (84% labeled F):IM (marker infinitive) 98.5%PR (preposition noun) 90.6%UK (complementizer verb) 86.4%VC (auxiliary verb main verb) 86.1%DET (noun determiner) 89.5%ROOT 87.8%SUB (verb subject) 84.5%

• Medium accuracy (76% labeled F 80%):ATT (noun modifier) 79.2%CC (coordination) 78.9%OBJ (verb object) 77.7%PRD (verb predicative) 76.8%ADV (adverbial) 76.3%

• Low accuracy (labeled F 70%):INF, APP, XX, ID

Dependency Types: English

• High accuracy (86% labeled F):VC (auxiliary verb main verb) 95.0%NMOD (noun modifier) 91.0%SBJ (verb subject) 89.3%PMOD (preposition modifier) 88.6%SBAR (complementizer verb) 86.1%

• Medium accuracy (73% labeled F 83%):ROOT 82.4%OBJ (verb object) 81.1% VMOD (verb modifier) 76.8%AMOD (adjective/adverb modifier) 76.7%PRD (predicative) 73.8%

• Low accuracy (labeled F 70%):DEP (null label)

MaltParser

• Software for inductive dependency parsing:– Freely available for research and education

(http//www.msi.vxu.se/users/nivre/research/MaltParser.html)

• Version 0.3:– Parsing algorithms:

• Nivre (2003) (arc-eager, arc-standard)• Covington (2001) (projective, non-projective)

– Learning algorithms:• MBL (TIMBL)• SVM (LIBSVM)

– Feature models:• Arbitrary combinations of part-of-speech features, dependency

type features and lexical features

Auxiliary tools:

• MaltEval

• MaltConverter

• Proj

CoNLL-X Shared Task

Language #Tokens #DTypes ASU ASL

Japanese 150K 8 92.2 90.3English* 1000K 12 89.7 88.3Bulgarian 200K 19 88.0 82.5Chinese 350K 134 88.0 82.2Swedish 200K 64 87.9 81.3Danish 100K 53 86.9 82.0Portuguese 200K 55 86.0 81.5German 700K 46 85.0 82.0Italian* 40K 17 82.9 75.7Czech 1250K 82 80.1 72.8Spanish 90K 21 79.0 74.3Dutch 200K 26 76.0 71.7Arabic 50K 27 74.0 61.7Turkish 60K 26 73.8 63.0Slovene 30K 26 73.3 62.2

Possible Projects

• CoNLL Shared Task:– Work on one or more languages – With or without MaltParser– Data sets available

• Parsing spoken language:– Talbanken05: Swedish treebank with written and

spoken data, cross-training experiments– GSLC: 1.2M corpus of spoken Swedish

machine learning 2 inductive dependency parsing joakim nivre uppsala universityväxjö university...

Documents