dep3.pdf
TRANSCRIPT
-
Dependency Parsing - Part II
Pawan Goyal
CSE, IIT Kharagpur
October 16, 2014
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 1 / 48
-
Maximum Spanning Tree Based
Basic IdeaStarting from all possible connections, find the maximum spanning tree.
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 2 / 48
-
Directed Spanning Trees
A directed spanning tree of a (multi-)digraph G = (V ,A) is a subgraphG = (V ,A) such that:
I V = VI A A, and |A|= |V| 1I G is a tree (acyclic)
A spanning tree of the following (multi-)digraphs
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 3 / 48
-
Directed Spanning Trees
A directed spanning tree of a (multi-)digraph G = (V ,A) is a subgraphG = (V ,A) such that:
I V = VI A A, and |A|= |V| 1I G is a tree (acyclic)
A spanning tree of the following (multi-)digraphs
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 3 / 48
-
Weighted Directed Spanning Trees
Assume we have a weight function for each arc in a multi-digraphG = (V ,A).
Define wijk 0 to be the weight of (i, j,k) A for a multi-digraphDefine the weight of directed spanning tree G of graph G as
w(G) =
(i,j,k)Gwijk
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 4 / 48
-
Maximum Spanning Trees (MST)
Let T(G) be the set of all spanning trees for graph G
The MST problem
Find the spanning tree G of the graph G that has the highest weight
G = argmaxGT(G)
w(G) = argmaxGT(G)
(i,j,k)G
wijk
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 5 / 48
-
Maximum Spanning Trees (MST)
Let T(G) be the set of all spanning trees for graph G
The MST problem
Find the spanning tree G of the graph G that has the highest weight
G = argmaxGT(G)
w(G) = argmaxGT(G)
(i,j,k)G
wijk
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 5 / 48
-
Finding MST
Directed Graph
For each sentence x, define the directed graph Gx = (Vx,Ex) given by
Vx = {x0 = root,x1, . . . ,xn}Ex = {(i, j) : i , j,(i, j) [0 : n] [1 : n]}
Gx is a graph withthe sentence words and the dummy root symbol as vertices and
a directed edge between every pair of distinct words and
a directed edge from the root symbol to every word
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 6 / 48
-
Finding MST
Directed Graph
For each sentence x, define the directed graph Gx = (Vx,Ex) given by
Vx = {x0 = root,x1, . . . ,xn}Ex = {(i, j) : i , j,(i, j) [0 : n] [1 : n]}
Gx is a graph withthe sentence words and the dummy root symbol as vertices and
a directed edge between every pair of distinct words and
a directed edge from the root symbol to every word
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 6 / 48
-
MST and Dependency Graph
TheoremEvery valid dependency graph for sentence x is equivalent to a directedspanning tree for Gx that originates out of vertex 0
Three problems
Defining wijk corresponding to a feature space
Learning wijk from gold-standard data
At run-time, inferring the MST for a sentence x using the learnt weights
Lets talk about inference part first
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 7 / 48
-
MST and Dependency Graph
TheoremEvery valid dependency graph for sentence x is equivalent to a directedspanning tree for Gx that originates out of vertex 0
Three problems
Defining wijk corresponding to a feature space
Learning wijk from gold-standard data
At run-time, inferring the MST for a sentence x using the learnt weights
Lets talk about inference part first
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 7 / 48
-
MST and Dependency Graph
TheoremEvery valid dependency graph for sentence x is equivalent to a directedspanning tree for Gx that originates out of vertex 0
Three problems
Defining wijk corresponding to a feature space
Learning wijk from gold-standard data
At run-time, inferring the MST for a sentence x using the learnt weights
Lets talk about inference part first
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 7 / 48
-
Chu-Liu-Edmonds Algorithm
Chu-Liu-Edmonds AlgorithmEach vertex in the graph greedily selects the incoming edge with thehighest weight.
If a tree results, it must be a maximum spanning tree.If not, there must be a cycle.
I Identify the cycle and contract it into a single vertex.I Recalculate edge weights going into and out of the cycle.
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 8 / 48
-
Chu-Liu-Edmonds Algorithm
x = John saw Mary
Build the directed graph
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 9 / 48
-
Chu-Liu-Edmonds Algorithm
Find the highest scoring incoming arc for each vertex
If this is a tree, then we have found MST.
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 10 / 48
-
Chu-Liu-Edmonds Algorithm
If not a tree, identify cycle and contract
Recalculate arc weights into and out-of cycle
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 11 / 48
-
Chu-Liu-Edmonds Algorithm
Outgoing arc weightsEqual to the max of outgoing arc over all vertices in cycle
e.g., John Mary is 3 and saw Mary is 30.
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 12 / 48
-
Chu-Liu-Edmonds Algorithm
Incoming arc weightsEqual to the weight of best spanning tree that includes head of incomingarc and all nodes in cycle
root saw John is 40root John saw is 29
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 13 / 48
-
Chu-Liu-Edmonds Algorithm
Calling the algorithm again on the contracted graph:
This is a tree and the MST for the contracted graph
Go back up the recursive call and reconstruct final graph
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 14 / 48
-
Chu-Liu-Edmonds Algorithm
The edge from wjs to Mary was from saw
The edge from root to wjs represented a tree from root to saw to John.
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 15 / 48
-
Arc weights as linear classifiers
wijk = w.f (i, j,k)
Arc weights are a linear combination of features of the arc f (i, j,k) and acorresponding weight vector w
What arc features?
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 16 / 48
-
Arc weights as linear classifiers
wijk = w.f (i, j,k)
Arc weights are a linear combination of features of the arc f (i, j,k) and acorresponding weight vector w
What arc features?
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 16 / 48
-
Arc Features f (i, j,k)
FeaturesIdentities of the words wi and wj for a label lkhead = saw & dependent=with
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 17 / 48
-
Arc Features f (i, j,k)
FeaturesPart-of-speech tags of the words wi and wj for a label lkhead-pos = Verb & dependent-pos=Preposition
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 18 / 48
-
Arc Features f (i, j,k)
FeaturesPart-of-speech of words surrounding and between wi and wj
inbetween-pos = Nouninbetween-pos = Adverb
dependent-pos-right = Pronounhead-pos-left=Noun
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 19 / 48
-
Arc Features f (i, j,k)
FeaturesNumber of words between wi and wj, and their orientation
arc-distance = 3arc-direction = right
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 20 / 48
-
Arc Features f (i, j,k)
FeaturesCombinationshead-pos=Verb & dependent-pos=Preposition & arc-label=PPhead-pos=Verb & dependent=with & arc-distance=3
No limit : any feature over arc (i, j,k) or input x
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 21 / 48
-
Learning the parameters
Re-write the inference problem
G = argmaxGT(Gx)
(i,j,k)G
wijk
= argmaxGT(Gx)
w
(i,j,k)Gf (i, j,k)
= argmaxGT(Gx)
w f (G)
Which can be plugged into learning algorithms
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 22 / 48
-
Learning the parameters
Re-write the inference problem
G = argmaxGT(Gx)
(i,j,k)G
wijk
= argmaxGT(Gx)
w
(i,j,k)Gf (i, j,k)
= argmaxGT(Gx)
w f (G)
Which can be plugged into learning algorithms
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 22 / 48
-
Inference-based Learning
Training data: T = {(xt,Gt)}|T |t=11. w(0) = 0; i = 02. for n : 1..N3. for t : 1..|T |4. Let G = argmaxGw(i).f (G)5. if G , G6. w(i+1) = w(i)+ f (Gt) f (G)7. i = i+18. return wi
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 23 / 48
-
Dependency Parsing as a Constraint Satisfaction problem
The last two approaches were data-driven
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 24 / 48
-
Constraint Satisfaction
Uses Constraint Dependency Grammar (CDG)
Grammatical rules are given as constraints on word-to-word modification
Parsing is formalized as a constraint satisfaction problem
Parsing is an eliminative process rather than a constructive process
Constraint satisfaction removes values that contradict constraints
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 25 / 48
-
Constraint Propagation
Three stepsStep 1. Form initial constraint network using a core grammar
Step 2. Remove local inconsistencies
Step 3. If ambiguity remains, add new constraints and repeat step 2.
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 26 / 48
-
Constraint Propagation Example
Senence: Put the block on the floor on the table in the room
Simplified representation: V1 NP2 PP3 PP4 PP5Correct analysis
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 27 / 48
-
Initial Constraints
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 28 / 48
-
Initial Constraints
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 29 / 48
-
Initial Constraints
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 30 / 48
-
Initial Constraints
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 31 / 48
-
Adding New Constraints
Still 14 possible analyses.
Introduce more constrains:
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 32 / 48
-
Modified Tables
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 33 / 48
-
Modified Tables
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 34 / 48
-
Modified Tables
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 35 / 48
-
Modified Tables
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 36 / 48
-
Modified Tables
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 37 / 48
-
Modified Tables
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 38 / 48
-
Modified Tables
No object can be on two different objects.
Once every arc has been resolved, we get a unique dependency tree
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 39 / 48
-
Modified Tables
No object can be on two different objects.
Once every arc has been resolved, we get a unique dependency tree
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 39 / 48
-
Example from a Sanskrit Sentence: POS along withdependency
Consider the sentence:
ramah. vanam. gacchati
1. ramah. = rama {masc.} {sg.} {nom.}2. ramah. = ra {pr.} {1p} {pl.}3. vanam. = vana {neu.} {sg.} {nom.}4. vanam. = vana {neu.} {sg.} {acc.}5. gacchati = gam {pr.} {3p.} {sg.}6. gacchati = gam {pr. part.} {masc.} {sg.} {loc.}7. gacchati = gam {pr. part.} {neu.} {sg.} {loc.}
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 40 / 48
-
Example from a Sanskrit Sentence
1. ramah. = rama {masc.} {sg.} {nom.}2. ramah. = ra {pr.} {1p} {pl.}3. vanam. = vana {neu.} {sg.} {nom.}4. vanam. = vana {neu.} {sg.} {acc.}5. gacchati = gam {pr.} {3p.} {sg.}6. gacchati = gam {pr. part.} {masc.} {sg.} {loc.}7. gacchati = gam {pr. part.} {neu.} {sg.} {loc.}
Figure: Possible Relations
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 41 / 48
-
Example from a Sanskrit Sentence
Figure: Adjacency and Possiblepaths
1. ramah. = rama {masc.} {sg.} {nom.}2. ramah. = ra {pr.} {1p} {pl.}3. vanam. = vana {neu.} {sg.} {nom.}4. vanam. = vana {neu.} {sg.} {acc.}5. gacchati = gam {pr.} {3p.} {sg.}6. gacchati = gam {pr. part.} {masc.} {sg.} {loc.}7. gacchati = gam {pr. part.} {neu.} {sg.} {loc.}
A path P is a sequence of edges whichconnects the nodes from S to F.
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 42 / 48
-
Example from a Sanskrit Sentence
Figure: A sample Path
For example, S-1-3-5-F is a path.
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 43 / 48
-
How the Parser works?
Figure: Adjacency and Possible paths
Figure: Relations
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 44 / 48
-
How the Parser works?
Figure: Adjacency and Possible paths
Figure: Relations
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 45 / 48
-
How the Parser works?
Figure: Adjacency and Possible paths
Figure: Relations
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 46 / 48
-
How the Parser works?
Figure: Adjacency and Possible paths
Figure: Relations
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 47 / 48
-
How the Parser works?
Figure: Adjacency and Possible paths
Figure: Relations
Pawan Goyal (IIT Kharagpur) Dependency Parsing - Part II October 16, 2014 48 / 48