ph.d. thesis oral defenseph.d. thesis oral defense stanford artificial intelligence laboratory...
TRANSCRIPT
-
Ph.D. Thesis Oral Defense
Stanford Artificial Intelligence Laboratory (SAIL)
Valentin I. Spitkovsky
Grammar Induction and Parsingwith Dependency-and-Boundary Models
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 1 / 60
-
The Problem
Problem: Unsupervised Parsing (and Grammar Induction)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 2 / 60
-
The Problem
Problem: Unsupervised Parsing (and Grammar Induction)
Input: Raw Text
... By most measures, the nation’s industrial sector is now
growing very slowly — if at all. Factory payrolls fell in
September. So did the Federal Reserve ...
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 2 / 60
-
The Problem
Problem: Unsupervised Parsing (and Grammar Induction)
NN NNS VBD IN NN ♦| | | | | |
Factory payrolls fell in September .
Input: Raw Text (Sentences, Tokens and POS-tags)
... By most measures, the nation’s industrial sector is now
growing very slowly — if at all. Factory payrolls fell in
September. So did the Federal Reserve ...
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 2 / 60
-
The Problem
Problem: Unsupervised Parsing (and Grammar Induction)
NN NNS VBD IN NN ♦| | | | | |
Factory payrolls fell in September .
Input: Raw Text (Sentences, Tokens and POS-tags)
... By most measures, the nation’s industrial sector is now
growing very slowly — if at all. Factory payrolls fell in
September. So did the Federal Reserve ...
Output: Syntactic Structures (and a Probabilistic Grammar)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 2 / 60
-
The Problem
Motivation: Unsupervised (Dependency) Parsing
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 3 / 60
-
The Problem
Motivation: Unsupervised (Dependency) Parsing
Insert your favorite reason(s) why you’d like to parseanything in the first place...
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 3 / 60
-
The Problem
Motivation: Unsupervised (Dependency) Parsing
Insert your favorite reason(s) why you’d like to parseanything in the first place...
... adjust for any data without reference tree banks:— i.e., understudied languages or genres (e.g., legal).
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 3 / 60
-
The Problem
Motivation: Unsupervised (Dependency) Parsing
Insert your favorite reason(s) why you’d like to parseanything in the first place...
... adjust for any data without reference tree banks:— i.e., understudied languages or genres (e.g., legal).
Potential applications:◮ machine translation
— word alignment, phrase extraction, reordering;
◮ web search— retrieval, query refinement;
◮ question answering, speech recognition, etc.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 3 / 60
-
The Problem
Evaluation: Directed Dependency Accuracy
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 4 / 60
-
The Problem
Evaluation: Directed Dependency Accuracy
Scoring example:
NN NNS VBD IN NN ♦| | | | | |
Factory payrolls fell in September .
Directed Score: 35 = 60% (left/right-branching baselines:25 = 40%)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 4 / 60
-
The Problem
Evaluation: Directed Dependency Accuracy
Scoring example:
NN NNS VBD IN NN ♦| | | | | |
Factory payrolls fell in September .
Directed Score: 35 = 60% (left/right-branching baselines:25 = 40%);
Undirected Score: 45 = 80% (left/right-branching baselines:45 = 80%).
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 4 / 60
-
Prior Art
Prior Art: A Brief History
1992 — word classes (Carroll and Charniak)1998 — greedy linkage via mutual information (Yuret)2001 — iterative re-estimation with EM (Paskin)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 5 / 60
-
Prior Art
Prior Art: A Brief History
1992 — word classes (Carroll and Charniak)1998 — greedy linkage via mutual information (Yuret)2001 — iterative re-estimation with EM (Paskin)
2004 — right-branching baseline— valence (DMV) (Klein and Manning)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 5 / 60
-
Prior Art
Prior Art: A Brief History
1992 — word classes (Carroll and Charniak)1998 — greedy linkage via mutual information (Yuret)2001 — iterative re-estimation with EM (Paskin)
2004 — right-branching baseline— valence (DMV) (Klein and Manning)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 5 / 60
-
Prior Art
Prior Art: A Brief History
1992 — word classes (Carroll and Charniak)1998 — greedy linkage via mutual information (Yuret)2001 — iterative re-estimation with EM (Paskin)
2004 — right-branching baseline— valence (DMV) (Klein and Manning)
2004 — annealing techniques (Smith and Eisner)2005 — contrastive estimation (Smith and Eisner)2006 — structural biasing (Smith and Eisner)2007 — common cover link representation (Seginer)2008 — logistic normal priors (Cohen et al.)2009 — lexicalization and smoothing (Headden et al.)2009 — soft parameter tying (Cohen and Smith)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 5 / 60
-
Prior Art
Prior Art: Dependency Model with Valence
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 6 / 60
-
Prior Art
Prior Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 6 / 60
-
Prior Art
Prior Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
e.g.: verb
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 6 / 60
-
Prior Art
Prior Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
e.g.: verb
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 6 / 60
-
Prior Art
Prior Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
e.g.: verb
a1
noun
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 6 / 60
-
Prior Art
Prior Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
e.g.: verb
a1
nounphrase
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 6 / 60
-
Prior Art
Prior Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
e.g.: verb
a1
nounphrase
etc...
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 6 / 60
-
Prior Art
Prior Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
e.g.: verb
a1
nounphrase
etc...
a2
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 6 / 60
-
Prior Art
Prior Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
e.g.: verb
a1
nounphrase
etc...
a2
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 6 / 60
-
Prior Art
Prior Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
e.g.: verb
a1
nounphrase
etc...
a2
STOP
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 6 / 60
-
Prior Art
Prior Art: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
a1 a2
STOP
P(th) =∏
dir∈{L,R}
PSTOP(ch, dir,
adj︷︸︸︷
1n=0)
n∏
i=1
P(tai ) PATTACH(ch, dir, cai )
(1− PSTOP(ch, dir,
adj︷︸︸︷
1i=1))
n=|args(h,dir)|V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 6 / 60
-
Prior Art
Prior Art: Unsupervised Learning Engine
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 7 / 60
-
Prior Art
Prior Art: Unsupervised Learning Engineparameter fitting via expectation-maximization (EM)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 7 / 60
-
Prior Art
Prior Art: Unsupervised Learning Engineparameter fitting via expectation-maximization (EM)
MAGIC
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 7 / 60
-
Prior Art
Prior Art: Unsupervised Learning Engineparameter fitting via expectation-maximization (EM)
◮ by means of inside-outside re-estimation (Baker, 1979)
w1 wmwp−1 wp wq wq+1
N1 (Manning and Schütze, 1999)
N j
· · · · · · · · ·
α
β
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 7 / 60
-
Prior Art
Prior Art: Unsupervised Learning Engineparameter fitting via expectation-maximization (EM)
BLACK
BOX
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 7 / 60
-
Prior Art
Prior Art: The Standard Corpus — WSJ
5 10 15 20 25 30 35 40 45
5
10
15
20
25
30
35
40
45Sentences (1,000s)
Tokens (1,000s)100
200
300
400
500
600
700
800
900
WSJk
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 8 / 60
-
Prior Art
Prior Art: The Standard Corpus — WSJ10
5 10 15 20 25 30 35 40 45
5
10
15
20
25
30
35
40
45Sentences (1,000s)
Tokens (1,000s)100
200
300
400
500
600
700
800
900
WSJk
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 8 / 60
-
Outline
Outline
Introduction to the Grammar Induction Problem◮ The Inputs and Outputs (text → dependency parses)◮ A Probabilistic Model (DMV)◮ Parameter Fitting (EM)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 9 / 60
-
Outline
Outline
Introduction to the Grammar Induction Problem◮ The Inputs and Outputs (text → dependency parses)◮ A Probabilistic Model (DMV)◮ Parameter Fitting (EM)
Part I: Non-Convex Optimization
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 9 / 60
-
Outline
Outline
Introduction to the Grammar Induction Problem◮ The Inputs and Outputs (text → dependency parses)◮ A Probabilistic Model (DMV)◮ Parameter Fitting (EM)
Part I: Non-Convex Optimization
Part II: Parsing Constraints
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 9 / 60
-
Outline
Outline
Introduction to the Grammar Induction Problem◮ The Inputs and Outputs (text → dependency parses)◮ A Probabilistic Model (DMV)◮ Parameter Fitting (EM)
Part I: Non-Convex Optimization
Part II: Parsing Constraints
Part III: Generative Models
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 9 / 60
-
Outline
Outline
Introduction to the Grammar Induction Problem◮ The Inputs and Outputs (text → dependency parses)◮ A Probabilistic Model (DMV)◮ Parameter Fitting (EM)
Part I: Non-Convex Optimization
Part II: Parsing Constraints
Part III: Generative Models
State-of-the-Art Integrated Solution
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 9 / 60
-
Outline
Outline
Introduction to the Grammar Induction Problem◮ The Inputs and Outputs (text → dependency parses)◮ A Probabilistic Model (DMV)◮ Parameter Fitting (EM)
Part I: Non-Convex Optimization
Part II: Parsing Constraints
Part III: Generative Models
State-of-the-Art Integrated Solution (in submission)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 9 / 60
-
Outline
Optimization
Viterbi Training
Baby Steps
Lateen EM
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 10 / 60
-
Issues...
Issue I: Why so little data?
5 10 15 20 25 30 35 40
10
20
30
40
50
60
70
WSJk
Directed Dependency Accuracy (%)on WSJ40
-
Issues...
Issue I: Why so little data?
5 10 15 20 25 30 35 40
10
20
30
40
50
60
70
WSJk
Directed Dependency Accuracy (%)on WSJ40
Uninformed
-
Issues...
Issue I: Why so little data?
5 10 15 20 25 30 35 40
10
20
30
40
50
60
70
WSJk
Directed Dependency Accuracy (%)on WSJ40
Uninformed
-
Issues...
Issue I: Why so little data?
5 10 15 20 25 30 35 40
10
20
30
40
50
60
70
WSJk
Directed Dependency Accuracy (%)on WSJ40
Uninformed
Oracle
-
Issues...
Issue I: Why so little data?
5 10 15 20 25 30 35 40
10
20
30
40
50
60
70
WSJk
Directed Dependency Accuracy (%)on WSJ40
Uninformed
Oracle
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 11 / 60
-
Issues...
Issue II: Non-convex objectives...
maximizing the probability of data (sentence strings):
θ̂UNS = argmaxθ
∑
s
log∑
t∈T (s)
Pθ(t)
︸ ︷︷ ︸
Pθ(s)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 12 / 60
-
Issues...
Issue II: Non-convex objectives...
maximizing the probability of data (sentence strings):
θ̂UNS = argmaxθ
∑
s
log∑
t∈T (s)
Pθ(t)
︸ ︷︷ ︸
Pθ(s)
supervised objective would be convex:
θ̂SUP = argmaxθ
∑
s
logPθ(t∗(s))
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 12 / 60
-
Issues...
Issue II: Non-convex objectives...
maximizing the probability of data (sentence strings):
θ̂UNS = argmaxθ
∑
s
log∑
t∈T (s)
Pθ(t)
︸ ︷︷ ︸
Pθ(s)
supervised objective would be convex:
θ̂SUP = argmaxθ
∑
s
logPθ(t∗(s))
could optimize likeliest parse trees (also non-convex):
θ̂VIT = argmaxθ
∑
s
maxt∈T (s)
logPθ(t)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 12 / 60
-
Issues... Classic EM
Classic EM:
5 10 15 20 25 30 35 40
10
20
30
40
50
60
70
WSJk
Directed Dependency Accuracy (%)on WSJ40
Uninformed
Oracle
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 13 / 60
-
Issues... Viterbi EM
Viterbi EM:
5 10 15 20 25 30 35 40
10
20
30
40
50
60
70
WSJk
Directed Dependency Accuracy (%)on WSJ40
Oracle
Uninformed
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 14 / 60
-
Issues... Viterbi EM
Hard vs. Soft EM:
Viterbi EM: zoom in on likeliest tree
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 15 / 60
-
Issues... Viterbi EM
Hard vs. Soft EM:
Viterbi EM: zoom in on likeliest tree◮ does not degrade with input length◮ better retains supervised solutions◮ actually learns from scratch
— see also Cohen and Smith (2010)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 15 / 60
-
Issues... Viterbi EM
Hard vs. Soft EM:
Viterbi EM: zoom in on likeliest tree◮ does not degrade with input length◮ better retains supervised solutions◮ actually learns from scratch
— see also Cohen and Smith (2010)
Classic EM: “focus across the board”
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 15 / 60
-
Issues... Viterbi EM
Hard vs. Soft EM:
Viterbi EM: zoom in on likeliest tree◮ does not degrade with input length◮ better retains supervised solutions◮ actually learns from scratch
— see also Cohen and Smith (2010)
Classic EM: “focus across the board”◮ hard to see the trees for the forest...
... but known to work with shorter inputs!
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 15 / 60
-
Baby Steps
Idea I: Baby Steps ... as Non-convex Optimization
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 16 / 60
-
Baby Steps
Idea I: Baby Steps ... as Non-convex Optimization
start with an easy (convex) case
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 16 / 60
-
Baby Steps
Idea I: Baby Steps ... as Non-convex Optimization
start with an easy (convex) case
slowly extend it to the fully complex target task
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 16 / 60
-
Baby Steps
Idea I: Baby Steps ... as Non-convex Optimization
start with an easy (convex) case
slowly extend it to the fully complex target task
take tiny (cautious) steps in the problem space
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 16 / 60
-
Baby Steps
Idea I: Baby Steps ... as Non-convex Optimization
start with an easy (convex) case
slowly extend it to the fully complex target task
take tiny (cautious) steps in the problem space
... try not to stray far from relevantneighborhoods in the solution space
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 16 / 60
-
Baby Steps
Idea I: Baby Steps ... as Non-convex Optimization
start with an easy (convex) case
slowly extend it to the fully complex target task
take tiny (cautious) steps in the problem space
... try not to stray far from relevantneighborhoods in the solution space
base case: sentences of length one (trivial — no init)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 16 / 60
-
Baby Steps
Idea I: Baby Steps ... as Non-convex Optimization
start with an easy (convex) case
slowly extend it to the fully complex target task
take tiny (cautious) steps in the problem space
... try not to stray far from relevantneighborhoods in the solution space
base case: sentences of length one (trivial — no init)
incremental step: smooth WSJk; re-init WSJ(k + 1)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 16 / 60
-
Baby Steps
Idea I: Baby Steps ... as Graduated Learning
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 17 / 60
-
Baby Steps
Idea I: Baby Steps ... as Graduated Learning
WSJ1 — Atone (verbs!)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 17 / 60
-
Baby Steps
Idea I: Baby Steps ... as Graduated Learning
WSJ1 — Atone (verbs!)
WSJ2 — It is. (pronouns!)Darkness fell.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 17 / 60
-
Baby Steps
Idea I: Baby Steps ... as Graduated Learning
WSJ1 — Atone (verbs!)
WSJ2 — It is. (pronouns!)Darkness fell.
Become a LobbyistWSJ3 — But many have. (determiners!)
They didn’t.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 17 / 60
-
Baby Steps
Idea I: Baby Steps ... and Related Notions
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 18 / 60
-
Baby Steps
Idea I: Baby Steps ... and Related Notions
shaping (Skinner, 1938)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 18 / 60
-
Baby Steps
Idea I: Baby Steps ... and Related Notions
shaping Psychology / Cognitive Science (Skinner, 1938)
less is more (Kail, 1984; Newport, 1988; 1990)
starting small (Elman, 1993)
◮ scaffold on model complexity [restrict memory]◮ scaffold on data complexity [restrict input]
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 18 / 60
-
Baby Steps
Idea I: Baby Steps ... and Related Notions
shaping Psychology / Cognitive Science (Skinner, 1938)
less is more (Kail, 1984; Newport, 1988; 1990)
starting small (Elman, 1993)
◮ scaffold on model complexity [restrict memory]◮ scaffold on data complexity [restrict input]
stepping stones (Brown et al., 1993)
coarse-to-fine NLP / AI (Charniak and Johnson, 2005)
curriculum learning (Bengio et al., 2009)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 18 / 60
-
Baby Steps
Idea I: Baby Steps ... and Related Notions
shaping Psychology / Cognitive Science (Skinner, 1938)
less is more (Kail, 1984; Newport, 1988; 1990)
starting small (Elman, 1993)
◮ scaffold on model complexity [restrict memory]◮ scaffold on data complexity [restrict input]
stepping stones (Brown et al., 1993)
coarse-to-fine NLP / AI (Charniak and Johnson, 2005)
curriculum learning (Bengio et al., 2009)
continuation methods (Allgower and Georg, 1990)
deterministic annealing Math / OR (Rose, 1998)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 18 / 60
-
Baby Steps
Idea I: Baby Steps ... and Related Notions
shaping Psychology / Cognitive Science (Skinner, 1938)
less is more (Kail, 1984; Newport, 1988; 1990)
starting small (Elman, 1993)
◮ scaffold on model complexity [restrict memory]◮ scaffold on data complexity [restrict input]
stepping stones (Brown et al., 1993)
coarse-to-fine NLP / AI (Charniak and Johnson, 2005)
curriculum learning (Bengio et al., 2009)
continuation methods (Allgower and Georg, 1990)
deterministic annealing Math / OR (Rose, 1998)
successive approximations!
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 18 / 60
-
Baby Steps
Aside I: Speaking of successive approximations...
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 19 / 60
-
Baby Steps
Idea I: Baby Steps ... Observations & Results!
5 10 15 20 25 30 35 40
10
20
30
40
50
60
70
WSJk
Directed Dependency Accuracy (%)on WSJ40
Uninformed
Oracle
-
Baby Steps
Idea I: Baby Steps ... Observations & Results!
5 10 15 20 25 30 35 40
10
20
30
40
50
60
70
WSJk
Directed Dependency Accuracy (%)on WSJ40
Uninformed
Oracle
Baby Steps
-
Baby Steps
Idea I: Baby Steps ... Observations & Results!
5 10 15 20 25 30 35 40
10
20
30
40
50
60
70
WSJk
Directed Dependency Accuracy (%)on WSJ40
Uninformed
Oracle
Baby Steps
-
Baby Steps
Idea I: Baby Steps ... Observations & Results!
5 10 15 20 25 30 35 40
10
20
30
40
50
60
70
WSJk
Directed Dependency Accuracy (%)on WSJ40
Uninformed
Oracle
Baby Steps
Less is More︸ ︷︷ ︸
K&M∗
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 20 / 60
-
Leapfrog
Idea II: Less is More & Leapfrog ... a Hack!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Oracle
Uninformed
Baby StepsK&M
-
Leapfrog
Idea II: Less is More & Leapfrog ... a Hack!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Oracle
Uninformed
Baby Steps
K&M∗
-
Leapfrog
Idea II: Less is More & Leapfrog ... a Hack!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Oracle
Uninformed
Baby Steps
K&M∗
-
Leapfrog
Idea II: Less is More & Leapfrog ... a Hack!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Oracle
Uninformed
Baby Steps
K&M∗
-
Leapfrog
Idea II: Less is More & Leapfrog ... a Hack!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Oracle
Uninformed
Baby Steps
K&M∗
-
Leapfrog
Idea II: Less is More & Leapfrog ... a Hack!
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Directed Dependency Accuracy (%)on WSJk
Oracle
Uninformed
Baby Steps
K&M∗
Leapfrog
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 21 / 60
-
Leapfrog
Aside II: Also, to be fair...
5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
WSJk
Undirected Dependency Accuracy (%)on WSJk
Uninformed
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 22 / 60
-
Lateen EM
Lateen EM: Intuition
Use both objectives (a primary and a secondary).
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 23 / 60
-
Lateen EM
Lateen EM: Intuition
Use both objectives (a primary and a secondary).
As a captain can’t count on favorable winds, so anunsupervised learner can’t rely on co-operative gradients.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 23 / 60
-
Lateen EM
Lateen EM: Intuition
Use both objectives (a primary and a secondary).
As a captain can’t count on favorable winds, so anunsupervised learner can’t rely on co-operative gradients.
Lateen strategies de-emphasize fixed points, e.g., bytacking around local attractors, in a zig-zag fashion.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 23 / 60
-
Lateen EM
Lateen EM: Simple Algorithm
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 24 / 60
-
Lateen EM
Lateen EM: Simple Algorithm
Alternate ordinary soft and hard EM algorithms:switching when stuck helps escape local optima.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 24 / 60
-
Lateen EM
Lateen EM: Simple Algorithm
Alternate ordinary soft and hard EM algorithms:switching when stuck helps escape local optima.
E.g., Italian grammar induction improvesfrom 41.8% to 56.2% after three lateen alternations:
50 100 150 200 250 300
3.03.54.04.5
3.39
3.26
3.42
3.19
3.33
3.23
3.39
3.18
3.29
3.21
3.39
3.18
3.29
3.22
bpt
iteration
Pumping action: – hard EM pushes down the top curve (primary objective);– soft EM pushes down the bottom curve (the secondary
objective), often at the expense of the primary.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 24 / 60
-
Lateen EM
Lateen EM: Early-Stopping
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 25 / 60
-
Lateen EM
Lateen EM: Early-Stopping
Use one objective to validate moves proposed by theother: stop if the secondary objective gets worse.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 25 / 60
-
Lateen EM
Lateen EM: Early-Stopping
Use one objective to validate moves proposed by theother: stop if the secondary objective gets worse.
30% faster, on average, than either standard EM.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 25 / 60
-
Lateen EM
Optimization: Summary
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 26 / 60
-
Lateen EM
Optimization: Summary
Take guesswork out of tuning EM:
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 26 / 60
-
Lateen EM
Optimization: Summary
Take guesswork out of tuning EM:◮ Lateen EM ties termination to a sign change
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 26 / 60
-
Lateen EM
Optimization: Summary
Take guesswork out of tuning EM:◮ Lateen EM ties termination to a sign change;◮ Viterbi EM and Baby Steps don’t require initializers.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 26 / 60
-
Lateen EM
Optimization: Summary
Take guesswork out of tuning EM:◮ Lateen EM ties termination to a sign change;◮ Viterbi EM and Baby Steps don’t require initializers.
Exploit multiple views and iterate (Blum and Mitchell, 1998):
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 26 / 60
-
Lateen EM
Optimization: Summary
Take guesswork out of tuning EM:◮ Lateen EM ties termination to a sign change;◮ Viterbi EM and Baby Steps don’t require initializers.
Exploit multiple views and iterate (Blum and Mitchell, 1998):
◮ Lateen EM optimizes for Viterbi parse trees and forests;
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 26 / 60
-
Lateen EM
Optimization: Summary
Take guesswork out of tuning EM:◮ Lateen EM ties termination to a sign change;◮ Viterbi EM and Baby Steps don’t require initializers.
Exploit multiple views and iterate (Blum and Mitchell, 1998):
◮ Lateen EM optimizes for Viterbi parse trees and forests;◮ Baby Steps focuses on simpler sentences in the data.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 26 / 60
-
Lateen EM
Optimization: Summary
Take guesswork out of tuning EM:◮ Lateen EM ties termination to a sign change;◮ Viterbi EM and Baby Steps don’t require initializers.
Exploit multiple views and iterate (Blum and Mitchell, 1998):
◮ Lateen EM optimizes for Viterbi parse trees and forests;◮ Baby Steps focuses on simpler sentences in the data.
Capitalize on EM’s advantages:
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 26 / 60
-
Lateen EM
Optimization: Summary
Take guesswork out of tuning EM:◮ Lateen EM ties termination to a sign change;◮ Viterbi EM and Baby Steps don’t require initializers.
Exploit multiple views and iterate (Blum and Mitchell, 1998):
◮ Lateen EM optimizes for Viterbi parse trees and forests;◮ Baby Steps focuses on simpler sentences in the data.
Capitalize on EM’s advantages:◮ guarantee to not harm a primary likelihood
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 26 / 60
-
Lateen EM
Optimization: Summary
Take guesswork out of tuning EM:◮ Lateen EM ties termination to a sign change;◮ Viterbi EM and Baby Steps don’t require initializers.
Exploit multiple views and iterate (Blum and Mitchell, 1998):
◮ Lateen EM optimizes for Viterbi parse trees and forests;◮ Baby Steps focuses on simpler sentences in the data.
Capitalize on EM’s advantages:◮ guarantee to not harm a primary likelihood;◮ begin with large steps in a parameter space.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 26 / 60
-
Lateen EM
Optimization: Summary
Take guesswork out of tuning EM:◮ Lateen EM ties termination to a sign change;◮ Viterbi EM and Baby Steps don’t require initializers.
Exploit multiple views and iterate (Blum and Mitchell, 1998):
◮ Lateen EM optimizes for Viterbi parse trees and forests;◮ Baby Steps focuses on simpler sentences in the data.
Capitalize on EM’s advantages:◮ guarantee to not harm a primary likelihood;◮ begin with large steps in a parameter space.
Ameliorate EM’s disadvantages:
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 26 / 60
-
Lateen EM
Optimization: Summary
Take guesswork out of tuning EM:◮ Lateen EM ties termination to a sign change;◮ Viterbi EM and Baby Steps don’t require initializers.
Exploit multiple views and iterate (Blum and Mitchell, 1998):
◮ Lateen EM optimizes for Viterbi parse trees and forests;◮ Baby Steps focuses on simpler sentences in the data.
Capitalize on EM’s advantages:◮ guarantee to not harm a primary likelihood;◮ begin with large steps in a parameter space.
Ameliorate EM’s disadvantages:◮ avoid taking disproportionately many (and ever-smaller)
steps to approach a likelihood’s fixed point
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 26 / 60
-
Lateen EM
Optimization: Summary
Take guesswork out of tuning EM:◮ Lateen EM ties termination to a sign change;◮ Viterbi EM and Baby Steps don’t require initializers.
Exploit multiple views and iterate (Blum and Mitchell, 1998):
◮ Lateen EM optimizes for Viterbi parse trees and forests;◮ Baby Steps focuses on simpler sentences in the data.
Capitalize on EM’s advantages:◮ guarantee to not harm a primary likelihood;◮ begin with large steps in a parameter space.
Ameliorate EM’s disadvantages:◮ avoid taking disproportionately many (and ever-smaller)
steps to approach a likelihood’s fixed point;◮ escape by changing objectives and mixing solutions!
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 26 / 60
-
Lateen EM
Constraints
Web Markup
Punctuation
Capitalization
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 27 / 60
-
Overview
Constraints: Supervised and Unsupervised
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 28 / 60
-
Overview
Constraints: Supervised and Unsupervisedcompact summaries of high-level insights into a domain
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 28 / 60
-
Overview
Constraints: Supervised and Unsupervisedcompact summaries of high-level insights into a domain— can significantly reduce the search space
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 28 / 60
-
Overview
Constraints: Supervised and Unsupervisedcompact summaries of high-level insights into a domain— can significantly reduce the search space— often easier to enforce than to model
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 28 / 60
-
Overview
Constraints: Supervised and Unsupervisedcompact summaries of high-level insights into a domain— can significantly reduce the search space— often easier to enforce than to model
relevant to unsupervised learning (less rope to hang self)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 28 / 60
-
Overview
Constraints: Supervised and Unsupervisedcompact summaries of high-level insights into a domain— can significantly reduce the search space— often easier to enforce than to model
relevant to unsupervised learning (less rope to hang self)— in general, steer at the “right” regularities in data
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 28 / 60
-
Overview
Constraints: Supervised and Unsupervisedcompact summaries of high-level insights into a domain— can significantly reduce the search space— often easier to enforce than to model
relevant to unsupervised learning (less rope to hang self)— in general, steer at the “right” regularities in data— linguistic structure underdetermined by raw text
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 28 / 60
-
Overview
Constraints: Supervised and Unsupervisedcompact summaries of high-level insights into a domain— can significantly reduce the search space— often easier to enforce than to model
relevant to unsupervised learning (less rope to hang self)— in general, steer at the “right” regularities in data— linguistic structure underdetermined by raw text
partial bracketings (Pereira and Schabes, 1992)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 28 / 60
-
Overview
Constraints: Supervised and Unsupervised
compact summaries of high-level insights into a domain— can significantly reduce the search space— often easier to enforce than to model
relevant to unsupervised learning (less rope to hang self)— in general, steer at the “right” regularities in data— linguistic structure underdetermined by raw text
partial bracketings (Pereira and Schabes, 1992)
synchronous grammars (Alshawi and Douglas, 2000)
linear-time parsing, skewness of trees, etc. (Seginer, 2007)
sparse posterior regularization (Ganchev et al., 2009)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 28 / 60
-
Linguistic Analysis
Syntax of Markup: Common Constituents
..., but [S [NP the Toronto Star][VP reports [NP this][PP in the softest possible way],[S stating ...]]]
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 29 / 60
-
Linguistic Analysis
Syntax of Markup: Common Constituents
..., but [S [NP the Toronto Star][VP reports [NP this][PP in the softest possible way],[S stating ...]]]
S → NP VP
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 29 / 60
-
Linguistic Analysis
Syntax of Markup: Common Constituents
..., but [S [NP the Toronto Star][VP reports [NP this][PP in the softest possible way],[S stating ...]]]
S → NP VP→ DT NNP NNP VBZ NP PP S
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 29 / 60
-
Linguistic Analysis
Syntax of Markup: Constituent Productions
%NP → NNP NNP 9.6NP → NNP 4.6NP → NP PP 3.4
NP → NNP NNP NNP 2.4NP → DT NNP NNP 2.1NP → NN 1.8
NP → DT NNP NNP NNP 1.7NP → DT NN 1.7NP → DT NNP NNP 1.6S → NP VP 1.4
NP → DT NNP NNP NNP 1.2NP → DT JJ NN 1.1NP → NNS 1.0NP → JJ NN 0.8NP → NP NP 0.8
35.3
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 30 / 60
-
Linguistic Analysis
Syntax of Markup: Constituent Productions
%NP → NNP NNP 9.6NP → NNP 4.6NP → NP PP 3.4
NP → NNP NNP NNP 2.4NP → DT NNP NNP 2.1NP → NN 1.8
NP → DT NNP NNP NNP 1.7NP → DT NN 1.7NP → DT NNP NNP 1.6S → NP VP 1.4
NP → DT NNP NNP NNP 1.2NP → DT JJ NN 1.1NP → NNS 1.0NP → JJ NN 0.8NP → NP NP 0.8
35.3
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 30 / 60
-
Linguistic Analysis
Syntax of Markup: Constituent Productions
%NP → NNP NNP 9.6NP → NNP 4.6NP → NP PP 3.4
NP → NNP NNP NNP 2.4NP → DT NNP NNP 2.1NP → NN 1.8
NP → DT NNP NNP NNP 1.7NP → DT NN 1.7NP → DT NNP NNP 1.6S → NP VP 1.4
NP → DT NNP NNP NNP 1.2NP → DT JJ NN 1.1NP → NNS 1.0NP → JJ NN 0.8NP → NP NP 0.8
35.3
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 30 / 60
-
Linguistic Analysis
Syntax of Markup: Common Dependencies
..., but [S [NP the Toronto Star][VP reports [NP this][PP in the softest possible way],[S stating ...]]]
DT NNP NNP VBZ DT IN DT JJS JJ NN
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 31 / 60
-
Linguistic Analysis
Syntax of Markup: Common Dependencies
..., but [S [NP the Toronto Star][VP reports [NP this][PP in the softest possible way],[S stating ...]]]
DT NNP NNP VBZ DT IN DT JJS JJ NN
DT NNP VBZ
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 31 / 60
-
Linguistic Analysis
Syntax of Markup: Common Dependencies
..., but [S [NP the Toronto Star][VP reports [NP this][PP in the softest possible way],[S stating ...]]]
DT NNP NNP VBZ DT IN DT JJS JJ NN
DT NNP VBZ
“the Star reports”
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 31 / 60
-
Linguistic Analysis
Syntax of Markup: Head-Outward Spawns
%NNP 24.4NN 8.1
DT NNP 6.1DT NN 5.9NNS 4.5NNPS 1.4VBG 1.3
NNP NNP NN 1.2VBD 1.0IN 1.0VBN 1.0
DT JJ NN 0.9VBZ 0.9
POS NNP 0.9JJ 0.8
59.4
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 32 / 60
-
Linguistic Analysis
Syntax of Markup: Head-Outward Spawns
%NNP 24.4NN 8.1
DT NNP 6.1DT NN 5.9NNS 4.5NNPS 1.4VBG 1.3
NNP NNP NN 1.2VBD 1.0IN 1.0VBN 1.0
DT JJ NN 0.9VBZ 0.9
POS NNP 0.9JJ 0.8
59.4
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 32 / 60
-
Linguistic Analysis
Syntax of Markup: Head-Outward Spawns
%NNP 24.4NN 8.1
DT NNP 6.1DT NN 5.9NNS 4.5NNPS 1.4VBG 1.3
NNP NNP NN 1.2VBD 1.0IN 1.0VBN 1.0
DT JJ NN 0.9VBZ 0.9
POS NNP 0.9JJ 0.8
59.4
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 32 / 60
-
Transition Formatting
Formatting:
strong connection between markup and syntax
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 33 / 60
-
Transition Formatting
Formatting:
strong connection between markup and syntax
yields a suite of accurate parsing constraints
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 33 / 60
-
Transition Formatting
Formatting:
strong connection between markup and syntax
yields a suite of accurate parsing constraints
improves grammar induction over web data
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 33 / 60
-
Transition Formatting
Formatting:
strong connection between markup and syntax
yields a suite of accurate parsing constraints
improves grammar induction over web data, but...
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 33 / 60
-
Transition Formatting
Formatting:
strong connection between markup and syntax
yields a suite of accurate parsing constraints
improves grammar induction over web data, but...◮ works better with more natural language-resources!
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 33 / 60
-
Transition Formatting
Formatting:
strong connection between markup and syntax
yields a suite of accurate parsing constraints
improves grammar induction over web data, but...◮ works better with more natural language-resources!
missing structural cues:
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 33 / 60
-
Transition Formatting
Formatting:
strong connection between markup and syntax
yields a suite of accurate parsing constraints
improves grammar induction over web data, but...◮ works better with more natural language-resources!
missing structural cues:— e.g., punctuation (will make heavy use)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 33 / 60
-
Transition Formatting
Formatting:
strong connection between markup and syntax
yields a suite of accurate parsing constraints
improves grammar induction over web data, but...◮ works better with more natural language-resources!
missing structural cues:— e.g., punctuation (will make heavy use)
and capitalization (will not discuss today)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 33 / 60
-
Transition Formatting
Formatting:
strong connection between markup and syntax
yields a suite of accurate parsing constraints
improves grammar induction over web data, but...◮ works better with more natural language-resources!
missing structural cues:— e.g., punctuation (will make heavy use)
and capitalization (will not discuss today)
raw word streams often difficult even for humans— e.g., transcribed utterances (Kim and Woodland, 2002)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 33 / 60
-
Example Raw Text
Example: Raw Word Stream
ALTHOUGH IT PROBABLY HAS REDUCED THELEVEL OF EXPENDITURES FOR SOME
PURCHASERS UTILIZATION MANAGEMENTLIKE MOST OTHER COST CONTAINMENTSTRATEGIES DOESN’T APPEAR TO HAVE
ALTERED THE LONG-TERM RATE OFINCREASE IN HEALTH-CARE COSTS THE
INSTITUTE OF MEDICINE AN AFFILIATE OFTHE NATIONAL ACADEMY OF SCIENCESCONCLUDED AFTER A TWO-YEAR STUDY
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 34 / 60
-
Example Formatted Text
Example:
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 35 / 60
-
Example Formatted Text
Example:
[SBAR Although it probably has reduced the level ofexpenditures for some purchasers],
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 35 / 60
-
Example Formatted Text
Example:
[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization
management] —
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 35 / 60
-
Example Formatted Text
Example:
[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization
management] — [PP like most other costcontainment strategies] —
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 35 / 60
-
Example Formatted Text
Example:
[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization
management] — [PP like most other costcontainment strategies] — [VP doesn’t appear to
have altered the long-term rate of increase inhealth-care costs],
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 35 / 60
-
Example Formatted Text
Example:
[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization
management] — [PP like most other costcontainment strategies] — [VP doesn’t appear to
have altered the long-term rate of increase inhealth-care costs], [NP the Institute of Medicine],
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 35 / 60
-
Example Formatted Text
Example:
[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization
management] — [PP like most other costcontainment strategies] — [VP doesn’t appear to
have altered the long-term rate of increase inhealth-care costs], [NP the Institute of Medicine],
[NP an affiliate of the National Academy of
Sciences],
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 35 / 60
-
Example Formatted Text
Example:
[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization
management] — [PP like most other costcontainment strategies] — [VP doesn’t appear to
have altered the long-term rate of increase inhealth-care costs], [NP the Institute of Medicine],
[NP an affiliate of the National Academy of
Sciences], [VP concluded after a two-year study].
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 35 / 60
-
Example Strong Assumption
Intuition:
strong constraint
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 36 / 60
-
Example Strong Assumption
Intuition:
strong constraint: (head ← head) in training
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 36 / 60
-
Example Strong Assumption
Intuition:
strong constraint: (head ← head) in training
word head , head word word ,
head word word word word word word word .
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 36 / 60
-
Example Strong Assumption
Intuition:
strong constraint: (head ← head) in training
word head , head word word ,
head word word word word word word word .
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 36 / 60
-
Example Strong Assumption
Intuition:
strong constraint: (head ← head) in training
word head , head word word ,
head word word word word word word word .
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 36 / 60
-
Example Strong Assumption
Intuition:
strong constraint: (head ← head) in training
Other countries , including West Germany ,
may have a hard time justifying continued membership .
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 36 / 60
-
Example Weak Assumption
Intuition:
weak constraint
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 37 / 60
-
Example Weak Assumption
Intuition:
weak constraint: (head ← external word) in inference
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 37 / 60
-
Example Weak Assumption
Intuition:
weak constraint: (head ← external word) in inference
word word head word word word ,
head word word word word word word word .
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 37 / 60
-
Example Weak Assumption
Intuition:
weak constraint: (head ← external word) in inference
word word head word word word ,
head word word word word word word word .
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 37 / 60
-
Example Weak Assumption
Intuition:
weak constraint: (head ← external word) in inference
word word head word word word ,
head word word word word word word word .
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 37 / 60
-
Example Weak Assumption
Intuition:
weak constraint: (head ← external word) in inference
IFI also has nonvoting preferred shares ,
which are quoted on the Milan stock exchange .
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 37 / 60
-
Linguistic Analysis Constituents
Linguistic Analysis:
punctuation and syntax are related(Nunberg, 1990; Briscoe, 1994;Jones 1994; Doran, 1998, inter alia)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 38 / 60
-
Linguistic Analysis Constituents
Linguistic Analysis:
punctuation and syntax are related(Nunberg, 1990; Briscoe, 1994;Jones 1994; Doran, 1998, inter alia)
49.4% of inter-punctuationfragments are constituents
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 38 / 60
-
Linguistic Analysis Constituents
Linguistic Analysis:
punctuation and syntax are related(Nunberg, 1990; Briscoe, 1994;Jones 1994; Doran, 1998, inter alia)
49.4% of inter-punctuationfragments are constituents
many fragments deriveprecisely themselves:
%IN 9.6NN 7.2NNP 6.3CD 3.8VBD 3.2VBZ 3.0RB 2.8VBG 2.2VBP 1.9NNS 1.8WDT 1.6MD 1.1VBN 1.1
IN VBD 1.0JJ 0.7
52.8
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 38 / 60
-
Linguistic Analysis Strong Dependencies
Training:
strong constraint, e.g.,
... arrests followed a “ Snake Day ” at Utrecht ...
— already 74.0% agreement with head-percolated trees.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 39 / 60
-
Linguistic Analysis Weak Dependencies
Inference:
weak constraint, e.g.,
Maryland Club also distributes tea , which ...
— now 92.9% agreement with head-percolated trees!
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 40 / 60
-
Linguistic Analysis Weak Dependencies
Modeling
Context-Sensitive Unsupervsed Tags
Dependency-and-Boundary Models
Reduced Models of Grammar
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 41 / 60
-
Gold Tags Unlexicalized Tokens
Example: Actually, state-of-the-art uses gold tags!
IN PRP RB VBZ VBN DT NN IN NNS IN DTNNS NN NN IN RBS JJ NN NN NNS VBZ RBVB TO VB VBN DT JJ NN IN NN IN JJ NNSDT NNP IN NNP DT NN IN DT NNP NNP IN
NNPS VBD IN DT JJ NN
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 42 / 60
-
Gold Tags Unlexicalized Tokens
Word Categories: Two benefits of tag-sets...Grouping: pooling statistics of words that play similarsyntactic roles improves generalization (reduces sparsity):
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 43 / 60
-
Gold Tags Unlexicalized Tokens
Word Categories: Two benefits of tag-sets...Grouping: pooling statistics of words that play similarsyntactic roles improves generalization (reduces sparsity):
◮ indeed, working directly with words does not do well.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 43 / 60
-
Gold Tags Unlexicalized Tokens
Word Categories: Two benefits of tag-sets...Grouping: pooling statistics of words that play similarsyntactic roles improves generalization (reduces sparsity):
◮ indeed, working directly with words does not do well.
Disambiguation: for words that take on multiple parts ofspeech, knowing gold tags limits the parsing search space:
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 43 / 60
-
Gold Tags Unlexicalized Tokens
Word Categories: Two benefits of tag-sets...Grouping: pooling statistics of words that play similarsyntactic roles improves generalization (reduces sparsity):
◮ indeed, working directly with words does not do well.
Disambiguation: for words that take on multiple parts ofspeech, knowing gold tags limits the parsing search space:
◮ reducing manual tags to monosemous clusterings lowersperformance below that of the unsupervised categoriesconstructed by Finkel and Manning (2009) / Clark (2003).
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 43 / 60
-
Gold Tags Unlexicalized Tokens
Word Categories: Two benefits of tag-sets...Grouping: pooling statistics of words that play similarsyntactic roles improves generalization (reduces sparsity):
◮ indeed, working directly with words does not do well.
Disambiguation: for words that take on multiple parts ofspeech, knowing gold tags limits the parsing search space:
◮ reducing manual tags to monosemous clusterings lowersperformance below that of the unsupervised categoriesconstructed by Finkel and Manning (2009) / Clark (2003).
Can improve with context-sensitive unsupervised clusters!
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 43 / 60
-
Gold Tags Unlexicalized Tokens
Word Categories: Two benefits of tag-sets...Grouping: pooling statistics of words that play similarsyntactic roles improves generalization (reduces sparsity):
◮ indeed, working directly with words does not do well.
Disambiguation: for words that take on multiple parts ofspeech, knowing gold tags limits the parsing search space:
◮ reducing manual tags to monosemous clusterings lowersperformance below that of the unsupervised categoriesconstructed by Finkel and Manning (2009) / Clark (2003).
Can improve with context-sensitive unsupervised clusters!
1 start with a hard assignment; (standard word clustering)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 43 / 60
-
Gold Tags Unlexicalized Tokens
Word Categories: Two benefits of tag-sets...Grouping: pooling statistics of words that play similarsyntactic roles improves generalization (reduces sparsity):
◮ indeed, working directly with words does not do well.
Disambiguation: for words that take on multiple parts ofspeech, knowing gold tags limits the parsing search space:
◮ reducing manual tags to monosemous clusterings lowersperformance below that of the unsupervised categoriesconstructed by Finkel and Manning (2009) / Clark (2003).
Can improve with context-sensitive unsupervised clusters!
1 start with a hard assignment; (standard word clustering)2 inject context-colored noise; (get out of the local optimum)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 43 / 60
-
Gold Tags Unlexicalized Tokens
Word Categories: Two benefits of tag-sets...Grouping: pooling statistics of words that play similarsyntactic roles improves generalization (reduces sparsity):
◮ indeed, working directly with words does not do well.
Disambiguation: for words that take on multiple parts ofspeech, knowing gold tags limits the parsing search space:
◮ reducing manual tags to monosemous clusterings lowersperformance below that of the unsupervised categoriesconstructed by Finkel and Manning (2009) / Clark (2003).
Can improve with context-sensitive unsupervised clusters!
1 start with a hard assignment; (standard word clustering)2 inject context-colored noise; (get out of the local optimum)3 Viterbi-train a bitag HMM. (the unTagger)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 43 / 60
-
Gold Tags Unlexicalized Tokens
Word Categories: Breaking the barrier...
Swapped out gold tags for unsupervised word categories:
System Description Accuracy
“punctuation” with gold tags 58.4
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 44 / 60
-
Gold Tags Unlexicalized Tokens
Word Categories: Breaking the barrier...
Swapped out gold tags for unsupervised word categories:
System Description Accuracy
“punctuation” with gold tags 58.4“punctuation” with monosemous induced tags 58.2 (−0.2)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 44 / 60
-
Gold Tags Unlexicalized Tokens
Word Categories: Breaking the barrier...
Swapped out gold tags for unsupervised word categories:
System Description Accuracy
“punctuation” with gold tags 58.4“punctuation” with monosemous induced tags 58.2 (−0.2)“punctuation” with context-sensitive induced tags 59.1 (+0.7)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 44 / 60
-
Gold Tags Unlexicalized Tokens
Word Categories: Breaking the barrier...
Swapped out gold tags for unsupervised word categories:
System Description Accuracy
“punctuation” with gold tags 58.4“punctuation” with monosemous induced tags 58.2 (−0.2)“punctuation” with context-sensitive induced tags 59.1 (+0.7)
◮ only a small drop from switching to (monosemous)unsupervised clusters — previous systems lost ∼ 5pts
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 44 / 60
-
Gold Tags Unlexicalized Tokens
Word Categories: Breaking the barrier...
Swapped out gold tags for unsupervised word categories:
System Description Accuracy
“punctuation” with gold tags 58.4“punctuation” with monosemous induced tags 58.2 (−0.2)“punctuation” with context-sensitive induced tags 59.1 (+0.7)
◮ only a small drop from switching to (monosemous)unsupervised clusters — previous systems lost ∼ 5pts;
◮ first state-of-the-art system to improve with(context-sensitive) unsupervised clusters!
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 44 / 60
-
Dependency-and-Boundary Models
Dependency-and-Boundary Models (DBMs):
Use boundary cues in head-driven dependency grammars.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 45 / 60
-
Dependency-and-Boundary Models
Dependency-and-Boundary Models (DBMs):
Use boundary cues in head-driven dependency grammars.
E.g., induce structure by working inwards from edges
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 45 / 60
-
Dependency-and-Boundary Models
Dependency-and-Boundary Models (DBMs):
Use boundary cues in head-driven dependency grammars.
E.g., induce structure by working inwards from edges:
DT NN VBZ IN DT NN
| | | | | |
[The check] is in [the mail].︸ ︷︷ ︸
Subject NP︸ ︷︷ ︸
Object NP
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 45 / 60
-
Dependency-and-Boundary Models
Dependency-and-Boundary Models (DBMs):
Use boundary cues in head-driven dependency grammars.
E.g., induce structure by working inwards from edges:
DT NN VBZ IN DT NN
| | | | | |
[The check] is in [the mail].︸ ︷︷ ︸
Subject NP︸ ︷︷ ︸
Object NP
◮ learn from left fringe (determiner DT) to parse object NP
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 45 / 60
-
Dependency-and-Boundary Models
Dependency-and-Boundary Models (DBMs):
Use boundary cues in head-driven dependency grammars.
E.g., induce structure by working inwards from edges:
DT NN VBZ IN DT NN
| | | | | |
[The check] is in [the mail].︸ ︷︷ ︸
Subject NP︸ ︷︷ ︸
Object NP
◮ learn from left fringe (determiner DT) to parse object NP◮ based on right fringe (noun NN), correctly parse subject NP
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 45 / 60
-
Dependency-and-Boundary Models
Dependency-and-Boundary Models (DBMs):
Use boundary cues in head-driven dependency grammars.
E.g., induce structure by working inwards from edges:
DT NN VBZ IN DT NN
| | | | | |
[The check] is in [the mail].︸ ︷︷ ︸
Subject NP︸ ︷︷ ︸
Object NP
◮ learn from left fringe (determiner DT) to parse object NP◮ based on right fringe (noun NN), correctly parse subject NP◮ between them, glean make-up of larger phrases (e.g., VP)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 45 / 60
-
Dependency-and-Boundary Models
Dependency-and-Boundary Models: Highlights
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 46 / 60
-
Dependency-and-Boundary Models
Dependency-and-Boundary Models: Highlights
DBM-1: Use words at the fringes.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 46 / 60
-
Dependency-and-Boundary Models
Dependency-and-Boundary Models: Highlights
DBM-1: Use words at the fringes.
◮ truly head-outward model (Alshawi, 1996)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 46 / 60
-
Dependency-and-Boundary Models
Dependency-and-Boundary Models: Highlights
DBM-1: Use words at the fringes.
◮ truly head-outward model (Alshawi, 1996)◮ conditions on what can more often be seen!
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 46 / 60
-
Dependency-and-Boundary Models
Dependency-and-Boundary Models: Highlights
DBM-1: Use words at the fringes.
◮ truly head-outward model (Alshawi, 1996)◮ conditions on what can more often be seen!
DBM-2: Fragments differ from complete sentences.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 46 / 60
-
Dependency-and-Boundary Models
Dependency-and-Boundary Models: Highlights
DBM-1: Use words at the fringes.
◮ truly head-outward model (Alshawi, 1996)◮ conditions on what can more often be seen!
DBM-2: Fragments differ from complete sentences.
◮ incomplete fragments are uncharacteristically short
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 46 / 60
-
Dependency-and-Boundary Models
Dependency-and-Boundary Models: Highlights
DBM-1: Use words at the fringes.
◮ truly head-outward model (Alshawi, 1996)◮ conditions on what can more often be seen!
DBM-2: Fragments differ from complete sentences.
◮ incomplete fragments are uncharacteristically short◮ roots of fragments are generally not verbs or modals
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 46 / 60
-
Dependency-and-Boundary Models
Dependency-and-Boundary Models: Highlights
DBM-1: Use words at the fringes.
◮ truly head-outward model (Alshawi, 1996)◮ conditions on what can more often be seen!
DBM-2: Fragments differ from complete sentences.
◮ incomplete fragments are uncharacteristically short◮ roots of fragments are generally not verbs or modals
DBM-3: Learn to stitch together fragments.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 46 / 60
-
Dependency-and-Boundary Models
Class-based, head-outward generation (Alshawi, 1996)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 47 / 60
-
Dependency-and-Boundary Models
Class-based, head-outward generation (Alshawi, 1996)
chce
dir = R adj = T
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 47 / 60
-
Dependency-and-Boundary Models
Class-based, head-outward generation (Alshawi, 1996)
chce
dir = R adj = T
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 47 / 60
-
Dependency-and-Boundary Models
Class-based, head-outward generation (Alshawi, 1996)
ch
dir = R adj = T
cd1
ce
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 47 / 60
-
Dependency-and-Boundary Models
Class-based, head-outward generation (Alshawi, 1996)
ch
dir = R
cd1
ceadj = F
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 47 / 60
-
Dependency-and-Boundary Models
Class-based, head-outward generation (Alshawi, 1996)
ch
dir = R
cd1
ceadj = F
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 47 / 60
-
Dependency-and-Boundary Models
Class-based, head-outward generation (Alshawi, 1996)
ch
dir = R
cd1
adj = F
cd2
ce
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 47 / 60
-
Dependency-and-Boundary Models
Class-based, head-outward generation (Alshawi, 1996)
ch
dir = R
cd1
adj = F
cd2
ce
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 47 / 60
-
Dependency-and-Boundary Models
Class-based, head-outward generation (Alshawi, 1996)
ch
dir = R
cd1
adj = F
cd2
ceSTOP
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 47 / 60
-
Dependency-and-Boundary Models
Class-based, head-outward generation (Alshawi, 1996)
ch
dir = R
cd1
adj = F
cd2
ceSTOP
PROOT(ch | comp)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 47 / 60
-
Dependency-and-Boundary Models
Class-based, head-outward generation (Alshawi, 1996)
ch
dir = R
cd1
adj = F
cd2
ceSTOP
PROOT(ch | comp) PATTACH(cd | ch, dir , cross)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 47 / 60
-
Dependency-and-Boundary Models
Class-based, head-outward generation (Alshawi, 1996)
ch
dir = R
cd1
adj = F
cd2
ceSTOP
PROOT(ch | comp) PATTACH(cd | ch, dir , cross) PSTOP(| dir , adj , ce, comp)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 47 / 60
-
Reduced Models
How to better exploit more data better?
more text in long sentences, but those can be hard...
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 48 / 60
-
Reduced Models
How to better exploit more data better?
more text in long sentences, but those can be hard...◮ require Viterbi training, punctuation constraints, etc.
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 48 / 60
-
Reduced Models
How to better exploit more data better?
more text in long sentences, but those can be hard...◮ require Viterbi training, punctuation constraints, etc.
could we “start small” and still use more data?
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 48 / 60
-
Reduced Models
How to better exploit more data better?
more text in long sentences, but those can be hard...◮ require Viterbi training, punctuation constraints, etc.
could we “start small” and still use more data?◮ ... and wouldn’t it be nice if we could just split things up!
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 48 / 60
-
Reduced Models What If
What if we chopped up input at punctuation?
impact on quantity of data (with a 15-token threshold):◮ more and simpler word sequences incorporated earlier
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 49 / 60
-
Reduced Models What If
What if we chopped up input at punctuation?
impact on quantity of data (with a 15-token threshold):◮ more and simpler word sequences incorporated earlier◮ much more dense coverage of available data:
⋆ number of training inputs goes up 2.2x⋆ number of tokens increases 4.3x
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 49 / 60
-
Reduced Models What If
What if we chopped up input at punctuation?
impact on quantity of data (with a 15-token threshold):◮ more and simpler word sequences incorporated earlier◮ much more dense coverage of available data:
⋆ number of training inputs goes up 2.2x⋆ number of tokens increases 4.3x
but, also impact on quality of data
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 49 / 60
-
Reduced Models What If
What if we chopped up input at punctuation?
impact on quantity of data (with a 15-token threshold):◮ more and simpler word sequences incorporated earlier◮ much more dense coverage of available data:
⋆ number of training inputs goes up 2.2x⋆ number of tokens increases 4.3x
but, also impact on quality of data:◮ many fewer complete sentences exhibiting full structure
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 49 / 60
-
Reduced Models What If
What if we chopped up input at punctuation?
impact on quantity of data (with a 15-token threshold):◮ more and simpler word sequences incorporated earlier◮ much more dense coverage of available data:
⋆ number of training inputs goes up 2.2x⋆ number of tokens increases 4.3x
but, also impact on quality of data:◮ many fewer complete sentences exhibiting full structure◮ even less representative than short inputs:
⋆ but mostly phrases and clauses...
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 49 / 60
-
Reduced Models What If
What if we chopped up input at punctuation?
impact on quantity of data (with a 15-token threshold):◮ more and simpler word sequences incorporated earlier◮ much more dense coverage of available data:
⋆ number of training inputs goes up 2.2x⋆ number of tokens increases 4.3x
but, also impact on quality of data:◮ many fewer complete sentences exhibiting full structure◮ even less representative than short inputs:
⋆ but mostly phrases and clauses...
... however, we have an appropriate model family!
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 49 / 60
-
Reduced Models Example Continued
Example (cont’d)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 50 / 60
-
Reduced Models Example Continued
Example (cont’d)
length & type left & right
complete 51 S IN NN
incomplete 12 SBAR IN NNS2 NP NN NN6 PP IN NNS
14 VP VBZ NNS4 NP DT NNP
8 NP DT NNPS5 VP VBD NN
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 50 / 60
-
Reduced Models Example Continued
Example (cont’d)
DBM-2length & type left & right
complete 51 S IN NN
incomplete 12 SBAR IN NNS2 NP NN NN6 PP IN NNS
14 VP VBZ NNS4 NP DT NNP
8 NP DT NNPS5 VP VBD NN
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 50 / 60
-
Reduced Models Example Continued
Example (cont’d)
DBM-1length & type left & right
complete 51 S IN NN
incomplete 12 SBAR IN NNS2 NP NN NN6 PP IN NNS
14 VP VBZ NNS4 NP DT NNP
8 NP DT NNPS5 VP VBD NN
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 50 / 60
-
Reduced Models Example Continued
Example (cont’d)
length & type left & right
complete 51 S IN NN
incomplete 12 SBAR IN NNS2 NP NN NN6 PP IN NNS
14 VP VBZ NNS4 NP DT NNP
8 NP DT NNPSDBM-3 5 VP VBD NN
partial parse forests “easy-first” (Goldberg and Elhadad, 2010)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 50 / 60
-
Reduced Models Example Continued
Example (cont’d)
length & type left & right
complete 51 S IN NN
incomplete 12 SBAR IN NNS2 NP NN NN6 PP IN NNS
reduced model 14 VP VBZ NNSDBM-0 4 NP DT NNP
8 NP DT NNPS5 VP VBD NN
partial parse forests “easy-first” (Goldberg and Elhadad, 2010)
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 50 / 60
-
Reduced Models Example Continued
Integrated System
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 51 / 60
-
Reduced Models Example Continued
System: Combiners... (from Leapfrog)
-
Reduced Models Example Continued
System: Combiners... (from Leapfrog)
LD
LD
LD
+arg
MAXLD
C1C ∗1 = L(C1)
C2C ∗2 = L(C2)
C ∗1 + C∗2 = C+
-
Reduced Models Example Continued
System: Combiners... (from Leapfrog)
LD
LD
LD
+arg
MAXLD
C1C ∗1 = L(C1)
C2C ∗2 = L(C2)
C ∗1 + C∗2 = C+
LDC2
C1
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 52 / 60
-
Reduced Models Example Continued
System: Inductors... (from Baby Steps and Lateen EM)
-
Reduced Models Example Continued
System: Inductors... (from Baby Steps and Lateen EM)
HL·DBMD
SDBMD
SDBM0D
C
F
S
D0
C1
C2
C ′1
C ′2
-
Reduced Models Example Continued
System: Inductors... (from Baby Steps and Lateen EM)
HL·DBMD
SDBMD
SDBM0D
C
F
S
D0
C1
C2
C ′1
C ′2
CD
lsplit
Dl+1split
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 53 / 60
-
Reduced Models Example Continued
System: Iterating... (from Baby Steps, Less is More, etc.)
-
Reduced Models Example Continued
System: Iterating... (from Baby Steps, Less is More, etc.)
1 2 14 15
-
Reduced Models Example Continued
System: Iterating... (from Baby Steps, Less is More, etc.)
1 2 14 15
HL·DBMDl+1
split∅Cl Cl+1l+1
l+1
-
Reduced Models Example Continued
System: Iterating... (from Baby Steps, Less is More, etc.)
1 2 14 15
HL·DBMDl+1
split∅Cl Cl+1l+1
l+1
HL·DBMD45split
HL·DBMD45C
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 54 / 60
-
Reduced Models Example Continued
System: Results ... on Section 23 of WSJ
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 55 / 60
-
Reduced Models Example Continued
System: Results ... on Section 23 of WSJ
Right-Branching (Klein and Manning, 2004) 31.7%
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 55 / 60
-
Reduced Models Example Continued
System: Results ... on Section 23 of WSJ
Right-Branching (Klein and Manning, 2004) 31.7%DMV @10 34.2%
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 55 / 60
-
Reduced Models Example Continued
System: Results ... on Section 23 of WSJ
Right-Branching (Klein and Manning, 2004) 31.7%DMV @10 34.2%Baby Steps @15 39.2%Baby Steps @45 39.4%
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 55 / 60
-
Reduced Models Example Continued
System: Results ... on Section 23 of WSJ
Right-Branching (Klein and Manning, 2004) 31.7%DMV @10 34.2%Baby Steps @15 39.2%Baby Steps @45 39.4%Soft Parameter Tying (Cohen and Smith, 2009) 42.2%
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 55 / 60
-
Reduced Models Example Continued
System: Results ... on Section 23 of WSJ
Right-Branching (Klein and Manning, 2004) 31.7%DMV @10 34.2%Baby Steps @15 39.2%Baby Steps @45 39.4%Soft Parameter Tying (Cohen and Smith, 2009) 42.2%Less is More @15 44.1%
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 55 / 60
-
Reduced Models Example Continued
System: Results ... on Section 23 of WSJ
Right-Branching (Klein and Manning, 2004) 31.7%DMV @10 34.2%Baby Steps @15 39.2%Baby Steps @45 39.4%Soft Parameter Tying (Cohen and Smith, 2009) 42.2%Less is More @15 44.1%Leapfrog @45 45.0%
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 55 / 60
-
Reduced Models Example Continued
System: Results ... on Section 23 of WSJ
Right-Branching (Klein and Manning, 2004) 31.7%DMV @10 34.2%Baby Steps @15 39.2%Baby Steps @45 39.4%Soft Parameter Tying (Cohen and Smith, 2009) 42.2%Less is More @15 44.1%Leapfrog @45 45.0%
(Gimpel and Smith, 2012) 53.1%(Gillenwater et al., 2010) 53.3%
(Bisk and Hockenmaier, 2012) 53.3%(Blunsom and Cohn, 2010) 55.7%
(Tu and Honavar, 2012) 57.0%
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral Defense Gates #415 (2013-08-14) 55 / 60
-
Reduced Models Example Continued
System: Results ... on Section 23 of WSJ
Right-Branching (Klein and Manning, 2004) 31.7%DMV @10 34.2%Baby Steps @15 39.2%Baby Steps @45 39.4%Soft Parameter Tying (Cohen and Smith, 2009) 42.2%Less is More @15 44.1%Leapfrog @45 45.0%
(Gimpel and Smith, 2012) 53.1%(Gillenwater et al., 2010) 53.3%
(Bisk and Hockenmaier, 2012) 53.3%(Blunsom and Cohn, 2010) 55.7%
(Tu and Honavar, 2012) 57.0%Integrated System — no initializers or gold tags — 64.4%
V.I. Spitkovsky (Stanford & Google) Ph.D. Thesis Oral