a survey of unsupervised grammar induction
DESCRIPTION
A Survey of Unsupervised Grammar Induction. Baskaran Sankaran Senior Supervisor: Dr Anoop Sarkar. School of Computing Science Simon Fraser University. Motivation. Languages have hidden regularities karuppu naay puunaiyai thurathiyathu iruttil karuppu uruvam marainthathu - PowerPoint PPT PresentationTRANSCRIPT
A Survey of Unsupervised Grammar InductionBaskaran Sankaran
Senior Supervisor:Dr Anoop Sarkar
School of Computing ScienceSimon Fraser University
2
MotivationLanguages have hidden
regularitieskaruppu naay puunaiyai thurathiyathuiruttil karuppu uruvam marainthathunaay thurathiya puunai vekamaaka ootiyathu
3
MotivationLanguages have hidden
regularitieskaruppu naay puunaiyai thurathiyathuiruttil karuppu uruvam marainthathunaay thurathiya puunai vekamaaka ootiyathu
4
FORMAL STRUCTURES
5
Phrase-Structure
Sometimes the bribed became partners in the company
6
Phrase-Structure
Binarize, CNF
• Sparsity issue with words• Use POS tags
S
ADVP
@S
RB NP VP
VBD
@VP
NP PP
DT
VBN
NNS
IN NP
IN DT NN
S ADVP @S@S NP VPVP VBD @VP@VP NP PPNP DT VBNNP DT NNNP NNSPP IN NPADVP
RB
IN IN
7
Evaluation Metric-1Unsupervised
Induction◦ Binarized output tree
Possibly unlabelledEvaluation
◦ Gold treebank parse◦ Recall - % of true
constituents found◦ Also precision and F-
scoreWall Street Journal
(WSJ) dataset
S
X X
XX VBD
X
X X
IN
X
VBNDTRB
NNS
NNDT
8
Dependency Structure
VBD
VBN NNS
VBN*DT
VBD*
IN
IN* NN
DT NN*
RB
Sometimes
the
NNS*
the company
bribed partners
became
in
9
Dependency Structure
VBDDT NNS NNIN DTVBNRB
Sometimes the bribed became partners in the company
10
Evaluation Metric-2Unsupervised Induction
◦Generates directed dependency arcsCompute (directed) attachment
accuracy◦Gold dependencies◦WSJ10 dataset
VBDDT NNS NNIN DTVBNRB
Sometimes the bribed became partners in the company
11
Unsupervised Grammar Induction
To learn the hidden structure of a language◦POS tag sequences as input◦Generates phrase-structure/ dependencies ◦No attempt to find the meaning
Overview◦Phrase-structure and dependency grammars◦Mostly on English (few on Chinese, German
etc.)◦Learning restricted to shorter sentences◦Significantly lags behind the supervised
methods
12
PHRASE-STRUCTURE INDUCTION
13
Toy ExampleCorpus
the dog bites a man dog sleeps
a dog bites a bone the man sleeps Grammar
S NP VP NP N N manVP V NP Det a N boneVP V Det the V sleepsNP Det N N dog V bites
14
EM for PCFG(Baker ’79; Lari and Young ’90)
Inside-Outside◦EM instance for probabilistic CFG
Generalization of Forward-backward for HMMs
◦Non-terminals are fixed◦Estimate maximum likelihood rule
probabilities
S NP VP V --> dogNP --> Det N Det --> manNP --> N N --> manVP --> V V --> manVP --> V NP Det --> boneVP --> NP V N --> boneDet --> the V --> bone
N --> the Det --> bites
V --> the N --> bitesDet --> a V --> bitesN --> a Det --> sleepsV --> a N --> sleepsDet --> dog V --> sleeps
N --> dog
S NP VP 1.0 V --> dogNP --> Det N 0.875 Det --> manNP --> N 0.125 N --> man 0.375VP --> V 0.5 V --> manVP --> V NP 0.5 Det --> boneVP --> NP V N --> bone 0.125Det --> the 0.428571 V --> bone
N --> the Det --> bites
V --> the N --> bitesDet --> a 0.571429 V --> bites 0.5N --> a Det --> sleepsV --> a N --> sleeps 0.5Det --> dog V --> sleeps
N --> dog 0.5
15
Inside-OutsideSometim
esthe
bribed
became
partners
in
the
company
@S NP VP
P(NP the bribed)
P(@S NP VP)
P(VP became … company)
P(S Sometimes @S)
16
Constraining Search
Sometimes
the
bribed
became
partners
in
the
company
(Pereira and Schabes ’92; Schabes et al. ’93)
17
Constraining Search(Pereira and Schabes ’92; Schabes et al. ’93; Hwa ’99)
Treebank bracketings◦Bracketing boundaries constrain
inductionWhat happens with limited
supervision?◦More bracketed data exposed iteratively◦0% bracketed data◦100% bracketed data
Right-branching baseline
Recall: 50.0Recall: 78.0
Recall: 76.0
18
Distributional clustering(Adriaans et al. ’00; Clark ’00; van Zaanen ’00)
Cluster the word sequences◦Context: adjacent words or
boundaries◦Relative frequency distribution of
contexts the black dog bites the manthe man eats an apple
Identifies constituents◦Evaluation on ATIS corpus
Recall: 35.6
19
Constituent-Context Model(Klein and Manning ’02)
• Valid constituents in a tree should not crossS
XX
X X
VBD
X
X
DT
VBN
X
XX
DT
NN
RB
NNS
IN
S
X X
XX VBD
X
X X
IN
X
VBNDTRB
NNS
NNDT
20
Constituent-Context ModelSometim
esthe
bribed
became
partners
in
the
company
DT
VBN
RB
VBD
RecallRight-branch:
70.0CCM: 81.6
S
XX
X X
VBD
X
X
DT
VBN
X
XX
DT
NN
RB
NNS
IN
21
DEPENDENCY INDUCTION
22
Dependency Model w/ Valence
(Klein and Manning ’04)Simple generative model
◦Choose head – P(Root)◦End – P(End | h, dir, v)
Attachment dir (right, left) Valence (head outward)
◦Argument – P(a | h, dir)
Dir Accuracy CCM: 23.8DMV: 43.2 Joint: 47.5
VBDDT NNS NNIN DTVBNRB
Sometimes the bribed became partners in the company
Sometimes
the
bribed
became
partners
in
the
company
• Head – P(Root)• Argument – P(a | h, dir)• End – P(End | h, dir, v)
23
DMV Extensions(Headden et al. ’09; Blunsom and Cohn ’10)
Extended Valence (EVG)◦Valence frames for the head
Allows different distributions over arguments
Lexicalization (L-EVG)Tree Substitution Grammar
◦Tree fragments instead of CFG rules
Dir Acc: 68.8
Dir Acc: 65.0
VBDDT NNS NNIN DTVBNRBSometimes the bribed became partners in the company
Dir Acc: 67.7
24
MULTILINGUAL SETTING
25
Bilingual Alignment & Parsing
(Wu ’97)Inversion Transduction Grammar
(ITG)◦Allows reordering S
X X
e2
f4
e1
f3
e4
f2
e3
f1
e1 e2 e3 e4
f1 f2 f3 f4
26
Bilingual Parsing(Snyder et al. ’09)
Bilingual Parsing◦PP Attachment ambiguity
I saw (the student (from MIT)1 )2
◦Not ambiguous in Urdu)میں سے( آ�ئٹی ) 1یم علم( دیکھ� 2ط�لب کو
I ((MIT of) student) saw
27
Summary & OverviewEM for PCFG
Constrain with bracketingDistributional Clustering
CCMDMV
Contrastive EstimationEVG & L-EVGTSG + DMV
Data-oriented Parsing
Parametric Search Methods
Structural Search Methods
EM for PCFGConstrain with bracketing
Contrastive Estimation
Distributional Clustering
CCMDMV
EVG & L-EVGTSG + DMV
Data-oriented Parsing
•State-of-the-art• Phrase-structure (CCM +
DMV)Recall: 88.0
• Dependency (Lexicalized EVG)
Dir Acc: 68.8
PrototypePrototype
28
QUESTIONS?
Thanks!
29
MotivationLanguages have hidden
regularities
30
MotivationLanguages have hidden
regularities◦The guy in China◦… new leader in China◦That’s what I am asking you …◦I am telling you …
31
Issues with EM(Carroll and Charniak ’92; Periera and Schabes ’92; de
Marcken ’05)(Liang and Klein ’08; Spitkovsky et al. ’10)
Phrase-structure◦Finds local maxima instead of global◦Multiple ordered adjuctions
Both phrase-structure & dependency◦Disconnect between likelihood and
optimal grammar
32
Constituent-Context Model(Klein and Manning ’02)
CCM◦Only constituent identity◦Valid constituents in a tree should
not cross
33
Bootstrap phrases(Haghighi and Klein ’06)
Bootstrap with seed examples for constituents types◦Chosen from most frequent treebank
phrases◦Induces labels for constituents
Integrate with CCM◦CCM generates brackets
(constituents)◦Proto labels them
Recall: 59.6
Recall: 68.4
34
Dependency Model w/ Valence
(Klein and Manning ’04)Simple generative model
◦Choose head; attachment dir (right, left)
◦Valence (head outward) End of generation modelled separately
Dir Acc: 43.2
VBDDT NNS NNIN DTVBNRB
Sometimes the bribed became partners in the company
35
Learn from how not to speakContrastive Estimation (Smith
and Eisner ’05)◦Log-linear Model of dependency
Features: f(q, T) P(Root); P(a | h, dir); P(End | h, dir, v)
Conditional likelihood
36
Learn from how not to speak(Smith and Eisner ’05)
Contrastive Estimation Ex. the brown cat vs. cat brown the
◦Neighborhoods Transpose (Trans), delete & transpose
(DelOrTrans)
Dir Acc: 48.8
37
DMV Extensions-1(Cohen and Smith ’08, ’09)
Tying parameters◦Correlated Topic Model (CTM)
Correlation between different word types◦Two types of tying parameters
Logistic Normal (LN) Shared LN
Dir Acc: 61.3Dir Acc: 61.3
38
DMV Extensions-2VBD
VBN
NNS
VBN*
DT
VBD*
IN
IN*
NN
DT NN*
RB
Sometimes
the
NNS*
the company
bribed
partners
became
in
VBD
VBN
VBD
became
VBD
NNSVBD
NNS
(Blunsom and Cohn ’10)
NNS
IN
IN
in
NN
39
DMV Extensions-2(Blunsom and Cohn ’10)
Tree Substitution Grammar (TSG)◦Lexicalized trees◦Hierarchical prior
Different levels of backoff
Dir Acc: 67.7
VBD
VBN
VBD
became
VBD
NNSVBD
NNS
NNS
IN
IN
in
NN