joint parsing and alignment with weakly synchronized grammars david burkett, john blitzer, & dan...
TRANSCRIPT
![Page 1: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/1.jpg)
Joint Parsing and Alignment with Weakly Synchronized Grammars
David Burkett, John Blitzer, & Dan Klein
![Page 2: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/2.jpg)
Statistical MT Training Pipeline
1) Align sentence pairs (GIZA++)2) Parse English sentences (Berkeley parser) Parse Foreign sentences
3) Extract rules (Galley et al. 2006)
4) Tune discriminative parameters
在at
办公室office
里in
读了read
书book
read
the
book
in
the
office
} Joint model for (1) & (2)
![Page 3: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/3.jpg)
Data Setting for Joint Models
( 中文 ; )
English WSJ
.
.
.
(EN; )
(EN; )
(EN; )
( 中文 ; )...
( 中文 ; )
Chinese CTBParallel, Aligned CTB
.
.
.
(EN, 中文 ; )
(EN, 中文 ; )
(EN, 中文 ; )
Unlabeled parallel text
.
.
.
(EN; 中文 )
(EN; 中文 )
(EN; 中文 )
![Page 4: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/4.jpg)
Word alignment grids
在at
办公室office
里in
读了read
书book
read
the
book
in
the
office
![Page 5: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/5.jpg)
Syntactic Correspondences
EN中文Build a model
![Page 6: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/6.jpg)
Correspondence via Synchronous Grammars
![Page 7: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/7.jpg)
Synchronous derivation
![Page 8: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/8.jpg)
Synchronous Derivation
![Page 9: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/9.jpg)
Weakly Synchronized Example
![Page 10: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/10.jpg)
Weakly Synchronized Example
Separate PCFGs
![Page 11: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/11.jpg)
Weakly Synchronized Example
ITG alignment
![Page 12: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/12.jpg)
Weakly Synchronized Example
Points for synchronization, but not required
![Page 13: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/13.jpg)
Correspondence Model & Feature Types
办公室office
Feature type 1: Word Alignment
EN 中文
PPPP
Feature type 3: Correspondence
Feature type 2: Monolingual Parser
ENPP
in the office
EN 中文EN 中文
EN 中文EN 中文EN 中文
[HBDK09]
![Page 14: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/14.jpg)
Estimating
EN 中文 EN 中文
• Set to maximize the log-likelihood of the correct parses & alignments
EN EN 中文中文 EN 中文
EN 中文• normalizes to sum to 1
![Page 15: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/15.jpg)
Computing
PP PP Correspondence features tie pieces together
EN 中文
EN 中文
Computing exactly is intractable
EN 中文 EN 中文
Individual , , have polynomial-time dynamic programming algorithms
![Page 16: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/16.jpg)
Approximating : Mean Field
• Exploit tractability in individual models:
• Factored approximation: EN 中文
PPPP
1) Initialize separately
2) Iterate:
• Set to minimize EN 中文
EN 中文
Algorithm
![Page 17: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/17.jpg)
Large scale inference
We can approximate in polynomial time, but . . .EN 中文
Sum over possible alignments is an algorithm.
But computers are fast, right?
• Medium-length sentences are 50 words long• Small translation data sets are 250,000 sentences• ~4 quadrillion operations (See for speedup details)[BBK10, HBDK09]
![Page 18: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/18.jpg)
Quantitative Results: Parsing
Series178
81
84
87
90 Monolingual Joint
![Page 19: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/19.jpg)
Quantitative Results: Parsing
Chinese parser78
81
84
87
90 Monolingual Joint
85.7%
83.6%
![Page 20: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/20.jpg)
Quantitative Results: Parsing
Chinese parser English parser78
81
84
87
90 Monolingual Joint
81.2%
84.5%
![Page 21: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/21.jpg)
Incorrect English PP Attachment
![Page 22: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/22.jpg)
Corrected English PP Attachment
![Page 23: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/23.jpg)
Quantitative Results: Translation
Word alignment65
69
73
77
81
85
89 HMM Discriminative ITG Joint
69.5%
85.0%
BLEU improvement from 29.4 to 30.6
79.5%
![Page 24: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/24.jpg)
Better Translations with Bilingual Adaptation
ReferenceAt this point the cause of the plane collision is still unclear. The local caa will launch an investigation into this .
Baseline (GIZA++)The cause of planes is still not clear yet, local civil aviation department will investigate this .
目前 导致 飞机 相撞 的 原因 尚 不 清楚 , 当地 民航 部门 将 对此 展开 调查Cur-
rently cause plane crash DE reason yet not clear, localcivilaero-
nauticsbureau will toward open
investi-gations
Bilingual Adaptation ModelThe cause of plane collision remained unclear, local civil aviation departments will launch an investigation .
![Page 25: Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual](https://reader035.vdocument.in/reader035/viewer/2022062619/5517fc36550346c6568b501a/html5/thumbnails/25.jpg)
Thanks