Download - Hitting The Right Paraphrases In Good Time
![Page 1: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/1.jpg)
1
Hitting The Right Paraphrases In Good Time
Stanley KokDept. of Comp. Sci. & Eng.
Univ. of WashingtonSeattle, USA
Chris BrockettNLP Group
Microsoft ResearchRedmond, USA
![Page 2: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/2.jpg)
Motivation Background Hitting Time Paraphraser Experiments Future Work
2
Overview
![Page 3: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/3.jpg)
Motivation Background Hitting Time Paraphraser Experiments Future Work
3
Overview
![Page 4: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/4.jpg)
4
What’s a paraphrase of…
ParaphraseSystem
“is on good terms with”
• “is friendly with”
• “is a friend of”• …
Query expansion Document summarization Natural language generation Question answering etc.
Applications
![Page 5: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/5.jpg)
5
What’s a paraphrase of…
ParaphraseSystem
“is on good terms with”
• “is friendly with”
• “is a friend of”• …
Bilingual Parallel Corpora
![Page 6: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/6.jpg)
English Phrase (E)
German Phrase (G)
P(G|E) P(E|G)
under control unter kontrolle 0.75 0.40
in check unter kontrolle 0.60 0.20
... … … …6
Bilingual Parallel Corpus
…the cost dynamic is under control……die kostenentwicklung unter kontrolle……keep the cost in check……die kosten unter kontrolle………
Phrase Table
![Page 7: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/7.jpg)
BCB system [Bannard & Callison-Burch, ACL’05]
P(E2|E1) ¼C G P(E2|G) P(G|E1)
SBP system [Callison-Burch, EMNLP’08]
P(E2|E1) ¼C G P(E2|G,syn(E1)) p(G|E1, syn(E1))
7
State of the Art
![Page 8: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/8.jpg)
8E1E2
G1 F2
P(F2|E1)
P(E2|F2)
P(G1|E1)P(E2|G1)
E3E4
(in check) (under control)
G2G3
(unter kontrolle)F1
Graphical View
![Page 9: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/9.jpg)
9
Graphical ViewPath lengths > 2General graphAdd nodes to represent domain knowledge
Random WalksHitting Times
G1 F2G2G3 F1
E1E2E3E4
![Page 10: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/10.jpg)
Motivation Background Hitting Time Paraphraser Experiments Future Work
10
Overview
![Page 11: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/11.jpg)
AA
Random Walk Begin at node A Randomly pick neighbor n
E
F
D
B
C11
![Page 12: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/12.jpg)
Random Walk Begin at node A Randomly pick neighbor n Move to node n
E
F
D A
2B
C12
![Page 13: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/13.jpg)
Random Walk Begin at node A Randomly pick neighbor n Move to node n Repeat
E
F
D A
B
2C13
![Page 14: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/14.jpg)
Expected number of steps starting from node i before node j is visited for first time Smaller hitting time → closer to start node i
Truncated Hitting Time [Sarkar & Moore, UAI’07]
Random walks are limited to T steps Computed efficiently & with high probability by
sampling random walks [Sarkar, Moore & Prakash ICML’08]
14
Hitting Time from node i to j
![Page 15: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/15.jpg)
Finding Truncated Hitting Time By Sampling
E
F
D 1
B
C
A
A
T=5
15
![Page 16: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/16.jpg)
Finding Truncated Hitting Time By Sampling
E
F
4 A
B
C
D
A D
T=5
16
![Page 17: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/17.jpg)
Finding Truncated Hitting Time By Sampling
5
F
D A
B
C
E
A D E
T=5
17
![Page 18: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/18.jpg)
Finding Truncated Hitting Time By Sampling
E
F
4 A
B
C
D
A D E D
T=5
18
![Page 19: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/19.jpg)
Finding Truncated Hitting Time By Sampling
E
6
D A
B
CF
A D E D F
T=5
19
![Page 20: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/20.jpg)
Finding Truncated Hitting Time By Sampling
5
F
D A
B
C
E
A D E D F E
T=5
20
![Page 21: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/21.jpg)
Finding Truncated Hitting Time By Sampling
A D E D F E
T=5
E
F
D A
B
C
hAD=1hAE=2
hAF=4
hAA=0hAB=5
hAC=5
21
![Page 22: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/22.jpg)
Motivation Background Hitting Time Paraphraser Experiments Future Work
22
Overview
![Page 23: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/23.jpg)
23
Hitting Time Paraphraser (HTP)
ParaphraseSystem
“is on good terms with”
• “is friendly with”
• “is a friend of”• …
HTP
Phrase TablesEnglish-GermanEnglish-FrenchGerman-Frenchetc.
Phrase Paraphrases
![Page 24: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/24.jpg)
24
Graph Construction
![Page 25: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/25.jpg)
25
Graph Construction
![Page 26: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/26.jpg)
BFS from query phrase up to depth d or up to max. number n of nodes d = 6, n = 50,000
26
… … … ……
…
……
…Graph Construction
![Page 27: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/27.jpg)
27
Graph Construction
… … … ……
…
……
…
0.250.35
![Page 28: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/28.jpg)
28
Graph Construction
… … … ……
…
……
…
0.6
![Page 29: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/29.jpg)
29
Graph Construction
… … … ……
…
……
…
0.50.5
![Page 30: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/30.jpg)
Run m truncated random walks to estimate truncated hitting time of each node T = 10, m = 1,000,000
Prune nodes with hitting times = T
Estimate Trunc. Hitting Times
![Page 31: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/31.jpg)
31
Add Ngram Nodes
“achieve the goal”“achieve the aim”“reach the objective”
“the”……
“achieve the” “the aim”“reach” “objective”
![Page 32: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/32.jpg)
32
Add “Syntax” Nodes
“whose goal is” “the aim is”“the objective is” “what goal”
start with article end with be start with interrogatives
![Page 33: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/33.jpg)
33
Add Not-Substring-Of Nodes
“reach the” “reach the aim”“reach the objective” “objective”
not-substring-of
![Page 34: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/34.jpg)
34
Feature Nodes
ngram nodes
“syntax” nodes
not-substring nodes
phrase nodes
p2
p1
p3
p4 = 0.4= 0.1
= 0.4
= 0.1
![Page 35: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/35.jpg)
Run m truncated random walks again Rank paraphrases in increasing order of
hitting times
35
Re-estimate Truncated Hitting Times
![Page 36: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/36.jpg)
Motivation Background Hitting Time Paraphraser Experiments Future Work
36
Overview
![Page 37: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/37.jpg)
Europarl dataset [Koehn, MT-Summit’05]
Use 6 of 11 languages: English, Danish, German, Spanish, Finnish, Dutch
About a million sentences per language English−Foreign phrasal alignments by giza++
[Callison-Burch, EMNLP’08]
Foreign−Foreign phrasal alignments by MSR aligner
37
Data
![Page 38: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/38.jpg)
SBP system [Callison-Burch, EMNLP’08]
HTP with no feature node HTP with bipartite graph
38
Comparison Systems
![Page 39: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/39.jpg)
NIST dataset 4 English translations per Chinese sentence 33,216 English translations
Randomly selected 100 English phrases From 1-4grams in both NIST & Europarl datasets Exclude stop words, numbers, phrases containing
periods and commas
39
Evaluation Methodology
![Page 40: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/40.jpg)
For each phrase, randomly select a sentence from NIST dataset containing it
Substituted top 1 to 10 paraphrases for phrase
40
Methodology
![Page 41: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/41.jpg)
Manually evaluated resulting sentences 0: Clearly wrong; grammatically incorrect or does not preserve meaning 1: Minor grammatical errors (e.g., subject-verb disagreement; wrong tenses, etc.), or meaning largely preserved but not completely 2: Totally correct; grammatically correct and meaning is preserved
Correct: 1 and 2; Wrong: 0 Two evaluators; Kappa = 0.62 (substantial agree.)
41
Methodology
![Page 42: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/42.jpg)
42
Phr. HTP SBPq1
q2
… … …q49
q50
q51
… …q100
HTP vs. SBP
p11 p21 p31 p41 p51 p61 p71 p81 p91 p101 p111 p121
p12 p22 p32 p42 p52
p149 p249p349p449p549p649p749p849
p11 p21 p31 p41 p51 p61 p71
p12 p22 p32
p149p249p349p449p549
p150 p250 p350p450p550 p650p750
p151 p251 p351p451p551 p651p751p851
p1100p2100p3100 p410
0p5100p6100p7100p8100
p951p1051 p1151p1251
0.71 0.53
![Page 43: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/43.jpg)
43
Phr. HTP SBPq1
q2
… … …q49
q50
q51
… …q100
HTP vs. SBP
p11 p21 p31 p41 p51 p61 p71 p81 p91 p101 p111 p121
p12 p22 p32 p42 p52
p149 p249p349p449p549p649p749p849
p11 p21 p31 p41 p51 p61 p71
p12 p22 p32
p149p249p349p449p549
p150 p250 p350p450p550 p650p750
p151 p251 p351p451p551 p651p751p851 p951p1051 p1151p1251
0.56 0.39
373
paraphrases per
system
p1100p2100p3100 p410
0p5100p6100p7100p8100
![Page 44: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/44.jpg)
44
Phr. HTP SBPq1
q2
… … …q49
q50
q51
… …q100
HTP vs. SBP
p11 p21 p31 p41 p51 p61 p71 p81 p91 p101 p111 p121
p12 p22 p32 p42 p52
p149 p249p349p449p549p649p749p849
p11 p21 p31 p41 p51 p61 p71
p12 p22 p32
p149p249p349p449p549
p150 p250 p350p450p550 p650p750
p151 p251 p351p451p551 p651p751p851 p951p1051 p1151p1251
483
paraphrases
0.54
p1100p2100p3100 p410
0p5100p6100p7100p8100
![Page 45: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/45.jpg)
45
Phr. HTP SBPq1
q2
… … …q49
q50
q51
… …q100
HTP vs. SBP
p11 p21 p31 p41 p51 p61 p71 p81 p91 p101 p111 p121
p12 p22 p32 p42 p52
p149 p249p349p449p549p649p749p849
p11 p21 p31 p41 p51 p61 p71
p12 p22 p32
p149p249p349p449p549
p150 p250 p350p450p550 p650p750
p151 p251 p351p451p551 p651p751p851 p951p1051 p1151p1251
0.53
p1100p2100p3100 p410
0p5100p6100p7100p8100
0.50
0.71
0.61
![Page 46: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/46.jpg)
46
Phr. HTP SBPq1
q2
… … …q49
q50
q51
… …q100
HTP vs. SBP
p11 p21 p31 p41 p51 p61 p71 p81 p91 p101 p111 p121
p12 p22 p32 p42 p52
p149 p249p349p449p549p649p749p849
p11 p21 p31 p41 p51 p61 p71
p12 p22 p32
p149p249p349p449p549
p150 p250 p350p450p550 p650p750
p151 p251 p351p451p551 p651p751p851 p951p1051 p1151p1251
0.54 0.39
p1100p2100p3100 p410
0p5100p6100p7100p8100975
paraphrases
0.32
373
paraphrases
492
paraphrases
0.43
420 correct
paraphrases
145 correct
paraphrases
![Page 47: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/47.jpg)
47
Timings
System Timing (secs/phrase)
HTP 48
SBP 468
![Page 48: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/48.jpg)
Motivation Background Hitting Time Paraphraser Experiments Future Work
48
Overview
![Page 49: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/49.jpg)
Apply HTP to languages other than English Evaluate HTP impact on applications
e.g., improve performance of resource-sparse machine translation systems
Add more features etc.
49
Future Work
![Page 50: Hitting The Right Paraphrases In Good Time](https://reader035.vdocument.in/reader035/viewer/2022081520/5681635d550346895dd42809/html5/thumbnails/50.jpg)
HTP: a paraphrase system based on random walks Good paraphrases have smaller hitting times General graph Path length > 2 Incorporate domain knowledge
HTP outperforms state-of-the-art
50
Conclusion