backward machine transliteration by learning phonetic similarity
DESCRIPTION
Backward Machine Transliteration by Learning Phonetic Similarity. Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao Lin and Hsin-His Chen. PRESENTED AT SIXTH CONFERENCE ON NATURAL LANGUAGE LEARNING, TAIPEI, TAIWAN,2002. Outline. Motivation Objective Introduction - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/1.jpg)
Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
Advisor : Dr. Hsu
Presenter : Chien Shing Chen
Author: Wei-Hao Lin and Hsin-His Chen
Backward Machine Transliteration by Learning Phonetic Similarity
PRESENTED AT SIXTH CONFERENCE ON NATURAL LANGUAGE LEARNING, TAIPEI, TAIWAN,2002
![Page 2: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/2.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Outline
Motivation Objective Introduction Grapheme-to-Phoneme(音素 ,音位 ) Transformation Similarity Measurement Learning Phonetic Similarity Experimental Result Conclusions Personal Opinion
![Page 3: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/3.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Motivation
a similarity-based framework to model the task of backward transliteration
a learning algorithm to automatically acquire phonetic similarities from a corpus
Backward transliteration: from a transliteration to original language, like “ 本拉登” =>Bin Laden
![Page 4: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/4.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objective
Backward machine transliteration by learning phonetic similarity
雨果 (Yu-guo) => Hugo
![Page 5: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/5.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Introduction
![Page 6: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/6.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Introduction
IPA : International Phonetic Alphabet( 國際音標 )Yu-guo =>h j u g oU
Hugo =>v k uo
Similarity Measurement
![Page 7: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/7.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Introduction
CMU pronunciation dictionary 0.6 版ftp://ftp.cs.cmu.edu/project/fgdata/dict
![Page 8: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/8.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Similarity Measurement-alignment
Set is the alphabet set of two strings S1 and S2. ,where ‘_’ stands for space.
Space can be inserted into S1’ and S2’
S1’ and S2’ are aligned
![Page 9: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/9.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Similarity Measurement-score
<English,Chinese> <Hugo, Yu3-guo3>
the phoneme pair (v k uo, h j u g oU)
={h, j, u, v, g, k, oU, uo, _}
![Page 10: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/10.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Similarity Measurement-score
={h, j, u, v, g, k, oU, uo, _}
![Page 11: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/11.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Similarity Measurement-Dynamic
Dynamic programming to trade off :alignment
similarity scoring matrix M
OPTIMALS1 (j h u g oU)
S2 (v k uo)
![Page 12: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/12.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Dynamic programming-Dynamic
Set T is a n+1 by m+1 table where n is the length S1, m is the length of S2.
![Page 13: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/13.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Learning Phonetic Similarity
develop a learning algorithm to remove the efforts of assigning scores in the matrix
capture the subtle difference
How to prepare a training corpus, followed by the learning algorithm.
![Page 14: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/14.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Learning Phonetic Similarity
Positive pairs: original words and the transliterated words are matched
Negative pairs: mismatch the original words and the transliterated words
Ei: original English
Ci: transliterated Chinese
Corpus with n pairs
克林頓
本拉登
魯賓遜
Clinton
Bin Laden
Robinson n positive pairn (n-1) negative pair
![Page 15: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/15.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Learning Algorithm
Treat each training sample as a linear equation
m is the size of the phoneme sets, m=9
wi,j is the row i and the column j of the scoring matrix
xi,j is a binary value indicating the presence of wi,j in the alignmenty is the similarity score.
![Page 16: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/16.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Learning Algorithm
Linear equation in the corpus can be conveniently represented in the matrix form,
, R is the number of pairs in the corpus
i stands for the ith sample pair in the corpus
•wi,j is the scoring matrix•xi,j is a binary value•y is the similarity score
![Page 17: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/17.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Learning Algorithm
The criterion is the sum-of-squared error minimized.
The classical solution is to take the pseudo inverse of , i.e. ,to obtain the w that minimizes the SSE , i.e.
adopt the Widrow-Hoff rule to solve
![Page 18: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/18.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Learning Algorithm
k stands for the kth row in the matrix X
i for the number of iterations
is the learning rate
is the momentum coefficient.
is empirically set as
as
follows,
![Page 19: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/19.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Learning Algorithm
The w(i) is updated iteratively until the learned w appears to overfit.
The iterations to ensure the w will converge to a vector satisfying
Update w(i) immediately after encountering a new training sample instead of accumulating all errors of training samples
The other speed-up technique is the momentum used to damp the oscillations. .
![Page 20: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/20.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments
.corpus is consisted of 1574 pairs of <English,Chinese> names
313 have no entries in the pronouncing dictionary.
97 phonemes used to represent these names, in which 59 and 51 phonemes are used for Chinese and English names.
Rank is the position of the correct original word in a list of candidate words sorted.
![Page 21: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/21.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments
.
![Page 22: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/22.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments
.
![Page 23: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/23.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusions
Without any phonological analysis, the learning algorithm can acquire those similarities without human intervention.
![Page 24: Backward Machine Transliteration by Learning Phonetic Similarity](https://reader035.vdocument.in/reader035/viewer/2022070405/56814000550346895dab337e/html5/thumbnails/24.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Personal Opinion
Drawbackobtain the score matrix depend on a few empirically rule
Is the experiment tie in with the testing samples ?
ApplicationA different method to compute the similarity between words.
Future WorkThe Widrow-Hoff rule may estimate the parameter to substitute for attempting intervention blinded.
Combine sound speech recognize with this method to output a new objectivity method