multi-layer filtering algorithm bilingual chunk alignment in statistical machine translation an...
TRANSCRIPT
Multi-Layer Filtering algorithm
Bilingual Chunk Alignment In Statistical Machine
Translation
An introduction of Multi-Layer Filtering (MLF) algorithm
Dawei Hou
LING 575 MT WIN07
2
Multi-Layer Filtering algorithm
What is the “Chunk” here ?
In this paper:
The “Chunk” doesn’t rely on the information from
tagging, parsing, syntax analyzing or segmenting
A “Chunk” is a continuous words order
3
Multi-Layer Filtering algorithm
Why do we use “Chunk” in translations?
Can leads to more fluent translations since chunk-based
translations capture local reordering phenomena.
Can successfully makes long sentences shorter, which
benefits SMT algorithm’s performance.
Obtains accurate one-to-one alignment of each pair bilingual
chunks.
Greatly decrease search space and time complexity during
translation.
4
Multi-Layer Filtering algorithm
What about other approaches?
What about word-based translations?
5
Multi-Layer Filtering algorithm
Some background
SMT systems employ word-based alignment models
based on the five word-based statistical models
proposed by IBM.
Problem:
Still suffer from poor performance when used in the
language pairs which have great differences in
structures since these models fundamentally rely on
word-level translation.
6
Multi-Layer Filtering algorithm
Some background
Alignment algorithms based on phrases, chunks or
structures and most of them based on complex
syntax information.
Problem:
Have proven to yield poor performance when dealing
with long sentences;
Heavily depend on the performance of associated
tools such as parsers, POS taggers ....
7
Multi-Layer Filtering algorithm
How do we get improvements from those
problems by using chunk-based translations?
8
Multi-Layer Filtering algorithm
Multi-Layer Filtering algorithm
To discover one-to-one pairs of bilingual chunks in the
untagged well-formed bilingual sentence pairs
Multi-Layers are used to extract bilingual chunks
according to different features of chunks in the
bilingual corpus.
9
Multi-Layer Filtering algorithm
Summarization of Procedures
Filtering the most frequent chunks
Clustering the similar words and filtering the most frequent
structures
Deal with the remnant fragment
Keeping one-to-one alignment
10
Multi-Layer Filtering algorithm
Filtering the most frequent chunks -- Step 1
Assumption:
The most co-occurrent word lists might be a potential chunk.
Apply the formula-1 list below, we filter those word lists as
initial monolingual chunks;
1 2 1 2 1 2( , ,... ) (1 ) ( , ,... ) ( , ,... )k k k kD D w w w MI w w w P w w w
1 21 2 1 2
1 2
( , ,... )( , ,... ) ( , ,... )
( ) ( ) ... ( )
kk k
k
P w w wMI w w w P w w w log
P w P w P w
formula-1
formula-2
11
Multi-Layer Filtering algorithm
The result of Filtering Step 1
What || kind || of || room || do || you || want || to || reserve
1.36 1.31 0.046 0.063 10.07 0.61 2.11 0.077
你 || 想 || 预 || 定 || 什 || 么 || 样 || 的 || 房 || 间
0.69 0.17 1.39 0.076 7.80 0.87 0.30 1.27
4.52
An example :
12
Multi-Layer Filtering algorithm
Filtering the most frequent chunks -- Step 2
Now we have :
All the cohesion degrees between any two adjacent words in
Source and Target sentences.
Applying the formula-3 list below, we will find the entire set of
initial monolingual chunks;
formula-3
_ _ _int{ }
_max _ _ _ _
length of a sentencen
the imum length of a chunk
13
Multi-Layer Filtering algorithm
The result of Filtering Step 2-1
What || kind || of || room || do || you || want || to || reserve
1.36 1.31 0.046 0.063 10.07 0.61 2.11 0.077
你 || 想 || 预 || 定 || 什 || 么 || 样 || 的 || 房 || 间
0.69 0.17 1.39 0.076 7.80 0.87 0.30 1.27
4.52
In this case: n = int{ 10/4 } = 2;
14
Multi-Layer Filtering algorithm
The result of Filtering Step 2-(1)-EN
Initial ChunksInitial Chunks DDkk DDkk** Initial ChunksInitial Chunks DDkk DDkk**
What kind 1.36 1.36 You want 0.61 0.61
What kind of 2.10 5.25 You want to 0.33 0.82
Kind of 1.31 1.31 You want to reserve
0.086 0.60
Do you 10.07 10.07 Want to 2.11 2.11
Do you want 0.31 0.77 Want to reserve 0.056 0.14
Do you want to 0.13 0.90 To reserve 0.077 0.077
Now we get a table of the initial monolingual chunks;
2
( )*
( )
kk k
Max DD D
Max D
formula-4
15
Multi-Layer Filtering algorithm
The result of Filtering Step 2-(2)-EN
Initial ChunksInitial Chunks DDkk DDkk** Initial ChunksInitial Chunks DDkk DDkk**
What kind 1.36 1.36 You wantYou want 0.610.61 0.610.61
What kind of 2.10 5.25 You want toYou want to 0.330.33 0.820.82
Kind of 1.31 1.31 You want to You want to reservereserve
0.0860.086 0.600.60
Do you 10.07 10.07 Want to 2.11 2.11
Do you wantDo you want 0.310.31 0.770.77 Want to reserveWant to reserve 0.0560.056 0.140.14
Do you want toDo you want to 0.130.13 0.900.90 To reserveTo reserve 0.0770.077 0.0770.077
Set threshold Dk*> 1.0 , we get :
We still need more steps to do maximum matching and overlap discarding;
16
Multi-Layer Filtering algorithm
The result of Filtering Step 2-(3)-EN
Initial ChunksInitial Chunks DDkk DDkk** Initial ChunksInitial Chunks DDkk DDkk**
What kind 1.36 1.36 Want to 2.11 2.11
What kind of 2.10 5.25
Kind of 1.31 1.31 Do you 10.07 10.07
According to the maximum matching principle and Preventing overlapping problem, we need to apply :
formula-4: 1
k
k
D
D
1
i
i
K
K
D
D
formula-5:
17
Multi-Layer Filtering algorithm
The result of Filtering Step 2-(4)-EN
Deal with the remnant fragment:
we simply combine such individual or sequential words as a chunk.
So we get a much shorter sentence lists below:
What & kind & of || room || do & you || want & to || reserve
18
Multi-Layer Filtering algorithm
The result of Filtering Step 2-(1)-CN
What || kind || of || room || do || you || want || to || reserve
1.36 1.31 0.046 0.063 10.07 0.61 2.11 0.077
你 || 想 || 预 || 定 || 什 || 么 || 样 || 的 || 房 || 间
0.69 0.17 1.39 0.076 7.80 0.87 0.30 1.27
4.52
In this case: n = int{ 10/4 } = 2;
19
Multi-Layer Filtering algorithm
The result of Filtering Step 2-(2)-CN
Initial ChunksInitial Chunks DDkk DDkk** Initial ChunksInitial Chunks DDkk DDkk**
你想 0.69 0.69 么样的房 0.13 0.55
预定 2.39 2.39 样的 0.30 0.30
什么 7.80 7.80 样的房 0.13 0.30
什么样 0.44 1.00 样的房间 0.21 0.88
什么样的 0.58 2.44 的房 1.27 1.27
么样 0.87 0.87 的房间 2.45 5.88
么样的 0.37 0.84 房间 4.52 4.52
Now we get a table of the initial monolingual chunks;
2
( )*
( )
kk k
Max DD D
Max D
formula-4
20
Multi-Layer Filtering algorithm
The result of Filtering Step 2-(3)-CN
Set threshold Dk*> 1.0 , we get :
We still need more steps to do maximum matching and overlap discarding;
Initial ChunksInitial Chunks DDkk DDkk** Initial ChunksInitial Chunks DDkk DDkk**
你想你想 0.690.69 0.690.69 么样的房么样的房 0.130.13 0.550.55
预定 2.39 2.39 样的样的 0.300.30 0.300.30
什么 7.80 7.80 样的房样的房 0.130.13 0.300.30
什么样什么样 0.440.44 1.001.00 样的房间样的房间 0.210.21 0.880.88
什么样的 0.58 2.44 的房 1.27 1.27
么样么样 0.870.87 0.870.87 的房间 2.45 5.88
么样的么样的 0.370.37 0.840.84 房间 4.52 4.52
21
Multi-Layer Filtering algorithm
The result of Filtering Step 2-(4)-CN
Initial ChunksInitial Chunks DDkk DDkk** Initial ChunksInitial Chunks DDkk DDkk**
预定 2.39 2.39 的房 1.27 1.27
什么 7.80 7.80 的房间 2.45 5.88
什么样的 0.58 2.44 房间 4.52 4.52
According to the maximum matching principle :
的
By applying formula-4:
1
k
k
D
D
max( D 什么样的 /D 什么样 ,D 的房间 /D 房间 ) = max(2.44,1.30) = 2.44
?
22
Multi-Layer Filtering algorithm
The result of Filtering Step 2-(5)-CN
Deal with the remnant fragment:
we simply combine such individual or sequential words as a chunk.
So we get a much shorter sentence lists below:
你 || 想 || 预 & 定 || 什 & 么 & 样 & 的 || 房 & 间
23
Multi-Layer Filtering algorithm
Some problems
After fisrt filtering process, suppose we found an aligned chunk pairs:
|| 在 & 五 & 点 ||
|| at & five & o’clock ||
But some potentially good chunks like:
Might have been broken into several fragments like:
Since this structure include word sequences with low frequency of occurrence (we suppose “six” is lower frequent than “five” here )
|| at & six & o’clock ||
|| at || six || o’clock ||
24
Multi-Layer Filtering algorithm
Clustering the similar words and filtering the most frequent structures
Many frequent chunks have similar structures but different in
detail.
We can cluster similar words according to the position vectors of
their behavior relative to anchor words.
For all of the words in the same class, we suppose they are good
chunks, then filter the most frequent structures according the
method introduced before.
25
Multi-Layer Filtering algorithm
Clustering the similar words and filtering the most frequent structures – Step 1
In the corpus resulting from the first filtering process, find the most
frequent words as anchor words, for example:
RankRank 11 22 33 44 55 66 77 88 99 1010
WordWord the a to this for in on of at room
Why we use most frequent words?
As the anchor words are the most common words, a great deal of information can be obtained.
Words in similar position vectors in relation to anchor words can be assumed to belong to similar word
classes.
26
Multi-Layer Filtering algorithm
Clustering the similar words and filtering the most frequent structures – Step 2
Build words vectors and define the size of the window for observation.
(in this case windows size = 5)
For instance, we build a word vector which anchor word is “in” and we
observe a candidate word “the” to be clustered falls within the window:
SizeSize 55
PositionPosition w-2w-2 w-1w-1 ww w+1w+1 w+2w+2
WordWord the the in the the
ValueValue 16 1 0 415 0
Formula-7,8:
1
( , )N
ij j
k
V w w
1_____( , )
0 _____j
wj ww w
wj w
27
Multi-Layer Filtering algorithm
Clustering the similar words and filtering the most frequent structures – Step 3
In order to compare vectors fairly, these vectors must be
normalized by formular-9 as follows:
1
*ij
ij m
ij
j
VV
V
Example : “in/that” and “in/this”
28
Multi-Layer Filtering algorithm
Clustering the similar words and filtering the most frequent structures – Step 4
Measure the similarities of various vectors and cluster the words
which have similar distributions relative to the anchor words:
2)
1
( , ( )K
x y xj yj
j
D V V V V
Euclidian distance:
Example result:
Word classisWord classis Anchor Anchor wordswords
Single double twin standard suite different quiet (a, room)
the my your this that our (in, room)
America all fact Japan English (in, )
29
Multi-Layer Filtering algorithm
Clustering the similar words and filtering the most frequent structures – Step 5
For all of the words in the same class, replace with a particular
symbol, and then consider this symbol as an ordinary word. Then
filter the most frequent structures my Multi-Layer Filtering algorithm
again.
For instance, if we have:
|| 在 & 五 & 点 ||
|| at & five & o’clock ||
parallel word classes:
& { One, two,…, five..., twelve }
We will get :
{ 一 , 二 ,…, 五 ..., 十二 }
|| 在 & 一 & 点 ||
|| at & one & o’clock ||
|| 在 & 两 & 点 ||
|| at & two & o’clock ||...
30
Multi-Layer Filtering algorithm
Keeping one-to-one alignment
Next step:
31
Multi-Layer Filtering algorithm
Keeping one-to-one alignment
Now we have a pair of new parallel sentences with chunks:
你 || 想 || 预 & 定 || 什 & 么 & 样 & 的 || 房 & 间
What & kind & of || room || do & you || want & to || reserve
Our purpose is to find one-to-one chunk alignment on the assumption that the chunks the chunks
to be aligned may occur almost equally in the corresponding parallel texts.to be aligned may occur almost equally in the corresponding parallel texts.
32
Multi-Layer Filtering algorithm
Keeping one-to-one alignment
2 { ( _ , _ )}
( _ ) ( _ )
Num Co occurrence C CHK E CHK
Num C CHK Num E CHK
By applying the formular-11, we can get a alignment table:
formular-11:
θθ 你你 想想 预定预定 什么样的什么样的 房间房间What kind ofWhat kind of 0.025 0.021 0.053 0.8890.889 0.016
RoomRoom 0.021 0.029 0.09 0.014 0.8880.888
Do youDo you 0.4600.460 0.014 0.002 0.012 0.020
Want toWant to 0.007 0.0690.069 0.013 0.002 0.023
reservereserve 0.002 0.001 0.0830.083 0.034 0.047
33
Multi-Layer Filtering algorithm
Experiments
Training data:
55,000 pairs of Chinese-English spoken parallel sentences
Test data:
400 pairs of Chinese-English spoken parallel sentences were chosen randomly from the same corpus.
These 400 pairs sentences manually partitioned to obtain monolingual chunks and then manually aligned the corresponding bilingual chunks for computing the chunking and alignment accuracy.
34
Multi-Layer Filtering algorithm
Experiments
Evaluation:
Comparing the automatically obtained monolingual chunks and aligned bilingual chunks to chunks
discovered manually, we compute their precision, recall and F-Measure value by the followed formula:
100%r
p
Nprecision
N 100%
r
a
Nrecall
N
2
2
( 1)
( )
precision recallF
precision recall
35
Multi-Layer Filtering algorithm
Experiments
The accuracy of chunkingThe accuracy of chunking
Precision(%)Precision(%) Recall(%)Recall(%) F-MeasureF-Measure
77 65 0.70
Results:
The accuracy of alignmentThe accuracy of alignment
Precision(%)Precision(%) Recall(%)Recall(%) F-MeasureF-Measure
89 72 0.80
36
Multi-Layer Filtering algorithm
Experiments
Comparisions of chunk-based translation to word-based translation:
SystemsSystems BLEUBLEU NISTNIST
Word-basedWord-based 0.259 2.661
Chunk-basedChunk-based 0.290 2.921
ImprovementImprovement + 0.031 + 0.260
The improvement is about 10%.
37
Multi-Layer Filtering algorithm
Conclusions
This chunking and alignment algorithm doesn’t rely on the information from tagging, parsing or syntax analysis, and doesn’t even require sentence segmentation.
It obtains accurate one-to-one alignment of chunks
It greatly decreases search space and time complexity during translation.
The performance is better than baseline word alignment system. (in some tasks)
38
Multi-Layer Filtering algorithm
Problem / Weakness
Authors didn’t say anything.
Maybe we can do some improvement at:
The step of maximum matching
The step of building position vectors