multi-layer filtering algorithm bilingual chunk alignment in statistical machine translation an...

38
Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei Hou LING 575 MT WIN07

Upload: bernice-miles

Post on 02-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

Multi-Layer Filtering algorithm

Bilingual Chunk Alignment In Statistical Machine

Translation

An introduction of Multi-Layer Filtering (MLF) algorithm

Dawei Hou

LING 575 MT WIN07

Page 2: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

2

Multi-Layer Filtering algorithm

What is the “Chunk” here ?

In this paper:

The “Chunk” doesn’t rely on the information from

tagging, parsing, syntax analyzing or segmenting

A “Chunk” is a continuous words order

Page 3: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

3

Multi-Layer Filtering algorithm

Why do we use “Chunk” in translations?

Can leads to more fluent translations since chunk-based

translations capture local reordering phenomena.

Can successfully makes long sentences shorter, which

benefits SMT algorithm’s performance.

Obtains accurate one-to-one alignment of each pair bilingual

chunks.

Greatly decrease search space and time complexity during

translation.

Page 4: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

4

Multi-Layer Filtering algorithm

What about other approaches?

What about word-based translations?

Page 5: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

5

Multi-Layer Filtering algorithm

Some background

SMT systems employ word-based alignment models

based on the five word-based statistical models

proposed by IBM.

Problem:

Still suffer from poor performance when used in the

language pairs which have great differences in

structures since these models fundamentally rely on

word-level translation.

Page 6: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

6

Multi-Layer Filtering algorithm

Some background

Alignment algorithms based on phrases, chunks or

structures and most of them based on complex

syntax information.

Problem:

Have proven to yield poor performance when dealing

with long sentences;

Heavily depend on the performance of associated

tools such as parsers, POS taggers ....

Page 7: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

7

Multi-Layer Filtering algorithm

How do we get improvements from those

problems by using chunk-based translations?

Page 8: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

8

Multi-Layer Filtering algorithm

Multi-Layer Filtering algorithm

To discover one-to-one pairs of bilingual chunks in the

untagged well-formed bilingual sentence pairs

Multi-Layers are used to extract bilingual chunks

according to different features of chunks in the

bilingual corpus.

Page 9: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

9

Multi-Layer Filtering algorithm

Summarization of Procedures

Filtering the most frequent chunks

Clustering the similar words and filtering the most frequent

structures

Deal with the remnant fragment

Keeping one-to-one alignment

Page 10: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

10

Multi-Layer Filtering algorithm

Filtering the most frequent chunks -- Step 1

Assumption:

The most co-occurrent word lists might be a potential chunk.

Apply the formula-1 list below, we filter those word lists as

initial monolingual chunks;

1 2 1 2 1 2( , ,... ) (1 ) ( , ,... ) ( , ,... )k k k kD D w w w MI w w w P w w w

1 21 2 1 2

1 2

( , ,... )( , ,... ) ( , ,... )

( ) ( ) ... ( )

kk k

k

P w w wMI w w w P w w w log

P w P w P w

formula-1

formula-2

Page 11: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

11

Multi-Layer Filtering algorithm

The result of Filtering Step 1

What || kind || of || room || do || you || want || to || reserve

1.36 1.31 0.046 0.063 10.07 0.61 2.11 0.077

你 || 想 || 预 || 定 || 什 || 么 || 样 || 的 || 房 || 间

0.69 0.17 1.39 0.076 7.80 0.87 0.30 1.27

4.52

An example :

Page 12: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

12

Multi-Layer Filtering algorithm

Filtering the most frequent chunks -- Step 2

Now we have :

All the cohesion degrees between any two adjacent words in

Source and Target sentences.

Applying the formula-3 list below, we will find the entire set of

initial monolingual chunks;

formula-3

_ _ _int{ }

_max _ _ _ _

length of a sentencen

the imum length of a chunk

Page 13: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

13

Multi-Layer Filtering algorithm

The result of Filtering Step 2-1

What || kind || of || room || do || you || want || to || reserve

1.36 1.31 0.046 0.063 10.07 0.61 2.11 0.077

你 || 想 || 预 || 定 || 什 || 么 || 样 || 的 || 房 || 间

0.69 0.17 1.39 0.076 7.80 0.87 0.30 1.27

4.52

In this case: n = int{ 10/4 } = 2;

Page 14: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

14

Multi-Layer Filtering algorithm

The result of Filtering Step 2-(1)-EN

Initial ChunksInitial Chunks DDkk DDkk** Initial ChunksInitial Chunks DDkk DDkk**

What kind 1.36 1.36 You want 0.61 0.61

What kind of 2.10 5.25 You want to 0.33 0.82

Kind of 1.31 1.31 You want to reserve

0.086 0.60

Do you 10.07 10.07 Want to 2.11 2.11

Do you want 0.31 0.77 Want to reserve 0.056 0.14

Do you want to 0.13 0.90 To reserve 0.077 0.077

Now we get a table of the initial monolingual chunks;

2

( )*

( )

kk k

Max DD D

Max D

formula-4

Page 15: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

15

Multi-Layer Filtering algorithm

The result of Filtering Step 2-(2)-EN

Initial ChunksInitial Chunks DDkk DDkk** Initial ChunksInitial Chunks DDkk DDkk**

What kind 1.36 1.36 You wantYou want 0.610.61 0.610.61

What kind of 2.10 5.25 You want toYou want to 0.330.33 0.820.82

Kind of 1.31 1.31 You want to You want to reservereserve

0.0860.086 0.600.60

Do you 10.07 10.07 Want to 2.11 2.11

Do you wantDo you want 0.310.31 0.770.77 Want to reserveWant to reserve 0.0560.056 0.140.14

Do you want toDo you want to 0.130.13 0.900.90 To reserveTo reserve 0.0770.077 0.0770.077

Set threshold Dk*> 1.0 , we get :

We still need more steps to do maximum matching and overlap discarding;

Page 16: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

16

Multi-Layer Filtering algorithm

The result of Filtering Step 2-(3)-EN

Initial ChunksInitial Chunks DDkk DDkk** Initial ChunksInitial Chunks DDkk DDkk**

What kind 1.36 1.36 Want to 2.11 2.11

What kind of 2.10 5.25

Kind of 1.31 1.31 Do you 10.07 10.07

According to the maximum matching principle and Preventing overlapping problem, we need to apply :

formula-4: 1

k

k

D

D

1

i

i

K

K

D

D

formula-5:

Page 17: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

17

Multi-Layer Filtering algorithm

The result of Filtering Step 2-(4)-EN

Deal with the remnant fragment:

we simply combine such individual or sequential words as a chunk.

So we get a much shorter sentence lists below:

What & kind & of || room || do & you || want & to || reserve

Page 18: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

18

Multi-Layer Filtering algorithm

The result of Filtering Step 2-(1)-CN

What || kind || of || room || do || you || want || to || reserve

1.36 1.31 0.046 0.063 10.07 0.61 2.11 0.077

你 || 想 || 预 || 定 || 什 || 么 || 样 || 的 || 房 || 间

0.69 0.17 1.39 0.076 7.80 0.87 0.30 1.27

4.52

In this case: n = int{ 10/4 } = 2;

Page 19: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

19

Multi-Layer Filtering algorithm

The result of Filtering Step 2-(2)-CN

Initial ChunksInitial Chunks DDkk DDkk** Initial ChunksInitial Chunks DDkk DDkk**

你想 0.69 0.69 么样的房 0.13 0.55

预定 2.39 2.39 样的 0.30 0.30

什么 7.80 7.80 样的房 0.13 0.30

什么样 0.44 1.00 样的房间 0.21 0.88

什么样的 0.58 2.44 的房 1.27 1.27

么样 0.87 0.87 的房间 2.45 5.88

么样的 0.37 0.84 房间 4.52 4.52

Now we get a table of the initial monolingual chunks;

2

( )*

( )

kk k

Max DD D

Max D

formula-4

Page 20: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

20

Multi-Layer Filtering algorithm

The result of Filtering Step 2-(3)-CN

Set threshold Dk*> 1.0 , we get :

We still need more steps to do maximum matching and overlap discarding;

Initial ChunksInitial Chunks DDkk DDkk** Initial ChunksInitial Chunks DDkk DDkk**

你想你想 0.690.69 0.690.69 么样的房么样的房 0.130.13 0.550.55

预定 2.39 2.39 样的样的 0.300.30 0.300.30

什么 7.80 7.80 样的房样的房 0.130.13 0.300.30

什么样什么样 0.440.44 1.001.00 样的房间样的房间 0.210.21 0.880.88

什么样的 0.58 2.44 的房 1.27 1.27

么样么样 0.870.87 0.870.87 的房间 2.45 5.88

么样的么样的 0.370.37 0.840.84 房间 4.52 4.52

Page 21: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

21

Multi-Layer Filtering algorithm

The result of Filtering Step 2-(4)-CN

Initial ChunksInitial Chunks DDkk DDkk** Initial ChunksInitial Chunks DDkk DDkk**

预定 2.39 2.39 的房 1.27 1.27

什么 7.80 7.80 的房间 2.45 5.88

什么样的 0.58 2.44 房间 4.52 4.52

According to the maximum matching principle :

By applying formula-4:

1

k

k

D

D

max( D 什么样的 /D 什么样 ,D 的房间 /D 房间 ) = max(2.44,1.30) = 2.44

?

Page 22: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

22

Multi-Layer Filtering algorithm

The result of Filtering Step 2-(5)-CN

Deal with the remnant fragment:

we simply combine such individual or sequential words as a chunk.

So we get a much shorter sentence lists below:

你 || 想 || 预 & 定 || 什 & 么 & 样 & 的 || 房 & 间

Page 23: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

23

Multi-Layer Filtering algorithm

Some problems

After fisrt filtering process, suppose we found an aligned chunk pairs:

|| 在 & 五 & 点 ||

|| at & five & o’clock ||

But some potentially good chunks like:

Might have been broken into several fragments like:

Since this structure include word sequences with low frequency of occurrence (we suppose “six” is lower frequent than “five” here )

|| at & six & o’clock ||

|| at || six || o’clock ||

Page 24: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

24

Multi-Layer Filtering algorithm

Clustering the similar words and filtering the most frequent structures

Many frequent chunks have similar structures but different in

detail.

We can cluster similar words according to the position vectors of

their behavior relative to anchor words.

For all of the words in the same class, we suppose they are good

chunks, then filter the most frequent structures according the

method introduced before.

Page 25: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

25

Multi-Layer Filtering algorithm

Clustering the similar words and filtering the most frequent structures – Step 1

In the corpus resulting from the first filtering process, find the most

frequent words as anchor words, for example:

RankRank 11 22 33 44 55 66 77 88 99 1010

WordWord the a to this for in on of at room

Why we use most frequent words?

As the anchor words are the most common words, a great deal of information can be obtained.

Words in similar position vectors in relation to anchor words can be assumed to belong to similar word

classes.

Page 26: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

26

Multi-Layer Filtering algorithm

Clustering the similar words and filtering the most frequent structures – Step 2

Build words vectors and define the size of the window for observation.

(in this case windows size = 5)

For instance, we build a word vector which anchor word is “in” and we

observe a candidate word “the” to be clustered falls within the window:

SizeSize 55

PositionPosition w-2w-2 w-1w-1 ww w+1w+1 w+2w+2

WordWord the the in the the

ValueValue 16 1 0 415 0

Formula-7,8:

1

( , )N

ij j

k

V w w

1_____( , )

0 _____j

wj ww w

wj w

Page 27: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

27

Multi-Layer Filtering algorithm

Clustering the similar words and filtering the most frequent structures – Step 3

In order to compare vectors fairly, these vectors must be

normalized by formular-9 as follows:

1

*ij

ij m

ij

j

VV

V

Example : “in/that” and “in/this”

Page 28: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

28

Multi-Layer Filtering algorithm

Clustering the similar words and filtering the most frequent structures – Step 4

Measure the similarities of various vectors and cluster the words

which have similar distributions relative to the anchor words:

2)

1

( , ( )K

x y xj yj

j

D V V V V

Euclidian distance:

Example result:

Word classisWord classis Anchor Anchor wordswords

Single double twin standard suite different quiet (a, room)

the my your this that our (in, room)

America all fact Japan English (in, )

Page 29: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

29

Multi-Layer Filtering algorithm

Clustering the similar words and filtering the most frequent structures – Step 5

For all of the words in the same class, replace with a particular

symbol, and then consider this symbol as an ordinary word. Then

filter the most frequent structures my Multi-Layer Filtering algorithm

again.

For instance, if we have:

|| 在 & 五 & 点 ||

|| at & five & o’clock ||

parallel word classes:

& { One, two,…, five..., twelve }

We will get :

{ 一 , 二 ,…, 五 ..., 十二 }

|| 在 & 一 & 点 ||

|| at & one & o’clock ||

|| 在 & 两 & 点 ||

|| at & two & o’clock ||...

Page 30: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

30

Multi-Layer Filtering algorithm

Keeping one-to-one alignment

Next step:

Page 31: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

31

Multi-Layer Filtering algorithm

Keeping one-to-one alignment

Now we have a pair of new parallel sentences with chunks:

你 || 想 || 预 & 定 || 什 & 么 & 样 & 的 || 房 & 间

What & kind & of || room || do & you || want & to || reserve

Our purpose is to find one-to-one chunk alignment on the assumption that the chunks the chunks

to be aligned may occur almost equally in the corresponding parallel texts.to be aligned may occur almost equally in the corresponding parallel texts.

Page 32: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

32

Multi-Layer Filtering algorithm

Keeping one-to-one alignment

2 { ( _ , _ )}

( _ ) ( _ )

Num Co occurrence C CHK E CHK

Num C CHK Num E CHK

By applying the formular-11, we can get a alignment table:

formular-11:

θθ 你你 想想 预定预定 什么样的什么样的 房间房间What kind ofWhat kind of 0.025 0.021 0.053 0.8890.889 0.016

RoomRoom 0.021 0.029 0.09 0.014 0.8880.888

Do youDo you 0.4600.460 0.014 0.002 0.012 0.020

Want toWant to 0.007 0.0690.069 0.013 0.002 0.023

reservereserve 0.002 0.001 0.0830.083 0.034 0.047

Page 33: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

33

Multi-Layer Filtering algorithm

Experiments

Training data:

55,000 pairs of Chinese-English spoken parallel sentences

Test data:

400 pairs of Chinese-English spoken parallel sentences were chosen randomly from the same corpus.

These 400 pairs sentences manually partitioned to obtain monolingual chunks and then manually aligned the corresponding bilingual chunks for computing the chunking and alignment accuracy.

Page 34: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

34

Multi-Layer Filtering algorithm

Experiments

Evaluation:

Comparing the automatically obtained monolingual chunks and aligned bilingual chunks to chunks

discovered manually, we compute their precision, recall and F-Measure value by the followed formula:

100%r

p

Nprecision

N 100%

r

a

Nrecall

N

2

2

( 1)

( )

precision recallF

precision recall

Page 35: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

35

Multi-Layer Filtering algorithm

Experiments

The accuracy of chunkingThe accuracy of chunking

Precision(%)Precision(%) Recall(%)Recall(%) F-MeasureF-Measure

77 65 0.70

Results:

The accuracy of alignmentThe accuracy of alignment

Precision(%)Precision(%) Recall(%)Recall(%) F-MeasureF-Measure

89 72 0.80

Page 36: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

36

Multi-Layer Filtering algorithm

Experiments

Comparisions of chunk-based translation to word-based translation:

SystemsSystems BLEUBLEU NISTNIST

Word-basedWord-based 0.259 2.661

Chunk-basedChunk-based 0.290 2.921

ImprovementImprovement + 0.031 + 0.260

The improvement is about 10%.

Page 37: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

37

Multi-Layer Filtering algorithm

Conclusions

This chunking and alignment algorithm doesn’t rely on the information from tagging, parsing or syntax analysis, and doesn’t even require sentence segmentation.

It obtains accurate one-to-one alignment of chunks

It greatly decreases search space and time complexity during translation.

The performance is better than baseline word alignment system. (in some tasks)

Page 38: Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei

38

Multi-Layer Filtering algorithm

Problem / Weakness

Authors didn’t say anything.

Maybe we can do some improvement at:

The step of maximum matching

The step of building position vectors