institute of computing technology forest-to-string statistical translation rules yang liu, qun liu,...

64
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese Academy of Sciences

Upload: daniel-robertson

Post on 05-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INS

TIT

UTE O

F C

OM

PU

TIN

G

TEC

HN

OLO

GY

Forest-to-String Statistical

Translation RulesYang Liu, Qun Liu, and Shouxun

LinInstitute of Computing Technology

Chinese Academy of Sciences

Page 2: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Outline

Introduction Forest-to-String Translation Rules Training Decoding Experiments Conclusion

Page 3: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Syntactic and Non-syntactic Bilingual Phrases

NP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

President Bush made a speech

syntacticnon-syntactic

Page 4: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Importance of Non-syntactic Bilingual Phrases

About 28% of bilingual phrases are non-syntactic on a English-Chinese corpus (Marcu et al., 2006).

Requiring bilingual phrases to be syntactically motivated will lose a good amount of valuable knowledge (Koehn et al., 2003).

Keeping the strengths of phrases while incorporating syntax into statistical translation results in significant improvements (Chiang, 2005) .

Page 5: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Previous Work

NP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

President Bush made a speech

Galley et al., 2004

Page 6: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Previous Work

NPB

DT JJ NN

the mutual understanding

THE MUTUAL UNDERSTANDING

DT JJ

the mutual

THE MUTUAL

*NPB_*NN

NPB

*NPB_*NN NN

Marcu et al., 2006

Page 7: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Previous Work

NP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

Liu et al., 2006

BUSH PRESIDENT

President Bush

NR NN

NP

President Bush

Page 8: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Our Work

We augment the tree-to-string translation model with forest-to-string rules that capture non-

syntactic phrase pairs auxiliary rules that help integrate forest-

to-string rules into the tree-to-string model

Page 9: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Outline

Introduction Forest-to-String Translation Rules Training Decoding Experiments Conclusion

Page 10: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Tree-to-String Rules

GUNMAN

NN

the gunman

VP

SB VP

WAS NP VV

NN KILLED

was killed by

IP

NP VP PU

Page 11: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Derivation

A derivation is a left-most composition of translation rules that explains how a source parse tree, a target sentence, and the word alignment between them are synchronously generated.IP

NP VP PU

NN

GUNMAN

the gunman

SB VP

WAS NP VV

NN KILLED

was killed by

POLICE

police .

.

Page 12: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

A Derivation Composed of TRs

Page 13: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

A Derivation Composed of TRs

IP

NP VP PU

IP

NP VP PU

Page 14: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

A Derivation Composed of TRs

IP

NP VP PU

GUNMAN

NN

the gunman

NP

NN

GUNMAN

the gunman

Page 15: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

A Derivation Composed of TRs

IP

NP VP PU

NN

GUNMAN

thegunman

VP

SB VP

WAS NP VV

NN KILLED

was killed by

SB VP

WAS NP VV

NN KILLED

was killed by

Page 16: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

A Derivation Composed of TRs

IP

NP VP PU

NN

GUNMAN

the gunman

police

SB VP

WAS NP VV

NN KILLED

was killed by

NN

POLICE

POLICE

police

Page 17: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

A Derivation Composed of TRs

.

PU

.

IP

NP VP PU

NN

GUNMAN

the gunman

SB VP

WAS NP VV

NN KILLED

was killed by

POLICE

police .

.

Page 18: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Forest-to-String and Auxiliary Rules

NN

GUNMAN

the gunman

SB

WAS

was

NP

forest = tree sequence !

IP

NP VP PU

SB VP

care about only root sequence while incorporating forest rules

Page 19: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

A Derivation Composed of TRs, FRs, and ARs

Page 20: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

A Derivation Composed of TRs, FRs, and ARs

IP

NP VP PU

SB VP

IP

NP VP PU

SB VP

Page 21: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

A Derivation Composed of TRs, FRs, and ARs

IP

NP VP PU

SB VP

NN

GUNMAN

the gunman

SB

WAS

was

NP

NN

GUNMAN WAS

the gunman was

Page 22: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

A Derivation Composed of TRs, FRs, and ARs

NP

killed by

PU

.

.

VP

VV

KILLED

IP

NP VP PU

NN

GUNMAN

the gunman

SB VP

WAS NP VV

KILLED

was killed by .

.

Page 23: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

A Derivation Composed of TRs, FRs, and ARs

IP

NP VP PU

NN

GUNMAN

the gunman

SB VP

WAS NP VV

NN KILLED

was killed by

POLICE

police .

.

police

NN

POLICE

NP

Page 24: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Outline

Introduction Forest-to-String Translation Rules Training Decoding Experiments Conclusion

Page 25: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Training

Extract both tree-to-string and forest-to-string rules from word-aligned, source-side parsed bilingual corpus

Bottom-up strategy Auxiliary rules are NOT learnt from

real-world data

Page 26: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An Example

NP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

President Bush made a speech

NR

BUSH

Bush

Page 27: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An Example

NP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

President Bush made a speech

NN

PRESIDENT

President

Page 28: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An Example

NP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

President Bush made a speech

VV

MADE

made

Page 29: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An Example

NP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

President Bush made a speech

NN

SPEECH

speech

Page 30: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An Example

NP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

President Bush made a speech

NP

NR

President

NN

BUSH PRESIDENT

Bush

NP

NR

President

NN

PRESIDENT

NP

NR NN

BUSH

Bush

NP

NR NN

Page 31: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An Example

NP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

President Bush made a speech

Page 32: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An Example

NP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

President Bush made a speech

VP

VV NN

a

VP

VV NN

made a

MADE

VP

VV NN

a speech

SPEECH

VP

VV NN

made a speech

MADE SPEECH

Page 33: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An Example

NP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

President Bush made a speech

NP VV NP

VVNR NN

NP

VVNR NN

MADEBUSH PRESIDENT

madePresident Bush

10 FRs

Page 34: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An Example

NP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

President Bush made a speech

Page 35: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An Example

NP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

President Bush made a speech

NP

NP VP

max_height = 2

Page 36: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Why We Don’t Extract Auxiliary Rules ?

IP

NP VP-B

NP-B NP-B

NR NR NN CC NN NN

SHANGHAI PUDONG DEVE WITH LEGAL ESTAB

VV

STEP

The development of Shanghai ‘s Pudong is in step with the establishment of its legal system

Page 37: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Outline

Introduction Forest-to-String Translation Rules Training Decoding Experiments Conclusion

Page 38: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Decoding

Input: a source parse treeOutput: a target sentence

Bottom-up strategy Build auxiliary rules while decoding Compute subcell divisions for building

auxiliary rules

Page 39: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An ExampleNP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

NR

BUSH

Bush

( NR BUSH ) ||| Bush ||| 1:1

Rule

Derivation

Translation

Bush

Page 40: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An ExampleNP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

NN

PRESIDENT

President

( NN PRESIDENT ) ||| President ||| 1:1

Rule

Derivation

Translation

President

Page 41: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An ExampleNP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

VV

MADE

made

( VV MADE ) ||| made ||| 1:1

Rule

Derivation

Translation

made

Page 42: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An ExampleNP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

NN

SPEECH

speech

( NN SPEECH ) ||| speech ||| 1:1

Rule

Derivation

Translation

speech

Page 43: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An ExampleNP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

NP

( NP ( NR ) ( NN ) ) ||| X1 X2 ||| 1:2 2:1( NR BUSH ) ||| Bush ||| 1:1

( NN PRESIDENT ) ||| President ||| 1:1

Rule

Derivation

Translation

President Bush

NR NN

Page 44: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An ExampleNP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

Rule

Derivation

Translation

Page 45: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An ExampleNP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

VP

( VP ( VV MADE ) ( NN ) ) ||| made a X ||| 1:1 2:3( NN SPEECH ) ||| speech ||| 1:1

Rule

Derivation

Translation

made a speech

VV NN

MADE

made a

Page 46: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An ExampleNP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

NP

( NP ( NN ) ( NN PRESIDENT ) ) ( VV MADE ) ||| President X made a ||| 1:2 2:1 3:3

( NR BUSH ) ||| Bush ||| 1:1

Rule

Derivation

Translation

President Bush made a

NR NN VV

PRESIDENTMADE

President made a

Page 47: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An ExampleNP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

Rule

Derivation

Translation

Page 48: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

An ExampleNP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

Rule

Derivation

Translation

NP

NP VP

VV NN

( NP ( NP ) ( VP ( VV ) ( NN ) ) ) ||| X1 X2 || 1:1 2:1 3:2( NP ( NN ) ( NN PRESIDENT ) ) ( VV MADE )

||| President X made a ||| 1:2 2:1 3:3( NR BUSH ) ||| Bush ||| 1:1

( NN SPEECH ) ||| speech ||| 1:1

President Bush made a speech

Page 49: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Subcell DivisionNP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH1:1 2:2 3:3 4:4

Page 50: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Subcell DivisionNP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

1:3 4:4

Page 51: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Subcell DivisionNP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

1:41:1 2:41:2 3:41:3 4:4

1:1 2:2 3:41:1 2:3 4:41:2 3:3 4:4

1:1 2:2 3:3 4:4

2^(n-1)

Page 52: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Build Auxiliary Rule

NP

NP VP

NR NN VV NN

BUSH PRESIDENT MADE SPEECH

NP

NP VP

NR NN VV NN

Page 53: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Penalize the Use of FRs and ARs

Auxiliary rules, which are built rather than learnt, have no probabilities.

We introduce a feature that sums up the node count of auxiliary rules to balance the preference between conventional tree-to-string rules new forest-to-string and auxiliary rules

Page 54: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Outline

Introduction Forest-to-String Translation Rules Training Decoding Experiments Conclusion

Page 55: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Experiments

Training corpus: 31,149 sentence pairs with 843K Chinese words and 949K English words

Development set: 2002 NIST Chinese-to-English test set (571 of 878 sentences)

Test set: 2005 NIST Chinese-to-English test set (1,082 sentences)

Page 56: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Tools

Evaluation: mteval-v11b.pl Language model: SRI Language Modeling

Toolkits (Stolcke, 2002) Significant test: Zhang et al., 2004 Parser: Xiong et al., 2005 Minimum error rate training: optimizeV5

IBMBLEU.m (Venugopal and Vogel, 2005)

Page 57: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Rules Used in Experiments

Rule L P U Total

BP251, 173

0 0 251,173

TR 56, 983 41, 027 3, 529101, 539

FR 16, 609254, 346

25, 051296, 006

Page 58: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Comparison

System Rule Set BLEU4

Pharaoh BP0.2182±0.0089

Lynx

BP0.2059±0.0083

TR0.2302±0.0089

TR + BP0.2346±0.0088

TR + FR + AR0.2402±0.0087

Page 59: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

TRs Are Still Dominant

To achieve the best result of 0.2402, Lynx made use of: 26, 082 tree-to-string rules 9,219 default rules 5,432 forest-to-string rules 2,919 auxiliary rules

Page 60: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Effect of Lexicalization

Forest-to-String Rule Set

BLEU4

None0.2225±0.0085

L0.2297± 0.0081

P0.2279±0.0083

U0.2270±0.0087

L + P + U0.2312±0.0082

Page 61: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Outline

Introduction Forest-to-String Translation Rules Training Decoding Experiments Conclusion

Page 62: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Conclusion

We augment the tree-to-string translation model with forest-to-string rules that capture non-

syntactic phrase pairs auxiliary rules that help integrate forest-to-

string rules into the tree-to-string model Forest and auxiliary rules enable tree-to-

string models to derive in a more general way and bring significant improvement.

Page 63: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Future Work

Scale up to large data Further investigation in auxiliary rules

Page 64: INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese

INSTITUTE OF COMPUTING

TECHNOLOGY

Thanks!