pfa node alignment algorithm

14
PFA Node Alignment Algorithm Consider the parse trees of a Chinese-English parallel pair of sentences.

Upload: tia

Post on 22-Feb-2016

52 views

Category:

Documents


0 download

DESCRIPTION

PFA Node Alignment Algorithm. Consider the parse trees of a Chinese-English parallel pair of sentences. PFA Node Alignment Algorithm. Each of the nodes stores a value. All nodes are initialized with the value 1. Each Word to Word alignment is assigned a unique prime number. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: PFA Node Alignment Algorithm

PFA Node Alignment Algorithm

Consider the parse trees of a Chinese-English parallel pair of sentences.

Page 2: PFA Node Alignment Algorithm

PFA Node Alignment Algorithm

Each of the nodes stores a value.

All nodes are initialized with the value 1.

Each Word to Word alignment is assigned a unique prime number.

Page 3: PFA Node Alignment Algorithm

PFA Node Alignment Algorithm

For every word to word alignment, we do the following:• Let p be the unique prime value assigned to the alignment.• Let ws and wt be the aligned words on the source and target side.• Assign the value p to the nodes corresponding to the words ws and wt .

• Example: “Australia” gets value 2, “is” gets value 3.

Page 4: PFA Node Alignment Algorithm

PFA Node Alignment Algorithm

In case there are “one-to-many” alignments, they are considered as multiple “one-to-one” alignments, and all of these alignments are given the same prime value.

Example: “North Korea” is just one word on Chinese side. That word is assigned the value 25, which is a product 5*5.

Page 5: PFA Node Alignment Algorithm

PFA Node Alignment Algorithm

Once all the lexical items have values, we propogate the values up the tree as follows:

• Work bottom-up• A node updates its value as the product of the values of its children.

Page 6: PFA Node Alignment Algorithm

PFA Node Alignment Algorithm

Once all the lexical items have values, we propogate the values up the tree as follows:

• Work bottom-up• A node updates its value as the product of the values of its children.

• Values could become large!

Page 7: PFA Node Alignment Algorithm

PFA Node Alignment Algorithm

Once all nodes have values, they can be aligned as follows:

• If a node on Chinese side has a value same as node on English side, align them.

• If two nodes have equal values, take the node at lowest level in the tree, but not the lexical level node.

Page 8: PFA Node Alignment Algorithm

PFA Node Alignment Algorithm

Once all nodes have values, they can be aligned as follows:

• If a node on Chinese side has a value same as node on English side, align them.

• If two nodes have equal values, take the node at lowest level in the tree, but not the lexical level node.

Page 9: PFA Node Alignment Algorithm

PFA Node Alignment Algorithm

Features of the algorithm:

1. Order of the constituents does not matter in node alignment.

2. Extra words in constituents are allowed, but the least number of them is allowed.

Page 10: PFA Node Alignment Algorithm

PFA Node Alignment Algorithm

Extraction of Phrases:

Get the Yields of the aligned nodes and build a phrase table tagged with syntactic categories on source and target sides!

Example:

NP # NP :: 澳洲 # Australia

Page 11: PFA Node Alignment Algorithm

PFA Node Alignment Algorithm

All Phrases from this tree:

1. IP # S :: 澳洲 是 与 北韩 有 邦交 的 少数 国家 之一 。 # Australia is one of the few countries that have diplomatic relations with North Korea .

2. VP # VP :: 是 与 北韩 有 邦交 的 少数 国家 之一 # is one of the few countries that have diplomatic relations with North Korea

3. NP # NP :: 与 北韩 有 邦交 的 少数 国家 之一 # one of the few countries that have diplomatic relations with North Korea

4. VP # VP :: 与 北韩 有 邦交 # have diplomatic relations with North Korea5. NP # NP :: 邦交 # diplomatic relations6. NP # NP :: 北韩 # North Korea7. NP # NP :: 澳洲 # Australia

Page 12: PFA Node Alignment Algorithm

PFA Node Alignment Performance

• If data is manually word-aligned, alignment error rate is very small, so is the PFA Node-Alignment Error Rate.

• What happens when word-alignments are done automatically?

Page 13: PFA Node Alignment Algorithm

PFA Node Alignment Performance

• Evaluation Data: Treebank corpus. – Parallel Chinese-English Treebank with manual word-

alignments– 3342 Sentence Pairs

• Node Alignments: 39874 (About 12/tree pair)• NP to NP Alignments: 5427– (Makes good phrase table!)

• With manual alignments as gold standard, evaluation done with automatic word alignments.

Page 14: PFA Node Alignment Algorithm

PFA Node Alignment Performance

Viterbi Combination Strategy Precision Recall

Intersection 0.6278 0.5525

Union 0.8054 0.2778

Sym-1 (Thot Toolkit) 0.7182 0.4525

Sym-2 (Thot Toolkit) 0.7170 0.4602

Grow-Diag-Final (Pharaoh) 0.4040 0.0250

Viterbi word alignments from Chinese-English and reverse directions were merged Using different algorithms to test the performance of Node-Alignment