semantic evaluation of machine translation

10
Semantic Evaluation of Machine Translation Billy Wong, City University of Hong Kong 21 st May 2010

Upload: cleatus-angel

Post on 30-Dec-2015

29 views

Category:

Documents


0 download

DESCRIPTION

Semantic Evaluation of Machine Translation. Billy Wong, City University of Hong Kong 21 st May 2010. Introduction. Surface text similarity is not a reliable indicator in automatic MT evaluation Insensitive to variation of translation Deeper linguistic analysis is preferred - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Semantic Evaluation of Machine Translation

Semantic Evaluation of Machine Translation

Billy Wong, City University of Hong Kong21st May 2010

Page 2: Semantic Evaluation of Machine Translation

Introduction Surface text similarity is not a reliable

indicator in automatic MT evaluation Insensitive to variation of translation

Deeper linguistic analysis is preferred WordNet is widely used for matching synonyms

E.g. METEOR (Banerjee & Lavie 2005), TERp (Snover et al. 2009), ATEC (Wong & Kit 2010)…

Is the similarity of words between MT outputs and references fully described?

Page 3: Semantic Evaluation of Machine Translation

Motivation WordNet

Granularity of sense distinctions is highly fine-grained

Word pairs not in the same sense: [mom vs mother], [safeguard vs security], [expansion vs

extension], [journey vs tour], [impact vs influence]…etc. Word pairs in similar meaning

Problematic if ignore them in evaluation What is needed is a word similarity measure

Proposal: Utilization of word similarity measures in

automatic MT evaluation

Page 4: Semantic Evaluation of Machine Translation

Word Similarity Measures Knowledge-based (WordNet)

Wup (Wu & Palmer 1994) Res (Resnik 1995) Jcn (Jiang & Conrath 1997) Hso (Hirst & St-Onge 1998) Lch (Leacock & Chodorow 1998) Lin (Lin 1998) Lesk (Banerjee & Pedersen 2002)

Corpus-based LSA (Landauer et al. 1998)

Page 5: Semantic Evaluation of Machine Translation

Experiment Three questions:

To what extent two words are considered similar? Which word similarity measure(s) is/are more

appropriate to use? How much performance gain an MT evaluation

metric can obtain by incorporating word similarity measures?

Page 6: Semantic Evaluation of Machine Translation

Setting Data

MetricsMATR08 development data 1992 MT outputs 8 MT systems 4 references

Evaluation metric Unigram matching

Exact match / synonym / semantically similar Same weight

Three variants Precision (p), recall (r) and F-measure (f)

where c: MT output t: reference translation

Page 7: Semantic Evaluation of Machine Translation

Result (1) Correlation thresholds of each measure

Page 8: Semantic Evaluation of Machine Translation

Result (2) Correlation of the metric

Page 9: Semantic Evaluation of Machine Translation

Conclusion The importance of semantically similar words

in automatic MT evaluation Two word similarity measures, wup and LSA,

perform relatively better

Remaining problems Semantic similarity vs. Semantic relatedness

E.g. [committee vs chairman] (LSA) Most WordNet similarity measures run on verbs

and nouns only

Page 10: Semantic Evaluation of Machine Translation

Thank you