continuous operational evaluation of evolving proprietary mt solution’s adequacy

Post on 07-Feb-2016

23 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

May 26 th 2014. Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy. Ekaterina Stambolieva e katerina.stambolieva@euroscript.lu. Outline. Why? MT Adequacy? What? Evaluation Findings Conclusion & Future Work. WHY?. impending industry problem:. - PowerPoint PPT Presentation

TRANSCRIPT

May 26th 2014

Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

Ekaterina Stambolievaekaterina.stambolieva@euroscript.lu

Why?Why?

MT Adequacy?MT Adequacy?

What?What?

EvaluationEvaluation

FindingsFindings

Conclusion & Future WorkConclusion & Future Work

Outline

impending industry problem:

WHY?

MTE, May 26th 2014

impending industry problem:

WHY?

MTE, May 26th 2014

How do we compare MT systems over time?

impending industry problem:

We measure MT quality continuously

WHY?

MTE, May 26th 2014

How do we compare MT systems over time?

impending industry problem:

We measure MT quality continuously

WHY?

MTE, May 26th 2014

How do we compare MT systems over time?

BLEU?

impending industry problem:

We measure MT quality continuously

WHY?

MTE, May 26th 2014

How do we compare MT systems over time?

BLEU?We want adequate

translations

Why?Why?

MT Adequacy?MT Adequacy?

What?What?

EvaluationEvaluation

FindingsFindings

Conclusion & Future WorkConclusion & Future Work

Outline

How do we define MT adequacy in business?

ADEQUACY

MTE, May 26th 2014

How do we define MT adequacy in business?

ADEQUACY

MTE, May 26th 2014

accelerate time-to-deliveryreduce translation costsachieve near-native fluency

adequacy

ADEQUACY

MTE, May 26th 2014

adequacy

improving MT output’s acceptance for the task of post-editing

ADEQUACY

MTE, May 26th 2014

We aim at evaluating our MT systems continuously and compare results over time

WHAT

MTE, May 26th 2014

We aim at evaluating our MT systems continuously and compare results over time

We design our system’s improvements based on human end-user feedback

WHAT

MTE, May 26th 2014

We aim at evaluating our MT systems continuously and compare results over time

We design our system’s improvements based on human end-user feedback

We do not directly evaluate translation quality, instead we assesses over-time MT output improvement

WHAT

MTE, May 26th 2014

We aim at evaluating our MT systems continuously and compare results over time

We design our system’s improvements based on human end-user feedback

We do not directly evaluate translation quality, instead we assesses over-time MT output improvement

no annotation effort required

WHAT

MTE, May 26th 2014

Why?Why?

MT Adequacy?MT Adequacy?

What?What?

EvaluationEvaluation

• Edit DistanceEdit Distance

FindingsFindings

Conclusion & Future WorkConclusion & Future Work

Outline

We compare the results of 2 MT English<->Danish systems

THE EXAMPLE

MTE, May 26th 2014

We compare the results of 2 MT English<->Danish systems

THE EXAMPLE

MTE, May 26th 2014

BLEU

1 2 EN->DA 59.22DA->EN 64.26

We compare the results of 2 MT English<->Danish systems

THE EXAMPLE

MTE, May 26th 2014

BLEU

1 2 EN->DA 59.22 58.84DA->EN 64.26 63.98

3 objective categories to evaluate MT output

– Does the MT output look better than before?

– Does the MT output look worse than before?

– Is it difficult for you to judge whether the MT output is better or not?

CATEGORIES

MTE, May 26th 2014

We will present MT output evaluation based on the Edit Distance (ED) score

EVALUATION

MTE, May 26th 2014

We will present MT output evaluation based on the Edit Distance (ED) score

EVALUATION

MTE, May 26th 2014

We compute in how many edits MT output transforms into the human

translation segment based on the same source

Why?Why?

MT Adequacy?MT Adequacy?

What?What?

EvaluationEvaluation

FindingsFindings

Conclusion & Future WorkConclusion & Future Work

Outline

new MTED

old MT ED

87.08 71.31

94.77 87.44

82.62 66.04

74.19 73.84

84.36 79.79

91.26 88.06

75.12 74.48

FINDINGS

MTE, May 26th 2014

new MTED

old MT ED

87.08 71.31

94.77 87.44

82.62 66.04

74.19 73.84

84.36 79.79

91.26 88.06

75.12 74.48

FINDINGS

MTE, May 26th 2014

Y X N

Annotator 1 60% 36% 4%

Annotator 2 76% 16% 8%

Annotator 3 68% 24% 8%

new MTED

old MT ED

87.08 71.31

94.77 87.44

82.62 66.04

74.19 73.84

84.36 79.79

91.26 88.06

75.12 74.48

FINDINGS

MTE, May 26th 2014

Improved MT acceptance

for the task of post-editing

new MTED

old MT ED

87.08 71.31

94.77 87.44

82.62 66.04

74.19 73.84

84.36 79.79

91.26 88.06

75.12 74.48

FINDINGS

MTE, May 26th 2014

Length variance comparison

between MT output with the new and old

system does not affect MT acceptance

Why?Why?

MT Adequacy?MT Adequacy?

What?What?

EvaluationEvaluation

FindingsFindings

Conclusion & Future WorkConclusion & Future Work

Outline

Modify ED to take into consideration the number of UNK words

Modify the metric so that it detects small improvements in the system

– such as number isolation– tag protection

Take segment character length into consideration

– So not to penalize too much shorter segments

FUTURE WORK

MTE, May 26th 2014

Modify ED to take into consideration the number of UNK words

Modify the metric so that it detects small improvements in the system

– such as number isolation– tag protection

Take segment character length into consideration

– So not to penalize too much shorter segments

FUTURE WORK

MTE, May 26th 2014

Modify ED to take into consideration the number of UNK words

Modify the metric so that it detects small improvements in the system

– such as number isolation– tag protection

Take segment character length into consideration

– So not to penalize too much shorter segments

FUTURE WORK

MTE, May 26th 2014

Thank you

MTE, May 26th 2014

top related