continuous operational evaluation of evolving proprietary mt solution’s adequacy

33
May 26 th 2014 Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy Ekaterina Stambolieva [email protected]

Upload: eavan

Post on 07-Feb-2016

22 views

Category:

Documents


0 download

DESCRIPTION

May 26 th 2014. Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy. Ekaterina Stambolieva e [email protected]. Outline. Why? MT Adequacy? What? Evaluation Findings Conclusion & Future Work. WHY?. impending industry problem:. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

May 26th 2014

Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

Ekaterina [email protected]

Page 2: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

Why?Why?

MT Adequacy?MT Adequacy?

What?What?

EvaluationEvaluation

FindingsFindings

Conclusion & Future WorkConclusion & Future Work

Outline

Page 3: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

impending industry problem:

WHY?

MTE, May 26th 2014

Page 4: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

impending industry problem:

WHY?

MTE, May 26th 2014

How do we compare MT systems over time?

Page 5: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

impending industry problem:

We measure MT quality continuously

WHY?

MTE, May 26th 2014

How do we compare MT systems over time?

Page 6: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

impending industry problem:

We measure MT quality continuously

WHY?

MTE, May 26th 2014

How do we compare MT systems over time?

BLEU?

Page 7: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

impending industry problem:

We measure MT quality continuously

WHY?

MTE, May 26th 2014

How do we compare MT systems over time?

BLEU?We want adequate

translations

Page 8: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

Why?Why?

MT Adequacy?MT Adequacy?

What?What?

EvaluationEvaluation

FindingsFindings

Conclusion & Future WorkConclusion & Future Work

Outline

Page 9: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

How do we define MT adequacy in business?

ADEQUACY

MTE, May 26th 2014

Page 10: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

How do we define MT adequacy in business?

ADEQUACY

MTE, May 26th 2014

accelerate time-to-deliveryreduce translation costsachieve near-native fluency

Page 11: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

adequacy

ADEQUACY

MTE, May 26th 2014

Page 12: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

adequacy

improving MT output’s acceptance for the task of post-editing

ADEQUACY

MTE, May 26th 2014

Page 13: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

We aim at evaluating our MT systems continuously and compare results over time

WHAT

MTE, May 26th 2014

Page 14: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

We aim at evaluating our MT systems continuously and compare results over time

We design our system’s improvements based on human end-user feedback

WHAT

MTE, May 26th 2014

Page 15: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

We aim at evaluating our MT systems continuously and compare results over time

We design our system’s improvements based on human end-user feedback

We do not directly evaluate translation quality, instead we assesses over-time MT output improvement

WHAT

MTE, May 26th 2014

Page 16: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

We aim at evaluating our MT systems continuously and compare results over time

We design our system’s improvements based on human end-user feedback

We do not directly evaluate translation quality, instead we assesses over-time MT output improvement

no annotation effort required

WHAT

MTE, May 26th 2014

Page 17: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

Why?Why?

MT Adequacy?MT Adequacy?

What?What?

EvaluationEvaluation

• Edit DistanceEdit Distance

FindingsFindings

Conclusion & Future WorkConclusion & Future Work

Outline

Page 18: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

We compare the results of 2 MT English<->Danish systems

THE EXAMPLE

MTE, May 26th 2014

Page 19: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

We compare the results of 2 MT English<->Danish systems

THE EXAMPLE

MTE, May 26th 2014

BLEU

1 2 EN->DA 59.22DA->EN 64.26

Page 20: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

We compare the results of 2 MT English<->Danish systems

THE EXAMPLE

MTE, May 26th 2014

BLEU

1 2 EN->DA 59.22 58.84DA->EN 64.26 63.98

Page 21: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

3 objective categories to evaluate MT output

– Does the MT output look better than before?

– Does the MT output look worse than before?

– Is it difficult for you to judge whether the MT output is better or not?

CATEGORIES

MTE, May 26th 2014

Page 22: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

We will present MT output evaluation based on the Edit Distance (ED) score

EVALUATION

MTE, May 26th 2014

Page 23: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

We will present MT output evaluation based on the Edit Distance (ED) score

EVALUATION

MTE, May 26th 2014

We compute in how many edits MT output transforms into the human

translation segment based on the same source

Page 24: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

Why?Why?

MT Adequacy?MT Adequacy?

What?What?

EvaluationEvaluation

FindingsFindings

Conclusion & Future WorkConclusion & Future Work

Outline

Page 25: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

new MTED

old MT ED

87.08 71.31

94.77 87.44

82.62 66.04

74.19 73.84

84.36 79.79

91.26 88.06

75.12 74.48

FINDINGS

MTE, May 26th 2014

Page 26: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

new MTED

old MT ED

87.08 71.31

94.77 87.44

82.62 66.04

74.19 73.84

84.36 79.79

91.26 88.06

75.12 74.48

FINDINGS

MTE, May 26th 2014

Y X N

Annotator 1 60% 36% 4%

Annotator 2 76% 16% 8%

Annotator 3 68% 24% 8%

Page 27: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

new MTED

old MT ED

87.08 71.31

94.77 87.44

82.62 66.04

74.19 73.84

84.36 79.79

91.26 88.06

75.12 74.48

FINDINGS

MTE, May 26th 2014

Improved MT acceptance

for the task of post-editing

Page 28: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

new MTED

old MT ED

87.08 71.31

94.77 87.44

82.62 66.04

74.19 73.84

84.36 79.79

91.26 88.06

75.12 74.48

FINDINGS

MTE, May 26th 2014

Length variance comparison

between MT output with the new and old

system does not affect MT acceptance

Page 29: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

Why?Why?

MT Adequacy?MT Adequacy?

What?What?

EvaluationEvaluation

FindingsFindings

Conclusion & Future WorkConclusion & Future Work

Outline

Page 30: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

Modify ED to take into consideration the number of UNK words

Modify the metric so that it detects small improvements in the system

– such as number isolation– tag protection

Take segment character length into consideration

– So not to penalize too much shorter segments

FUTURE WORK

MTE, May 26th 2014

Page 31: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

Modify ED to take into consideration the number of UNK words

Modify the metric so that it detects small improvements in the system

– such as number isolation– tag protection

Take segment character length into consideration

– So not to penalize too much shorter segments

FUTURE WORK

MTE, May 26th 2014

Page 32: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

Modify ED to take into consideration the number of UNK words

Modify the metric so that it detects small improvements in the system

– such as number isolation– tag protection

Take segment character length into consideration

– So not to penalize too much shorter segments

FUTURE WORK

MTE, May 26th 2014

Page 33: Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

Thank you

MTE, May 26th 2014