carnegie mellon goal recycle non-expert post-editing efforts to: - refine translation rules...

1
Carnegie Mellon Goal Recycle non-expert post-editing efforts to: - Refine translation rules automatically - Improve overall translation quality Proposed approach - User-friendly online GUI: the Translation Correction Tool >> non-expert bilingual speakers (abstract away from MT system details) >> MT error classification specifically tailored to elicit the most information possible with the least linguistics terminology - Active Learning to obtain minimal pairs and do feature detection - Rule Refinement operations to automatically modify translation rules AVENUE System Rule-based MT system rapid development of MT Resource-poor languages Requirements: small number of non-expert bilingual speakers to translate and align elicitation corpus (Probst et al. 2001) Goal: learn and refine translation rules automatically The Translation Correction Tool v.01 MT error classification Radically different approach to MT evaluation Instead of end-users, translation experts or developers, it needs to be tailored for non-expert bilingual users. Hypothesis : >> non-expert bilingual users can accurately detect an error in the machine translated sentence, given the source language sentence and, optionally, some context. >> they can also probably indicate which other word(s) in the target sentence give us the clue about why there is an error. Example: in agreement errors, what is the word it needs to agree with. English-Spanish User Studies Purpose: threefold >> test naïve users ability to detect and classify MT errors >> assess GUI usefulness and user-friendliness >> asses appropriateness of MT error classification 32 English sentences extracted from the AVENUE elicitation corpus Transfer MT system included a hand-crafted grammar with 12 rules and 442 lexical entries Correction Example with the TCTool Output from MT system: Users need to correct the Spanish translation so that words are in the right form and in the right order. Note that an alignment is missing from “I” to “vi”, so users should also add an alignment between these two words. Actual user statistics 29 users who completed all 32 sentences. 83% users were from Spain. 2/3 with no background in Linguistics 75% with a graduate degree and 25% with a Bachelor's degree. Average translations fixed: 26,6 (over 32) Average duration: 1:30 min >> ~3 minutes per translation Duration range [28min-4:18hours] Measuring user accuracy Gold standard 10 users log files (~ 300 files ) >> interested in high precision at the expense of lower recall. User corrections were not always consistent with other users’. Most of the time, when the final translations differed from gold standard, they were still correct. On average, users only produced 2.5 translations that were worse than the gold standard (out of 26,6). Users got most alignments correctly. Usability questionnaire 82% said TCTool is user-friendly 100% said it is easy to determine if a sentence translation is correct, but only 88% felt that determining the source of errors is easy. Users did not read most of the tutorial (23-pages) Conclusions The TCTool is an online tool that elicits guided and structured user feedback on translations generated by a transfer-based MT system, with the ultimate goal of automatically improving the translation rules. The first English-Spanish user study shows that users can detect errors with high accuracy (89%), but have a harder time classifying error given the MT error classification above (72%). In general, most of the problems users had were due to not having read the instructions and tutorial. The Translation Correction Tool: English-Spanish user studies Ariadna Font Llitjós and Jaime Carbonell {aria,jgc}@cs.cmu.edu Language Technologies Institute CMU Abstract Machine translation systems should improve with feedback from post-editors, but none do beyond statistical and example-base MT improving marginally if the corrected translation is added to the parallel training data. Rule based systems to date improve only via manual debugging. In contrast, we introduce a largely automated method for capturing more information from the human post-editor so that corrections may be performed automatically to translation grammar rules and lexical entries. This paper focuses on the information capture phase and reports on an experiment with English-Spanish translation. Version 01 has 5 CGI scripts in Perl and 1 JavaScript, which together produce a total of 8 different HTML pages. This simplified data flow diagram shows how the core of the TCTool works. Set of possible actions to correct a sentence using the TCTool modify a word >> set of error types associated with it add a word delete a word drag a word into a different position (change word order) add an alignment delete an alignment Future Work >> Interactive dynamic tutorial Need higher precision in error classification: >> Refine MT error classification as shown in the snapshot on the right. >> examples added >> drop-down menu added >> Analyze all user feedback to see how we can automate the rule refinement process. Acknowledgements The research funded in part by NSF grant number IIS- 0121631NSF. We would also like to thank Kenneth Sim and Patrick Milholl for the implementation of the JavaScript. References Flanagan, M., 1994. Error Classification for MT Evaluation. Proceedings of AMTA 94, pp. 65-72, 1994. Imamura, K., Sumita, E. and Matsumoto, Y., 2003. Feedback cleaning of Machine Translation Rules Using Automatic Evaluation. ACL- 03: 41st Annual Meeting of the Association for Computational Linguistics, pp. 447-454, 2003. Menezes, A. and Richardson, S. 2001. A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. Workshop on Example-Based Machine Translation, in MT Summit VIII, pp. 35-42, 2001. Papineni, K., Roukos, S. and Ward, T., 1998. Maximum Likelihood and Discriminative Training of Direct Translation Models. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP-98), pp. 189-192, 1998. Probst, K., Brown, R., Carbonell, J., Lavie, A. Levin, and L., Peterson, E., 2001. Design and Implementation of Controlled Elicitation for Machine Translation of Low-density Languages. Proceedings of the MT2010 workshop at MT Summit 2001. Probst, Katharina, Lori Levin, Erik Peterson, Alon Lavie, Jaime Carbonell. 2002. MT for Resource-Poor Languages Using Elicitation- Based Learning of Syntactic Transfer Rules. Machine Translation, Special Issue on Embedded MT, 17(4). 2002. Su K., Chang J. and Una Hsu, Y. 1995. A corpus-based statistics-oriented two-way design for parameterized MT systems: Rationale, Architecture and Training issues. TMI-95, 6th Theoretical and Methodological Issues in Machine Translation, pp. 334-353, 1995. White, J.S., O'Connell, T. and O'Mara, F., 1994. The ARPA MT Evaluation Methodologies: Evaluation, Lessons, and Future Approaches. Proceedings of AMTA 94, pp. 193-205, 1994. SL: i saw you yesterday TL: vi tu ayer AL: ((2,1),(3,2),(4,3))

Upload: kathryn-riley

Post on 04-Jan-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Carnegie Mellon Goal Recycle non-expert post-editing efforts to: - Refine translation rules automatically - Improve overall translation quality Proposed

Carnegie Mellon

Goal

Recycle non-expert post-editing efforts to:

- Refine translation rules automatically

- Improve overall translation quality

Proposed approach

- User-friendly online GUI: the Translation Correction Tool

>> non-expert bilingual speakers (abstract away from MT system details)

>> MT error classification specifically tailored to elicit the most information

possible with the least linguistics terminology

- Active Learning to obtain minimal pairs and do feature detection

- Rule Refinement operations to automatically modify translation rules

AVENUE System

Rule-based MT system

rapid development of MT

Resource-poor languages

Requirements: small number of non-expert bilingual speakers

to translate and align elicitation corpus (Probst et al. 2001)

Goal: learn and refine translation rules automatically

The Translation Correction Tool v.01

MT error classification

Radically different approach to MT evaluation

Instead of end-users, translation experts or developers, it needs to be tailored for non-expert bilingual users.

Hypothesis:

>> non-expert bilingual users can accurately detect an error in the machine translated sentence, given the source language sentence and, optionally, some context.

>> they can also probably indicate which other word(s) in the target sentence give us the clue about why there is an error. Example: in agreement errors, what is the word it needs to agree with.

English-Spanish User Studies

Purpose: threefold

>> test naïve users ability to detect and classify MT errors

>> assess GUI usefulness and user-friendliness

>> asses appropriateness of MT error classification

32 English sentences extracted from the AVENUE elicitation corpus

Transfer MT system included a hand-crafted grammar with 12 rules and 442 lexical entries

Correction Example with the TCTool

Output from MT system:

Users need to correct the Spanish translation so that words are in the right form and in the right order. Note that an alignment is missing from “I” to “vi”, so users should also add an alignment between these two words.

Actual user statistics

29 users who completed all 32 sentences. 83% users were from Spain.

2/3 with no background in Linguistics

75% with a graduate degree and 25% with a Bachelor's degree.

Average translations fixed: 26,6 (over 32)

Average duration: 1:30 min >> ~3 minutes per translation

Duration range [28min-4:18hours]

Measuring user accuracy

Gold standard

10 users log files (~ 300 files )

>> interested in high precision at the expense of lower recall.

User corrections were not always consistent with other users’.

Most of the time, when the final translations differed from gold standard, they were still correct.

On average, users only produced 2.5 translations that were worse than the gold standard (out of 26,6).

Users got most alignments correctly.

Usability questionnaire

82% said TCTool is user-friendly

100% said it is easy to determine if a sentence translation is correct, but only

88% felt that determining the source of errors is easy.

Users did not read most of the tutorial (23-pages)

Conclusions

The TCTool is an online tool that elicits guided and structured user feedback on translations generated by a transfer-based MT system, with the ultimate goal of automatically improving the translation rules.

The first English-Spanish user study shows that users can detect errors with high accuracy (89%), but have a harder time classifying error given the MT error classification above (72%). In general, most of the problems users had were due to not having read the instructions and tutorial.

The Translation Correction Tool: English-Spanish user studies

Ariadna Font Llitjós and Jaime Carbonell{aria,jgc}@cs.cmu.edu

Language Technologies InstituteCMU

Abstract

Machine translation systems should improve with feedback from post-editors, but none do beyond statistical and example-base MT improving marginally if the corrected translation is added to the parallel training data. Rule based systems to date improve only via manual debugging. In contrast, we introduce a largely automated method for capturing more information from the human post-editor so that corrections may be performed automatically to translation grammar rules and lexical entries. This paper focuses on the information capture phase and reports on an experiment with English-Spanish translation.

Version 01 has 5 CGI scripts in Perl and 1 JavaScript, which together produce a total of 8 different HTML pages. This simplified data flow diagram shows how the core of the TCTool works.

Set of possible actions to correct a sentence using the TCTool

• modify a word >> set of error types associated with it

• add a word

• delete a word

• drag a word into a different position (change word order)

• add an alignment

• delete an alignment

Future Work

>> Interactive dynamic tutorial

Need higher precision in error classification:

>> Refine MT error classification as shown in the snapshot on the right.

>> examples added

>> drop-down menu added

>> Analyze all user feedback to see how we can automate the rule refinement process.

Acknowledgements

The research funded in part by NSF grant number IIS-0121631NSF.

We would also like to thank Kenneth Sim and Patrick Milholl for the implementation of the JavaScript.

References

Flanagan, M., 1994. Error Classification for MT Evaluation. Proceedings of AMTA 94, pp. 65-72, 1994.

Imamura, K., Sumita, E. and Matsumoto, Y., 2003. Feedback cleaning of Machine Translation Rules Using Automatic Evaluation. ACL- 03: 41st Annual Meeting of the Association for Computational Linguistics, pp. 447-454, 2003.

Menezes, A. and Richardson, S. 2001. A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. Workshop on Example-Based Machine Translation, in MT Summit VIII, pp. 35-42, 2001.

Papineni, K., Roukos, S. and Ward, T., 1998. Maximum Likelihood and Discriminative Training of Direct Translation Models. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP-98), pp. 189-

192, 1998.

Probst, K., Brown, R., Carbonell, J., Lavie, A. Levin, and L., Peterson, E., 2001. Design and Implementation of Controlled Elicitation for Machine Translation of Low-density Languages. Proceedings of the MT2010 workshop at MT Summit 2001.

Probst, Katharina, Lori Levin, Erik Peterson, Alon Lavie, Jaime Carbonell. 2002. MT for Resource-Poor Languages Using Elicitation- Based Learning of Syntactic Transfer Rules. Machine Translation, Special Issue on Embedded MT, 17(4). 2002.

Su K., Chang J. and Una Hsu, Y. 1995. A corpus-based statistics-oriented two-way design for parameterized MT systems: Rationale, Architecture and Training issues. TMI-95, 6th Theoretical and Methodological Issues in Machine Translation, pp. 334-353, 1995.

White, J.S., O'Connell, T. and O'Mara, F., 1994. The ARPA MT Evaluation Methodologies: Evaluation, Lessons, and Future Approaches. Proceedings of AMTA 94, pp. 193-205, 1994.

SL: i saw you yesterday

TL: vi tu ayer

AL: ((2,1),(3,2),(4,3))