advanced decision architectures collaborative technology alliance a task-based evaluation method for...

33
ced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging Systems William Ogden New Mexico State University

Upload: melina-wade

Post on 04-Jan-2016

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

A Task-Based Evaluation Method for Embedded Machine Translation in Instant

Messaging Systems

William OgdenNew Mexico State University

Page 2: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

MT good enough?

Since 1981, Bill Ogg with human and computer interaction research is that cognitive psychologists. He is an expert ergonomic design, evaluation, sample, including the execution of the development of software interfaces are participating in the first half.

Work with the information retrieval research for the interaction between multiple languages and language users in applications is about a communication problem. In particular, the current instant messaging applications to evaluate machine translation technology used to apply the task-centric approach has been developed. He is currently in New Mexico State University, undergraduate and graduate students how to design a communication is to continue to teach.

Page 3: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

MT Evaluation

• Typically, machine translations are compared to human translations with computed distance metrics

• Good for system development and cross-system comparisons.

• Translations can also be rated for adequacy and fluency.

Page 4: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Machine Translation (MT)

• What is it good for?– Our goal is to evaluate the usefulness of

automatically translated language for Army applications.

• Proposed applications.– Document translation – Speech to speech dialog– Text instant messaging– … and others.

Page 5: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

CCL+ (TrIM) in field tests

• CCL enabled precise, rapid reporting and consultation on illness and injury, as well as location and availability of medical resources, simutantaneously in thirteen languages (Campbell and Hillenbrand, 2005).

• The TRiM (Translingual Instant Messaging) language application tool is an example of a marvelous application that crosses the language barrier. As the Fire Support Coordinator (FSC) for JWID 03, responsible for all ground artillery, naval gun and close air support, I was required to work with the Spanish Army. TRiM effectively enabled me to write my message on a whiteboard … and send it straight to Spain. They receive it in Spanish and can then respond to … me  and I receive it in English." (Joyce, 2003)

James R. Campbell and Chris Hillenbrand (2005). CCL for Operational Medical Support, Military Medical Technology Online Archives, Jan 26, 2005 in Volume: 9 Issue: 1 Retrieved from

http://www.military-medical-technology.com/article.cfm?DocID=784John R. Joyce (2003) Coalition Interoperability Tested at Dahlgren During JWID 2003. CHIPS - The

Department of the Navy Information Technology Magazine. Fall 2003, Retrieved from http://www.chips.navy.mil/archives/03_fall/PDF/JWID_2003.pdf

Page 6: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Evaluation Goals

• Evaluate MT in the context of Instant Messaging (TrIM - MITRE) for Army coalition coordination

• Discover applicable contexts • Develop tasks that can be used to encourage

realistic task-oriented conversation• Characterize these task domains

• Improve the MT application technology• Discover user expectations concerning the

capabilities of MT in this environment

Page 7: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Task-based evaluations

• MT is embedded in the application

• Test users are given realistic tasks

• Task measures are used to evaluate MT usefulness and effectiveness

• Linguistic measures are used to evaluate MT technology

Page 8: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Task-based advantages

• Answers the “good enough?” question

• Provides formative evaluations of the user interface

• Provides insights for MT development.

Page 9: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Task-based disadvantages

• Are the tasks realistic enough?

• Are the test users representative and available?

Page 10: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

General method

• Participants.• Pairs of native English, Japanese, Korean, Spanish,

Chinese speakers.• Environment.

• Pairs were seated in separate rooms with a task window and an TRIM window

• Task window covered the English dialog for non-English pairs.

Page 11: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

General method

• Procedure• Collaborative Task instructions were presented.• 10 – 20 minute practice task• Tasks were presented usually in two parts.• Participants worked until task was completed

(mutual agreement) or Time Limit reached (rarely)• Non-English participants rated the translation

quality (separate session).

Page 12: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Evolving tasks

Conducted a series of studies with an evolving set of information sharing tasks– Table fill-in– Logistic Map

• Situation assessment• Open-ended logistic requests• Discrete trial logistic requests

– Shared whiteboard planning – Picture identification

Page 13: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Analysis of Tasks

• Task characteristics determine structure of the communication

• Fixed or stylistic messaging strategies will be best serve by application interfaces providing custom, human-translated versions

Page 14: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Typical Logistic Task Finding

Language Pair Task Time (hrs)

Map Errors Message Count

Chinese-English 1.4 6 162

Japanese-English

1.35 6 199

Korean-English 1.2 8 191

Spanish-English 1 2 160

English-English .8 1 96

• Task performance is slowed– But not prevented (completion rates > .90)

– Translations judged “adequate” > 80%

Page 15: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Page 16: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Page 17: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Page 18: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Implications

• MT works for IM but could work better – It works because people are engaged and

can negotiate meanings– It could work better if the technology

supported negotiation and repair.

Page 19: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Meta comment analysis

Semantic Category

Occurrences(total/unique)

Percent Poorly Translated

Chinese Korean Chinese Korean

Yes 54/8 107/26 15 18

No 16/6 29/9 19 4

OK 92/13 85/11 19 9

Ready 27/24 22/20 11 23

What? 40/32 94/71 32 32

Understand? 11/9 8/7 9 0

Wait 8/5 4/3 37 50

Page 20: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Meta-buttons

Page 21: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Meta-Button Results

Without Meta-Buttons

With Meta-Buttons

Meta messages 31 47

Task messages 65 54

Task time 69 min 52 min

Page 22: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Discrete trial logistic task

Page 23: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Comparing Korean Translators

 Average Solution Time (Seconds)

Correctsolutions

Matching solutions

English Control

109 16.5 16.0

Korean 1 155 16.2 17.3

Korean 2 179 14.5 15.0

Page 24: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Picture Identification

Page 25: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Multiple Web Translations

Page 26: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Multiple Web Translations

• Translations obtained from three web sites– Google– Bizlingo (from Excite in Japan)– Amikai

• Source of initial translation balanced across trials.

• 16 pairs of Japanese-English teams

• 12 pairs of English-English teams

• 32 picture identification tasks

Page 27: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Multiple translations helped

Available Translations

Time (sec) Per cent correct

Message count

Single 177 83 8.80

Multiple 2 148 83 7.78

Multiple 3 130 86 7.74

English Control 109 94 7.74

Page 28: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Translation server comparison

Initial Translation

View count

Time (sec)

Per cent correct

Message count

BizLingo 1.83 132 79 7.41

Google 3.87 150 89 7.98

Page 29: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Multiple helps even the best

Available Translations

TranslationService

Time (sec)

Message count

Multiple BizLingo 135 7.43

Single BizLingo 172 8.46

Multiple Google 150 8

Single Google 194 9.49

Page 30: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Evaluation method sensitivity

• Task-based evaluation method is sensitive to MT engine differences

• But differences may actually be a good thing when multiple translation are made available

Page 31: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Conclusions

• IM is a good application for MT

• Task-based evaluation is effective

• Current application technologies do not support negotiated meaning

• Improvements are possible– e.g. meta-buttons, multiple translation

Page 32: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Available Technology

• Web-based task presentation and data collection

• Multiple translation chat client (Flash app)

Page 33: Advanced Decision Architectures Collaborative Technology Alliance A Task-Based Evaluation Method for Embedded Machine Translation in Instant Messaging

Advanced Decision Architectures Collaborative Technology Alliance

Acknowledgments

• ARL – John Warner, Melissa Holland

• MITRE – Rod Holland, Galen Williamson

• NMSU– Sieun An, Emily Chaffin, Yuki Ishikawa,

Wanying Jin, Jong Hwan Kim, Yosip Kim, Roberto Montalvo, Jeff Sorge and Ron Zacharski.