advanced decision architectures collaborative technology alliance a task-based evaluation method for...
TRANSCRIPT
Advanced Decision Architectures Collaborative Technology Alliance
A Task-Based Evaluation Method for Embedded Machine Translation in Instant
Messaging Systems
William OgdenNew Mexico State University
Advanced Decision Architectures Collaborative Technology Alliance
MT good enough?
Since 1981, Bill Ogg with human and computer interaction research is that cognitive psychologists. He is an expert ergonomic design, evaluation, sample, including the execution of the development of software interfaces are participating in the first half.
Work with the information retrieval research for the interaction between multiple languages and language users in applications is about a communication problem. In particular, the current instant messaging applications to evaluate machine translation technology used to apply the task-centric approach has been developed. He is currently in New Mexico State University, undergraduate and graduate students how to design a communication is to continue to teach.
Advanced Decision Architectures Collaborative Technology Alliance
MT Evaluation
• Typically, machine translations are compared to human translations with computed distance metrics
• Good for system development and cross-system comparisons.
• Translations can also be rated for adequacy and fluency.
Advanced Decision Architectures Collaborative Technology Alliance
Machine Translation (MT)
• What is it good for?– Our goal is to evaluate the usefulness of
automatically translated language for Army applications.
• Proposed applications.– Document translation – Speech to speech dialog– Text instant messaging– … and others.
Advanced Decision Architectures Collaborative Technology Alliance
CCL+ (TrIM) in field tests
• CCL enabled precise, rapid reporting and consultation on illness and injury, as well as location and availability of medical resources, simutantaneously in thirteen languages (Campbell and Hillenbrand, 2005).
• The TRiM (Translingual Instant Messaging) language application tool is an example of a marvelous application that crosses the language barrier. As the Fire Support Coordinator (FSC) for JWID 03, responsible for all ground artillery, naval gun and close air support, I was required to work with the Spanish Army. TRiM effectively enabled me to write my message on a whiteboard … and send it straight to Spain. They receive it in Spanish and can then respond to … me and I receive it in English." (Joyce, 2003)
James R. Campbell and Chris Hillenbrand (2005). CCL for Operational Medical Support, Military Medical Technology Online Archives, Jan 26, 2005 in Volume: 9 Issue: 1 Retrieved from
http://www.military-medical-technology.com/article.cfm?DocID=784John R. Joyce (2003) Coalition Interoperability Tested at Dahlgren During JWID 2003. CHIPS - The
Department of the Navy Information Technology Magazine. Fall 2003, Retrieved from http://www.chips.navy.mil/archives/03_fall/PDF/JWID_2003.pdf
Advanced Decision Architectures Collaborative Technology Alliance
Evaluation Goals
• Evaluate MT in the context of Instant Messaging (TrIM - MITRE) for Army coalition coordination
• Discover applicable contexts • Develop tasks that can be used to encourage
realistic task-oriented conversation• Characterize these task domains
• Improve the MT application technology• Discover user expectations concerning the
capabilities of MT in this environment
Advanced Decision Architectures Collaborative Technology Alliance
Task-based evaluations
• MT is embedded in the application
• Test users are given realistic tasks
• Task measures are used to evaluate MT usefulness and effectiveness
• Linguistic measures are used to evaluate MT technology
Advanced Decision Architectures Collaborative Technology Alliance
Task-based advantages
• Answers the “good enough?” question
• Provides formative evaluations of the user interface
• Provides insights for MT development.
Advanced Decision Architectures Collaborative Technology Alliance
Task-based disadvantages
• Are the tasks realistic enough?
• Are the test users representative and available?
Advanced Decision Architectures Collaborative Technology Alliance
General method
• Participants.• Pairs of native English, Japanese, Korean, Spanish,
Chinese speakers.• Environment.
• Pairs were seated in separate rooms with a task window and an TRIM window
• Task window covered the English dialog for non-English pairs.
Advanced Decision Architectures Collaborative Technology Alliance
General method
• Procedure• Collaborative Task instructions were presented.• 10 – 20 minute practice task• Tasks were presented usually in two parts.• Participants worked until task was completed
(mutual agreement) or Time Limit reached (rarely)• Non-English participants rated the translation
quality (separate session).
Advanced Decision Architectures Collaborative Technology Alliance
Evolving tasks
Conducted a series of studies with an evolving set of information sharing tasks– Table fill-in– Logistic Map
• Situation assessment• Open-ended logistic requests• Discrete trial logistic requests
– Shared whiteboard planning – Picture identification
Advanced Decision Architectures Collaborative Technology Alliance
Analysis of Tasks
• Task characteristics determine structure of the communication
• Fixed or stylistic messaging strategies will be best serve by application interfaces providing custom, human-translated versions
Advanced Decision Architectures Collaborative Technology Alliance
Typical Logistic Task Finding
Language Pair Task Time (hrs)
Map Errors Message Count
Chinese-English 1.4 6 162
Japanese-English
1.35 6 199
Korean-English 1.2 8 191
Spanish-English 1 2 160
English-English .8 1 96
• Task performance is slowed– But not prevented (completion rates > .90)
– Translations judged “adequate” > 80%
Advanced Decision Architectures Collaborative Technology Alliance
Advanced Decision Architectures Collaborative Technology Alliance
Advanced Decision Architectures Collaborative Technology Alliance
Advanced Decision Architectures Collaborative Technology Alliance
Implications
• MT works for IM but could work better – It works because people are engaged and
can negotiate meanings– It could work better if the technology
supported negotiation and repair.
Advanced Decision Architectures Collaborative Technology Alliance
Meta comment analysis
Semantic Category
Occurrences(total/unique)
Percent Poorly Translated
Chinese Korean Chinese Korean
Yes 54/8 107/26 15 18
No 16/6 29/9 19 4
OK 92/13 85/11 19 9
Ready 27/24 22/20 11 23
What? 40/32 94/71 32 32
Understand? 11/9 8/7 9 0
Wait 8/5 4/3 37 50
Advanced Decision Architectures Collaborative Technology Alliance
Meta-buttons
Advanced Decision Architectures Collaborative Technology Alliance
Meta-Button Results
Without Meta-Buttons
With Meta-Buttons
Meta messages 31 47
Task messages 65 54
Task time 69 min 52 min
Advanced Decision Architectures Collaborative Technology Alliance
Discrete trial logistic task
Advanced Decision Architectures Collaborative Technology Alliance
Comparing Korean Translators
Average Solution Time (Seconds)
Correctsolutions
Matching solutions
English Control
109 16.5 16.0
Korean 1 155 16.2 17.3
Korean 2 179 14.5 15.0
Advanced Decision Architectures Collaborative Technology Alliance
Picture Identification
Advanced Decision Architectures Collaborative Technology Alliance
Multiple Web Translations
Advanced Decision Architectures Collaborative Technology Alliance
Multiple Web Translations
• Translations obtained from three web sites– Google– Bizlingo (from Excite in Japan)– Amikai
• Source of initial translation balanced across trials.
• 16 pairs of Japanese-English teams
• 12 pairs of English-English teams
• 32 picture identification tasks
Advanced Decision Architectures Collaborative Technology Alliance
Multiple translations helped
Available Translations
Time (sec) Per cent correct
Message count
Single 177 83 8.80
Multiple 2 148 83 7.78
Multiple 3 130 86 7.74
English Control 109 94 7.74
Advanced Decision Architectures Collaborative Technology Alliance
Translation server comparison
Initial Translation
View count
Time (sec)
Per cent correct
Message count
BizLingo 1.83 132 79 7.41
Google 3.87 150 89 7.98
Advanced Decision Architectures Collaborative Technology Alliance
Multiple helps even the best
Available Translations
TranslationService
Time (sec)
Message count
Multiple BizLingo 135 7.43
Single BizLingo 172 8.46
Multiple Google 150 8
Single Google 194 9.49
Advanced Decision Architectures Collaborative Technology Alliance
Evaluation method sensitivity
• Task-based evaluation method is sensitive to MT engine differences
• But differences may actually be a good thing when multiple translation are made available
Advanced Decision Architectures Collaborative Technology Alliance
Conclusions
• IM is a good application for MT
• Task-based evaluation is effective
• Current application technologies do not support negotiated meaning
• Improvements are possible– e.g. meta-buttons, multiple translation
Advanced Decision Architectures Collaborative Technology Alliance
Available Technology
• Web-based task presentation and data collection
• Multiple translation chat client (Flash app)
Advanced Decision Architectures Collaborative Technology Alliance
Acknowledgments
• ARL – John Warner, Melissa Holland
• MITRE – Rod Holland, Galen Williamson
• NMSU– Sieun An, Emily Chaffin, Yuki Ishikawa,
Wanying Jin, Jong Hwan Kim, Yosip Kim, Roberto Montalvo, Jeff Sorge and Ron Zacharski.