a multi-path architecture for machine translation of english text into american sign language...

40
A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human Language Technologies conference / North American chapter of the Association for Computational Linguistics annual meeting. Boston, MA, USA. May 2, 2004 Computer and Information Science University of Pennsylvania Research Advisors: Mitch Marcus & Martha Palme

Upload: malik-stinton

Post on 15-Jan-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

A Multi-Path Architecture for Machine Translation of English Text into American

Sign Language Animation

Matt Huenerfauth

Student Workshop of the Human Language Technologies conference / North American chapter of the Association for Computational Linguistics

annual meeting. Boston, MA, USA. May 2, 2004

Computer and Information ScienceUniversity of Pennsylvania

Research Advisors: Mitch Marcus & Martha Palmer

Page 2: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Motivations and Applications

• One half of Deaf high school graduates (age 18) can read English at a fourth-grade level (age 10). – But most are fluent in ASL. (ASL ≠ English.)

– Many accessibility technologies assume English-fluency.

– ASL used by 500,000 Deaf people in North America.

• Applications for a Machine Translation System:– TV captioning, teletype telephones.

– Computer user-interfaces in ASL.

– Educational tools, access to information/media.

Page 3: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

MT: Input / Output

What’s the input? English Text.

What’s the output? Less clear…

Imagine a 3D virtual reality human being…

One that can perform sign language…

What’s the input? English Text.

What’s the output? Less clear…

Imagine a 3D virtual reality human being…

One that can perform sign language…

But this character needs a set of instructions telling it how to move!

The task: English These Instructions.VCom3d

Page 4: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Off-the-Shelf Virtual Humans

Photos: Seamless Solutions, Inc.Simon the Signer (Bangham et al., 2000)Vcom3D Corporation

Page 5: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

American Sign Language

• Sentence without classifier predicate:

Which university does Billy attend? wh #BILLY IXx GO-TO UNIVERSITY WHICH

• Sentence with classifier predicate:

The car drove down the bumpy road past a cat.

CAT ClassPred-bentV-{location of cat}

CAR ClassPred-3-{drive on bumpy road}

Page 6: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Difficult to Generate but Important

The car drove down the bumpy road past the cat.Where’s the cat? The road? The car?How close are they? Where does path start/stop?How show path is bumpy vs. windy vs. hilly?

Some English sentences require a classifier predicate to be translated fluently.Spatial prepositions, adverbs, other phrases…

Signers use classifier predicates frequently.Depending on genre, one to 17 times per minute.

Page 7: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Initial Approaches to ASL MT

Non-statistical Direct and Transfer

MT Architectures

Page 8: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Why not Statistical MT?

• ASL has no written form.

• Corpora is hard to collect, transcribe.– Annotate video: multiple simultaneous

channels of face, body, hand, and arm movements.

• There’s no training data.

Page 9: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Machine Translation Pyramid

• Options in MT design.– more work– domain size– subtler

divergences handled

Interlingua

SemanticStructure

SemanticStructure

SyntacticStructure

SyntacticStructure

WordStructure

WordStructure

Source Text Target Text

SemanticComposition

SemanticDecomposition

SemanticAnalysis

SemanticGeneration

SyntacticAnalysis

SyntacticGeneration

MorphologicalAnalysis

SemanticTransfer

SyntacticTransfer

DirectMorphologicalGeneration

MT Pyramid(Dorr, 1998.)

Page 10: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Direct ‘ASL’ MT Systems

• Word-for-sign dictionary look-up system.– Produces Signed English, not ASL.– Definitely can’t generate classifier predicates.

Page 11: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Transfer ASL MT Systems

• Syntactically analyze English text before crossing over to ASL.– Capture more divergences, more phenomena– Still can’t handle the complex use of space.– Still can’t generate classifier predicates.

Page 12: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

When the going gets tough…

…the tough try an interlingua. – Direct or transfer architectures are insufficient.– If not an interlingua, then at least an approach

with more spatial knowledge/representation.

Of course, there’s a problem.– It’s hard/impossible to build interlingua

system for an open-ended domain.

Page 13: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Getting by with limited domain?

We can identify sentences that need complex translation. (That need classifier predicates.)

When do we use classifier predicates?– Locations, orientations, or movements– Spatial verbs, prepositions, adverbs– Concrete or animate entities– Don’t worry about abstractions, beliefs, intentions

Page 14: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

“Multi-Path” MT

• Only when needed,Use complex, sophisticated MT. Interlingua?

• Otherwise,Use simpler easier-to-build MT. Transfer

• Use the linguistic ‘breadth’ of one approachand knowledge/spatial ‘depth’ of the other.

Page 15: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

“Multi-Path” MT

• Only when needed,Use complex, sophisticated MT. Interlingua?

• Otherwise,Use simpler easier-to-build MT. Transfer

• If all else fails,Use word-for-sign Direct transliteration.

Page 16: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

“Pyramidal” MT

• No longer a set of options.

• Now a design for a newmulti-path architecture.

Interlingua

SemanticStructure

SemanticStructure

SyntacticStructure

SyntacticStructure

WordStructure

WordStructure

Source Text Target Text

SemanticComposition

SemanticDecomposition

SemanticAnalysis

SemanticGeneration

SyntacticAnalysis

SyntacticGeneration

MorphologicalAnalysis

Semantic

Transfer

Syntactic

Transfer

DirectMorphologicalGeneration

MT Pyramid(Dorr, 1998.)

Page 17: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

“Pyramidal” MT

• No longer a set of options.

• Now a design for a newmulti-path architecture.Direct: Unanalyzable Text

Interlingual:Spatial Text

Transfer: Most Sentences

Interlingua

SemanticStructure

SemanticStructure

SyntacticStructure

SyntacticStructure

WordStructure

WordStructure

Source Text Target Text

SemanticComposition

SemanticDecomposition

SemanticAnalysis

SemanticGeneration

SyntacticAnalysis

SyntacticGeneration

MorphologicalAnalysis

Semantic

Transfer

Syntactic

Transfer

DirectMorphologicalGeneration

MT Pyramid(Dorr, 1998.)

Page 18: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

But what’s our interlingua?

And is it really an interlingua?

Page 19: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

What do human interpreters do?

Listen to English about spatial topics make 3D mental model of what’s said produce ASL classifier predicates

• Using a spatial representation of reality…

Page 20: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

What could a computer do?

Computer analyzes English text build 3D virtual reality of the scene use VR as basis for generating the spatial classifier predicate movements

• University of Pennsylvania AnimNL system:– 3D virtual reality model with characters/objects.– Input: English directions for characters to follow.– Builds animation: characters obey commands.

Badler, Bindiganavale, Allbeck, Schuler, Zhao, Lee, Shin, and Palmer. 2000. & Schuler. 2003.

Page 21: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

English-Controlled 3D Scene

http://hms.upenn.edu/software/PAR/images.html

Page 22: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

How it Works

English Text Syntactic Analysis

Select a PAR Template Fill the PAR Template

“Planning Process” Animation

Output

PAR = “Parameterized Action Representation” (on next slide)

Page 23: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Parameterized Action Representationparticipants: [ agent: AGENT

objects: OBJECT list ]

semantics: [ motion: {Object, Translate?, Rotate?} path: {Direction, Start, End, Distance}

termination: CONDITION duration: TIME-LENGTH

manner: MANNER ]

start: TIME

prep conditions: CONDITION boolean-exp

sub-actions: sub-PARs

parent action: PAR24

previous action: PAR35

next action: PAR64

This is a subset of PAR info.http://hms.upenn.edu/software/PAR

Bob tripped on the ball.

…tripped…

Planning Operator:Linked to 3D VRAnimated Movements.

Bob{ ball_1 }

{Bob, translate…, rotate…}Specifics of the path taken…

Accidentally.

End at 6am.3 Hours.

Accidentally, Rapidly.

…until dawn.…for 3 hours.…rapidly.

Planning algorithm works out movement details.

Page 24: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

English-Controlled 3D Scene

http://hms.upenn.edu/software/PAR/images.html

Page 25: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Using this technology…

An NL-Controlled 3D Scene

http://hms.upenn.edu/software/PAR/images.html

Page 26: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Using this technology…

An NL-Controlled 3D Scene

Page 27: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Using this technology…

An NL-Controlled 3D SceneOriginal image from: Simon the Signer (Bangham et al. 2000.)

Signing Character

Page 28: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Using this technology…

An NL-Controlled 3D SceneOriginal image from: Simon the Signer (Bangham et al. 2000.)

Signing Character

Page 29: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

“Invisible World” Approach

• Tiny 3D virtual reality in front of signer’s hands.

• AnimNL: English sentences about locomotion Move invisible objects accordingly

• Put hand on top of an object: go along for the ride!We just built a CLASSIFIER PREDICATE.

Page 30: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Classifier Predicate Pathway

Page 31: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Direct: Unanalyzable Text

Interlingual:Spatial Text

Transfer: Most Sentences

Page 32: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Design Issues and Discussion

Page 33: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Is the VR really an interlingua?

• Depends on your definition & how implemented.– Semantic representation:

Yes, model for 3D spatial domains.

– Useful for translation:We’ve shown how it can be.

– World knowledge beyond input semantics:Yes, in that it handles spatial/physical constraints.

– Language neutral: 3D coordinates: not just interlingual, it’s non-lingual.But might need other semantic/discourse information…

Page 34: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Other Languages

• Alleviates tradeoff: – Domain specificity vs. divergence-handling power.

– Use deeper approach in a broad coverage system.

• Translate variety of texts but perform deeper processing on certain inputs.– Important or well-understood sentences.

– Sublanguage that requires special handling.

– Transfer or deeper/interlingual approach for “special” text and resource-lighter approach for the rest.

Page 35: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Mixing Statistical/Symbolic MT

• This system had no statistical pathways.– Nothing prevents their use with this design.

• Statistical approach for most inputs; manually override translation of certain texts. – Statistical approach for direct (and transfer).– Hand-build the higher pathways.

Page 36: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Project Status

• Finishing design specification.

• Beginning implementation.

• Other considerations:– Evaluation?– Initial Applications?– How to generate multiple classifier predicates?– Representations to use in transfer pathway?

Page 37: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Questions?

Page 38: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

ReferencesBadler, Bindiganavale, Allbeck, Schuler, Zhao, Lee, Shin, and Palmer. 2000. Parameterized action

representations and natural language instructions for dynamic behavior modification of embodied agents. AAAI Spring Symposium.

Bangham, Cox, Lincoln, Marshall. 2000. Signing for the deaf using virtual humans. IEE2000.DeMatteo, A. (1977). Visual Analogy and the Visual Analogues in American Sign Language. In Lynn

Friedman (ed.). On the Other Hand: New Perspectives on American Sign Language. (pp 109-136). New York: Academic Press.

Holt, J. (1991). Demographic, Stanford Achievement Test - 8th Edition for Deaf and Hard of Hearing Students: Reading Comprehension Subgroup Results.

Liddell. 2003. Sources of Meaning in ASL Classifier Predicates. In Karen Emmorey (ed.). Perspectives on Classifier Constructions in Sign Languages. Workshop on Classifier Constructions, La Jolla, San Diego, California.

Liddell. 2003. Grammar, Gesture, and Meaning in American Sign Language. UK: Cambridge U. Press. Morford and MacFarlane. 2003. “Frequency Characteristics of ASL.” Sign Language Studies, 3:2. Schuler. 2003. Using model-theoretic semantic interpretation to guide statistical parsing and word recognition

in a spoken language interface. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL’03), Sapporo, Japan.

Supalla, T. (1978). Morphology of Verbs of Motion and Location. In F. Caccamise and D. Hicks (eds). Proceedings of the Second National Symposium on Sign Language Research and Teaching. (pp. 27-45). Silver Spring, MD: National Association for the Deaf.

Supalla, T. (1982). Structure and Acquisition of Verbs of Motion and Location in American Sign Language. Ph.D. Dissertation, University of California, San Diego.

Supalla, T. (1986). The Classifier System in American Sign Language. In C. Craig (ed.) Noun Phrases and Categorization, Typological Studies in Language, 7. (pp. 181-214). Philadelphia: John Benjamins.

Page 39: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

Advantages of Virtual Reality

• ASL signers can arrange objects under discussion in the space around them. – Presence of a virtual reality model in this system enables

sophisticated management of these positioned objects.

• The AnimNL system can also control the movements of virtual human figures who participate in the 3D scene. These figures possess skills useful for ASL signing; so, we can use one as our signer.– Same technology for signer and 3D spatial model.

Page 40: A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human

System Diagram