spatial and planning models of asl classifier predicates for machine translation matt huenerfauth 10...

46
Spatial and Planning Models of ASL Classifier Predicates for Machine Translation Matt Huenerfauth 10 th International Conference on Theoretical and Methodological Issues in Machine Translation October 4, 2004 Baltimore, MD, USA Computer and Information Science University of Research Advisors: Mitch Marcus & Martha Palmer

Upload: sienna-gyles

Post on 15-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Spatial and Planning Models of ASL Classifier Predicates

for Machine Translation

Matt Huenerfauth10th International Conference on Theoretical and

Methodological Issues in Machine TranslationOctober 4, 2004 Baltimore, MD, USA

Computer and Information ScienceUniversity of Pennsylvania

Research Advisors: Mitch Marcus & Martha Palmer

Motivations and Applications

• Only half of Deaf high school graduates (age 18+) can read English at a fourth-grade (age 10) level, despite ASL fluency.

• Many Deaf accessibility tools forget that English is a second language for these students (and has a different structure).

• Applications for a Machine Translation System:– TV captioning, teletype telephones.– Computer user-interfaces in ASL.– Educational tools, access to information/media.– Transcription, storage, and transmission of ASL.

Input / Output

What’s our input? English Text.

What’s our output? ASL has no written form.

Imagine a 3D virtual reality human being…

One that can perform sign language…

What’s our input? English Text.

What’s our output? ASL has no written form.

Imagine a 3D virtual reality human being…

One that can perform sign language…

But this character needs a set of instructions telling it how to move!

Our job: English These Instructions.VCom3d

Photos: Seamless Solutions, Inc.Simon the Signer (Bangham et al. “Signing for the Deaf Using Virtual Humans,” IEE2000.)Vcom3D Corporation

Off-the-Shelf Virtual Humans

ASL Linguistics

• Some ASL sentences: structure similar to that of spoken/written languages.

• Other ASL sentences: use space around signer to topologically describe the 3D layout of a scene under discussion.– The hands indicate the movement and

location of entities in the scene.– Called “Classifier Predicates.”

Classifier Predicate

GazeRightLeft

GazeRightLeft

The car parked between the cat and the house.

Viewer

sign:HOUSE

Viewer

sign:CAT

Viewer

sign:CAR

Note: Facial expression, head tilt, and shoulder tilt not included in this example.

Loc#3

To Loc#3

Loc#1

To Loc#1

Eyes follow right hand.

Path of car, stop at Loc#2. To Loc#2

Example

(Loc#2) (Loc#3) (Loc#1)

Previous ASL MT Systems• Little ASL corpora – no statistical systems.

• Previous direct and transfer systems are only partial solutions.– Some produce only Signed English, not ASL. – None handle the spatial aspects of ASL.

• All ignore classifier predicates.

We can’t ignore CPs

• CPs are needed to convey many concepts.

• Signers use CPs frequently.*

• CPs needed for some important applications– ASL user-interfaces– literacy educational software

* Morford and McFarland. 2003. “Sign Frequency Characteristics of ASL.” Sign Language Studies. 3:2.

Focus and Assumptions• Focus of this approach: producing classifier

predicates of movement and location.

• Part of a larger project* to develop a multi-path English-ASL MT architecture– Direct/transfer paths: most sentences.– This path: produce Classifier Predicates.

* Huenerfauth, M. 2004. “A Multi-Path Architecture for English-to-ASL MT.” HLT-NAACL Student Workshop.

ASL Classifier Predicate Models

Overall Architecture

EnglishSentence

Pred-ArgStructure

3D AnimationPlanning Operator

3D Animationof the Event

CP Semantics

CP Syntax

CP Phonology

CP Discourse

CP Translation Models Discussed

• Scene Visualization

• Discourse

• Semantics

• Syntax

• Phonology (we’ll talk about this one first)

Phonological Model

Body Parts Moving Through Space:

“Articulators”

EnglishSentence

Pred-ArgStructure

3D AnimationPlanning Operator

3D Animationof the Event

CP Semantics

CP Syntax

CP Phonology

CP Discourse

Overall Architecture

ASL Phonetics/Phonology

• “Phonetic” Representation of Output– Hundreds of animation joint angles.

• Traditional ASL Phonological Models– Hand: shape, orientation, location, movement– Some specification of non-manual features.– Tailored to non-CP output: Difficult to specify

complex motion paths. CPs don’t use as many handshapes and orientation patterns.

Classifier Predicate

GazeRightLeft

GazeRightLeft

The car parked between the cat and the house.

At Viewer

sign:HOUSE

At Viewer

sign:CAT

At Viewer

sign:CAR

Note: Facial expression, head tilt, and shoulder tilt not included in this example.

Location #3

To Loc #3

Location #1

To Loc #1

Eyes follow right hand.

Path of car, stop at Loc #2. To Location #2

Example

Phonological Model

• What is the output?– Abstract model of (somewhat) independent body parts.

• “Articulators”– Dominant Hand (Right)

– Non-Dominant Hand (Left)

– Eye Gaze

– Head Tilt

– Shoulder Tilt

– Facial Expression

What informationdo we specify for

each of these?

Values for Articulators

• Dominant Hand, Non-Dominant Hand– 3D point in space in front of the signer– Palm orientation– Hand shape (finite set of standard shapes)

• Eye Gaze, Head Tilt– 3D point in space at which they are aimed.

EnglishSentence

Pred-ArgStructure

3D AnimationPlanning Operator

3D Animationof the Event

Scene Visualization Approach

Converting an English sentence into a 3D

animation of an event.

CP Semantics

CP Syntax

CP Phonology

CP Discourse

Overall Architecture

Previously-Built Technology

• AnimNL System

– Virtual reality model of 3D scene.

– Input: English sentences that tell the characters/objects in the scene what to do.

– Output: An animation in which the characters/objects obey the English commands.

Bindiganavale, Schuler, Allbeck, Badler, Joshi, & Palmer. 2000. "Dynamically Altering Agent Behaviors Using Nat. Lang. Instructions." Int'l Conf. on Autonomous Agents.

Related Work: Coyne and Sproat. 2001. “WordsEye: An Automatic Text-to-Scene

Conversion System.” SIGGRAPH-2001. Los Angeles, CA.

EnglishSentence

Pred-ArgStructure

3D Animationof the Event

How It Works

3D AnimationPlanning Operator

We won’t discussall the details, but

one part of the process is important

to understand.(We’ll come back

to it later.)

Step 1: Analyzing English Input

• The car parked between the cat and the house.• Syntactic analysis.• Identify word senses: e.g. park-23• Identify discourse entities: car, cat, house.• Predicate Argument Structure

– Predicate: park-23

– Agent: the car

– Location: between the cat and the house

Example

Step 2: AnimNL builds 3D scene

Example

CP Discourse

CP Semantics

CP Syntax

CP PhonologyEnglishSentence

Pred-ArgStructure

3D AnimationPlanning Operator

3D Animationof the Event

Discourse Model

Overall Architecture

Discourse Model Motivations

• Preconditions for Performing a CP– (Entity is the current topic) OR (Starting point of this

CP is the same as the ending point of a previous CP)

• Effect of a CP Performance– (Entity is topicalized) AND (assigned a 3D location)

• Discourse Model must record: – topicalized status of each entity

– whether a point has been assigned to an entity

– whether entity has moved in the virtual reality since the last time the signer showed its location with a CP

Discourse Model

• Topic(x) – X is the current topic.

• Identify(x) – X has been associated with a location in space.

• Position(x) – X has not moved since the last time that it was placed using a CP.

Step 3: Setting up Discourse Model

• Model includes a subset of the entities in the 3D scene: those mentioned in the text.

• All values initially set to false for each entity.

CAR: __ Topic? __ Location Identified? __ Still in Same Position?

HOUSE: __ Topic? __ Location Identified? __ Still in Same Position?

CAT: __ Topic? __ Location Identified? __ Still in Same Position?

Example

CP Semantics

Semantic Model

Invisible 3D Placeholders: “Ghosts”

CP Discourse

CP Syntax

CP PhonologyEnglishSentence

Pred-ArgStructure

3D AnimationPlanning Operator

3D Animationof the Event

Overall Architecture

Semantic Model

• 3D representation of the arrangement of invisible placeholder objects in space

• These “ghosts” will be positioned based on the 3D virtual reality scene coordinates

• Choose the details, viewpoint, and timescale of the virtual reality scene for use by CPs

Step 4: Producing Ghost Scene

Example

HOUSECAR

CAT

CP Syntax

Syntactic Model

Planning-Based Generation of CPs

CP Discourse

CP Semantics

CP PhonologyEnglishSentence

Pred-ArgStructure

3D AnimationPlanning Operator

3D Animationof the Event

Overall Architecture

CP Templates

• Recent linguistic analyses of CPs suggests that they can be generated by:– Storing a lexicon of CP templates. – Selecting a template that expresses the proper

semantics and/or shows proper 3D movement.– Instantiate the template by filling in the relevant

3D locations in space.

Huenerfauth, M. 2004. “Spatial Representation of Classifier Predicates for MT into ASL.” Workshop on Representation and Processing of Signed Languages, LREC-2004.

Liddel, S. 2003. Grammar, Gesture, and Meaning in ASL. Cambridge University Press.

Animation Planning Process

• This mechanism is actually analogous to how the AnimNL system generates 3D virtual reality scenes from English text.– Stores templates of prototypical animation

movements (as hierarchical planning operators)– Select a template based on English semantics– Use planning process to work out preconditions

and effects to produce a 3D animation of event

Example

Database of TemplatesWALKING-UPRIGHT-FIGURE

Parameters: g0 (ghost car parking), g1..gN (other ghosts)Restrictions: g0 is a vehiclePreconditions: topic(g0) or (ident(g0) and positioned(g0)) for g=g1..gN: (ident(g) and positioned(g))

Articulator: Right HandLocation: Follow_location_of( g0 )Orientation: Direction_of_motion_path( g0 )Handshape: “Sideways 3”

Effects: positioned(g0), topic(g0),

express (park-23 ag:g0 loc:g1..gN )Concurrently: PLATFORM(g0.loc.final), EYETRACK(g0)

MOVING-MOTORIZED-VEHICLE

Parameters: g0 (ghost car parking), g1..gN (other ghosts)Restrictions: g0 is a vehiclePreconditions: topic(g0) or (ident(g0) and positioned(g0)) for g=g1..gN: (ident(g) and positioned(g))

Articulator: Right HandLocation: Follow_location_of( g0 )Orientation: Direction_of_motion_path( g0 )Handshape: “Sideways 3”

Effects: positioned(g0), topic(g0),

express (park-23 ag:g0 loc:g1..gN )Concurrently: PLATFORM(g0.loc.final), EYETRACK(g0)

LOCATE-BULKY-OBJECT

Parameters: g0 (ghost car parking), g1..gN (other ghosts)Restrictions: g0 is a vehiclePreconditions: topic(g0) or (ident(g0) and positioned(g0)) for g=g1..gN: (ident(g) and positioned(g))

Articulator: Right HandLocation: Follow_location_of( g0 )Orientation: Direction_of_motion_path( g0 )Handshape: “Sideways 3”

Effects: positioned(g0), topic(g0),

express (park-23 ag:g0 loc:g1..gN )Concurrently: PLATFORM(g0.loc.final), EYETRACK(g0)

TWO-APPROACHING-UPRIGHT-FIGURES

Parameters: g0 (ghost car parking), g1..gN (other ghosts)Restrictions: g0 is a vehiclePreconditions: topic(g0) or (ident(g0) and positioned(g0)) for g=g1..gN: (ident(g) and positioned(g))

Articulator: Right HandLocation: Follow_location_of( g0 )Orientation: Direction_of_motion_path( g0 )Handshape: “Sideways 3”

Effects: positioned(g0), topic(g0),

express (park-23 ag:g0 loc:g1..gN )Concurrently: PLATFORM(g0.loc.final), EYETRACK(g0)

LOCATE-SEATED-HUMAN

Parameters: g0 (ghost car parking), g1..gN (other ghosts)Restrictions: g0 is a vehiclePreconditions: topic(g0) or (ident(g0) and positioned(g0)) for g=g1..gN: (ident(g) and positioned(g))

Articulator: Right HandLocation: Follow_location_of( g0 )Orientation: Direction_of_motion_path( g0 )Handshape: “Sideways 3”

Effects: positioned(g0), topic(g0),

express (park-23 ag:g0 loc:g1..gN )Concurrently: PLATFORM(g0.loc.final), EYETRACK(g0)

PARKING-VEHICLE

Parameters: g0 (ghost car parking), g1..gN (other ghosts)Restrictions: g0 is a vehiclePreconditions: topic(g0) or (ident(g0) and position (g0)) for g=g1..gN: (ident(g) and position (g))

Articulator: Right HandLocation: Follow_location_of( g0 )Orientation: Direction_of_motion_path( g0 )Handshape: “Sideways 3”

Effects: positioned(g0), topic(g0),

express (park-23 ag:g0 loc:g1..gN )Concurrently: PLATFORM(g0.loc.final), EYETRACK(g0)

Step 5: Initial Planner Goal

• Planning starts with a “goal.”

• Express the semantics of the sentence:– Predicate: PARK-23– Agent: “the car” discourse entity

• We know from lexical information that this “car” is a vehicle (some special CPs may apply)

– Location: 3D position calculated “between” locations for “the cat” and “the house.”

Example

Step 6: Select Initial CP TemplatePARKING-VEHICLE

Parameters: g_0, g_1, g_2 (ghost car & nearby objects)Restrictions: g_0 is a vehiclePreconditions: topic( g_0 ) or ( ident( g_0 ) and position( g_0 )) (ident( g_1 ) and position( g_1 )) (ident( g_2 ) and position( g_2 ))

Articulator: Right HandLocation: Follow_location_of( g_0 )Orientation: Direction_of_motion_path( g_0 )Handshape: “Sideways 3”

Effects: position( g_0 ), topic( g_0 ),

express(park-23 agt: g_0 loc: g_1, g_2 )Concurrently: PLATFORM( g_0.loc.final), EYETRACK( g_0 )

Example

Step 7: Instantiate the TemplatePARKING-VEHICLE

Parameters: CAR, HOUSE, CATRestrictions: CAR is a vehiclePreconditions: topic(CAR) or (ident(CAR) and position(CAR)) (ident(CAT) and position(CAT)) (ident(HOUSE) and position(HOUSE))

Articulator: Right HandLocation: Follow_location_of( CAR )Orientation: Direction_of_motion_path( CAR )Handshape: “Sideways 3”

Effects: position(CAR), topic(CAR), express(park-23 agt:CAR loc:HOUSE,CAT )Concurrently: PLATFORM(CAR.loc.final), EYETRACK(CAR)

Example

Step 7: Instantiate the TemplatePARKING-VEHICLE

Parameters: CAR, HOUSE, CATRestrictions: CAR is a vehiclePreconditions: topic(CAR) or (ident(CAR) and position(CAR)) (ident(CAT) and position(CAT))

(ident(HOUSE) and position(HOUSE))

Effects: position(CAR), topic(CAR),

express (park-23 agt:CAR loc:HOUSE,CAT )

Example

GazeRightLeft

Eyes follow right hand.

Path of car, stop at Loc#2. To Loc#2

Step 8: Begin Planning ProcessPARKING-VEHICLE

Parameters: CAR, HOUSE, CATRestrictions: CAR is a vehiclePreconditions: topic(CAR) or (ident(CAR) and position(CAR)) (ident(CAT) and position(CAT))

(ident(HOUSE) and position(HOUSE))

Effects: position(CAR), topic(CAR),

express (park-23 agt:CAR loc:HOUSE,CAT )

Example

GazeRightLeft

Eyes follow right hand.

Path of car, stop at Loc#2. To Loc#2

Other Templates in the Database

• We’ve seen these:– PARKING-VEHICLE– PLATFORM– EYEGAZE

• There’s also these:– LOCATE-STATIONARY-ANIMAL– LOCATE-BULKY-OBJECT– MAKE-NOUN-SIGN

Example

Step 9: Planning Continues…PARKING-VEHICLE

Parameters: CAR, HOUSE, CATRestrictions: CAR is a vehiclePreconditions: topic(CAR) or (ident(CAR) and position(CAR)) (ident(CAT) and position(CAT))

(ident(HOUSE) and position(HOUSE))

Effects: position(CAR), topic(CAR), express (park-23 agt:CAR loc:HOUSE,CAT )

Example

Gaze

Right

Left

Eyes follow right hand.

Path of car, stop at Loc#2.

To Loc#2LOCATE-STATIONARY-ANIMAL

Parameters: CATRestrictions: CAT is an animalPreconditions: topic(CAT)

Effects: topic(CAT), position(CAT), ident(CAT)

Gaze

Right

Left

Eyes at Cat Location.

Move to Cat Location.

Step 9: Planning Continues…

PARKING-VEHICLE

MAKE-NOUN:“CAR”

LOCATE-STATNRY-ANIMAL

MAKE-NOUN:“CAT”

LOCATE-BULKY-OBJECT

MAKE-NOUN:

“HOUSE”

position(CAT)position(HOUSE)

topic(CAR)identify(CAR)

topic(CAT)identify(CAT)

topic(HOUSE)identify(HOUSE)

EYEGAZE

PLATFORM

(concurrently)

Example

Gaze

Right

Left

at Loc#1 at Loc#3follow car

Step 10: Build Phonological Spec

PLATFORM

EYEGAZE

at viewer

HOUSE

at viewer

CAT

at viewer

CAR

MAKE-NOUN:“CAR”

MAKE-NOUN:“CAT”

MAKE-NOUN:

“HOUSE”

LOCATE-STATNRY-ANIMAL

LOCATE-BULKY-OBJECT

PARKING-VEHICLE

Example

Wrap-Up and Discussion

Wrap-Up

• This is the first MT approach proposed for producing ASL Classifier Predicates.

• Currently in early implementation phase.

• Generation models for ASL CPs – discourse (topicalized/identified/positioned)– semantics (invisible ghosts)– syntax (planning operators)– phonology (simultaneous articulators)

Discussion• ASL as an MT research vehicle

– Need for a spatial representation to translate some English-to-ASL sentence pairs.

– Virtual reality: intermediate MT representation.– A translation pathway tailored to a specific

phenomenon as part of a multi-path system. – Symmetry in use of planning in the analysis

and generation sides of the MT architecture.

Questions?