see, hear, do: language and robots

8
IBM Research © 2002 IBM Corporation http://w3.ibm.com/ibm/presentations See, Hear, Do: Language and Robots Jonathan Connell Exploratory Computer Vision Group Etienne Marcheret Speech Algorithms & Engines Group Sharath Pankanti (ECVG) Josef Vopicka (Speech) Far Reaching Research (FRR) Project

Upload: tacita

Post on 24-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Far Reaching Research (FRR) Project. See, Hear, Do: Language and Robots. Jonathan Connell Exploratory Computer Vision Group Etienne Marcheret Speech Algorithms & Engines Group Sharath Pankanti (ECVG) Josef Vopicka (Speech). Title slide. Challenge = Multi-modal instructional dialogs. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: See, Hear, Do: Language and Robots

IBM Research

© 2002 IBM Corporation

See, Hear, Do:Language and Robots

Jonathan Connell Exploratory Computer Vision Group

Etienne MarcheretSpeech Algorithms & Engines Group

Sharath Pankanti (ECVG)

Josef Vopicka (Speech)

Far Reaching Research (FRR) Project

Page 2: See, Hear, Do: Language and Robots

2

IBM Research

© 2005 IBM Corporation

Challenge = Multi-modal instructional dialogs

Use speech, language, and vision to learn objects & actions

Innate perception abilities (objects / properties)

Innate action capabilities (navigation / grasping)

Easily acquire terms not knowable a priori

Example dialog:

Round up my mug.I don’t know how to “round up” your mug.

Walk around the house and look for it.When you find it bring it back to me.

I don’t know what your “mug” looks like.

It is like this <shows another mug> but sort of orange-ish.OK … I could not find your mug.

Try looking on the table in the living room.OK … Here it is!

Language Learning & Understanding is a AAAI Grand Challengehttp://www.aaai.org/aitopics/pmwiki/pmwiki.php/AITopics/GrandChallenges#language

verb learning

command following

noun learning

advice taking

Page 3: See, Hear, Do: Language and Robots

3

IBM Research

© 2005 IBM Corporation

Eldercare as an application

Example tasks:Pick up dropped phone

Get blanket from another room

Bring me the book I was reading yesterday

Large potential marketMany affluent societies have a demographic imbalance (Japan, EU, US)

Institutional care can be very expensive (to person, insurance, state)

A little help can go a long wayCan be supplied immediately (no waiting list for admission)

Allows person to stay at home longer (generally easier & less expensive)

Boosts independence and feeling of control (psychological advantage)

Note: We are not attempting to address the whole problemX Aggressive production cost containment

X Robust self-recharging and stairs traversal

X Bathing and bathroom care, patient transfer, cooking

X OSHA, ADA, FDA, FCC, UL or CE certification

Page 4: See, Hear, Do: Language and Robots

4

IBM Research

© 2005 IBM Corporation

State of the art Indoor navigation

Minerva from CMU, Jose from Univ. British Columbia

Perception & manipulationHerb from CMU / Intel (Kanade), PR2 from Willow Garage

Language learningRipley from MIT (Deb Roy), HAM from KTH in Sweden

Dialog and speechHonda system from IBM, call center handling from IBM

No object perceptionNo manipulation capability

Off-line object model generationNo natural language interface

Either fetch or carryNo procedural learning

No physical presence or actionNo visual perception of objects

Page 5: See, Hear, Do: Language and Robots

5

IBM Research

© 2005 IBM Corporation

Business Model

IBM

customers

OEM

buy hardware

Third Party

add software and services

$70B / year

Page 6: See, Hear, Do: Language and Robots

6

IBM Research

© 2005 IBM Corporation

Costs & revenue potential OEM sales price for hardware $6000

Electromechanical parts $1300Onboard computer $500Assembly (15hrs x $80 / hr) $1200+ 30% Sales & distribution + 20% profit $3000

Value-added wholesale price (w/ software) $15,00010% Continued R&D $150030% Sales & distribution $450020% Profit $3000

Price = Less than a new car

Total cost of ownership $8000 / yrLifetime = 3 years $5000 / yrService (15hrs / quarter x $50 / hr x 4 quarters) $3000 / yr

Effective wage (40hrs / wk x 50wks / yr = 2000 hrs / yr) $4 / hr

$24B / yrresell robot + value added software + field service

Eldercare market in US (x3 if EU and AP also) 3 millionTotal US population 300 millionAges 75-85 10%Suitable (ability level, desire, finances) 10%

Manufacturing business ($2000 / robot yr) $6B / yr

Services business ($3000 / robot yr) $9B / yr

Page 7: See, Hear, Do: Language and Robots

7

IBM Research

© 2005 IBM Corporation

Sample business case Home eldercare now (employer costs) $25,000 / yr

1 aide from 8am to 6pm = 10 hrs

50wks x 5days / wk x 10hrs / day = 2500 hrs / yr

Federal min. wage = $7.25 / hr

+38% overhead (FICA + 401K + medical) = $10 / hr

Aide’s activities:Help with clothes, hygiene, meals

Odd tasks such as fetching objects

Sitting around watching TV

Alternative: Half-time aide + robot $20,500 / yrHuman still helps with clothes, hygiene, meals

Robot potentially available after hours and on weekends

No problem with robot Training, Turnover, and Trust (stealing)

Value proposition (to client): 30% more hours @ 10% less costSplit savings with customer ($50,000 $45,000 per client)

Human 5 hrs + robot 8 hrs = 13 hrs / day during week

10% less revenue but 22% more profit (= $6.6B / yr extra profit if 100% market share)

Bill at $20,000 - $3000 service = $17,000 / yr revenue 10.6 months payback on $15,000 purchase

Page 8: See, Hear, Do: Language and Robots

8

IBM Research

© 2005 IBM Corporation

What’s different and important

Speech-driven interfaceNo headset required (far field), can learn new nouns and verbs

Multi-modal dialogResponds to gestures, exploits synergies between modalities

Manipulation as well as mobility

Not just a walking telephone, can do useful physical work also

One-shot learningNo turntable scanning, not 100’s of examples, no trial-and-error experiments

Cost containmentVision instead of special-purpose sensors and precise mechanicals