perceptive context for pervasive computing trevor darrell vision interface group mit ai lab

Perceptive Context for Pervasive Computing

Trevor DarrellVision Interface GroupMIT AI Lab

MIT Project Oxygen

A multi-laboratory effort at MIT to develop pervasive, human-centric computing

Enabling people “to do more by doing less,” that is, to accomplish more with less work

Bringing abundant computation and communication as pervasive as free air, naturally into people’s lives

Human-centered Interfaces

• Free users from desktop and wired interfaces• Allow natural gesture and speech commands• Give computers awareness of users• Work in open and noisy environments

- Outdoors -- PDA next to construction site!- Indoors -- crowded meeting room

• Vision’s role: provide perceptive context

Perceptive Context

• Who is there? (presence, identity)• What is going on? (activity)• Where are they? (individual location)• Which person said that? (audiovisual grouping)• What are they looking / pointing at? (pose, gaze)

Vision Interface Group Projects

• Person Identification at a distance from multiple cameras and multiple cues (face, gait)

• Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues

• Vision guided microphone array• Joint statistical models for audiovisual fusion• Face pose estimation: rigid motion estimation with long-

term drift reduction

Person Identification at a distance

• Multiple cameras• Face and gait cues• Approach: canonical frame for each modality by placing

the virtual camera at a desired viewpoint• Face: frontal view, fixed scale• Gait: profile silhouette• Need to place virtual camera

- explicit model estimation- search- motion-based heuristic trajectory

• We combine trajectory estimate and limited search

Virtual views

• Frontal •Profile silhouette:Face:

• Input

Examples: VH-generated views

• Faces:

• Gait:

Effects of view-normalization

Range-based stereo person tracking

• Range can be insensitive to fast illumination change• Compare range values to known background• Project into 2D overhead view

Intensity

RangeForeground

Plan view

• Merge data from multiple stereo cameras..• Group into trajectories…• Examine height for sitting/standing…

Visibility Constraints for Virtual Backgrounds

2C1C

p

I D1

I D2

virtual background for C1

Virtual Background Segmentation

Sparse Background New Image Detected Foreground!

Second View Virtual Background for first view Detected Foreground!

Points -> trajectories -> active sensing

Active Camera motionMicrophone arrayActivity classification

trajectories

Spatio-temporalpoints

Audio input in noisy environments

• Acquire high-quality audio from untethered, moving speakers

• “Virtual” headset microphones for all users

Vision guided microphone array

Cameras

Microphones

System flow (single target)

Vision-based tracker

Gradient ascent searchin array output power

Delay-and-sum beamformer

VideoStreams

AudioStreams

visionr

avr

),( avrty

Audio-visual Analysis

• Multi-modal approach to source separation• Exploit joint statistics of image and audio signal• Use non-parametric density estimation• Audio-based image localization• Image-based audio localization• A/V Verification: is this audio and video from the same

person?

Audio-visual synchrony detection

• Audio weighting from video (detected face)

+

AVMI Applications

• Image localization from audio

Audio associated with left face

Audio associated with right face

• New: Synchronization Detection!

image variance AVMI

Audio-visual synchrony detection

MI: 0.68 0.61 0.19 0.20

Compute confusion matrix for 8 subjects:

No errors!

No training!

Also can use for audio/visual temporal alignment….

Face pose estimation

• rigid motion estimation with long-term drift reduction

Brightness and depth motion constraints

I tI t + 1

I

Z

Z tZ t + 1 yt = yt-1

Parameter space

New bounded error tracking algorithm

Influenceregion

open loop 2D tracker closed loop 2D tracker

Track relative to allprevious frames whichare close in pose space

Closed-loop 3D tracker

Track users head gaze for hands-free pointing…

Head-driven cursor

Related Projects:• Schiele• Kjeldsen• Toyama

Current application for second pointer or scrolling / focus of attention…

Head-driven cursor

Method Avg. error. (pixels)

Cylindrical head tracker 25

2D Optical Flow head tracker 22.9

Hybrid 30

3D head tracker (ours) 7.5

Eye gaze 27

Trackball 3.7

Mouse 1.9

Gaze aware interface

• Drowsy driver detection: head nod and eye-blink…

• Interface Agent responds to gaze of user- agent should know when it’s being attended to- turn-taking pragmatics- anaphora / object reference

• First prototype- E21 interface “sam”- current experiments with face tracker on meeting room table

• Integrating with wall cameras and hand gesture interfaces…

“Look-to-talk”

Subject not looking at SAMASR turned off

Subject looking at SAMASR turned on





term drift reduction• Conclusion and contact info.

Conclusion: Perceptual Context

Take-home message: vision provides Perceptual Context to make applications aware of users..

• activity -- adapting outdoor activity classification [ Grimson and Stauffer ] to indoor domain…

So far: detection, ID, head pose, audio enhancement and synchrony verification… Soon:• gaze -- add eye tracking on pose stabilized face• pointing -- arm gestures for selection and navigation.

Contact

Prof. Trevor Darrell

www.ai.mit.edu/projects/vip• Person Identification at a distance from multiple cameras and multiple

cues (face, gait)- Greg Shakhnarovich

• Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues- Neal Checka, Leonid Taycher, David Demirdjian

• Vision guided microphone array- Kevin Wilson

• Joint statistical models for audiovisual fusion- John Fisher

• Face pose estimation: rigid motion estimation with long-term drift reduction- Louis Morency, Alice Oh, Kristen Grauman

perceptive context for pervasive computing trevor darrell vision interface group mit ai lab

Documents

multiple stereo cameras

multiple cues face

gaittracking multiple

rigid motion estimation

view virtual background

audiovisual fusionface

large illumination variation

indoor environments