perceptive context for pervasive computing trevor darrell vision interface group mit ai lab
DESCRIPTION
Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab. MIT Project Oxygen. A multi-laboratory effort at MIT to develop pervasive, human-centric computing Enabling people “to do more by doing less,” that is, to accomplish more with less work - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/1.jpg)
Perceptive Context for Pervasive Computing
Trevor DarrellVision Interface GroupMIT AI Lab
![Page 2: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/2.jpg)
MIT Project Oxygen
A multi-laboratory effort at MIT to develop pervasive, human-centric computing
Enabling people “to do more by doing less,” that is, to accomplish more with less work
Bringing abundant computation and communication as pervasive as free air, naturally into people’s lives
![Page 3: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/3.jpg)
Human-centered Interfaces
• Free users from desktop and wired interfaces• Allow natural gesture and speech commands• Give computers awareness of users• Work in open and noisy environments
- Outdoors -- PDA next to construction site!- Indoors -- crowded meeting room
• Vision’s role: provide perceptive context
![Page 4: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/4.jpg)
Perceptive Context
• Who is there? (presence, identity)• What is going on? (activity)• Where are they? (individual location)• Which person said that? (audiovisual grouping)• What are they looking / pointing at? (pose, gaze)
![Page 5: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/5.jpg)
Vision Interface Group Projects
• Person Identification at a distance from multiple cameras and multiple cues (face, gait)
• Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues
• Vision guided microphone array• Joint statistical models for audiovisual fusion• Face pose estimation: rigid motion estimation with long-
term drift reduction
![Page 6: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/6.jpg)
Vision Interface Group Projects
• Person Identification at a distance from multiple cameras and multiple cues (face, gait)
• Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues
• Vision guided microphone array• Joint statistical models for audiovisual fusion• Face pose estimation: rigid motion estimation with long-
term drift reduction
![Page 7: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/7.jpg)
Person Identification at a distance
• Multiple cameras• Face and gait cues• Approach: canonical frame for each modality by placing
the virtual camera at a desired viewpoint• Face: frontal view, fixed scale• Gait: profile silhouette• Need to place virtual camera
- explicit model estimation- search- motion-based heuristic trajectory
• We combine trajectory estimate and limited search
![Page 8: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/8.jpg)
Virtual views
• Frontal •Profile silhouette:Face:
• Input
![Page 9: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/9.jpg)
Examples: VH-generated views
• Faces:
• Gait:
![Page 10: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/10.jpg)
Effects of view-normalization
![Page 11: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/11.jpg)
Vision Interface Group Projects
• Person Identification at a distance from multiple cameras and multiple cues (face, gait)
• Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues
• Vision guided microphone array• Joint statistical models for audiovisual fusion• Face pose estimation: rigid motion estimation with long-
term drift reduction
![Page 12: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/12.jpg)
Range-based stereo person tracking
• Range can be insensitive to fast illumination change• Compare range values to known background• Project into 2D overhead view
Intensity
RangeForeground
Plan view
• Merge data from multiple stereo cameras..• Group into trajectories…• Examine height for sitting/standing…
![Page 13: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/13.jpg)
Visibility Constraints for Virtual Backgrounds
2C1C
p
I D1
I D2
virtual background for C1
![Page 14: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/14.jpg)
Virtual Background Segmentation
Sparse Background New Image Detected Foreground!
Second View Virtual Background for first view Detected Foreground!
![Page 15: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/15.jpg)
Points -> trajectories -> active sensing
Active Camera motionMicrophone arrayActivity classification
trajectories
Spatio-temporalpoints
![Page 16: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/16.jpg)
Vision Interface Group Projects
• Person Identification at a distance from multiple cameras and multiple cues (face, gait)
• Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues
• Vision guided microphone array• Joint statistical models for audiovisual fusion• Face pose estimation: rigid motion estimation with long-
term drift reduction
![Page 17: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/17.jpg)
Audio input in noisy environments
• Acquire high-quality audio from untethered, moving speakers
• “Virtual” headset microphones for all users
![Page 18: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/18.jpg)
Vision guided microphone array
Cameras
Microphones
![Page 19: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/19.jpg)
System flow (single target)
Vision-based tracker
Gradient ascent searchin array output power
Delay-and-sum beamformer
VideoStreams
AudioStreams
visionr
avr
),( avrty
![Page 20: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/20.jpg)
Vision Interface Group Projects
• Person Identification at a distance from multiple cameras and multiple cues (face, gait)
• Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues
• Vision guided microphone array• Joint statistical models for audiovisual fusion• Face pose estimation: rigid motion estimation with long-
term drift reduction
![Page 21: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/21.jpg)
Audio-visual Analysis
• Multi-modal approach to source separation• Exploit joint statistics of image and audio signal• Use non-parametric density estimation• Audio-based image localization• Image-based audio localization• A/V Verification: is this audio and video from the same
person?
![Page 22: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/22.jpg)
Audio-visual synchrony detection
![Page 23: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/23.jpg)
• Audio weighting from video (detected face)
+
AVMI Applications
• Image localization from audio
Audio associated with left face
Audio associated with right face
• New: Synchronization Detection!
image variance AVMI
![Page 24: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/24.jpg)
Audio-visual synchrony detection
MI: 0.68 0.61 0.19 0.20
Compute confusion matrix for 8 subjects:
No errors!
No training!
Also can use for audio/visual temporal alignment….
![Page 25: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/25.jpg)
Vision Interface Group Projects
• Person Identification at a distance from multiple cameras and multiple cues (face, gait)
• Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues
• Vision guided microphone array• Joint statistical models for audiovisual fusion• Face pose estimation: rigid motion estimation with long-
term drift reduction
![Page 26: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/26.jpg)
Face pose estimation
• rigid motion estimation with long-term drift reduction
![Page 27: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/27.jpg)
Brightness and depth motion constraints
I tI t + 1
I
Z
Z tZ t + 1 yt = yt-1
Parameter space
![Page 28: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/28.jpg)
New bounded error tracking algorithm
Influenceregion
open loop 2D tracker closed loop 2D tracker
Track relative to allprevious frames whichare close in pose space
![Page 29: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/29.jpg)
Closed-loop 3D tracker
Track users head gaze for hands-free pointing…
![Page 30: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/30.jpg)
Head-driven cursor
Related Projects:• Schiele• Kjeldsen• Toyama
Current application for second pointer or scrolling / focus of attention…
![Page 31: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/31.jpg)
Head-driven cursor
Method Avg. error. (pixels)
Cylindrical head tracker 25
2D Optical Flow head tracker 22.9
Hybrid 30
3D head tracker (ours) 7.5
Eye gaze 27
Trackball 3.7
Mouse 1.9
![Page 32: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/32.jpg)
Gaze aware interface
• Drowsy driver detection: head nod and eye-blink…
• Interface Agent responds to gaze of user- agent should know when it’s being attended to- turn-taking pragmatics- anaphora / object reference
• First prototype- E21 interface “sam”- current experiments with face tracker on meeting room table
• Integrating with wall cameras and hand gesture interfaces…
![Page 33: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/33.jpg)
“Look-to-talk”
Subject not looking at SAMASR turned off
Subject looking at SAMASR turned on
![Page 34: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/34.jpg)
Vision Interface Group Projects
• Person Identification at a distance from multiple cameras and multiple cues (face, gait)
• Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues
• Vision guided microphone array• Joint statistical models for audiovisual fusion• Face pose estimation: rigid motion estimation with long-
term drift reduction• Conclusion and contact info.
![Page 35: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/35.jpg)
Conclusion: Perceptual Context
Take-home message: vision provides Perceptual Context to make applications aware of users..
• activity -- adapting outdoor activity classification [ Grimson and Stauffer ] to indoor domain…
So far: detection, ID, head pose, audio enhancement and synchrony verification… Soon:• gaze -- add eye tracking on pose stabilized face• pointing -- arm gestures for selection and navigation.
![Page 36: Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab](https://reader036.vdocument.in/reader036/viewer/2022062517/568135b5550346895d9d1d3c/html5/thumbnails/36.jpg)
Contact
Prof. Trevor Darrell
www.ai.mit.edu/projects/vip• Person Identification at a distance from multiple cameras and multiple
cues (face, gait)- Greg Shakhnarovich
• Tracking multiple people in indoor environments with large illumination variation and sparse stereo cues- Neal Checka, Leonid Taycher, David Demirdjian
• Vision guided microphone array- Kevin Wilson
• Joint statistical models for audiovisual fusion- John Fisher
• Face pose estimation: rigid motion estimation with long-term drift reduction- Louis Morency, Alice Oh, Kristen Grauman