applying vision to intelligent human-computer interaction
DESCRIPTION
Applying Vision to Intelligent Human-Computer Interaction. Guangqi Ye Department of Computer Science The Johns Hopkins University Baltimore, MD 21218 October 21, 2005. Vision for Natural HCI. Advantages Affordable, non-intrusive, rich info. Crucial in multimodal interface - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/1.jpg)
1
Applying Vision to Intelligent Human-Computer Interaction
Guangqi Ye
Department of Computer ScienceThe Johns Hopkins University
Baltimore, MD 21218October 21, 2005
![Page 2: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/2.jpg)
2
Vision for Natural HCI
• Advantages Affordable, non-intrusive, rich info.
• Crucial in multimodal interface Speech/gesture system
• Vision-Based HCI: 3D interface, natural
HandVu and ARToolkit by M. Turk, et. al
![Page 3: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/3.jpg)
3
Motivation
• Haptics + Vision Remove constant contact limit.
• Gestures for vision-based HCI Intuitive with representation power Applications: 3D VE, tele-op., surgical
• Addressed problems Visual data collection Analysis, model and recognition
![Page 4: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/4.jpg)
4
Outline
• Vision/Haptics system• Modular framework for VBHCI• 4DT platform• Novel scheme for hand motion capture• Modeling composite gestures• Human factors experiment
![Page 5: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/5.jpg)
5
Vision + Haptics
• 3D Registration via visual tracking Remove limitation of constant contact
• Different passive objects to generate various sensation
![Page 6: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/6.jpg)
6
Vision: Hand Segmentation
• Model background: color histograms• Foreground detection: histogram
matching• Skin modeling: Gaussian model on Hue
![Page 7: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/7.jpg)
7
Vision: Fingertip Tracking
• Fingertip detection: model-based
• Tracking: prediction (Kalman) + local detection
![Page 8: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/8.jpg)
8
Haptics Module
• 3-D registration:• Interaction simulation
• Examples: planes, buttons
![Page 9: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/9.jpg)
9
Experimetal Results
• System: Pentimum III PC, 12fps
![Page 10: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/10.jpg)
10
Vision + Haptics: Video
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
![Page 11: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/11.jpg)
11
Outline
• Vision/Haptics system• Modular framework for VBHCI• 4DT platform• Novel scheme for hand motion capture• Modeling composite gestures• Human factors experiment• Conclusions
![Page 12: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/12.jpg)
12
Visual Modeling of Gestures: General Framework
• Gesture generation
• Gesture recognition
![Page 13: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/13.jpg)
13
Related Research in Modeling Gestures for HCI
![Page 14: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/14.jpg)
14
Targeted Problems
• Analysis: mostly tracking-based Our approach: using localized parser
• Model: single modality (static/dynamic)
Our model: coherent multimodal framework
• Recognition: Limited vocabulary/users Our contributions: large-scale experiment
![Page 15: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/15.jpg)
15
Visual Interaction Cues(VICs) Paradigm
• Site-centered interaction Example: cell phone buttons
![Page 16: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/16.jpg)
16
VICs State Mode
• Extend interaction functionality 3D gestures
![Page 17: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/17.jpg)
17
VICs Principle: Sited Interaction
• Component mapping
![Page 18: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/18.jpg)
18
Localized Parsers
• Low-level Parsers Motion, shape
• Learning-Based Modeling Neural Networks, HMMs
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
![Page 19: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/19.jpg)
19
System Architecture
![Page 20: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/20.jpg)
20
Outline
• Vision/Haptics system• Modular framework for VBHCI• 4DT platform• Novel scheme for hand motion capture• Modeling composite gestures• Human factors experiment• Conclusions
![Page 21: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/21.jpg)
21
4D-Touchpad System
• Geometric calib.
Homography-based
• Chromatic calib.
Affine model for
appearance transform
![Page 22: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/22.jpg)
22
System Calibration Example
![Page 23: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/23.jpg)
23
Hand Detection
• Foreground segmentation Image difference
• Modeling skin color Thresholding in YUV space Training: 16 users, 98% accuracy
• Hand region detection Merge skin pixels in segmented foreground
![Page 24: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/24.jpg)
24
Hand Detection Example
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
![Page 25: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/25.jpg)
25
Integrated into Existing Interface
• Shape parser + state-based gesture modeling
QuickTime™ and a decompressor
are needed to see this picture.
![Page 26: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/26.jpg)
26
Outline
• Vision/Haptics system• Modular framework for VBHCI• 4DT platform• Novel scheme for hand motion capture• Modeling composite gestures• Human factors experiment• Conclusions
![Page 27: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/27.jpg)
27
Efficient Motion Capture of 3D Gesture
• Capturing shape and motion in local space
• Appearance feature volume Region-based stereo matching
• Motion: differencing appearance
![Page 28: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/28.jpg)
28
Appearance Feature Example
![Page 29: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/29.jpg)
29
Posture Modeling Using 3D Feature
• Model 1: 3-layer neural networks Input: raw feature
NN: 20 hidden nodes Posture Training Testing
Pick 99.97% 99.18%
Push 100.00% 99.93%
Press-Left 100.00% 99.89%
Press-Right 100.00% 99.96%
Stop 100.00% 100.00%
Grab 100.00% 99.82%
Drop 100.00% 99.82%
Silence 99.98% 98.56%
![Page 30: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/30.jpg)
30
Posture Modeling Using 3D Feature
• Model 2: histogram-based ML Input: vector quantization, 96 clusters
Posture Training Testing
Pick 96.95% 97.50%
Push 96.98% 100.00%
Press-Left 100.00% 94.83%
Press-Right 99.07% 98.15%
Stop 99.80% 100.00%
Grab 98.28% 95.00%
Drop 100.00% 98.85%
Silence 98.90% 98.68%
![Page 31: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/31.jpg)
31
Dynamic Gesture Modeling
• Hidden Markov Models Input: VQ, 96 symbols Extension: modeling stop state p(sT)
Gesture Standard Training
Standard Testing
Extended Training
Extended Testing
Twist 96.30 81.48 100.00 85.19
Twist-Anti 93.62 93.10 100.00 93.10
Flip 100.00 96.43 100.00 96.43
Negative 79.58 98.79
Overall 96.64 81.05 100.00 97.89
![Page 32: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/32.jpg)
32
Outline
• Vision/Haptics system• Modular framework for VBHCI• 4DT platform• Novel scheme for hand motion capture• Modeling composite gestures• Human factors experiment• Conclusions
![Page 33: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/33.jpg)
33
Model Multimodal Gestures
• Low-level gesture as Gesture Words 3 classes: static, dynamic, parameterized
• High-level gesture Sequence of GWords
• Bigram model to capture constraints
![Page 34: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/34.jpg)
34
Example Model
![Page 35: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/35.jpg)
35
Learning and Inference
• Learning the bigram: maximum likelihood
• Inference: greedy-choice for online Choose path with maximum p(vt|vt-1)p(st|vt)
![Page 36: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/36.jpg)
36
Outline
• Vision/Haptics system• Modular framework for VBHCI• 4DT platform• Novel scheme for hand motion capture• Modeling composite gestures• Human factors experiment• Conclusions
![Page 37: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/37.jpg)
37
Human Factors Experiment
• Gesture vocabulary: 14 gesture words Multi-Modal: posture, parameterized and
dynamic gestures 9 possible gesture sentences
• Data collecting 16 volunteers, including 7 female 5 training and 3 testing sequences
• Gesture cuing: video + text
![Page 38: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/38.jpg)
38
Example Video Cuing
QuickTime™ and aCinepak decompressor
are needed to see this picture.
![Page 39: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/39.jpg)
39
Modeling Parameterized Gesture
• Three Gestures: moving, rotate, resize
• Region tracking on segmented image
Pyramid SSD tracker: X’=R()X + T
Template: 150 x 150
• Evaluation Average residual error: 5.5/6.0/6.7 pixels
![Page 40: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/40.jpg)
40
Composite Gesture Modeling Result
Gesture Sequences Ratio
Pushing 35 97.14%
Twisting 34 100.00%
Twisting-Anti 28 96.42%
Dropping 29 96.55%
Flipping 32 96.89%
Moving 35 94.29%
Rotating 27 92.59%
Stopping 33 100.00%
Resizing 30 96.67%
Total 283 96.47%
![Page 41: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/41.jpg)
41
User Feedback on Gesture-based Interface
• Gesture vocabulary Easy to learn: 100% agree
• Fatigue compared to GUI with mouse 50%: comparable, 38%: more tired, 12% less
• Overall convenience compared to GUI with mouse
44%: more comfortable 44%: comparable 12%: more awkward
![Page 42: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/42.jpg)
42
Contributions
• Vision+Haptics: novel multimodal interface
• VICs/4DT: a new framework for VBHCI and data collection
• Efficient motion capture for gesture analysis
• Heterogeneous gestures modeling• Large-scale gesture experiments
![Page 43: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/43.jpg)
43
Acknowledgement
• Dr. G. Hager• Dr. D. Burschka, J. Corso, A. Okamura• Dr. J. Eisner, R. Etienne-cummings, I.
Shafran• CIRL Lab X. Dai, L. Lu, S. Lee, M. Dewan, N. Howie, H. Lin, S. Seshanami
• Haptics Explorative Lab J. Abott, P. Marayong
![Page 44: Applying Vision to Intelligent Human-Computer Interaction](https://reader036.vdocument.in/reader036/viewer/2022062500/5681556b550346895dc335c6/html5/thumbnails/44.jpg)
44
Thanks