27 3 d scene accesibility for the blind via
DESCRIPTION
TRANSCRIPT
![Page 1: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/1.jpg)
3D Scene Accessibility For The Blind
Via Auditory-Multitouch Interfaces
Juan D. Gomez, Sinan Mohammed, Guido Bologna and Thierry Pun
UNIVERSITY OF GENEVA,
COMPUTER VISION & MULTIMEDIA LAB
CVML
University
of Geneva Computer vision &
Multimedia Lab
28-30 November 2011 in Brussels, Belgium
![Page 2: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/2.jpg)
“Object Detection” The annual PASCAL Visual Objects Challenge
![Page 3: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/3.jpg)
“Object Detection” The annual PASCAL Visual Objects Challenge
![Page 4: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/4.jpg)
V. Hedau, D. Hoiem, D.Forsyth,
“Recovering the Spatial Layout of Cluttered Rooms” IEEE International Conference on Computer Vision (ICCV), 2009.
![Page 5: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/5.jpg)
S.Y. Bao, M. Sun, S.Savarese.
“Coherent Object Detection And
Scene Layout Understanding” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
![Page 6: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/6.jpg)
Cylinder Square
Triangle Circle
40 cm
Gomez, J., Mohammed, S., Bologna, G. and Pun, T.
“Toward 3D Scene Understanding via Audio-description:
Kinect-iPad fusion for the visually impaired” International Conference on Computers and Accessibility (ASSETS), 2011.
Preliminary Target Scene
![Page 7: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/7.jpg)
Gomez, J., Bologna, G. and Pun, T.
“A virtual ceiling mounted depth-camera
using orthographic kinect ” IEEE International Conference on Computer Vision (ICCV), 2011.
One-Shot Semiautomatic Kinect Calibration
Before Calibration After Calibration
![Page 8: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/8.jpg)
Elements Extraction Via Depth-Based Segmentation
Gomez, J., Mohammed, S., Bologna, G. and Pun, T.
“Toward 3D Scene Understanding via Audio-description:
Kinect-iPad fusion for the visually impaired” International Conference on Computers and Accessibility (ASSETS), 2011.
Objectless Image
Layering across the Depth Layers in which an object was detected after scanning
![Page 9: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/9.jpg)
Neural-Based Object Recognition
4 features per Object:
Features’ values range from 0 to 1. [0,1].
Weights equal to 1, features are of same importance.
All features are scale-invariant.
All features are rotation-invariant.
| 1 – (majorAxisLength – minorAxisLength) / majorAxisLength |
perimeter / (majorAxisLength* pi)
| ((pi * Radius2 )-area) / area |
| 1 - | pi*majorAxisLength – perimeter | / perimeter |
Gomez, J., Mohammed, S., Bologna, G. and Pun, T.
“Toward 3D Scene Understanding via Audio-description:
Kinect-iPad fusion for the visually impaired” International Conference on Computers and Accessibility (ASSETS), 2011.
![Page 10: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/10.jpg)
Gomez, J., Mohammed, S., Bologna, G. and Pun, T.
“Toward 3D Scene Understanding via Audio-description:
Kinect-iPad fusion for the visually impaired” International Conference on Computers and Accessibility (ASSETS), 2011.
Early Scenary Description
So far:
Frontal-view gives just relative layout understanding.
A top-view of the scene is quite desirable to grasp scene distribution.
Wheras frontal distances (depths) are known, lateral distances are still missed.
How to deliver all this information to the blind user?
![Page 11: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/11.jpg)
Gomez, J., Mohammed, S., Bologna, G. and Pun, T.
“Toward 3D Scene Understanding via Audio-description:
Kinect-iPad fusion for the visually impaired” International Conference on Computers and Accessibility (ASSETS), 2011.
Delivering Visual Information via Finger-Triggered Audio
Natural Top-view of the scene Artificial Top-view of the scene Traget sensation to be achieved onto iPad
iPad holding Artificial Top-view Target sensation of Spatial Audio
![Page 12: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/12.jpg)
Gomez, J., Bologna, G. and Pun, T.
“A virtual ceiling mounted depth-camera
using orthographic kinect ” IEEE International Conference on Computer Vision (ICCV), 2011.
Deceptive Object Location Caused by Perspective
Causes Mistaken Spatial Sonification
And Top-View is Unreacheble despite Depth
Vanishing Point and Scene Optical Geometry Example
![Page 13: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/13.jpg)
Gomez, J., Bologna, G. and Pun, T.
“A virtual ceiling mounted depth-camera
using orthographic kinect ” IEEE International Conference on Computer Vision (ICCV), 2011.
Orthographic Vs Perspective Cameras
A perspective camera (bottom-right): Objects further away appear smaller in size, besides the positions vary with the distance.
An orthographic camera (top-left): Objects preserve natural proportions on size and position.
![Page 14: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/14.jpg)
Gomez, J., Bologna, G. and Pun, T.
“A virtual ceiling mounted depth-camera
using orthographic kinect ” IEEE International Conference on Computer Vision (ICCV), 2011.
Top-View Based on Virtual Orthographic Cam
![Page 15: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/15.jpg)
Gomez, J., Bologna, G. and Pun, T.
“A virtual ceiling mounted depth-camera
using orthographic kinect ” IEEE International Conference on Computer Vision (ICCV), 2011.
Top-View Based on Virtual Orthographic Cam
![Page 16: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/16.jpg)
Gomez, J., Bologna, G. and Pun, T.
“A virtual ceiling mounted depth-camera
using orthographic kinect ” IEEE International Conference on Computer Vision (ICCV), 2011.
Top-View Based on Virtual Orthographic Cam
Natural depth map from avobe using virtual orthographic Kinect Artificial Top-view using virtual orthographic Kinect and
Object recognition methods.
![Page 17: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/17.jpg)
Gomez, J., Mohammed, S., Bologna, G. and Pun, T.
“Scene accessibility for the blind
via computer-vision and multi-touch interfaces”
Conference on Open Accessibility Everywhere (AEGIS), 2011.
Experiments With Blinfoleded Users
Original Layout User Guess Centroids Shifting
![Page 18: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/18.jpg)
Gomez, J., Mohammed, S., Bologna, G. and Pun, T.
“Scene accessibility for the blind
via computer-vision and multi-touch interfaces”
Conference on Open Accessibility Everywhere (AEGIS), 2011.
Results
X axis represents 30 different scenes with four elements each. Y axis represents the average of the distances (cm)
between the original and the final location of the four objects.
This average distance has been normalized dividing its value by the diagonal (244 cm).
The colors of the bars (scenes) vary according to their exploration time that goes from 0 to 10 minutes (colormap).
Each bar shows on top the standard deviation of the four elements’ relocation.
![Page 19: 27 3 d scene accesibility for the blind via](https://reader033.vdocument.in/reader033/viewer/2022051819/54c56cd74a79599e358b4589/html5/thumbnails/19.jpg)
Gomez, J., Mohammed, S., Bologna, G. and Pun, T.
“Scene accessibility for the blind
via computer-vision and multi-touch interfaces”
Conference on Open Accessibility Everywhere (AEGIS), 2011.
Conclusions
The mean error distance on objects’ replacement for all the experiments was 3.3%
with respect to the diagonal of the table. This is around 8.5 cm of separation between
an original object position and its relocation.
In both cases i.e. scenes with three and four objects, this distance remained more or less invariant.
The exploration time varied according the number of elements on the table.
In average for a scene composed of three elements, 3.4 minutes were enough to build
its layout in mind, whereas for scenes with four elements this time reached 5.4 minutes.
This difference was given due to the increase in the number of sound-colors associations
to be learned; the results showed no misclassifications of objects though.
The results presented in this work reveal that the participants
were capable of grasping general spatial structure of the sonified
environments and accurately estimate scene layouts.