translation of sign language to speech
DESCRIPTION
Translation of sign language to speech. Michael Abd-El-Malak Student ID: 12045132 Supervisor: Iain Murray. Presentation outline. The requirements of translating level 2 ASL Problem outline and project objectives The proposed system Achievements and results Conclusion. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/1.jpg)
Translation of sign language to speech
Michael Abd-El-Malak
Student ID: 12045132
Supervisor: Iain Murray
![Page 2: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/2.jpg)
Presentation outline The requirements of translating level 2 ASL
Problem outline and project objectives
The proposed system
Achievements and results
Conclusion
![Page 3: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/3.jpg)
Level 2 ASL: Hand position and roll
In many cases hand movement and roll are necessary for correct word disambiguation.
Example: Roll required to differentiate between ‘weigh’ and ‘balance’
![Page 4: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/4.jpg)
Level 2 ASL: Facial feature extraction
ASL to English translation is a very difficult task.
Facial expressions and head movement convey lots of added information. Moving eyebrows up at the end of a phrase indicates a
question. Shaking the head implies the opposite.
By using facial feature tracking shorter phrases can be used.
![Page 5: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/5.jpg)
The problem
To track the hand with 4 degrees of freedom x, y & z coordinates Hand roll
To extract the positions of the facial features to facilitate feature tracking Pupils Eyebrows
![Page 6: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/6.jpg)
Project Objectives Develop a system that can simultaneously track
the hand(4 DOF) and facial features which is:
As non obtrusive as possible
Portable
Real time
Robust
Cost efficient
![Page 7: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/7.jpg)
Proposed System Based on the input from a 720 x 540 CCD
video camera. Uses a glove with infrared LEDs to track the
hand. Uses Neural networks to find the facial features.
Tracking System IR LED controllerHead locator
Position list
![Page 8: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/8.jpg)
Tracking the hand: Overview
The hand position is tracked through the use of infrared LEDs which are used as optical trackers.
Six IR LEDs are required for each hand.
The IR LEDs are sequentially turned on and off in synchronisation with the camera.
Single detection cycle requires 8 frames.
![Page 9: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/9.jpg)
Tracking the hand: Example
LED On
![Page 10: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/10.jpg)
Tracking the hand: Problems faced
Infrared LEDs have small viewing angles. LED intensity can be very weak. Requires good noise suppression.
People don’t remain still, even if they’re trying to. Results in a ghostly outline of the person appearing in the difference
image. Can be removed using differences from other frames. Low pass filtering
Reflective items such as glasses can look very similar to LEDs turned on. Use a proximity filter to remove outlier points.
Not enough information to obtain z location. Requires stereo vision.
![Page 11: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/11.jpg)
Facial feature extraction: Overview
Detection is done via neural networks.Non-linear two layer perceptron structure used.Most commonly used structure for pattern
recognition.
256 inputs are used which are obtained using a log-polar mapping around the sampling position.
![Page 12: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/12.jpg)
Facial feature extraction: Problems faced
Number of inputs 256 128 is too undependable
Size of the sampling structure ≈ face size → Predicts the feature position ≈ eye → Simply looks for dark spots ≈ eye separation → Best results
![Page 13: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/13.jpg)
Facial feature extraction: sampling example
![Page 14: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/14.jpg)
Results: Performance summary Hand tracking system
Position and pitch determined in 3 secondsX,Y coordinate is accurate to within 20 pixelsPitch is accurate to ± 10 degrees
Facial feature extraction systemEyes, eyebrows and left eye edge located in
20 secondsGenerally all features are located with an
accuracy of ± 20 pixels (720 x 540)
![Page 15: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/15.jpg)
Results: Hand tracking system
![Page 16: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/16.jpg)
Results: Hand tracking system
![Page 17: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/17.jpg)
Results: Facial feature localisation
PupilLeft corner of eye
Eyebrow
Ideal conditions
![Page 18: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/18.jpg)
Results: Hand tracking system
Out of focus image Reflection on the pupil
Pupils and right brow slightly off
![Page 19: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/19.jpg)
Results: Facial feature localisation
Detection with an angled face
![Page 20: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/20.jpg)
Results: Facial feature localisation
Matches each side correctly
![Page 21: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/21.jpg)
Summary and future work
A system was developed that is capable of tracking the hand position and roll, extract the position of the eyes & eyebrows.
Further work is required:Add more features such as lipsTo implement stereovisionTo further optimize the algorithmsTo implement the whole system in hardware
![Page 22: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/22.jpg)
Thank you for your attentionAny questions?
“The best way to predict the future is to invent it”Alan Kay
![Page 23: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/23.jpg)
![Page 24: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/24.jpg)
Artificial neural networks in 60 seconds
Simple feed-forward network, no recurrence Choice of transfer function critical Above is an example of a perceptron structure Statistical template is embedded in weights
Inputs
Output∑ TF
x1
x2
x3
xn
Bias
W1
Wn
W2
W3
Hidden layer
![Page 25: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/25.jpg)
Results: Facial feature localisation
![Page 26: Translation of sign language to speech](https://reader033.vdocument.in/reader033/viewer/2022061421/5681334e550346895d9a57da/html5/thumbnails/26.jpg)
Results: Facial feature localisation