![Page 1: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/1.jpg)
Dynamic Captioning: Video AccessibilityEnhancement for Hearing Impairment
Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng ChuaSchool of Computing, National University of Singapore, 117417, SingaporeyDepartment of ECE, National University of Singapore
![Page 2: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/2.jpg)
Outline• Introduction• Processing• Face Detection, Tracking, Grouping• Script-Face Mapping• Non-Salient Region Detection• Script-Speech Alignment• Volume Analysis
• Experiments• Conclusion
![Page 3: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/3.jpg)
Introduction• For hearing-Impairments, simply place subtitles may loss
following information:• Emotion(volume change)• Multiple people speaking simultaneously(messy subtitle)• Lose tracking of subtitle(speaking pace change)
![Page 4: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/4.jpg)
Introduction• Dynamic Captioning• Sets up an indicator to represent speaking volume• Makes arrow from subtitle to speaking mouth• Highlights the words being spoken
![Page 5: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/5.jpg)
Flowchart
![Page 6: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/6.jpg)
Script & Subtitle Alignment(Hello! My name is... Buffy” – Automatic Naming of Characters in TV Video)[22]
![Page 7: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/7.jpg)
Face Detection, Tracking, Grouping
• Face Detector[17]
• Robust Foreground Correspondence Tracker[18]Size of overlap area in adjacent frames > threshold
![Page 8: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/8.jpg)
Script-Face MappingDetermine who is speaker
•Lip motion analysis[19]Haar Feature based cascade mouth detector (mouth region)Compute Mean Square Distance for pixel values in mouth
region in each two continuous framesSet two thresholds to separate three states : {speaking,
nonspeaking, difficult to judge}
![Page 9: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/9.jpg)
Script-Face Mapping
![Page 10: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/10.jpg)
Script-Face Mapping
• Extract SIFT from 9 facial keypoints(9x128=1152 dim) to be facial feature vector
• If only one person is speaking, we can confirm who is speaking with script and subtitle file, then we can treat it with high confidence and use it to be training data (feature vector)
• If two or more persons speaking, use training data to identify the unknown ones (sparse representation classification[20])
![Page 11: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/11.jpg)
Script-Face Mapping
![Page 12: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/12.jpg)
Non-Salient Region Detection
(b) : for each pixel calculate Gaussian distance between self and adjacency pixelsThe lighter pixel represents more important they are
![Page 13: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/13.jpg)
Non-Salient Region Detection• Partition image into 5x5 grids (empirically)• Assign weight values to the blocks around speakers’ face block
• Assign weight wi = 1 for pixel left/right at talking block
• Assign weight wi = 0.8 for RT/LT/RD/LD blocks• For each block b, a saliency energy s (0 < s <1) is computed by
averaging all the normalized energies of the pixels within b. • Calculate score by • Insert captions in region with maximal score
![Page 14: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/14.jpg)
Script-Speech Alignment
![Page 15: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/15.jpg)
Script-Speech Alignment• Use 39-dim MFCC feature to describe the sound segment• Translate each word of CMU pronouncing dictionary into
phonetic sequence• SPHINX II recognition engine with pronouncing dictionary• Find match part which contain more than 3(emperically)
words to be anchor• Do matching when there is still unmatched segments
![Page 16: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/16.jpg)
Volume AnalysisSymbolize and illustrate the voice volume
Compute the power of the audio signal in a small local window (30ms)
![Page 17: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/17.jpg)
Experiments
![Page 18: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/18.jpg)
Experiments
![Page 19: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/19.jpg)
Experiments
![Page 20: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/20.jpg)
ConclusionContribute:
Helps hearing impaired audiences enjoy more
Future Work:1. Improves script-face mapping accuracy and face to larger dataset study2. Deal with videos without script
![Page 21: Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School](https://reader036.vdocument.in/reader036/viewer/2022081603/56649f2b5503460f94c464ee/html5/thumbnails/21.jpg)
The End