classifying motion picture audio
DESCRIPTION
Classifying Motion Picture Audio. Eirik Gustavsen 07.06.07. Outline. Motivation Thesis State of the Art Proposed system Experimental setup Results Future work Conclusion. Motivation. Most projects classify clear classes or classes with noise. - PowerPoint PPT PresentationTRANSCRIPT
Classifying Motion Picture Audio
Eirik Gustavsen07.06.07
Outline
• Motivation • Thesis• State of the Art• Proposed system• Experimental setup• Results• Future work• Conclusion
Motivation
• Most projects classify clear classes or classes with noise.
• Few clear boundaries in motion picture audio• Subjective descriptions of movies• Dificult to compare movie content
Thesis
It is possible to automatically create a table of contents of a motion picture, based on its audio track only.
Research questions
• Find best LLDs to classify motion picture audio
• Detect boundaries between audio classes within complex audio segments
• Automatically create a TOC based on the audio track only
Pre-Processing44100 Hz sample rateMono16 bits
30 ms windows (LW)
Low Level Descriptors
Time domain Frequency domain
Low Level Descriptors
• Total of 23 low level descriptors
TIME DOMAIN
• Audio Power• Audio Wave Form• Root-Mean Square• Short Time Energy• Low Short Time Energy Ratio• Zero-Crossing Rate• High Zero-Crossing Rate Ratio
FREQUENCY DOMAIN
• Audio Spectrum Centroid• Fundamental Frequency• 10 Mel-Frequency Cepstral Coefficients• Spectrum Flux
Dimensionally reduction
Principal components analysis (PCA) is a technique used to reduce multidimensional data sets to lower dimensions for analysis.
f(1)f(2)f(3)f(4)f(5)...f(23)
PCAd(1)d(2)d(3)
K Nearest Neighbors
Proposed system
Pre- Prosessing LLD Norm
PCAKNNPost- Prosessing
TOC Generation
Classifying Audio
Speech
Noise (white)
Music
”Silence”
Mixed audio classes
Class Boundary Detection
Class Boundary Detection
Class Boundary Detection
Finding most suitable LLDs
Most Suitable:
ASCAWFRMSHZCRR
Sample Results
Music with low volume
Clear speech
Speech with background environmental sounds
Fading between music and speech
Speech with Background music
Jingle
” Some mistakes”
Future Work
• To be done in this thesis– Post processing– TOC
• Open research questions for future works– New motion picture audio classes– Detecting sound objects– Speech recognition
Conclusion
• Pre-processing makes it possible to classify motion picture audio correctly
• Using right combination of LLDs enhances the result of the classification
Questions
?