lecture15-audio retrieval
TRANSCRIPT
![Page 1: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/1.jpg)
http://www.xkcd.com/655/
![Page 2: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/2.jpg)
Audio Retrieval!
David Kauchak
cs458 Fall 2012
Thanks to Doug Turnbull for some of the slides
![Page 3: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/3.jpg)
Administrative
n Assignment 4 part 1 due Tuesday
n Final project instructions out soon… n I’ll send an e-mail
n Homework 4 n No class next Tuesday!
![Page 4: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/4.jpg)
Audio Index construction
Audio files to be indexed
indexer
Index
audio preprocessing
wav midi
mp3
slow, jazzy, punk
may be keyed off of text may be keyed off of audio features
![Page 5: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/5.jpg)
Audio retrieval
Query Index
Systems differ by what the query input is and how they figure out the result
![Page 6: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/6.jpg)
Song identification
Index
Given an audio signal, tell me what the “song” is
Examples: n Query by Humming
n 1-866-411-SONG
n Shazam
n Bird song identification
n …
![Page 7: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/7.jpg)
Song identification
How might you do this?
Query by humming
“song” name
![Page 8: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/8.jpg)
Song identification
“song” name
![Page 9: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/9.jpg)
Song similarity
Index
Find the songs that are most similar to the input song
Examples: n Genius
n Pandora
n Last.fm
song
![Page 10: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/10.jpg)
Song similarity
How might you do this?
IR approach
f1 f2 f3 … fn
f1 f2 f3 … fn
f1 f2 f3 … fn
f1 f2 f3 … fn
f1 f2 f3 … fn
f1 f2 f3 … fn
rank by cosine sim
![Page 11: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/11.jpg)
Song similarity: collaborative filtering
song1
song2
song3
songm
…
What might you conclude from this information?
![Page 12: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/12.jpg)
Songs using descriptive text search
Index
Examples: n Very few commercial systems like this …
jazzy, smooth, easy listening
![Page 13: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/13.jpg)
![Page 14: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/14.jpg)
![Page 15: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/15.jpg)
![Page 16: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/16.jpg)
![Page 17: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/17.jpg)
![Page 18: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/18.jpg)
Music annotation
The key behind keyword based systems is annotating the music with tags
dance, instrumental, rock
blues, saxaphone, cool vibe
pop, ray charles, deep
Ideas?
![Page 19: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/19.jpg)
Annotating music
The human approach
“expert musicologists” from Pandora
Pros/Cons?
![Page 20: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/20.jpg)
Annotating music
Another human approach: games
![Page 21: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/21.jpg)
Annotating music
the web: music reviews
challenge?
![Page 22: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/22.jpg)
Automatically annotating music
Learning a music tagger
song signal review
tagger
tagger
blues, saxaphone, cool vibe
![Page 23: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/23.jpg)
Automatically annotating music
Learning a music tagger
song signal review
tagger What are the tasks we need to accomplish?
![Page 24: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/24.jpg)
System Overview
Learning
T T
Annotation
Training Data
Data
Audio-Feature Extraction (X)
Vocabulary
Document Vectors (y)
Features Model
![Page 25: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/25.jpg)
Automatically annotating music
Frank Sinatra - Fly me to the moon This is a jazzy, singer / songwriter song that is
calming and sad. It features acoustic guitar, piano, saxophone, a nice male vocal solo, and emotional, high-pitched vocals. It is a song with a light beat and a slow tempo.
Dr. Dre (feat. Snoop Dogg) - Nuthin' but a 'G' thang This is a dance poppy, hip-hop song that is
arousing and exciting. It features drum machine, backing vocals, male vocal, a nice acoustic guitar solo, and rapping, strong vocals. It is a song that is very danceable and with a heavy beat.
First step, extract “tags” from the reviews
![Page 26: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/26.jpg)
Automatically annotating music
Frank Sinatra - Fly me to the moon This is a jazzy, singer / songwriter song that is
calming and sad. It features acoustic guitar, piano, saxophone, a nice male vocal solo, and emotional, high-pitched vocals. It is a song with a light beat and a slow tempo.
Dr. Dre (feat. Snoop Dogg) - Nuthin' but a 'G' thang This is a dance poppy, hip-hop song that is
arousing and exciting. It features drum machine, backing vocals, male vocal, a nice acoustic guitar solo, and rapping, strong vocals. It is a song that is very danceable and with a heavy beat.
First step, extract “tags” from the reviews
![Page 27: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/27.jpg)
Learn a probabilistic model that captures a relationship between audio content and tags.
‘Jazz’ ‘Male Vocals’
‘Sad’ ‘Slow Tempo’
Autotagging
Frank Sinatra ‘Fly Me to the Moon’
Content-Based Autotagging
p(tag | song)
![Page 28: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/28.jpg)
Modeling a Song
+ + + + + + + + +
+ + + + + +
+ +
+ + +
+ + +
+ + +
+
+
+
+
+ + +
+ + + +
+
+
+
+
+
+ +
+ + + + + +
+ +
+ +
+
+ +
+
+ + + + +
+
+
+
+ + + + +
+ + +
+
+
+
+
+ + + + +
+ + +
+
+
+
+
+
+ + +
+ + +
+
+
+
+
+ + +
+
+ +
+
Bag of MFCC vectors
+
+ +
+ +
+
+ +
+
+ +
+ + +
+ +
+ + +
+ + +
+
+
+
+
+ + + +
+ + + +
+
+
+
+ +
+ + +
+ +
+
+
+
+ + +
+ + +
+
+
+
+
+
+ + + + +
+
+
+
+ + + + +
+ +
+
+
+
+
+ + +
+ + + +
+
+
+
+ +
+ + +
+ + +
+
+
+
+
+ + + + + +
+
+
+
cluster feature vectors
![Page 29: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/29.jpg)
1. Take all songs associated with tag t
2. Estimate ‘features clusters’ for each song
3. Combine these clusters into a single representative model for that tag
romantic
Tag Model Mixture of
song clusters
p(x|t)
clusters
romantic
Modeling a Tag
![Page 30: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/30.jpg)
1. Calculate the likelihood of the features for a tag model
Romantic?
Determining Tags
Inference
with Romantic Tag
Model
S1
S2
romantic?
![Page 31: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/31.jpg)
31
Annotation Semantic Multinomial for “Give it Away” by the Red Hot Chili Peppers http://www.youtube.com/watch?v=Mr_uHJPUlO8
![Page 32: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/32.jpg)
The CAL500 data set
The Computer Audition Lab 500-song (CAL500) data set n 500 ‘Western Popular’ songs n 174-word vocabulary
n genre, emotion, usage, instrumentation, rhythm, pitch, vocal characteristics
n 3 or more annotations per song
n 55 paid undergrads annotated music for 120 hours
Other Techniques 1. Text-mining of web documents
2. ‘Human Computation’ Games
![Page 33: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/33.jpg)
Retrieval The top 3 results for - “pop, female vocals, tender”
0.33
0.02
0.02
0.02
1. Shakira - The One
2. Alicia Keys - Fallin’
3. Evanescence - My Immortal
![Page 34: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/34.jpg)
Retrieval
‘Tender’ Crosby, Stills and Nash - Guinnevere Jewel - Enter from the East Art Tatum - Willow Weep for Me John Lennon - Imagine Tom Waits - Time
‘Female Vocals’ Alicia Keys - Fallin’ Shakira - The One Christina Aguilera - Genie in a Bottle Junior Murvin - Police and Thieves Britney Spears - I'm a Slave 4 U
‘Tender’
AND
‘Female Vocals’
Jewel - Enter from the East Evanescence - My Immortal Cowboy Junkies - Postcard Blues Everly Brothers - Take a Message to Mary Sheryl Crow - I Shall Believe
Query Retrieved Songs
![Page 35: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/35.jpg)
Annotation results
Annotation of the CAL500 songs with 10 words from a vocabulary of 174 words.
Model Precision Recall
Random 0.14 0.06
Our System 0.27 0.16
Human 0.30 0.15
![Page 36: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/36.jpg)
Retrieval results
Model AROC
Random 0.50
Our System - 1 Word 0.71
Our System - 2 Words 0.72
Our System - 3 Words 0.73
![Page 37: lecture15-audio retrieval](https://reader031.vdocument.in/reader031/viewer/2022020702/61fa714a5ae55560a25b8f1d/html5/thumbnails/37.jpg)
Echo nest
http://the.echonest.com/