audio meets image retrieval techniques dave kauchak department of computer science university of...

21
Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego [email protected]

Upload: maurice-casey

Post on 16-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Audio Meets Image Retrieval Techniques

Dave Kauchak

Department of Computer Science

University of California, San Diego

[email protected]

Image vs. Audio

? ? ?

?

?

?

ClassicalCountry

Rock

Image techniques to audio Idea: Apply image retrieval (and

classification) techniques to audio Image is 2-D Audio is 1-D

Benefits Don’t have to reinvent the wheel Image techniques have had fairly

good success More literature in image

processing Audio retrieval is a relatively new

field

Key Concepts and Goals Image techniques to audio processing

Apply a number of different image techniques (and show they work )

Relate various parts of audio to counterparts in image

Novel data set with known ground truth

Multiple input for audio Raw audio

A first step… Audio retrieval

Input: A number of songs Output: “Similar” songs from an

audio database Histogramming methods (Puzicha

et. al.) Wavelets instead of gabor filters

Basic Technique

DWT

Database

Most “similar” songs

histogram

Normal vs. Proportional Histogramming

Remember DWT:

Different number of samples per level Normal: Histogram each level with

same number of bins Proportional: Histogram each level

keeping samples/bin equal

Compare Histograms Chi-square on each level

Sum chi-square value and use for dissimilarity measure (lower the better)

Sum dissimilarity over all input songs

i JfIf

JfIfJID

)()(

)()(),(

2

Ground Truth Data Set Songs by 4 different bands (10

songs each) Dave Mathews band U2 Blink 182 Green Day

Mono, sampled at 22 KHz from a number of sources

Experiment Input = 5 songs by a single band Goal = Pull out 5 other songs by

that band 10 random experiments per band

(40 total) Normal bins: 8, 16, 32, 64, 128,

192, 256, 320, 384, 448, 512 Proportional bins: 4, 8, 16, 32, 64

Scoring By points:

5 pts. Correct answer in first place 4 pts. Correct answer in second place,

etc. Perfect = 5+4+3+2+1 = 15

Percentage correct at each place Percentage that have correct

answer less than or equal to place

Results: Points

Normal Histogramming: Increasing Bin Size

0

0.5

1

1.5

2

2.5

3

3.5

4

8 16 32 64 128 192 256 320 384 448 512

Number of Bins

Sco

re

Normal

Results: Points Proportional

Proportional Histogramming: Increasing Bin Size

0

0.5

1

1.5

2

2.5

3

3.5

4 8 16 32 64

Number of Bins

Sco

re

Proportional

Best Score Results: 16 bins

1st 2nd 3rd 4th 5th Score

Dave Mathews

.6 .8 .4 .3 .2 8.2

Blink 182

.3 .1 .1 0 .1 2.3

U2 0 0 0 .1 0 .2

Green Day

.2 .3 .2 0 .5 3.3

Average .275 .3 .175 .1 .2 3.5

Different Bands

Normal Proportional

Dave Mathews

6.9 5.8

Blink 182 1.3 2

U2 .9 1.5

Green Day 2.1 2

Average 2.8 2.8

Percentage correct

1st 2nd 3rd 4th 5th

Normal .23 .17 .17 .17 .18

Proportional

.16 .21 .24 .15 .15

One last result NormalPercentage that have good answers less than or equal to entry:

bins 1 2 3 4 58 0.3 0.375 0.425 0.5 0.625

16 0.275 0.4 0.475 0.5 0.5532 0.25 0.325 0.4 0.45 0.52564 0.25 0.3 0.4 0.475 0.55

128 0.225 0.325 0.425 0.5 0.6192 0.2 0.3 0.4 0.5 0.675256 0.225 0.35 0.475 0.525 0.65320 0.2 0.35 0.475 0.55 0.625384 0.225 0.325 0.5 0.55 0.625448 0.2 0.325 0.425 0.55 0.625512 0.2 0.35 0.45 0.55 0.625

ProportionalPercentage that have good answers less than or equal to entry:

bins 1 2 3 4 54 0.2 0.375 0.625 0.65 0.758 0.225 0.375 0.525 0.55 0.625

16 0.175 0.4 0.55 0.55 0.57532 0.125 0.375 0.45 0.5 0.57564 0.075 0.3 0.45 0.5 0.575

Summary of Results Overall, results are not amazing Band choice has large influence Normal and Proportional perform

somewhat similar Proportional is more even over all

bands Bin size doesn’t appear to be crucial

75% of a chance a song by the same band will end up in top 5

Next Step… Adaptive Binning Vary Parameters

Levels Song length Histogram comparison methods

Another image retrieval algorithm Boosting for feature selection using large

feature set? Other?

Larger and more diverse database

Conclusion Even though results are not

fabulous, image processing techniques CAN be used for audio processing

Using bands for testing allows for ground truth

Audio files are BIG!