retrieval methods for qbsh (query by singing/humming)

18
Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張張張 ) [email protected] http://mirlab.org/jang Multimedia Information Retrieval Lab CSIE Dept, National Taiwan University

Upload: orsen

Post on 08-Jan-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Retrieval Methods for QBSH (Query By Singing/Humming). J.-S. Roger Jang ( 張智星 ) [email protected] http://mirlab.org/jang Multimedia Information Retrieval Lab CSIE Dept, National Taiwan University. Retrieval Methods for QBSH. Goal Find the most similar melody in the database Challenges - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Retrieval Methods for  QBSH (Query By Singing/Humming)

Retrieval Methods for QBSH (Query By Singing/Humming)

J.-S. Roger Jang (張智星 )

[email protected]

http://mirlab.org/jang

Multimedia Information Retrieval Lab

CSIE Dept, National Taiwan University

Page 2: Retrieval Methods for  QBSH (Query By Singing/Humming)

Retrieval Methods for QBSH

Goal Given a query, find the most similar melody in the database

Challenges Robust pitch tracking for various acoustic inputs

Input from a mobile deviceInput at a noisy karaoke room

Comparison methods need to deal with…Key variations in users’ input (due to gender difference)Tempo variations in users’ inputReasonable response time, e.g., 5 seconds

Page 3: Retrieval Methods for  QBSH (Query By Singing/Humming)

Evaluation of QBSH Methods

Two criteria for evaluating QBSH methods Efficiency: How fast is the system?

Can it deal with a database of 100K songs?

Effectiveness: How accurate is the system?Several performance indices for effectiveness

Page 4: Retrieval Methods for  QBSH (Query By Singing/Humming)

A Typical Query Result

Page 5: Retrieval Methods for  QBSH (Query By Singing/Humming)

Performance Indices of Effectiveness in QBSH Methods

Queries always in database Top-10 recognition rates (RR) for n queries:

RR = (1+0+0+1+1…)/n

Top-10 mean reciprocal rank (MRR) for n queries: MRR = (1/3+1/inf+1/4+1/2+1/5…)/n

Queries may not in database True positive and true negative rates to deal with

out-of-vocabulary (OOV) problem

Page 6: Retrieval Methods for  QBSH (Query By Singing/Humming)

Examples of RR and MRR

Specs No. of queries: 10 Database size: 100 No OOV

GT (groundtruth) of the query set are within DB

Test result GT ranking: [1 3 8 4 9 21

2 5 8 2] Top-5 RR

(1+1+0+1+0+0+1+1+0+1)/10 = 6/10 = 60%

Top-5 MRR(1/1+1/3+1/∞+1/4+1/

∞+1/∞+1/2+1/5+1/∞+1/2)/10 = 0.2783

Quiz!

Quiz!

Page 7: Retrieval Methods for  QBSH (Query By Singing/Humming)

Types of QBSH Approaches

Categories of approaches to QBSH Histogram/statistics-based Note vs. note

Edit distance

Frame vs. noteHMM

Frame vs. frameLinear scaling, DTW, recursive alignment

Page 8: Retrieval Methods for  QBSH (Query By Singing/Humming)

Linear Scaling (LS)

Concept Scale the query linearly to match the candidates

Assumption Uniform tempo variation

Rest handling Cut leading and trailing zeros (silence) All the other zeros (rests) are replaced with the

previous non-zero pitch

Quiz! Example: Row Row Row a Boat

Page 9: Retrieval Methods for  QBSH (Query By Singing/Humming)

Linear Scaling

Scale the query pitch linearly to match the candidates

Original input pitch

Stretched by 1.25

Stretched by 1.5

Compressed by 0.75

Compressed by 0.5

Target pitch in database

Best match

Original pitch

Most likely from a MIDI file

Page 10: Retrieval Methods for  QBSH (Query By Singing/Humming)

Strength and Weakness of LS

Strength One-shot for dealing

with key transposition Efficient and effective Indexing methods

available

Weakness Cannot deal with non-

uniform tempo variations

Typical mapping path

Quiz!

Page 11: Retrieval Methods for  QBSH (Query By Singing/Humming)

Compress or Expand a Pitch Vector

Given a pitch vector y of length m, how to compress or expand it to length n? x2=interp1(1:m, y, linspace(1, m, n)); Examples

m=7, n=13m=7, n=9

Quiz!

Page 12: Retrieval Methods for  QBSH (Query By Singing/Humming)

Distance Function for LS

Commonly used distance function for LS Normalized Lp-norm

Characteristics Usually p=1 or 2 for LS Normalization to get rid of length variations

pp

n

pp

p n

xxxxL

/1

21)(ˆ

Quiz!

Page 13: Retrieval Methods for  QBSH (Query By Singing/Humming)

Key Transposition in LS

How to find the best transposed query that has the smallest distance from the database items: Best transposition

In practice…

)(minargˆ rsqLs ps

Query

Database item

Transposed query

)()()(ˆ1

)()()(ˆ2

rmedianqmedianrqmediansp

rmeanqmeanrqmeansp

Page 14: Retrieval Methods for  QBSH (Query By Singing/Humming)

Example of Linear Scaling via L1 Norm

linScaling01.m

0 50 100 150 200 250 300 350

50

60

70Database and input pitch vectors

Sem

itone

s

Database pitch

Input pitch

0 50 100 150 200 250 300 350

50

60

70

Sem

itone

s

Database and scaled pitch vectors

Database pitch

Scaled pitch

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.50

2

4

Scaling factor

Dis

tanc

e

Normalized distance

Page 15: Retrieval Methods for  QBSH (Query By Singing/Humming)

Linear Scaling via L1 and L2 Norm

linScaling02.m

0 50 100 150 200 250 300 350

50

60

70Database and input pitch vectors

Sem

itone

s

Database pitch

Input pitch

0 50 100 150 200 250 300 350

50

60

70

Sem

itone

s

Database and scaled pitch vectors

Database pitch

Scaled pitch via L1 norm

Scaled pitch via L2 norm

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.50

5

Scaling factor

Dis

tanc

es

Normalized distances via L1 & L

2 norm

L

1 norm

L2 norm

Page 16: Retrieval Methods for  QBSH (Query By Singing/Humming)

DTW (Dynamic Time Warping)

About DTW DTW introduction DTW for QBSH#1 method for task 2 in QBSH/MIREX 2006

Page 17: Retrieval Methods for  QBSH (Query By Singing/Humming)

RA (Recursive Alignment)

Characteristics Combine characteristics

of LS & DTW #1 method for task 1 in

QBSH/MIREX 2006

A typical mapping path

Page 18: Retrieval Methods for  QBSH (Query By Singing/Humming)

Modified Edit Distance

Note segmentation

Modified edit distance

,

)(}2),,....,,({

)(}2),,,....,({

)(),(

)(),(

)(),(

min

1,1

11,

1,1

1,

,1

,

ionfragmentatjkbbawd

ionconsolidatikbaawd

treplacemenbawd

insertionbwd

deletionawd

d

jkjikji

jikijki

jiji

jji

ji

ji