similarity matrix processing for music structure analysis

Post on 23-Jan-2016

26 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Similarity Matrix Processing for Music Structure Analysis. Yu Shiu, Hong Jeng C.-C. Jay Kuo ACM Multimedia 2006. System Framework. Pitch Class Profile (PCP). - PowerPoint PPT Presentation

TRANSCRIPT

Similarity Matrix Processing for Music Structure Analysis

Yu Shiu, Hong Jeng

C.-C. Jay Kuo

ACM Multimedia 2006

System Framework

Pitch Class Profile (PCP)

• The PCP vector is a 12-dimensional vector, which shows the relative intensities of the 12 pitch classes, {C, C#, D, D#, E, F, F#, G, G#, A, A#,B}

• Normalized to a unit vector

Pitch Class Profile (PCP)

Measure-based Similarity Matrix

• Previous similarity matrix– Pre-defined window size– results in a similarity matrix of a large

size that makes further processing more expensive

• In this paper– Use measure as the element of

similarity matrix

Measure-based Similarity Matrix

• PCP Vector generation– choose a window size that is equal to

the duration of one half beat– Detect onset signal

• compute the change of the spectral content between two adjacent shifting windows of 20ms long and with 50% overlap

Measure-based Similarity Matrix

– the autocorrelation function (ACF) of the onset signal is calculated to determine the beat period

– Example:• 100BPM → length of half beat is 300 ms• Longer than the window size commonly

use in previous work

Measure-based Similarity Matrix

• Grouping N successive PCP vectors

• Since PCP vectors are unit vectors, 0 <= sij <= 1

• dynamic time warping (DTW) can be used to enhance the sij value

Dynamic Time Warping

Measure-based Similarity Matrix

• After the simplification, a 3-minute song with a tempo of 100BPM can form a 75 × 75 similarity matrix

• MSM reveals more the chord similarity rather than the melody similarity

• Johnny Cash’s Hurt repeatedly uses the chord succession {Am, Am, C, D} in the 1st and 3rd sections while {G, A, F, C} in the 2nd and 4th sections.

• Beatles’ Yesterday does not have chord succession of short periods. Its music form structure is P = {I V V C V C V O}

Two MSM Examples

Detection of Local Similarity

• Using a 2D moving window

Detection of Local Similarity• move the 2D moving window along

the diagonal line of the MSM

Detection of Long Range Similarity

• The Viterbi algorithm is used to find segments with consecutive large similarity values along the 45-degree direction

• we can exploit the output from the second module that provides the chord succession similarity to enhance the long range similarity detection.

Detection of Long Range Similarity

• interpret the x-axis as the “time”, the y-axis as the “state”

Detection of Long Range Similarity

• use “scores” instead of “probabilities”

• The score of a path is defined as the product of similarity value of all states and scores of all state transitions

Detection of Long Range Similarity

• PT0 > PT1 to guarantee the preference along the 45-degree direction.– The larger the ratio, the more favorable

the path will proceed along the 45-degree direction.

– In our experiment, the ratio PT0/PT1 is chosen to be 1.5

Detection of Long Range Similarity

• Pruning with Chord Succession Information– sections with repetitive chord

successions of a certain period should be similar to sections of same period

– A period value p is tagged to a measure

Detection of Long Range Similarity

Post-processing

• we begin with the state j that gives the highest Q(L, j) at time L, and perform a back-tracking process.

• Segments with length smaller than φ measures are removed– In our implementation, φ = 8.

• Segments whose mean similarity value is less than a threshold, τ , are removed– τ = mean + standard deviation (for all sij)

Post-processing

• Each segment should be divided– if their two corresponding sections in the song

overlap with each other– if there is a significant difference between

similarity values before and after a certain point in the segment.

• If there are conflicts on sections, the one with a higher similarity value has the priority to keep the boundaries

• For those songs in verse-chorus form, similarity values are clustered into two classes– high similarity values are claimed to be the chorus

Experiment

• collection of 120 pop, country and rock songs after 60’s.

• 100 of them are of the verse-chorus form and 20 are of the AAA or other form

• mono audio sampled at a rate of 22,050Hz, with 16 bits per sample.

Experimental Results

• The pattern extraction of a song is claimed to be correct if all patterns in the song are extracted without distinguishing between verse and chorus

• The accurate detection rate is 112/120 = 93.33%.

Experimental Results

top related