using the shape of music to compute the similarity between symbolic musical pieces
Post on 08-Jul-2015
306 Views
Preview:
DESCRIPTION
TRANSCRIPT
Using the Shape of Music to
Compute the Similarity between
Symbolic Musical Pieces
Julián Urbano, Juan Lloréns, Jorge Morato and Sonia Sánchez-Cuadrado http://julian-urbano.info
Twitter: @julian_urbano
CMMR 2010 · Málaga, Spain · June 24th
Outline
• Introduction
• Melodic Similarity Requirements
• General Solutions to the Requirements
• A Model Based on Interpolation
• Implementation and Experimental Results
• Conclusions and Future Work
2
Symbolic Melodic Similarity
• Given a musical piece (i.e. query), retrieve others deemed melodically similar to it (i.e. results)
• Traditional approaches? [Typke et al., 2005a]
▫ Geometry [Ukkonen et al., 2003][Typke et al., 2004]
▫ n-grams [Uitdenbogerd et al., 1999][Doraisamy et al., 2003]
▫ Alignment [Hanna et al., 2007]
• What do we do?
▫ Use local an alignment algorithm
▫ whose symbols are n-grams
▫ according to a geometric substitution function
3
General Requirements
• Any Music Information Retrieval system should meet several requirements [Selfridge-Field, 1998][Byrd et al., 2002][Mongeau et al., 1990]
• Particularly focused on non-experts
• We just put together and thoroughly describe traditional and well-known requirements mostly related with transposition invariance
▫ Vertical requirements (i.e. pitch)
▫ Horizontal requirements (i.e. time)
4
Vertical Requirements
• Query [simplified riff from Layla by Dereck and the Dominos]
• Octave Equivalence
5
Vertical Requirements (II)
• Query
• Degree Equality
6
Vertical Requirements (III)
• Query
• Note Equality
7
Vertical Requirements (IV)
• Query
• Pitch Variation
8
Vertical Requirements (V)
• Query
• Harmonic Similarity
9
Vertical Requirements (and VI)
• Voice Separation
10
Horizontal Requirements
• Query [simplified beginning from op.81 no.10 by S. Heller]
• Time Signature Equivalence
11
Horizontal Requirements (II)
• Query
• Tempo Equivalence
12
Horizontal Requirements (III)
• Query
• Duration Equality
13
Horizontal Requirements (and IV)
• Query
• Duration Variation
14
General Vertical Solutions
• Octave Equivalence ▫ Disregard octave number but consider relative
changes (G5 to C6 is not the same as G5 to C5).
• Degree Equality ▫ Use the degrees within the tonality
• Note Equality ▫ Use actual pitch values
• Some approaches use both, but key signature is not always available in SMF [Hanna et al., 2007]
• The accepted solution is to consider relative pitch differences between successive notes
15
General Horizontal Solutions
• Time Signature Equivalence ▫ Just ignore it
• Tempo Equivalence ▫ Use actual note durations
• Duration Equality ▫ Use score durations
• Again, this information is not mandatory in SMF, and users with different expertise would prefer different approaches
• It is usual to just ignore time altogether, or use the duration ratio between successive notes
16
A Model based on Interpolation
• Consider the time-pitch plane
• Arrange the notes as points in the plane, according to their pitch and duration
• With different voices, get new pitch-dimensions sharing the same time dimension
• Define the curve Ci(t) as the one interpolating the notes of the i-th voice (pitch-dimension)
17
A Model based on Interpolation (II)
18
A Model based on Interpolation (and III)
• The similarity of two pieces is thought of as their similarity in shape
• Most requirements are directly met
▫ Neither pitch nor time invariants change the shape of the curve
▫ Pitch and Duration Variations can be measured analytically
19
Measure of Similarity
• Consider the curves as polynomials
▫ C(t)=antn+an-1tn-1+…+a1t+a0
• The first derivative measures how much the shape is changing at any time
• The shape dissimilarity between two curves (songs) can be measured as the area between their first derivatives
20
It is Metric
• Non-negativity
▫ diff(C, D) ≥ 0
• Identity of indiscernibles
▫ diff(C, D) = 0 C = D
• Symmetry
▫ diff(C, D) = diff(D, C)
• Triangle inequality
▫ diff(C, E) ≤ diff(C, D) + diff(D, E)
• So we could use vantage objects [Bozkaya et al., 1999]
21
Interpolation with Splines
• Easier to handle than Lagrange’s polynomials
• They avoid the Runge’s phenomenon [de Boor, 2001]
22
Interpolation with Splines (II)
• Defined as piece-wise functions
• Very handy to measure the Pitch and Duration Variations
▫ Span durations can be normalized from 0 to 1
23
Interpolation with Splines (and III)
• Defined as parametric functions
▫ One function per dimension
• Pitch and Time can be compared separately
• Voices can be isolated easily
▫ Using partial derivatives
• More weight can be given to pitch than to time
24
First Implementation
• Dynamic programming has been widely used with textual representations of music
▫ Levenshtein distance
▫ Needleman-Wunsch global alignment
▫ Smith-Waterman local alignment [Smith et al., 1981]
Shown to be the most effective [Hanna et al., 2007, 2008]
• The symbols in the sequences are defined as n-grams of successive notes, according to the spans defined by the curve
• The substitution score between two n-grams is the area between their curves’ derivatives
25
First Implementation (and II)
• We used degree 3 Uniform B-Splines [de Boor, 2003]
▫ Results in spans of 4 notes (n-gram length)
Noted be effective [Doraisamy et al., 2003]
• Pitch relative to the first note’s
▫ 74, 81, 72, 76
▫ 7, -2, 2 (actually 0, 7, -2, 2)
• Duration relative to the first note’s
▫ 240, 480, 240, 720
▫ 2, 1, 3 (actually 1, 2, 1, 3 or 1/7, 2/7, 1/7, 3/7)
26
Results
• Tested with MIREX 2005 test collections
▫ Training and evaluation collections
▫ 11 queries per collection
▫ About 550 songs per collection
▫ Partially ordered lists with relevants [Typke et al., 2005b]
▫ Effectiveness measured with ADR [Typke et al., 2006]
27
Results (II)
• Two alternatives tested
▫ Kpitch=1 and Ktime=0
▫ Kpitch=0.75 and Ktime=0.25
Chosen by others [Doraisamy et al., 2003][Hanna et al., 2007]
• We found the improvement of considering time completely incidental
28
Tuning Collection Avg. Min. Max.
Kpitch=1 , Ktime=0 Training 0.639 0.271 0.864
Kpitch=0.75 , Ktime=0.25 Training 0.643 0.312 0.864
Kpitch=1 , Ktime=0 Evaluation 0.709 0.314 0.911
Kpitch=0.75 , Ktime=0.25 Evaluation 0.710 0.314 0.911
Results (and III)
• Compared with the official MIREX 2005 results
▫ We would have ranked first
▫ Best ADR scores for 5 of the 11 queries
bold for best per query, italics for best per system
* for significant difference at the 0.10 level, ** at the 0.05 level and *** at the 0.01 level
29
Query Splines GAM O US TWV L(P3) L(DP) FM
190.011.224-1.1.1 0.803 0.820 0.717 0.824 0.538 0.455 0.547 0.443
400.065.784-1.1.1 0.879 0.846 0.619 0.624 0.861 0.614 0.839 0.679
450.024.802-1.1.1 0.722 0.450 0.554 0.340 0.554 0.340 0.340 0.340
600.053.475-1.1.1 0.911 0.883 0.911 0.911 0.725 0.661 0.650 0.567
600.053.481-1.1.1 0.630 0.293 0.629 0.486 0.293 0.357 0.293 0.519
600.054.278-1.1.1 0.810 0.674 0.785 0.864 0.731 0.660 0.527 0.418
600.192.742-1.1.1 0.703 0.808 0.808 0.703 0.808 0.642 0.642 0.808
700.010.059-1.1.2 0.521 0.521 0.521 0.521 0.521 0.667 0.521 0.521
700.010.591-1.4.2 0.314 0.665 0.314 0.314 0.314 0.474 0.314 0.375
702.001.406-1.1.1 0.689 0.566 0.874 0.675 0.387 0.722 0.606 0.469
703.001.021-1.1.1 0.826 0.730 0.412 0.799 0.548 0.549 0.692 0.561
Average 0.710 0.660 0.650 0.642 0.571* 0.558*** 0.543** 0.518***
Conclusions
• We presented a new geometric model to compute the similarity of symbolic pieces
▫ Opens a very promising line for further research
• It has a very intuitive interpretation, but not so intuitive implementation
• A very early prototype has shown to perform quite well with the MIREX 2005 test collections
▫ Would have ranked first
▫ Though not significantly better than the top 3
• The modeling of time is once again shown not to improve the overall effectiveness
30
So Now What?
• We presented a very early work • We are currently improving it
▫ As of today we reach avg. ADR scores of over 0.82
• Other considerations ▫ Local alignment? Domain-dependant tuning? ▫ Uniform B-Splines? Cardinal? Hermite? ▫ n-grams of length 4? Split at inflection points? ▫ Area between derivatives? Between the curves? ▫ Shape as a nominal variable (concave, convex)? ▫ Harmony: all possible paths? Polyphony?
• We will see... ▫ Submitting 3 or 4 versions to MIREX 2010
31
And That’s It!
Picture by 姒儿喵喵
32
top related