shenoy, a., & wang, y. (2005). key, chord, and rhythm tracking of popular music recordings. computer...

7/29/2019 Shenoy, A., & Wang, Y. (2005). Key, Chord, And Rhythm Tracking of Popular Music Recordings. Computer Music Jo

1/13

Arun Shenoy and Ye WangSchool of ComputingNational University of Singapore3 Science Drive 2, Singapore [email protected]@comp.nus.edu.sg

Key, Chord, and Rhythm

Tracking of Popular Music

Recordings

In this article, we propose a framework to analyze amusical audio signal (sampled from a popular musicCD) and determine its key, provide usable chordtranscriptions, and obtain the hierarchical rhythmstructure representation comprising the quarter-note, half-note, and whole-note (or measure) levels.This framework is just one specific aspect of the

broader field of content-based analysis of music.There would be many useful applications of content-based analysis of musical audio, most of which arenot yet fully realized. One of these is automaticmusic transcription, which involves the transfor-mation of musical audio into a symbolic represen-tation such as MIDI or a musical score, which inprinciple, could then be used to recreate the musicalpiece (e.g., Plumbley et al. 2002). Another applica-tion lies in the field of music informational re-trieval, that is, simplifying interaction with large

databases of musical multimedia by annotating au-dio data with information that is useful for searchand retrieval (e.g., Martin et al. 1998).

Two other applications are structured audio andemotion detection in music. In the case of struc-tured audio, we are interested in transmitting soundby describing rather than compressing it (Martin etal. 1998). Here, content analysis could be used toautomate partly the creation of this description bythe automatic extraction of various musical con-structs from the audio. Regarding emotion detec-

tion, Hevner (1936) has carried out experiments thatsubstantiated a hypothesis that music inherentlycarries emotional meaning. Huron (2000) haspointed out that, because the preeminent functionsof music are social and psychological, emotion couldserve as a very useful measure for the characteriza-tion of music in information retrieval systems. Theinfluence of musical chords on listenerss emotionhas been demonstrated by Sollberger et al. (2003).

Whereas we would expect human listeners to bereasonably successful at general auditory scene

analysis, it is still a challenge for computers to perform such tasks. Even simple human acts of cogntion such as tapping the foot to the beat or swayinin time with the music are not easily reproduced ba computer program. A brief review of audio anal-ysis as it relates to music, followed by case studieof a recently developed system that analyze specifi

aspects of music, has been presented by Dixon(2004). The landscape of music-content processingtechnologies is discussed in Aigrain (1999). The curent article does not present new audio signal-processing techniques for content analysis, insteabuilding a framework from existing techniques.However, it does represent a unique attempt at ingrating harmonic and metric information within aunified system in a mutually informing manner.

Although the detection of individual notes constutes low-level music analysis, it is often difficult

for the average listener to identify them in music.Rather, it is the overall quality conveyed by the combination of notes to form chords. Chords are the hamonic description of music, and like melody andrhythm, could serve to capture the essence of the msical piece. Non-expert listeners tend to hear grouof simultaneous notes as chords. It can be quite difcult to identify whether or not a particular pitch hbeen heard in a chord. Furthermore, although a coplete and accurate polyphonic transcription of allnotes would undoubtedly yield the best results, it

often possible to classify music by genre, identify msical instruments by timbre, or segment music intsectional divisions without this low-level analysis

Tonality is an important structural property ofmusic, and it has been described by music theorisand psychologists as a hierarchical ordering of thepitches of the chromatic scale such that these notare perceived in relation to one central and stablepitch, the tonic (Smith and Schmuckler 2000). Thhierarchical structure is manifest in listenerss peceptions of the stability of pitches in tonal contex

The key of a piece of music is specified by its toniand one of two modes: major or minor. A system tdetermine the key of acoustic musical signals has

Computer Music Journal, 29:3, pp. 7586, Fall 2005 2005 Massachusetts Institute of Technology


2/13

been demonstrated in Shenoy et al. (2004) and willbe summarized later in this article.

Rhythm is another component that is fundamen-tal to the perception of music. Measures of musicdivide a piece into time-counted segments, andtime patterns in music are referred to in terms ofmeter. The beat forms the basic unit of musicaltime, and in a meter of 4/4also called common orquadruple timethere are four beats to a measure.Rhythm can be perceived as a combination of strongand weak beats. In a 4/4 measure consisting of four

successive quarter notes, there is usually a strongbeat on the first and third quarter notes, and a weakbeat on the second and fourth (Goto and Muraoka1994). If the strong beat constantly alternates withthe weak beat, the inter-beat-interval (the temporaldifference between two successive beats) wouldusually correspond to the temporal length of a quar-ter note. For our purpose, the strong and weak beatsas defined above correspond to the alternating se-quence of equally spaced phenomenal impulses thatdefine the tempo of the music (Scheirer 1998). A

hierarchical structure like the measure (bar-line)level can provide information more useful for mod-eling music at a higher level of understanding (Gotoand Muraoka 1999). Key, chords, and rhythm areimportant expressive dimensions in musical perfor-mances. Although expression is necessarily con-tained in the physical features of the audio signalsuch as amplitudes, frequencies, and onset times, itis better understood when viewed from a higherlevel of abstraction, that is, in terms of musical con-structs (Dixon 2003) like the ones discussed here.

Related Work

Existing work in key determination has been re-stricted to either the symbolic domain (MIDI andscore), or, in the audio domain, single-instrumentand simple polyphonic sounds (see for example, Nget al. 1996; Chew 2001, 2002; Povel 2002; Pickens2003; Raphael and Stoddard 2003; Zhu and Kankan-halli 2003; Zhu et al. 2004). A system to extract the

musical key from classical piano sonatas sampledfrom compact discs has been demonstrated by Pauws

(2004). Here, the spectrum is first restructured inta chromagram representation in which the frequecies are mapped onto a limited set of twelve chromvalues. This chromagram is used in a correlativecomparison with the key profiles of all the 24 Weern musical keys that represent the perceived stabity of each chroma within the context of a particumusical key. The key profile that has the maximucorrelation with the computed chromagram istaken as the most likely key. It has been mentionthat the performance of the system on recordings

from other instrumentation or from other musicaidioms is unknown. Additionally, factors in musiperception and cognition such as rhythm and har-mony are not modeled. These issues have been addressed in Shenoy et al. (2004) in extracting the keof popular music sampled from compact-disc audi

Most existing work in the detection and recogntion of chords has similarly been restricted to thesymbolic domain or single-instrument and simplepolyphonic sounds (see, for example, Carreras et a1999; Fujishima 1999; Pardo and Birmingham 199

Bello et al. 2000; Pardo and Birmingham 2001;Ortiz-Berenguer and Casajus-Quiros 2002; Pickenand Crawford 2002; Pickens et al. 2002; Klapuri2003; Raphael and Stoddard 2003). A statistical approach to perform chord segmentation and recogntion on real-world musical recordings that usesHidden Markov Models (HMMs) trained using theExpectation-Maximization (EM) algorithm has bedemonstrated in Sheh and Ellis (2003). This workdraws on the prior idea of Fujishima (1999) who pposed a representation of audio termed pitch-cla

profiles (PCPs), in which the Fourier transform itensities are mapped to the twelve semitone class(chroma). This system assumes that the chord se-quence of an entire piece is known beforehand,which limits the technique to the detection ofknown chord progressions. Furthermore, becausethe training and testing data is restricted to the msic of the Beatles, it is unclear how this systemwould perform for other kinds of music. A learninmethod similar to this has been used to for chord dtection in Maddage et al. (2004). Errors in chord de

tection have been corrected using knowledge fromthe key-detection system in Shenoy et al. (2004).


3/13

Much research in the past has also focused onrhythm analysis and the development of beat-tracking systems. However, most of this researchdoes not consider higher-level beat structuresabove the quarter-note level, or it was restricted tothe symbolic domain rather than working in real-world acoustic environments (see for example,Allen and Dannenberg 1990; Goto and Muraoka1994; Vercoe 1997; Scheirer 1998; Cemgil et al.2001; Dixon 2001; Raphael 2001; Cemgil and Kap-pen 2003; Dixon 2003). Goto and Muraoka (1999)

perform real-time higher level rhythm determina-tion up to the measure level in musical audio with-out drum sounds using onset times and chordchange detection for musical decisions. The provi-sional beat times are a hypothesis of the quarter-note level and are inferred by an analysis of onsettimes. The chord-change analysis is then performedat the quarter-note level and at the interpolatedeighth-note level, followed by an analysis of howmuch the dominant frequency components in-cluded in chord tones and their harmonic overtones

change in the frequency spectrum. Musical knowl-edge of chord change is then applied to detect thehigher-level rhythm structure at the half-note andmeasure (whole-note) levels. Goto has extended thiswork to apply to music with and without drumsounds using drum patterns in addition to onsettimes and chord changes discussed previously(Goto 2001). The drum pattern analysis can be per-formed only if the musical audio signal containsdrums, and hence a technique that measures the au-tocorrelation of the snare drums onset times is ap-

plied. Based on the premise that drum-sounds arenoisy, the signal is determined to contain drumsounds only if this autocorrelation value is highenough. Based on the presence or absence of drumsounds, the knowledge of chord changes and/ordrum patterns is selectively applied. The highestlevel of rhythm analysis at the measure level(whole-note/bar) is then performed using only musi-cal knowledge of chord change patterns. In boththese works, chords are not recognized by name,and thus rhythm detection has been performed us-

ing chord-change probabilities rather than actualchord information.

System Description

A well-known algorithm used to identify the key music is the Krumhansl-Schmuckler key-finding gorithm (Krumhansl 1990). The basic principle ofthe algorithm is to compare the input music withprototypical major (or minor) scale-degree profile.other words, the distribution of pitch-classes in apiece is compared with an ideal distribution foreach key. This algorithm and its variations (Huronand Parncutt 1993; Temperley 1999a, 1999b, 2002

however, could not be directly applied in our sys-tem, as these require a list of notes with note-onand note-off times, which cannot be directly ex-tracted from polyphonic audio. Thus, the problemhas been approached in Shenoy et al. (2004) at ahigher level by clustering individual notes to obtathe harmonic description of the music in the formof the 24 major/minor triads. Then, based on a rulbased analysis of these chords against the chordspresent in the major and minor keys, the key of thsong is extracted. The audio has been framed into

beat-length segments to extract metadata in theform of quarter-note detection of the music. The bsis for this technique is to assist in the detection ochord structure based on the musical knowledgethat chords are more likely to change at beat timethan at other positions (Goto and Muraoka 1999).

The beat-detection process first detects the onsepresent in the music using sub-band processing(Wang et al. 2003). This technique of onset detec-tion is based on the sub-band intensity to detect tperceptually salient percussive events in the signa

We draw on the prior ideas of beat tracking dis-cussed in Scheirer (1998) and Dixon (2003) to detemine the beat structure of the music. As a first steall possible values of inter-onset intervals (IOIs) arcomputed. An IOI is defined as the time interval btween any pair of onsets, not necessarily successivThen, clusters of IOIs are identified, and a rankedset of hypothetical inter-beat-intervals (IBIs) is created based on the size of the corresponding clusterand by identifying integer relationships with otheclusters. The latter process is to recognize harmon

relationships between the beat (at the quarter-notlevel) and simple integer multiples of the beat (at


4/13

the half-note and whole-note levels). An error mar-gin of 25 msec has been set in the IBI to account forslight variations in the tempo. The highest rankedvalue is returned as the IBI from which we obtainthe tempo, expressed as an inverse value of the IBI.Patterns of onsets in clusters at the IBI are tracked,and beat information is interpolated into sections in

which onsets corresponding to the beat might notbe detected.

The audio feature for harmonic analysis is a re-duced spectral representation of each beat-spacedsegment of the audio based on a chroma transforma-tion of the spectrum. This feature class representsthe spectrum in terms of pitch class and forms thebasis for the chromagram (Wakefield 1999). The sys-tem takes into account the chord distributionacross the diatonic major scale and the three typesof minor scales (natural, harmonic, and melodic).

Furthermore, the system has been biased by assign-ing relatively higher weights to the primary chords

in each key (tonic, dominant, and subdominant).This concept has also been demonstrated in the Sral Array, a mathematical model for tonality (Che2001, 2002).

It is observed that the chord-recognition accuraof the key system, though sufficient to determinethe key, is not sufficient to provide usable chordtranscriptions or determine the hierarchical rhythstructure across the entire duration of the music.Thus, in this article, we enhance the four-step keydetermination system with three post-processing

stages that allow us to perform these two tasks wgreater accuracy. The block diagram of the proposframework is shown in Figure 1.

System Components

We now discuss the three post-key processing components of the system. These consist of a first phaof chord accuracy enhancement, rhythm structurdetermination, and a second phase of chord accu-

racy enhancement.

Chord Accuracy Enhancement (Phase 1)

In this step we aim to increase the accuracy of chodetection. For each audio frame, we perform twochecks.

Check 1: Eliminate Chords Not in the Key of theSong

Here, we perform a rule-based analysis of the de-tected chord to see if it exists in the key of the sonIf not, we check for the presence of the major chorof the same pitch class (if the detected chord is mnor), and vice versa. If present in the key, we replathe erroneous chord with this chord. This is becauthe major and minor chord of a pitch class differonly in the position of their mediants. The chord-detection approach often suffers from recognitionerrors that result from overlaps of harmonic comp

nents of individual notes in the spectrum; these aquite difficult to avoid. Hence, there is a possibly

Figure 1. System frame-work.


5/13

of error in the distinction between the major andminor chords for a given pitch class. It must behighlighted that chords outside the key are not nec-essarily erroneous, and this usage is a simplificationused by this system. If this check fails, we eliminate

the chord.

Check 2: Perform Temporal Corrections ofDetected or Missing Chords

If the chords detected in the adjacent frames are thesame as each other but different from the currentframe, then the chord in the current frame is likelyto be incorrect. In these cases, we coerce the cur-rent frames chord to match the one in the adjacentframes.

We present an illustrative example of the abovechecks over three consecutive quarter-note-spacedframes of audio in Figure 2.

Rhythm Structure Determination

Next, our system checks for the start of measuresbased on the premise that chords are more likely tochange at the beginning of a measure than at otherbeat positions (Goto 2001). Because there are four

quarter notes to a measure in 4/4 time (which is byfar the most common meter in popular music), we

check for patterns of four consecutive frames withthe same chord to demarcate all possible measureboundaries. However, not all of these boundariesmay be correct. We will illustrate this with an ex-ample in which a chord sustains over two measur

of the music. From Figure 3c, it can be seen thatthere are four possible measure boundaries being dtected across the twelve quarter-note-spaced framof audio. Our aim is to eliminate the two erroneouones, shown with a dotted line in Figure 3c, and interpolate an additional measure line at the start ofthe fifth frame to give us the required result as seein Figure 3b.

The correct measure boundaries along the entirlength of the song are thus determined as follows.First, obtain all possible patterns of boundary loca

tions that are have integer relationships in multipof four. The, select the pattern with the highestcount as the one corresponding to the pattern of atual measure boundaries. Track the boundary locations in the detected pattern, and interpolate missiboundary positions across the rest of the song.

Chord Accuracy Enhancement (Phase 2)

Now that the measure boundaries have been ex-

tracted, we can increase the chord accuracy in eacmeasure of audio with a third check.

Figure 2. Chord accuracyenhancement (Phase 1).


6/13

Check 3: Intra-Measure Chord Check

From Goto and Muraoka (1999), we know thatchords are more likely to change at the beginning ofthe measures than at other positions of half-notetimes. Hence if three of the chords are the same,

then the four chord is likely to be the same as theothers (Check 3a). Also, if a chord is common toboth halves of the measure then all the chords in

the measure are likely to be the same as this chor(Check 3b).

It is observed that all possible cases of chords under Check 3a are already handled by Checks 1 andabove. Hence, we only implement Check 3b, as illustrated in Figure 4 with an example. This check

required because, in the case of a minor key, we chave both the major and minor chord of the samepitch class present in the song A classic example

Figure 3. Error in measureboundary detection.

Figure 4. Chord accuracyenhancement (Phase 2).

Figure 3

Figure 4


7/13

this can be seen in Hotel California by the Eagles.This song is in the key of B Minor, and the chords inthe verse include an E major and an E minor chord,which shows a possible musical shift from the as-cending melodic minor to the natural minor. Here,if an E major chord is detected in a measure contain-ing the E minor chord, Check 1 would not detectany error, because both the major and minor chordsare potentially present in the key of B minor. Themelodic minor, however, rarely occurs as such in

popular music melodies, but it has been included inthis work along with the natural and harmonic mi-nor for completeness

Experiments

Setup

The results of our experiments, performed on 30popular songs in English spanning five decades ofmusic, are tabulated in Table 1. The songs havebeen carefully selected to represent a variety ofartists and time periods. We assume the meter to b4/4, which is the most frequent meter of popular

songs, and the tempo of the input song is assumedto be constrained between 40 and 185 quarter notper minute

Table 1. Experimental Results

Chord Chord Accuracy Successful Chord Accur

Detection Original Detected Enhancement-I Measure Enhancemen

No. Song Title (% accuracy) Key Key (% accuracy) Detection (% accuracy)

1 (1965) Righteous BrothersUnchained melody 57.68 C major C major 70.92 28/30 songs 85.11

1 (1965) Righteous BrothersUnchained melody 57.68 C major C major 70.92 Yes 85.11

2 (1977) Bee GeesStayin alive 39.67 F minor F minor 54.91 Yes 71.40

3 (1977) Eric ClaptonWonderful tonight 27.70 G major G major 40.82 Yes 60.64

4 (1977) Fleetwood MacYou make lovin fun 44.37 A# major A# major 60.69 Yes 79.31

5 (1979) EaglesI cant tell you why 52.41 D major D major 68.74 Yes 88.97

6 (1984) ForeignerI want to know what love is 55.03 D# minor D# minor 73.42 No 58.12

7 (1986) Bruce HornsbyThe way it is 59.74 G major G major 70.32 Yes 88.508 (1989) Chris ReaRoad to hell 61.51 A minor A minor 76.64 Yes 89.24

9 (1991) R.E.M.Losing my religion 56.31 A minor A minor 70.75 Yes 85.74

10 (1991) U2One 56.63 C major C major 64.82 Yes 76.63

11 (1992) Michael JacksonHeal the world 30.44 A major A major 51.76 Yes 68.62

12 (1993) MLTRSomeday 56.68 D major D major 69.71 Yes 87.30

13 (1995) CoolioGangstas paradise 31.75 C minor C minor 47.94 Yes 70.79

14 (1996) Backstreet BoysAs long as you love me 48.45 C major C major 61.97 Yes 82.82

15 (1996) Joan OsborneOne of us 46.90 A major A major 59.30 Yes 80.05

16 (1997) Bryan AdamsBack to you 68.92 C major C major 75.69 Yes 95.80

17 (1997) Green DayTime of your life 54.55 G major G major 64.58 Yes 87.77

18 (1997) HansonMmmbop 39.56 A major A major 63.39 Yes 81.08

19 (1997) Savage GardenTruly, madly, deeply 49.06 C major C major 63.88 Yes 80.86

20 (1997) Spice GirlsViva forever 64.50 D# minor F# major 74.25 Yes 91.42

21 (1997) Tina ArenaBurn 35.42 G major G major 56.13 Yes 77.38

22 (1998) Jennifer PaigeCrush 40.37 C# min C# min 55.41 Yes 76.78

23 (1998) Natalie ImbrugliaTorn 53.00 F major F major 67.89 Yes 87.73

24 (1999) SantanaSmooth 54.53 A minor A minor 69.63 No 49.91

25 (2000) CorrsBreathless 36.77 B major B major 63.47 Yes 77.28

26 (2000) Craig DavidWalking away 68.99 A minor C major 75.26 Yes 93.03

27 (2000) Nelly FurtadoTurn off the light 36.36 D major D major 48.48 Yes 70.52

28 (2000) WestlifeSeasons in the sun 34.19 F# major F# major 58.69 Yes 76.35

29 (2001) ShakiraWhenever, wherever 49.86 C# minor C# minor 62.82 Yes 78.39

30 (2001) TrainDrops of Jupiter 32.54 C major C major 53.73 Yes 69.85

Overall Accuracy at each stage 48.13 28/30 songs 63.20 28/30 songs 78.91


8/13

It can be observed that the average chord-detectionaccuracy across the length of the entire music per-formed by the chord-detection step (module 3 inFigure 2) is relatively low at 48.13%. The rest of thechords are either not detected or are detected in er-ror. This accuracy is sufficient, however, to deter-mine the key accurately for 28 out of 30 songs inthe key-detection step (module 4 in Figure 2), whichreflects an accuracy of over 93% for key detection.This was verified against the information in com-mercially available sheet music for the songs (www

.musicnotes.com, www.sheetmusicplus.com). Theaverage chord detection accuracy of the system im-proves on an average by 15.07% on applying ChordAccuracy Enhancement (Phase 1). (See module 5 inFigure 2.) Errors in key determination do not haveany effect on this step, as will be discussed next.

The new accuracy of 63.20% has been found to besufficient to determine the hierarchical rhythmstructure (module 6 in Figure 2) across the music for28 out of the 30 songs, thus again reflecting an accu-racy of over 93% for rhythm tracking. Finally, the

application of Chord Accuracy Enhancement (Phase2; see module 7 in Figure 2) makes a substantial per-formance improvement of 15.71%, leading to a finalchord-detection accuracy of 78.91%. This couldhave been higher were it not for the performancedrop for the two songs (song numbers 6 and 24 inTable 1) owing to error in measure-boundary detec-tion. This exemplifies the importance of accuratemeasure detection to performing intra-measurechord checks based on the previously discussedmusical knowledge of chords.

Analysis of Key

It can be observed that for two of the songs (songnumbers 20 and 26 in Table 1), the key has been de-termined incorrectly, because the major key and itsrelative minor are very close. Our technique as-sumes that the key of the song is constant through-out the length of the song. However, many songsoften use both major and minor keys, perhaps

choosing a minor key for the verse and a major keyfor the chorus, or vice versa. Sometimes, the chordsused in the song are present in both the major and

its relative minor. For example, the four mainchords used in the song Viva Forever by the SpiGirls are D-sharp minor, A-sharp minor, B major,and F-sharp major. These chords are present in thekey of F-sharp major and D-sharp minor; hence, itdifficult for the system to determine if the song isthe major key or the relative minor. A similar obsvation can be made for the song Walking AwayCraig David, in which the main chords used are Aminor, D minor, F major, G major, and C major.These chords are present in both the key of C maj

as well as in its relative minor, A minor.The use of a weighted-cosine-similarity techniq

causes the shorter major-key patterns to be preferrover the longer minor-key patterns, owing to nor-malization that is performed while applying the csine similarity. For the minor keys, normalizationapplied taking into account the count of chords thcan be constructed across all the three types of minscales. However, from an informal evaluation of poular music, we observe that popular music in minkeys usually shifts across only two out of the thre

scales, primarily the natural and harmonic minorIn such cases, the normalization technique appliewould cause the system to become slightly biasedtoward the relative major key, where this problemnot present as there is only one major scale.

Errors in key recognition do not affect the ChorAccuracy Enhancement (Phase 1) module, becauswe also consider chords present in the relative major/minor key in addition to the chords in the de-tected key. A theoretical explanation of how toperform key identification in such cases of ambig

ity (as seen above) based on an analysis of sheet msic can be found in Ewer (2002).

Analysis of Chord

The variation in the chord-detection accuracy of tsystem can be explained in terms of other chordsand key change.

Use of Other Chords

In this approach, we have considered only the maand minor triads However in addition to these


9/13

there are other chord possibilities in popular music,which likely contribute to the variation in chord de-tection. These chords include the augmented anddiminished triads, seventh chords (e.g., the dominantseventh, major seventh, and minor seventh), etc.

Polyphony, with its multidimensional sequencesof overlapping tones and overlapping harmoniccomponents of individual notes in the spectrum,

might cause the elements in the chroma vector tobe weighted wrongly. As a result, a Cmajor 7 chord(C, E, G, B) in the music might incorrectly get de-tected as an E minor chord (E, G, B) if the latterthree notes are assigned a relatively higher weightin the chroma vector.

Key Change

In some songs, there is a key change toward the endof a song to make the final repeated part(s) (e.g., the

chorus/refrain) slightly different from the previousparts. This is affected by transposing the song tohigher semitones, usually up a half step. This hasalso been highlighted by Goto (2003). Because oursystem does not currently handle key changes, thechords detected in this section will not be recog-nized. This is illustrated with an example in Figure 5.

Another point to be noted here is of chord substi-tution/simplification of extended chords in theevaluation of our system. For simplicity, extendedchords can be substituted by their respective ma-

jor/minor triads. As long as the notes in the ex-tended chord are present in the scale, and the basictriad is present the simplification can be done For

example, the C-major 7 can be simplified to the Cmajor triad. This substitution has been performedon the extended chords annotated in the sheet music in the evaluation of our system.

Rhythm Tracking Observations

We conjecture the error in rhythm tracking to becaused by errors in the chord-detection as discussabove. Chords present in the music and not handlby our system could be incorrectly classified into oof the 24 major/minor triads owing to complexitiein polyphonic audio analysis. This can result in incorrect clusters of four chords being captured by trhythm-detection process, resulting in an incorrecpattern of measure boundaries having the highestcount. Furthermore, beat detection is a non-triviatask; the difficulties of tracking the beats in acous

tic signals are discussed in Goto and Muraoka(1994). Any error in beat detection can cause a shiin the rhythm structure determined by the system

Discussion

We have presented a framework to determine thekey, chords, and hierarchical rhythm structure froacoustic musical signals. To our knowledge, this ithe first attempt to use a rule-based approach that

combines low-level features with high-level musiknowledge of rhythm and harmonic structure to dtermine all three of these dimensions of music Th

Figure 5. Key Shift.


10/13

methods should (and do) work reasonably well forkey and chord labeling of popular music in 4/4 time.However, they would not extend very far beyondthis idiom or to more rhythmically and tonallycomplex music. This framework has been appliedsuccessfully in various other aspects of contentanalysis, such as singing-voice detection (Nwe et al.2004) and the automatic alignment of textual lyricsand musical audio (Wang et al. 2004). The humanauditory system is capable of extracting rich andmeaningful data from complex audio signals (Sheh

and Ellis 2003), and existing computational audi-tory analysis systems fall clearly behind humans inperformance. Towards this end, we believe that themodel proposed here provides a promising platformfor the future development of more sophisticatedauditory models based on a better understanding ofmusic. Our current and future research that buildson this work is highlighted below.

Key Detection

Our technique assumes that the key of the song isconstant throughout the duration of the song. How-ever, owing to the properties of the relative major/minor key combinations, we have made the chord-and rhythm-detection processes (which use the keyof the song as input) quite robust against changesacross this key combination. However, the same can-not be said about other kinds of key changes in themusic. This is because such key changes are quitedifficult to track, as there are no fixed rules and

they tend to depend more on the songwriters cre-ativity. For example, the song Let It Grow by EricClapton switches from a B-minor key in the verse toan E major key in the chorus. We believe that ananalysis of the song structure (verse, chorus, bridge,etc.) could likely serve as an input to help track thiskind of key change. This problem is currently beinganalyzed and will be tackled in the future.

Chord Detection

In this approach, we have considered only the majorand minor triads However in addition to these

there are other chord possibilities in popular musiand future work will be targeted toward the detection of dominant-seventh chords and extendedchords as discussed earlier. Chord-detection re-search can be further extended to include knowl-edge of chord progressions based on the function ochords in their diatonic scale, which relates to theexpected resolution of each chord within a key.That is, the analysis of chord progressions based othe need for a sounded chord to move from anunstable sound (a dissonance) to a more final

stable sounding one (a consonance).

Rhythm Tracking

Unlike that of Goto and Muraoka (1999), therhythm-extraction technique employed in our current system does not perform well for drumlessmusic signals, because the onset detector has beenoptimized to detect the onset of percussive events(Wang et al. 2003). Future effort will be aimed at e

tending the current work for music signals that donot contain drum sounds.

Acknowledgments

We thank the editors and two anonymous reviewfor helpful comments and suggestions on an earlieversion of this article.

References

Aigrain, P. 1999. New Applications of Content Pro-cessing of Music. Journal of New Music Research28(4):271280.

Allen, P. E., and R. B. Dannenberg. 1990. Tracking MusiBeats in Real Time. Proceedings of the 1990 International Computer Music Conference. San Francisco: Iternational Computer Music Association, pp. 14014

Bello, J. P., et al. 2000. Techniques for Automatic MusTranscription. Paper presented at the 2000 International Symposium on Music Information Retrieval, U

versity of Massachusetts at Amherst, 2325 OctoberCarreras, F., et al. 1999. Automatic Harmonic Descrip

tion of Musical Signals Using Schema-Based Chord D


11/13

composition. Journal of New Music Research 28(4):310333.

Cemgil, A. T., et al. 2001. On Tempo Tracking: Tempo-gram Representation and Kalman Filtering. Journal ofNew Music Research 29(4):259273.

Cemgil, A. T., and B. Kappen. 2003. Monte Carlo Meth-ods for Tempo Tracking and Rhythm Quantization.Journal of Artificial Intelligence Research 18(1):4581.

Chew, E. 2001. Modeling Tonality: Applications to Mu-sic Cognition. Proceedings of the Twenty-Third An-nual Conference of the Cognitive Science Society.Wheat Ridge, Colorado: Cognitive Science Society,

pp. 206211.Chew, E. 2002. The Spiral Array: An Algorithm for De-

termining Key Boundaries. Proceedings of the 2002International Conference on Music and Artificial In-telligence. Vienna: Springer, pp. 1831.

Dixon, S. 2001. Automatic Extraction of Tempo and Beatfrom Expressive Performances. Journal of New MusicResearch 30(1):3958.

Dixon, S. 2003. On the Analysis of Musical Expressionsin Audio Signals. SPIEThe International Society forOptical Engineering5021(2):122132.

Dixon, S. 2004. Analysis of Musical Content in Digital

Audio. Computer Graphics and Multimedia: Applica-tions, Problems, and Solutions. Hershey, Pennsylvania:Idea Group, pp. 214235.

Ewer, G. 2002. Easy Music Theory. Lower Sackville, NovaScotia: Spring Day Music Publishers.

Fujishima, T. 1999. Real-Time Chord Recognition ofMusical Sound: A System Using Common Lisp Mu-sic. Proceedings of the 1999 International ComputerMusic Conference. San Francisco: International Com-puter Music Association, pp. 464467.

Goto, M. 2001. An Audio-Based Real-Time Beat-TrackingSystem for Music with or without Drum Sounds.

Journal of New Music Research 30(2):159171.Goto, M. 2003. A Chorus-Section Detecting Method for

Musical Audio Signals. Proceedings of the 2003 Inter-national Conference on Acoustics, Speech, and SignalProcessing. Piscataway, New Jersey: Institute for Elec-trical and Electronics Engineers, pp. 437440.

Goto, M., and Y. Muraoka. 1994. A Beat-Tracking Sys-tem for Acoustic Signals of Music. Proceedings of the1994 ACM Multimedia. New York: Association forComputing Machinery, pp. 365372.

Goto, M., and Y. Muraoka. 1999. Real-Time Beat Track-ing for Drumless Audio Signals: Chord Change Detec-

tion for Musical Decisions. Speech Communication27(34):311335.

Hevner, K. 1936. Experimental Studies of the Elements

of Expression in Music.American Journal of Psychoogy48:246268.

Huron, D. 2000. Perceptual and Cognitive Applicationin Music Information Retrieval. Paper presented atthe 2000 International Symposium on Music Informtion Retrieval, University of Massachusetts at Amher2325 October.

Huron, D., and R. Parncutt. 1993. An Improved Modelof Tonality Perception Incorporating Pitch Salienceand Echoic Memory. Psychomusicology12(2):154171.

Klapuri, A. 2003. Automatic Transcription of Music.

Proceedings of SMAC 2003. Stockholm: KTH.Krumhansl, C. 1990. Cognitive Foundations of Musical

Pitch. Oxford: Oxford University Press.Maddage, N. C., et al. 2004. Content-Based Music Stru

ture Analysis with Applications to Music SemanticsUnderstanding. Proceedings of 2004 ACM Multimedia. New York: Association for Computing Machinerpp. 112119.

Martin, K.D., et al. 1998. Music Content AnalysisThrough Models of Audition. Paper presented at the1998 ACM Multimedia Workshop on Content Pro-cessing of Music for Multimedia Applications, Bristo

UK, 12 April.Ng, K., et al. 1996. Automatic Detection of Tonality U

ing Note Distribution. Journal of New Music Re-search 25(4):369381.

Nwe, T. L., et al. 2004. Singing Voice Detection in Poplar Music. Paper presented at the 2004 ACM Multimdia Conference, New York, 12 October.

Ortiz-Berenguer, L. I., and F. J. Casajus-Quiros. 2002.Polyphonic Transcription Using Piano Modeling forSpectral Pattern Recognition. Proceedings of the 20Conference on Digital Audio Effects. Hoboken, NewJersey: Wiley, pp. 4550.

Pardo, B., and W. Birmingham. 1999. Automated Partitioning of Tonal Music. Technical Report CSE-TR-39699, Electrical Engineering and Computer SciencDepartment, University of Michigan.

Pardo, B., and W. Birmingham. 2001. Chordal AnalysisTonal Music. Technical Report CSE-TR-439-01, Eletrical Engineering and Computer Science DepartmenUniversity of Michigan.

Pauws, S. 2004. Musical Key Extraction from Audio.Paper presented at the 2004 International Sympo-sium on Music Information Retrieval, Barcelona,11 October.

Pickens, J., et al. 2002. Polyphonic Score Retrieval UsiPolyphonic Audio Queries: A Harmonic Modeling Aproach. Paper presented at the 2002 International


12/13

Symposium on Music Information Retrieval, Paris,15 October.

Pickens, J., and T. Crawford. 2002. Harmonic Models forPolyphonic Music Retrieval. Proceedings of the 2002ACM Conference on Information and Knowledge Man-agement. New York: Association for Computing Ma-chinery, pp. 430437.

Pickens, J. 2003. Key-Specific Shrinkage Techniques forHarmonic Models. Poster presented at the 2003 Inter-national Symposium on Music Information Retrieval,Baltimore, 2630 October.

Plumbley, M. D., et al. 2002. Automatic Music Tran-

scription and Audio Source Separation. Cyberneticsand Systems 33(6):603627.

Povel, D. J. L. 2002. A Model for the Perception of TonalMelodies. Proceedings of the 2002 International Con-ference on Music and Artificial Intelligence. Vienna:Springer, pp. 144154.

Raphael, C. 2001. Automated Rhythm Transcription.Paper presented at the 2001 International Symposiumon Music Information Retrieval, Bloomington, Indiana,1517 October.

Raphael, C., and J. Stoddard. 2003. Harmonic Analysiswith Probabilistic Graphical Models. Paper presented

at the 2003 International Symposium on Music Infor-mation Retrieval, Baltimore, 2630 October.

Scheirer, E. D. 1998. Tempo and Beat Analysis of Acous-tic Musical Signals. Journal of the Acoustical Societyof America 103(1):588601.

Sheh, A., and D. P. W. Ellis. 2003. Chord Segmentationand Recognition Using Em-Trained Hidden MarkovModels. Paper presented at the 2003 InternationalSymposium on Music Information Retrieval, Balti-more, 2630 October.

Shenoy, A., et al. 2004. Key Determination of AcousticMusical Signals. Paper presented at the 2004 Interna-

tional Conference on Multimedia and Expo, Taipei, 2730 June.

Smith, N. A., and M. A. Schmuckler. 2000. Pitch-Distributional Effects on the Perception of Tonality.

Paper presented at the Sixth International Confer-ence on Music Perception and Cognition, Keele,510 August.

Sollberger, B., et al. 2003. Musical Chords as AffectivePriming Context in a Word-Evaluation Task. MusicPerception 20(3):263282.

Temperley, D. 1999a. Improving the Krumhansl-Schmuckler Key-Finding Algorithm. Paper presenteat the 22nd Annual Meeting of the Society for MusicTheory, Atlanta, 12 November.

Temperley, D. 1999b. Whats Key for Key? TheKrumhansl-Schmuckler Key-Finding Algorithm Rec

sidered. Music Perception 17(1):65100.Temperley, D. 2002. A Bayesian Model of Key-Finding

Proceedings of the 2002 International Conference onMusic and Artificial Intelligence. Vienna: Springer,pp. 195206.

Vercoe, B. L. 1997. Computational Auditory PathwayMusic Understanding. In I. Delige and J. Sloboda, ePerception and Cognition of Music. London: Psychoogy Press, pp. 307326.

Wakefield, G. H. 1999. Mathematical Representation Joint Time-Chroma Distributions. SPIEThe International Society for Optical Engineering3807:6376

Wang, Y., et al. 2003. Parametric Vector QuantizationCoding Percussive Sounds in Music. Proceedings ofthe 2003 International Conference on Acoustics,Speech, and Signal Processing. Piscataway, New JersInstitute for Electrical and Electronics Engineers.

Wang, Y. et al. 2004. LyricAlly: Automatic Synchroniztion of Acoustic Musical Signals and Textual Lyrics.Proceedings of 2004 ACM Multimedia. New York: Asociation for Computing Machinery, pp. 212219.

Zhu, Y., and M. Kankanhalli. 2003. Music Scale Modelfor Melody Matching. Proceedings of the Eleventh Iternational ACM Conference on Multimedia. New Yo

Association for Computing Machinery, pp. 359362.Zhu, Y., et al. 2004. A Method for Solmization of Melod

Paper presented at the 2004 International Conferencon Multimedia and Expo, Taipei, 2730 June.


13/13

shenoy, a., & wang, y. (2005). key, chord, and rhythm tracking of popular music recordings. computer...

Documents