1 copyright 2011 g.tzanetakis music information retrieval george tzanetakis...

1 Copyright 2011 G.Tzanetakis

Music Information Retrieval

George Tzanetakis ([email protected]) Associate Professor, IEEE Senior MemberTier II Canada Research Chair Computer Science Department(also in Music, ECE)University of Victoria, Canada

Copyright 2011 G.Tzanetakis

MIR

‣ Interdisciplinary science of retrieving information from music

‣ ISMIR - Int. Symposium -> Int. Conf. on MIR -> Int. Conf. of the Society of MIR

‣ First ISMIR in 2000

‣ Increasing presence in ICASSP, ICME, ACMM, TMM, TASLP, MMTA

‣ All proceedings are freely available online

‣ [email protected]


ConnectionsMachineLearning

Signal Processing

Psychology

Computer Science

Information Science

Human-ComputerInteraction

MUSIC

Copyright 2011 G.Tzanetakis4

Music today Music is produced, distributed and consumed

digitally

2011 digital music sales > physical album sales


IndustryQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

6725421


Music Collections

‣ Personal music collections ~ thousands

‣ Streaming music sites, stores ~ millions

‣ Great celestial jukebox in the sky ~ all of recorded music in human history

‣ A 5-minute music track is digitally represented using approximately 26 million floating point numbers


Overview Focus on signal processing and audio

Audio Feature Extraction

Timbre, Pitch, Rhythm

Analysis

Similarity, Classification, Modelling Time

Tasks

Similarity, Genre classification, Tag annotation, Query-by-Humming, Audio-Score Alignment


Audio Feature Extraction Sound and sine waves

Timbral Features Short Time Fourier Transform (STFT)

Mel-Frequency Cepstral Coefficients (MFCC), Perceptual Audio Compression

Pitch and Harmony

Rhythm


Linear Systemsand Sinusoids

in1

in2

in1 + in2

out1

out2

out1 + out2

Amplitude

Period = 1 / Frequency

0 180 360

Phase True sine waves last forever

sine wave -> LTI -> new sine wave

10


Fourier Transform

Text

1768-1830


Short TimeFourier Transform

Time-varying spectraFast Fourier Transform FFT

Input

Time

t

t+1

t+2

Filters Oscillators

Output

Amplitude

Frequency

12


Spectrum and Shape Descriptors

M

F

CentroidRolloffFlux BandwidthMoments....

Centroid

FeatureSpace

Feature vector

=

13


Mel Frequency Cepstral Coefficients

Mel-scale13 linearly-spaced filters 27 log-spaced filters

CFCF-130CF / 1.0718

CF+130CF * 1.0718

Mel-filtering

Log

DCT

MFCCs

14


Audio Feature Extraction

15


Traditional Music Representations

16


Pitch content Harmony, melody = pitch concepts

Music Theory Score = Music

Bridge to symbolic MIR

Automatic music transcription

Non-transcriptive argumentsSplit the octave to discrete logarithmicallyspaced intervals

17


Pitch Detection

P

Time-domainFrequency-domainPerceptual

Pitch is a PERCEPTUAL attribute correlated but not equivalent to fundamental frequency

18


Time Domain

C4 Clarinet Note C4 Sine Wave

# zero-crossings sensitive to noise – needs LPF

19


AutoCorrelation

Efficient computation possible for powers of 2 using FFT

F(f) = FFT(X(t))S(f) = F(f) F*(f)R(l) = IFFT(S(f))

20


Frequency Domain

Fundamental frequency (as well as pitch) will correspond to peaks in the Spectrum. The fundamental does not necessarily have the highest amplitude.

Sine C4 Clarinet C4

21


Chroma – Pitch perception

22


Automatic RhythmDescription

23


Beat HistogramsTzanetakis et al AMTA01

max(h(i)), argmax(h(i)) Beat Histogram Features

24


Analysis Overview

Musical Piece

Trajectory

Point

Cloud

25


Content-basedSimilarity Retrieval

(or query-by-example)

Point

Input: Query exampleOutput: Ranked list of similar audio files based on feature vector similarity

26


p( | ) * P( )

Classification

Decision boundary

Partitioning of feature spaceGenerative vs discriminative models

P( | )= p( )

MusicSpeech

27


Classification

Genre/Style

Emotion/Mood

Artist

Instrument

MIREX 2007

10 genres 700 30-secondclips / genre

28


Multi-tag annotation

Free-form tags (female voice, woman singing)

Multi-label classification problems with twists

Issues: synonyms, subpart relations, sparse,noisy

Cold start problem

Typically each tag is treated independently as a classification problem

Inverse also interesting (query-by-keywords)

29


Stacking

30


Polyphonic Audio-Score Alignment

Representation Time Series of Chroma

Matching Procedure Dynamic Time

Warping

31


Dynamic Time Wraping

Aligned Performances of the same orchestral piece

Attempting to align two different orchestra pieces

32


Query-by-humming

User sings a melody

Computer searches database for song containing the melody

The challenge of difficult queries

33


The MUSART system

Query preprocessing Pitch contour extraction (audio) Note segmentation (symbolic)

Target preprocessing (symbolic) Theme extraction Model-forming, representation

Search to find approximate match Dynamic Time Warping, HMMs

34


Conclusions Through a combination of digital signal processing

and machine learning techniques a variety of music information retrieval tasks have been explored in the literature

The tasks covered in this presentation are representative of existing work and there are already commercial implementations for them. There are many more that are actively being investigated.

Music is a complex and fascinating signal and we

are just beginning to understand it better using computers

1 copyright 2011 g.tzanetakis music information retrieval george tzanetakis...

Documents

tzanetakis fourier

tzanetakis spectrum

tzanetakis industry

canada slide

music ismir

music bridge

music sites

recorded music