1 copyright 2011 g.tzanetakis music information retrieval george tzanetakis...
TRANSCRIPT
1 Copyright 2011 G.Tzanetakis
Music Information Retrieval
George Tzanetakis ([email protected]) Associate Professor, IEEE Senior MemberTier II Canada Research Chair Computer Science Department(also in Music, ECE)University of Victoria, Canada
Copyright 2011 G.Tzanetakis
MIR
‣ Interdisciplinary science of retrieving information from music
‣ ISMIR - Int. Symposium -> Int. Conf. on MIR -> Int. Conf. of the Society of MIR
‣ First ISMIR in 2000
‣ Increasing presence in ICASSP, ICME, ACMM, TMM, TASLP, MMTA
‣ All proceedings are freely available online
3 Copyright 2011 G.Tzanetakis
ConnectionsMachineLearning
Signal Processing
Psychology
Computer Science
Information Science
Human-ComputerInteraction
MUSIC
Copyright 2011 G.Tzanetakis4
Music today Music is produced, distributed and consumed
digitally
2011 digital music sales > physical album sales
5 Copyright 2011 G.Tzanetakis
IndustryQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
6725421
Copyright 2011 G.Tzanetakis
Music Collections
‣ Personal music collections ~ thousands
‣ Streaming music sites, stores ~ millions
‣ Great celestial jukebox in the sky ~ all of recorded music in human history
‣ A 5-minute music track is digitally represented using approximately 26 million floating point numbers
7 Copyright 2011 G.Tzanetakis
Overview Focus on signal processing and audio
Audio Feature Extraction
Timbre, Pitch, Rhythm
Analysis
Similarity, Classification, Modelling Time
Tasks
Similarity, Genre classification, Tag annotation, Query-by-Humming, Audio-Score Alignment
8 Copyright 2011 G.Tzanetakis
Audio Feature Extraction Sound and sine waves
Timbral Features Short Time Fourier Transform (STFT)
Mel-Frequency Cepstral Coefficients (MFCC), Perceptual Audio Compression
Pitch and Harmony
Rhythm
9 Copyright 2011 G.Tzanetakis
Linear Systemsand Sinusoids
in1
in2
in1 + in2
out1
out2
out1 + out2
Amplitude
Period = 1 / Frequency
0 180 360
Phase True sine waves last forever
sine wave -> LTI -> new sine wave
Copyright 2011 G.Tzanetakis
Short TimeFourier Transform
Time-varying spectraFast Fourier Transform FFT
Input
Time
t
t+1
t+2
Filters Oscillators
Output
Amplitude
Frequency
12
Copyright 2011 G.Tzanetakis
Spectrum and Shape Descriptors
M
F
CentroidRolloffFlux BandwidthMoments....
Centroid
FeatureSpace
Feature vector
=
13
Copyright 2011 G.Tzanetakis
Mel Frequency Cepstral Coefficients
Mel-scale13 linearly-spaced filters 27 log-spaced filters
CFCF-130CF / 1.0718
CF+130CF * 1.0718
Mel-filtering
Log
DCT
MFCCs
16
Copyright 2011 G.Tzanetakis
Pitch content Harmony, melody = pitch concepts
Music Theory Score = Music
Bridge to symbolic MIR
Automatic music transcription
Non-transcriptive argumentsSplit the octave to discrete logarithmicallyspaced intervals
17
Copyright 2011 G.Tzanetakis
Pitch Detection
P
Time-domainFrequency-domainPerceptual
Pitch is a PERCEPTUAL attribute correlated but not equivalent to fundamental frequency
18
Copyright 2011 G.Tzanetakis
Time Domain
C4 Clarinet Note C4 Sine Wave
# zero-crossings sensitive to noise – needs LPF
19
Copyright 2011 G.Tzanetakis
AutoCorrelation
Efficient computation possible for powers of 2 using FFT
F(f) = FFT(X(t))S(f) = F(f) F*(f)R(l) = IFFT(S(f))
20
Copyright 2011 G.Tzanetakis
Frequency Domain
Fundamental frequency (as well as pitch) will correspond to peaks in the Spectrum. The fundamental does not necessarily have the highest amplitude.
Sine C4 Clarinet C4
23
Copyright 2011 G.Tzanetakis
Beat HistogramsTzanetakis et al AMTA01
max(h(i)), argmax(h(i)) Beat Histogram Features
25
Copyright 2011 G.Tzanetakis
Content-basedSimilarity Retrieval
(or query-by-example)
Point
Input: Query exampleOutput: Ranked list of similar audio files based on feature vector similarity
26
Copyright 2011 G.Tzanetakis
p( | ) * P( )
Classification
Decision boundary
Partitioning of feature spaceGenerative vs discriminative models
P( | )= p( )
MusicSpeech
27
Copyright 2011 G.Tzanetakis
Classification
Genre/Style
Emotion/Mood
Artist
Instrument
MIREX 2007
10 genres 700 30-secondclips / genre
28
Copyright 2011 G.Tzanetakis
Multi-tag annotation
Free-form tags (female voice, woman singing)
Multi-label classification problems with twists
Issues: synonyms, subpart relations, sparse,noisy
Cold start problem
Typically each tag is treated independently as a classification problem
Inverse also interesting (query-by-keywords)
30
Copyright 2011 G.Tzanetakis
Polyphonic Audio-Score Alignment
Representation Time Series of Chroma
Matching Procedure Dynamic Time
Warping
31
Copyright 2011 G.Tzanetakis
Dynamic Time Wraping
Aligned Performances of the same orchestral piece
Attempting to align two different orchestra pieces
32
Copyright 2011 G.Tzanetakis
Query-by-humming
User sings a melody
Computer searches database for song containing the melody
The challenge of difficult queries
33
Copyright 2011 G.Tzanetakis
The MUSART system
Query preprocessing Pitch contour extraction (audio) Note segmentation (symbolic)
Target preprocessing (symbolic) Theme extraction Model-forming, representation
Search to find approximate match Dynamic Time Warping, HMMs
34
Copyright 2011 G.Tzanetakis
Conclusions Through a combination of digital signal processing
and machine learning techniques a variety of music information retrieval tasks have been explored in the literature
The tasks covered in this presentation are representative of existing work and there are already commercial implementations for them. There are many more that are actively being investigated.
Music is a complex and fascinating signal and we
are just beginning to understand it better using computers