music analysis and retrieval for audio signals george tzanetakis postdoctoral fellow computer...

38
Music Analysis and Retrieval for Audio Signals eorge Tzanetakis ostDoctoral Fellow omputer Science Department arnegie Mellon University [email protected] http://www.cs.cmu.edu/~gtzan

Post on 30-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Music Analysis and Retrieval for Audio

Signals

George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University

[email protected]://www.cs.cmu.edu/~gtzan

Page 2: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Bits of the history of bits

01011110101010 Hello world

Understanding multimedia content =

Web Multimedia

Page 3: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Music

➢ 4 million recorded CDs➢ 4000 CDs / month ➢ 60-80% ISP bandwidth➢ Global ➢ Pervasive➢ Complex

Page 4: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

The not so far future of MIR

➢ Library of all recorded music ➢ Tasks: organize, search, retrieve,

classify recommend, browse, listen, annotate

➢ Examples:

Page 5: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Talk Outline

DancingHuman ComputerInteraction

HearingSignal Processing

UnderstandingMachine Learning

MP3 feature extractionDWT Beat Histograms

QBE similarity retrievalGenre Classification

Content & Context AwareTimbregrams Timbrespaces

Page 6: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Hearing -Feature extraction

FeatureSpace

Feature vector

Page 7: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Timbral Texture

Timbre = differentiate sounds of samepitch and loudness

Timbral Texture = differentiate mixtures of sounds(possibly with the same or similar rhythmic and pitch content)

Global, statistical and fuzzy properties

Page 8: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Time-domain waveform

Input

Time

?

Decompose to building blocks

Frequency

Time

Page 9: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Spectrum and Shape Descriptors

M

F

CentroidRolloffFlux BandwidthMoments....

Centroid

FeatureSpace

Feature vector

=

Page 10: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Time-Frequency Analysis Fourier Transform

f t = 12Æ

+ f Î e i Î t dt

f Î =+ f t e i Î t dt

ei¾= cos ¾ƒ i sin ¾

f x =n=0

ancos n x ƒn=0

bn sin n x

Page 11: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Short TimeFourier Transform

Time-varying spectraFast Fourier Transform FFT

Input

Time

t

t+1

t+2

Filters Oscillators

Output

Amplitude

Frequency

Page 12: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

STFT- Wavelets

È t È Î e 1/ 4Heisenberg uncertaintyTime – Frequency

Page 13: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

MPEG Audio Feature Extraction

AnalysisFilterbank

Psychoacoustics Model

Available bits

Perceptual Audio Coding (slow encoding, fast decoding)

MP3

Pye ICASSP 00 Tzanetakis & Cook ICASSP 00

Page 14: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Summary of Timbral Texture Features

➢ Time-Frequency analysis ➢ Signal processing (STFT, DWT)➢ Perceptual (MFCC, MPEG) ➢ Statistics over “texture” window

Page 15: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Rhythm

➢ Rhythm = movement in time➢ Origins in poetry (iamb, trochaic...)➢ Foot tapping ➢ Hierarchical semi-periodic structure ➢ Linked to motion➢ Running vs global

Page 16: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Wavelet-basedRhythm Analysis

Tzanetakis et al AMTA01Goto, Muraoka CASA98Foote, Uchihashi ICME01Scheirer JASA98

InputSignal

Full Wave Rectification - Low Pass Filtering - Normalization

+

Peak Picking

Beat Histogram

Envelope Extraction

DWT

Autocorrelation

Page 17: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Beat HistogramsTzanetakis et al AMTA01

max(h(i)), argmax(h(i))

Page 18: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Musical Content Features

➢ Timbral Texture (19) ➢ Spectral Shape ➢ MFCC (perceptually motivated features,

ASR)➢ Rhythmic structure (6)

➢ Beat Histogram Features➢ Harmonic content (5)

➢ Pitch Histogram Features

Page 19: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Understanding

Musical Piece

Trajectory Point

Page 20: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Query-by-Example Content-based Retrieval

Rank List Collection of 3000 clips

Page 21: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Automatic Musical Genre Classification

➢ Categorical music descriptions created by humans

➢ Fuzzy boundaries➢ Statistical properties

➢ Timbral texture, rhythmic structure, harmonic content

➢ Evaluate musical content features➢ Structure audio collections

Page 22: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

p( | ) * P( )

Statistical Supervised Learning

Decision boundary

Partitioning of feature space

P( | )= p( )

MusicSpeech

Page 23: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Non-parametric classifiers

p( | ) * P( ) P( | )=

p( )

Nearest-neighbor classifiers(K-NN)

Page 24: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Parametric classifiers

p( | ) * P( ) P( | )=

p( )

Gaussian Classifier

Gaussian Mixture Models

Page 25: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Classification Evaluation – 10 genres

Automatic (different collection)

Gaussian Mixture Model (GMM)10-fold cross-validation 61% (70%)

Perrot & Gjerdingen, M.Cognition 99 Tzanetakis & Cook, TSAP 10(5) 2002

0

10

20

30

40

50

60

70

80

90

100

Classification Accuracy

Random

Automatic

Manual (52 subjects)

0.25 seconds 40% 3 seconds 70%

0

10

20

30

40

50

60

70

80

90

100

Classification Accuracy

Random

0.25

3

Page 26: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

GenreGram DEMO

Dynamic real time 3D display for classification of radio signals

Page 27: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Audio Segmentation

➢ Segmentation = changes of sound "texture"

Music Male Voice Female Voice

News:

Page 28: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Multifeature Segmentation Methodology

➢ Time series of feature vectors V(t)➢ f(t) = d(V(t), V(t-1))

➢ D(x,y) = (x-y)C-1(x-y)t (Mahalanobis)➢ df/dt peaks correspond to texture

changes

Tzanetakis & Cook, WASPAA 99

Page 29: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Interaction

➢ Automatic results not perfect➢ Music listening subjective➢ Browsing vs retrieval ➢ Adapt UI to audio “Content & Context”

➢ Computer Audition➢ Visualization

Cooledit

Page 30: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Content and Context

➢ Content ~ file➢ Genre, male voice, high frequency

➢ Context ~ file and collection➢ Similarity ➢ Slow – fast

➢ Multiple visualizations ➢ Same content, different context

Billie Holiday

Ella Fitzerald

Christina Aguilera

Page 31: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Principal Component Analysis

Projection matrix

PCAEigenanalysisof collection correlation matrix

Page 32: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Timbregrams and Timbrespaces

PCA = content & context

Tzanetakis & Cook DAFX00, ICAD01

Page 33: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Integration

Page 34: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Implementation

➢ MARSYAS : free software framework for computer audition research

➢ Server in C++ (numerical signal processing and machine learning)

➢ Client in JAVA (GUI) ➢ Linux, Solaris, Irix and Wintel (VS , Cygwin)

➢ Apr. 5500 downloads, 2300 different hosts, 30 countries since March 2001

Tzanetakis & Cook Organized Sound 4(3) 00

Page 35: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Marsyas users

Desert Island

Jared HoberockDan KellyBen Tietgen

Marc Cardle

Music-drivenmotion editing

Real timemusic-speechdiscrimination

Page 36: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Current Work-Collaborations

➢ CMU➢ Structural analysis ➢ Query -by-humming➢ MIR over P2P networks (Jun Gao) ➢ Informedia

➢ Princeton: sound fx analysis-synthesis(P.Cook)

➢ Rochester: machine learning (Tao Li)➢ Northwestern: perception of musical genre

(R.Gjerdingen)

Tzanetakis, Hu and Dannenberg, WIAMIS 03

Page 37: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Future Work

➢ Music➢ Singer identification➢ Chord progression detection➢ Intermediate representations

➢ Motion capture signals➢ Biological signals – time series in

general➢ Content and context aware

multimedia UI

Page 38: Music Analysis and Retrieval for Audio Signals George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs.cmu.edu

Auditory Scene AnalysisAlbert Bregman