pitch-based representations, analysis and applicationspitch-based representations, analysis and...

121
1 Copyright 2011 G.Tzanetakis Pitch-based representations, analysis and applications George Tzanetakis ([email protected] ) Associate Professor Canada Research Chair (Tier II) Computer Science Department (also in Music, ECE) University of Victoria, Canada

Upload: others

Post on 29-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

1 Copyright 2011 G.Tzanetakis

Pitch-based representations, analysis and applications

George Tzanetakis ([email protected]) Associate Professor Canada Research Chair (Tier II) Computer Science Department (also in Music, ECE) University of Victoria, Canada

Page 2: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

2 Copyright 2011 G.Tzanetakis

Traditional Music Representations

Page 3: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

3 Copyright 2011 G.Tzanetakis

Pitch content Ø  Harmony, melody = pitch concepts

Ø  Music Theory Score = Music

Ø  Bridge to symbolic MIR

Ø  Automatic music transcription

Ø  Non-transcriptive arguments Split the octave to discrete logarithmically spaced intervals

Page 4: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

4 Copyright 2011 G.Tzanetakis

MIDI Ø  Musical Instrument Digital Interfaces

Ø  Hardware interface Ø  File Format

Ø  Note events

Ø  Duration, discrete pitch, "instrument" Ø  Extensions

Ø  General MIDI Ø  Notation, OMR, continuous pitch

Page 5: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

5 Copyright 2011 G.Tzanetakis

Representations Ø  Score

Ø  Discrete, high level abstraction, explicit structure, no performance info

Ø  MIDI Ø  Discrete, medium level of abstraction, explicit

time but less structure, targeted to keyboard performance

Ø  Audio Ø  Continuous, low level abstraction, timing and

structure implicit

Page 6: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

6 Copyright 2011 G.Tzanetakis

Psychoacoustics

Ø  Scientific study of sound perception

Ø  Frequently limits of perception

Ø  Range (20Hz – 20000Hz) Ø  Intensity (0dB-120dB) Ø  Masking

Ø  Missing fundamental (2xf, 3xf, 4xf) give humans the impression of 1xf pitch

Page 7: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

7 Copyright 2011 G.Tzanetakis

Pitch Detection

P

Time-domain Frequency-domain Perceptual

Rhythm -> ~20 Hz Pitch (courtesy of R.Dannenberg – Nyquist)

Pitch is a PERCEPTUAL attribute correlated but not equivalent to fundamental frequency

Page 8: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

8 Copyright 2011 G.Tzanetakis

Pitch Perception I

Ø  Pitch is not just fundamental frequency

Ø  Periodicity or harmonicity or both ?

Ø  Human judgements (adjust sine method)

Ø  1924 Fletcher – harmonic partials missing fundamental (pitch is still heard)

Ø  Examples: phone, small radio Ø  Terhardt (1972), Licklider (1959)

Ø  duplex theory of pitch (virtual & spectral pitch)

Page 9: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

9 Copyright 2011 G.Tzanetakis

Pitch Perception II

Ø  One perception – two overlapping mechanisms

Ø  Counting cycles of period < 800Hz Ø  Place of excitation along basilar membrane >

1600 Hz

FFT Pick sinusoids weighting / masking candidate generation

common divisors most likely pitch

Page 10: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

10 Copyright 2011 G.Tzanetakis

Time Domain

C4 Clarinet Note C4 Sine Wave

# zero-crossings sensitive to noise – needs LPF

Page 11: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

11 Copyright 2011 G.Tzanetakis

AutoCorrelation

Efficient computation possible for powers of 2 using FFT

F(f) = FFT(X(t)) S(f) = F(f) F*(f) R(l) = IFFT(S(f))

Page 12: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

12 Copyright 2011 G.Tzanetakis

Average Magnitude Difference Function

No multiplies – more efficient for fixed point

Page 13: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

13 Copyright 2011 G.Tzanetakis

Frequency Domain

Fundamental frequency (as well as pitch) will correspond to peaks in the Spectrum. The fundamental does not necessarily have the highest amplitude.

Sine C4 Clarinet C4

Page 14: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

14 Copyright 2011 G.Tzanetakis

Multiple Pitch Detection

> 1kHz

< 1kHz

Half-wave Rectifier LowPass

Periodicity Detection

Periodicity Detection

SACF Enhancer

Pitch Candidates

Tzanetakis et al, ISMIR 01 Tolonen and Karjalainen, TSAP00

Page 15: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

15 Copyright 2011 G.Tzanetakis

Plotting over Time

A4, B4, C4

Page 16: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

16 Copyright 2011 G.Tzanetakis

Polyphonic Transcription

Original Transcribed

Mixture signal Noise Suppression

Klapuri et al, DAFX 00

Predominant pitch estimation

Remove detected sound

Estimate # voices iterate

Page 17: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

17 Copyright 2011 G.Tzanetakis

Perceptual Pitch

Scales Ø  Perception of frequency

Ø  Various perceptual scales based on JND

Ø  Typically linear below a break frequency Fb and logarithmic above

Ø  Popular choices for break frequency (1000, 700, 625, 228)

Ø  What the hell is a mel ?

Page 18: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

18 Copyright 2011 G.Tzanetakis

Mel Mapping

Page 19: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

19 Copyright 2011 G.Tzanetakis

Musical Pitch

Ø  Tuning = different ways of subdividing the octave logarithmically (as ratios) into intervals

Ø  Tension between harmonic ratios, modulation to different keys, regularity, pure fifths (ratio of 1.5 or 3:2)

> Many tuning systems have been explored through history

Page 20: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

20 Copyright 2011 G.Tzanetakis

Tuning systems

Ø  Just intonation (1:1, 9:8, 5:4, 4:3, 3:2, 5:3, 15:8, 2:1)

Ø  Pythagorean tuning all notes derives from 3:2 (1:1, 256:243, 9:8,…)

Ø  Equal temperament

Ø  All notes spaced by logarithmically equal distances (100 cents). Each step is higher by 21/12 (1.0594) from previous.

Page 21: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

21 Copyright 2011 G.Tzanetakis

Notation

Ø  A, B, C, D, E, F, G

Ø  Number indicate octave

Ø  A4 is 440Hz and MIDI number 69 Ø  Do, Re, Mi, Fa, Sol, La, Ti

Ø  MIDI (0-128)

Ø  m = 69 + 12 log2(f/440)

Page 22: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

22 Copyright 2011 G.Tzanetakis

Pitch Helix

Linear pitch (i.e log(frequency) is wrapped around a cylinder – in order to model the octave equivalence. Pitch perception has two dimensions: Height: naturally organizes pitches from low to high Chroma: represents the inherent circularity of pitch

Page 23: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

23 Copyright 2011 G.Tzanetakis

Pitch Histograms

C G C G

(7 * c ) mod 12

Circle of 5s

Page 24: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

24 Copyright 2011 G.Tzanetakis

Chroma

Page 25: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

25 Copyright 2011 G.Tzanetakis

Calculating Pitch Profiles – Warping

Ø  Calculate FFT of a signal segment

Ø  Map each FFT bin to Hertz

Ø  512 time domain samples -> 256 FFT bins @ 22050 Hz. Each bin will be 11025/256 ~= 43 Hz

Ø  f = k * (srate / fft_size)

Ø  Map each bin (in Hertz) to MIDI:

Ø  m = 69 + 12 log2(f/440)

Page 26: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

26 Copyright 2011 G.Tzanetakis

Pitch Histogram

Ø  Average amplitudes of bins mapping to the same MIDI note number

(different averaging shapes can be used)

Ø  If desired fold the resulting histogram, collapsing bins that belong to the same pitch class into one

Ø  Frequently more than 12 bins per octave to account for tuning/performance variations

Page 27: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

27 Copyright 2011 G.Tzanetakis

Chroma Profiles

Sine C4 Clarinet C4

0 bin is A and spacing is chromatic

Page 28: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

28 Copyright 2011 G.Tzanetakis

Chromagrams

Page 29: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

29 Copyright 2011 G.Tzanetakis

Time Alignment

Ø  ARTHUR (Foote 2000)

Ø  Two sequences of energy contours corresponding to two performances of the same symphony

Ø  We are given two pitch sequences of the same melody sung by different singers

Ø  How can we find if they match ?

Page 30: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

30 Copyright 2011 G.Tzanetakis

Dynamic Time Warping

Page 31: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

31 Copyright 2011 G.Tzanetakis

Music Representations

Ø  Symbolic Representation –  easy to manipulate –  “flat” performance

Ø  Audio Representation –  expressive

performance –  opaque & unstructured

Align

POLYPHONIC AUDIO AND MIDI ALIGNMENT

Page 32: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

32 Copyright 2011 G.Tzanetakis

Similarity Matrix

Similarity Matrix for Beethoven’s 5th Symphony, first movement

Optimal Alignment

Path

Oboe solo: • Acoustic Recording • Audio from MIDI

(Duration: 6:17)

(Dur

atio

n: 7

:49)

POLYPHONIC AUDIO AND MIDI ALIGNMENT

Page 33: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

33 Copyright 2011 G.Tzanetakis

Performance matching

A B C D

Power plot

24d “pitch” vectors from FFT

Nearest neighbor with Locality-Sensitive Hashing Identical, different copy, different vocals, different performance (80%)

Characteristic sequence

Yang, WASPAA 99

Foote, ISMIR 00

Dynamic programming Symphonies

Page 34: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

34 Copyright 2011 G.Tzanetakis

Structural Analysis

Ø  Similarity matrix

Ø  Representations

Ø  Notes Ø  Chords Ø  Chroma

Ø  Greedy hill-climbing algorithm

Ø  Recognize repeated patterns Ø  Result = AABA (explanation)

Dannenberg & Hu, ISMIR 2002 Tzanetakis, Dannenberg & Hu, WIAMIS 03

Page 35: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

35 Copyright 2011 G.Tzanetakis

Similarity Matrices

Satin Doll - MIDI Satin Doll – Audio-from-midi

Page 36: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

36 Copyright 2011 G.Tzanetakis

An example – Naima

Page 37: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

37 Copyright 2011 G.Tzanetakis

Perception-based approaches

Ø  Pitch perception Ø  Loudness percetion Ø  Critical Bands Ø  Mel-Frequency Cepstral Coefficients Ø  Masking Ø  Perceptual Audio Compression (MPEG)

Page 38: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

38 Copyright 2011 G.Tzanetakis

The Human Ear

Pinna Auditory canal Ear Drum Stapes-Malleus-Incus (gain control) Cochlea (freq. analysis) Auditory Nerve ?

Wave travels to cutoff slowing down increasing in amplitude power is absorbed

Each frequency has a position of maximum displacement

Page 39: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

39 Copyright 2011 G.Tzanetakis

Masking

Two frequencies -> beats -> harsh -> seperate

Inner Hair Cell excitation Frequency Masking Temporal Masking

Pairs of sine waves (one softer) – how much weaker in order to be masked ? (masking curves) wave of high frequency can not mask a wave of lower frequency

Page 40: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

40 Copyright 2011 G.Tzanetakis

Mel Frequency Cepstral Coefficients

Mel-scale 13 linearly-spaced filters 27 log-spaced filters

CF CF-130 CF / 1.0718

CF+130 CF * 1.0718

Mel-filtering

Log

DCT

MFCCs

Page 41: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

41 Copyright 2011 G.Tzanetakis

Discrete Cosine Transform Ø  Strong energy compaction

Ø  For certain types of signals approximates KL transform (optimal)

Ø  Low coefficients represent most of the signal - can throw high

Ø  MFCCs keep first 13-20

Ø  MDCT (overlap-based) used in MP3, AAC, Vorbis audio compression

Page 42: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

42 Copyright 2011 G.Tzanetakis

MPEG Audio Feature Extraction

Analysis Filterbank

Psychoacoustics Model

Available bits

Perceptual Audio Coding (slow encoding, fast decoding)

MP3

Pye ICASSP 00 Tzanetakis & Cook ICASSP 00

Page 43: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

43 Copyright 2011 G.Tzanetakis

Psychoacoustic Model

Ø  Each band is quantized

Ø  Quantization introduces noise

Ø  Adapt the quantization so that it is inaudible

Ø  Take advantage of masking

Ø  Hide quantization noise where it is masked Ø  MPEG standarizes how the quantized bits are

transmitted not the psychoacoustic model – (only recommended)

Page 44: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

44 Copyright 2011 G.Tzanetakis

HMM segmentation

Hidden

p( | )

Observed

Model

1 2

P( | )

3 4 5

t t-1

Aucouturier & Sandler, AES 01

Page 45: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

45 Copyright 2011 G.Tzanetakis

Locating singing voice segments

Berenzweig & Ellis, WASPAA 99

Multi-layer perceptron 2000 hidden units 54 phone classes

16 msec p(phone class)

80% accuracy

Page 46: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

46 Copyright 2011 G.Tzanetakis

“Classic” multi-stage approach

Grouping Cue 1

Time-Frequency representation

Short Time Fourier Transform Discrete basis: windowed sine waves

Grouping Cue 2

Partial Tracking (McAuley & Quatieri)

Sound source formation: grouping of partials based on harmonicity

PROBLEMS: Difficult to decide ordering, brittle

Page 47: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

47 Copyright 2011 G.Tzanetakis

Spectral Clustering

Ø  Alternative to traditional point-based algorthms (k-means)

Ø  Doesn’t assume convex shapes

Ø  Doesn’t assume Gaussians

Ø  Avoid multiple restarts

Ø  Eigenstructure of similarity matrix

Page 48: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

48 Copyright 2011 G.Tzanetakis

Sound Source Separation using Spectral Clustering

Page 49: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

49 Copyright 2011 G.Tzanetakis

Comparison with partial tracking

MacAuly and Quatieri Tracking of Partials

Proposed Approach

Page 50: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

50 Copyright 2011 G.Tzanetakis

050

100150

200 0500

1000

15002000

25000

0.05

0.1

0.15

0.2

0.25

Frequency (Hz)Time (frames)

Amplitude

Cluster 0Cluster 1

050

100150

200 0500

1000

15002000

25000

0.05

0.1

0.15

0.2

0.25

Frequency (Hz)Time (frames)

Amplitude

Cluster 0Cluster 1

Synthetic Mixtures of Instruments

4-note mixture

Instrumentation detection based on timbral models Martins, et al, ISMIR07

Page 51: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

51 Copyright 2011 G.Tzanetakis

“Real world” separation results

Ø  Mirex database

Ø  Live U2

More examples: http://opihi.cs.uvic.ca/NormCutAudio http://opihi.cs.uvic.ca/Dafx2007

Page 52: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

52 Copyright 2011 G.Tzanetakis

Playlist generation

(s1,s2,s3, ... , sn) 20% slow songs, 80% fast, female jazz singers

Tewfik, ICASSP 99 Pachet, IEEE Multimedia00

Constraint-satisfaction problem Smooth transitions

Technical attributes (artist, album, name) Content attributes (jazz singer, brass)

Page 53: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

53 Copyright 2011 G.Tzanetakis

Audio Thumbnailing

Ø  Representative short summary of piece

Ø  Segmentation-based Ø  Repetition-based

Ø  Hard to evaluate

Page 54: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

54 Copyright 2011 G.Tzanetakis

Segmentation-based Thumbnailing

Ø  Begin and end times of a 2 second thumbnail that best represents the segment

" 62% first two seconds of the segment " 92% two seconds within the first five seconds

of the segment Ø  Automatic thumbnailing

" first 5 sec + best effort about 80% "correct"

Page 55: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

55 Copyright 2011 G.Tzanetakis

Repetition-based thumbnailing

Thumbnail = maximum repeated segment

Logan, B., ICASSP 00 Bartch and Wakefield,WASPAA99

Alternatives: Clustering, HMM

Page 56: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

56 Copyright 2011 G.Tzanetakis

Structure from similarity

Feature vector trajectory Correlation at various time lags

Foote et al, ISMIR 02 Dannenberg et al, ICMC 02

ABAA'

Page 57: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

57 Copyright 2011 G.Tzanetakis

Query-by-humming

Ø  User sings a melody

Ø  Computer searches database for song containing the melody

Ø  Probably less useful than it sounds but interesting problem

Ø  The challenge of difficult queries

Page 58: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

58 Copyright 2011 G.Tzanetakis

The MUSART system

Ø  Query preprocessing

Ø  Pitch contour extraction (audio) Ø  Note segmentation (symbolic)

Ø  Target preprocessing (symbolic)

Ø  Theme extraction Ø  Model-forming, representation

Ø  Search to find approximate match

Ø  Dynamic Programming, HMMs

Page 59: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

59 Copyright 2011 G.Tzanetakis

Representations

Ø  Pitch and tempo invariance

Ø  Quantized pitch intervals Ø  Quantized IOI ratio

Ø  Approximate matching

Ø  HMM Ø  Dynamic programming Ø  Time Series

Page 60: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

60 Copyright 2011 G.Tzanetakis

Audio Fingerprinting and Watermarking

Ø  Watermarking

Ø  Copyright protection Ø  Proof of ownership Ø  Usage policies

Ø  Metadata hiding

Ø  Fingerprinting

Ø  Tracking Ø  Copyright protection Ø  Metadata linking

Page 61: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

61 Copyright 2011 G.Tzanetakis

Watermarking

Ø  Steganography (hiding information in messages – invisible ink )

Watermark Embedder

watermark data

key

signal repres.

transmission attacks

Watermark Extractor

key

watermark data

(original music)

Page 62: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

62 Copyright 2011 G.Tzanetakis

Desired Properties

Ø  Perceptually hidden (inaudible) Ø  Statistically invisible Ø  Robust against signal processing Ø  Tamper resistant Ø  Spread in the music, not in header Ø  key dependent

Page 63: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

63 Copyright 2011 G.Tzanetakis

Representations for Watermarking

Ø  Basic Principles

Ø  Psychoacoustics Ø  Spread Spectrum

Ø  redundant spread of information in TF plane Ø  Representations

Ø  Linear PCM

Ø  Compressed bitstreams Ø  Phase, stereo Ø  Parametric representations

Page 64: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

64 Copyright 2011 G.Tzanetakis

Watermarking on parametric representations

Yi-Wen Liu J. Smith 2004

message

parameters p

QIM p' Audio

Synthesis Attack

Parameter Estimation Min dist

decoding

W

W' p'' Choose attack tolerance quantize so perceptual distortion < t and lattice finding possible

Page 65: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

65 Copyright 2011 G.Tzanetakis

Problems with watermarking

Ø  The security of the entire system depends on devices available to attackers

Ø  Breaks Kerckhoff's Criterion: A security system must work even if reverse-engineered

Ø  Mismatch attacks

Ø  Time stretch audio – stretch it back (invertible)

Ø  Oracle attacks

Ø  Poll watermark detector

Page 66: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

66 Copyright 2011 G.Tzanetakis

Audio Fingerprinting

Ø  Each song is represent as a fingerprint (small robust representation)

Ø  Search database based on fingerprint

Ø  Main challenges

Ø  highly robust fingerprint extraction Ø  efficient fingerprint search strategy

Ø  Information is summarized from the whole song – attacks degrade unlike watermarking

Page 67: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

67 Copyright 2011 G.Tzanetakis

Hash functions

Ø  H(X) -> maps large X to small hash value

Ø  compare by comparing hash value

Ø  Perceptual hash function ?

Ø  impossible to get exact matching Ø  Perceptually similar objects result in similar

fingerprints

Ø  Detection/false alarm tradeoff

Page 68: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

68 Copyright 2011 G.Tzanetakis

Properties

Ø  Robustness

Ø  Reliability

Ø  Fingerprint size

Ø  Granularity

Ø  Search speed and scalability

Page 69: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

69 Copyright 2011 G.Tzanetakis

Fraunhofer

Ø  LLD Mpeg-7 framework (SFM)

Ø  Vector quantization (k-means)

Ø  Codebook of representative vectors Ø  Database target signature is the codebook

Ø  Query -> sequence of feature vectors

Ø  Matching by finding “best” codebook

Ø  Robust not very scalable (O(n) search))

Allamanche Ismir 2001

Page 70: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

70 Copyright 2011 G.Tzanetakis

Philips Research

Ø  32-bit subfingerprints for every 11.6 msec

Ø  overlapping frames of 0.37 seconds (31/32 overlap)

Ø  PSD -> logarithmic band spacing (bark)

Ø  bits 0-1 sign of energy

Ø  looks like a fingerprint

Ø  assume one fingerprint perfect – hierarchical database layout (works ok)

Haitsa & Kalker Ismir 2002

Page 71: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

71 Copyright 2011 G.Tzanetakis

Shazam Entertainment

Ø  Pick landmarks on audio – calculate fingerprint

Ø  histogram of relative time differences for filtering

Ø  Spectrogram peaks (time, frequency)

Page 72: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

72 Copyright 2011 G.Tzanetakis

Spectrogram Peaks

Very robust – even over noisy cell phones

Page 73: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

73 Copyright 2011 G.Tzanetakis

Audio Fingerprinting

Music piece

Signature Database

Signature 1.5 million Matching

Robustness

300 bytes

1sec

20msec

Copyright, metadata

moodlogic.net

Page 74: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

STEREO PANNING FEATURES FOR CLASSIFYING

RECORDING PRODUCTION STYLE

George Tzanetakis [email protected]

Randy Jones

Kirk McNally

ISMIR 2007 8th International Conference on Music Information Retrieval

23rd – 27th September 2007

Vienna, Austria

Page 75: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

75 ISMIR 2007 – Stereo Panning Features for Classifying Recording Production Style

Motivation

•  The “classic” audio MIR system

•  Mixing and production are critical in modern recordings

–  The Producer as a Composer - Virgil Moorefield, MIT Press, 2005

–  “Famous” record producers

•  Phil Spector, George Martin, Quincy Jones, Brian Eno

•  Can recording production information and more specifically stereo mixing information assist the automatic extraction of information from audio signals ?

Convert2Mono � Feature�Extraction �

Machine�Learning �

Page 76: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

76 ISMIR 2007 – Stereo Panning Features for Classifying Recording Production Style

Related Work

•  The “album” effect

–  Artist identification performance degrades when different albums are used for training & evaluation (Whitman, IEEE NNSP, 2003)

–  Compensate mastering equalization curves (Kim, ISMIR 2006)

•  “Glass” ceiling of timbral features

–  (Aucouturier, Journal of Negative Results in Speech and Audio 2004)

•  Stereo-based source separation

–  For each FFT bin calculate panning coefficient

–  Group together bins that have similar panning coefficients as belonging to the same sound source

–  (Avendano, WASPAA 2003) (Woodruff, ISMIR 2003)

Page 77: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

77 ISMIR 2007 – Stereo Panning Features for Classifying Recording Production Style

Stereo Panning Spectrum

Based on �(Avendano, WASPAA 2003)�

For every FFT bin a �panning coefficient �

between -1 (left) and +1 (right)�with the center at 0 �

Page 78: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

78 ISMIR 2007 – Stereo Panning Features for Classifying Recording Production Style

Example I

Hell’s Bells - ACDC (black left, white right)�Important note: Just panning information �(invariant to frequency content/dynamics)�

Page 79: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

79 ISMIR 2007 – Stereo Panning Features for Classifying Recording Production Style

Example II

Supervixen by Garbage (black left, white right)�

Page 80: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

80 ISMIR 2007 – Stereo Panning Features for Classifying Recording Production Style

Stereo Panning Spectrum Features

Panning RMS for subbands�Low (0-250 Hz) �Medium (250-2500 Hz)�High (2500-22050 Hz) �

Texture-window features �For dynamics �

Final features are the means and standard deviations of �the entire audio clip resulting in a 4*2*2 = 16 dimensional vector / audio clip �

Page 81: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

81 ISMIR 2007 – Stereo Panning Features for Classifying Recording Production Style

Experiments

•  Collections

–  1960s “garage” (The Byrds, The Kinks, Buddy Holly)

–  1980s “grunge” (Nirvana, PearlJam, RadioHead)

–  Acoustic Jazz (Miles Davis, John Coltrange, Wynton Marsalis)

–  Electric Jazz (Return to Forever, Weather Report, Mahavishnu Orchestra)

•  Configurations

–  STEREO Stereo Panning Spectrum Features (SPSF)

–  STEREO MFCC (concatenate MFCCs for left and right)

–  STEREO (SPSF+MFCC)

Page 82: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

82 ISMIR 2007 – Stereo Panning Features for Classifying Recording Production Style

Results (10-fold cross-validation)

� ��

Naïve Bayes Classifier�Linear SVM trained �

Using SMO�Decision Tree �

Page 83: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

83 ISMIR 2007 – Stereo Panning Features for Classifying Recording Production Style

Some more results

Mean Average Panning Histograms�

Acoustic Jazz� Electric Jazz�

Artist20 Id �with album folds�(not in paper) �

Page 84: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

84 ISMIR 2007 – Stereo Panning Features for Classifying Recording Production Style

MIREX 2007 results

•  Music Information Retrieval Evaluation Exchange (MIREX)

–  Annual forum for comparing algorithms of different MIR tasks

–  2007 “classification tasks”

•  Audio Artist Identification, Classical Composer Identification

•  Audio Genre Classification, Audio Mood Classification

Page 85: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

85 ISMIR 2007 – Stereo Panning Features for Classifying Recording Production Style

Implementation

http://marsyas.sourceforge.net ���

bextract -e STEREOSPS /pathto/mirexcollection.txt -w fmatrix.arff�

�bextract -e STEREOMFCC /pathto/mirexcollection.txt -w fmatrix.arff�

bextract -e STEREOSPSMFCC /pathto/mirexcollection.txt -w fmatrix.arff�

Page 86: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

86 ISMIR 2007 – Stereo Panning Features for Classifying Recording Production Style

Future Work & Acknowledgements

•  Characterize more generally recording production style

•  Audio-based record producer identification

–  Maybe a future MIREX task ?

•  Dave Pensado describing one of his mixes (Mix Magazine)

–  “Massive club bottom, hiphop sensibility in the bottom, and this real smoothed-out, classy Quincy Jones types top”

–  Can we move this type of description into audio MIR ?

•  Many thanks to:

–  NSERC, SSHRC Canada

–  Carlos Avendano for clearly describing his algorithm

–  Dan Ellis for providing stereo files for the ArtistID 20

–  Perry Cook for suggesting using stereo information a long time ago

Page 87: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

87 Copyright 2011 G.Tzanetakis

Computational Musicology - recitation research

Ø  Transition from oral to written transmission

Ø  Study how diverse recitation traditions having their origin in primarily non-notated melodies, later became codified

Ø  Hungarian siratok, torah cantillation, koran recitation, 10th century St. Gallen plainchant

Page 88: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

88 Copyright 2011 G.Tzanetakis

Pitch Contour Extraction

Page 89: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

89 Copyright 2011 G.Tzanetakis

Histogram-based contour abstraction

Page 90: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

90 Copyright 2011 G.Tzanetakis

Dynamic-Time Warping

Page 91: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

91 Copyright 2011 G.Tzanetakis

Cantillion http://cantillion.sness.net

Page 92: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

92 Copyright 2011 G.Tzanetakis

Retrieval

Page 93: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

93 Copyright 2011 G.Tzanetakis

Retrieval at different levels of abstraction

Page 94: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

94 Copyright 2011 G.Tzanetakis

Marsyas Overview Ø  Software framework for audio analysis, synthesis and retrieval

Ø  Efficient and extensible framework design

Ø  specific emphasis on Music Information Retrieval (MIR)

Ø  C++, OOP Ø  Multiplatform (Linux, MS Windows®, MacOSX®, …)

Ø  Provides a variety of building blocks for performing common audio tasks:

Ø  soundfile IO, audio IO, signal processing and machine learning modules

Ø  blocks can be combined into data flow networks that can be modified and controlled dynamically while they process data in soft real-time.

WAV source

FFT

KNN

WAV source Hanning FFT

Page 95: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

95 Copyright 2011 G.Tzanetakis

Page 96: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

96 Copyright 2011 G.Tzanetakis

Marsyas Overview

Ø  Marsyas Brief History

Ø  1998 ~2000

Ø  Created by George Tzanetakis during his PhD activities at Princeton Ø  2000 ~2002

Ø  Marsyas 0.1 Ø  First stable revisions of the toolkit Ø  Distributions hosted at SourceForge Ø  Creation of a developer community

Ø  User and Developer Mailing lists Ø  2002 ~ …

Ø  Marsyas 0.2 Ø  Major framework revision Ø  SourceForge SubVersion

Page 97: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

97 Copyright 2011 G.Tzanetakis 04.17.2009 Open Source Software Framework for Audio Research

Related Work Context

Ø  Open Source frameworks Ø  CLAM (http://clam.iua.upf.edu/)

Ø  STK (http://ccrma.stanford.edu/software/stk/)

Ø  Chuck (http://chuck.cs.princeton.edu/)

Ø  PureData (Pd) (http://crca.ucsd.edu/~msp/software.html)

Ø  Open Sound Control (OSC) (http://cnmat.berkeley.edu/OpenSoundControl/)

Commercial toolkits

Ø  MAX/MSP® (http://www.cycling74.com/)

Ø  MATLAB® Simulink® (http://www.mathworks.com/products/simulink/)

Ø  LabView® (http://www.ni.com/labview/)

Ø  DirectShow® GraphEdit (http://windowssdk.msdn.microsoft.com/en-us/library/ms787460.aspx)

Page 98: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

98 Copyright 2011 G.Tzanetakis

Statistics

Page 99: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

99 Copyright 2011 G.Tzanetakis

Statistics

Page 100: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

100 Copyright 2011 G.Tzanetakis

Statistics

Page 101: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

101 Copyright 2011 G.Tzanetakis

Statistics

Page 102: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

102 Copyright 2011 G.Tzanetakis

Users & Applications

Ø  Musicream (Masataka Goto)

Ø  Music playback system with similarity capabilities

Ø  Uses Marsyas as its music similarity engine

http://staff.aist.go.jp/m.goto/Musicream/

Page 103: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

103 Copyright 2011 G.Tzanetakis

Users & Applications

http://www.cs.princeton.edu/sound/software/sndpeek/

Page 104: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

104 Copyright 2011 G.Tzanetakis

Users and Applications

Joao Oliveira, Fabien Gouyon, and Luis Paulo Reis INESC Porto, Portugal

Page 105: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

105 Copyright 2011 G.Tzanetakis

Users and Applications

MusieMood Vladimir Kim Steven Bergner Torsten Moller Simon Fraser Univ. Vancouver, Canada

Page 106: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

106 Copyright 2011 G.Tzanetakis

Usage Scenarios

Ø  Marsyas command line tools Ø  Efficient Ø  Execute in real-time (when applied) Ø  No library dependencies Ø  Tools and examples:

Ø  sfplay

Ø  bextract

Ø  phasevocoder

Ø  sfplugin

Ø  …

Page 107: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

107 Copyright 2011 G.Tzanetakis

Usage Scenarios

Ø  Some examples: > bextract -e STFTMFCC music.mf speech.mf -p ms.mpl -w myweka.arff

> sfplugin –p ms-mpl unknownAudioSignal.wav

Page 108: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

108 Copyright 2011 G.Tzanetakis

Usage Scenarios

Ø  Qt4 User Interfaces MarGrid2 MarPlayer MarPhasevocoder MarNetworkWidget …

Page 109: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

109 Copyright 2011 G.Tzanetakis

MIREX 2008 results

Page 110: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

110 Copyright 2011 G.Tzanetakis

MIREX 2008 Results

Page 111: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

111 Copyright 2011 G.Tzanetakis

Architecture

Ø  Biggest challenge: expressivity without sacrificing efficiency

Ø  Compile-time

Ø  Definition of processing blocks

Ø  Run-time

Ø  Assembling network

Ø  Passing data through it

Ø  Changing behavior through controls

Page 112: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

112 Copyright 2011 G.Tzanetakis

Architecture

Ø  Marsyas 0.2 Ø  New dataflow model of audio computation

hierarchical messaging system used to control the dataflow network (inspired on Open Sound Control (OSC) )

Ø  general matrices instead of 1-D arrays as data

Processing 1

Processing 2

Composite/Parallel

SinkSource

Composite/Series

Page 113: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

113 Copyright 2011 G.Tzanetakis

Architecture

FFT512 smpls * 1 obs

22050 Hz

22050 / 512 Hz

1 smpls * 512 obs

Ø  MarSystem Slices

Page 114: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

114 Copyright 2011 G.Tzanetakis

Architecture

Source

Fanout

F1

F2

F3

Destination

IMPLICIT PATCHING

Source

F1

F2

F3

Destination

EXPLICIT PATCHING # EXPLICIT PATCHING: source, F1, F2, F3, destination; # Connect the in/out ports connect(source, F1); connect(source, F2); connect(source, F3); connect(F1, destination); connect(F2, destination); connect(F3, destination); # IMPLICIT PATCHING source, F1, F2, F3, destination; Fanout mix; mix.add([F1, F2, F3]); Series net; net.add(source,mix,destination);

Page 115: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

115 Copyright 2011 G.Tzanetakis

Architecture

Page 116: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

116 Copyright 2011 G.Tzanetakis

Architecture Ø  Feature Extraction using Implicit Patching MarSystemManager mng;

MarSystem* Series1 = mng.create("Series", “Series1");

MarSystem* Fanout1 = mng.create(“Fanout", “Fanout1");

MarSystem* Series2 = mng.create("Series", “Series2");

MarSystem* Fanout2 = mng.create(“Fanout”, “Fanout2”);

MarSystem* Fanout3 = mng.create(“Fanout”, “Fanout3”);

Fanout3->addMarSystem(mng.create(“Mean”, “Mean”));

Fanout3->addMarSystem(mng.create(“Variance”, “Variance”));

Fanout2->addMarSystem(mng.create(“Centroid”, “Centroid”));

Fanout2->addMarSystem(mng.create(“RollOff”, “Rolloff”));

Fanout2->addMarSystem(mng.create(“Flux”, “Flux”);

Series2->addMarSystem(mng.create(“Spectrum”, “Spectrum”);

Series2->addMarSystem(Fanout2);

Fanout1->addMarSystem(mng.create(“ZeroCrossings”, “ZeroCrossings”);

Fanout1->addMarSystem(Series2);

Series1->addMarSystem(mng.create("SoundFileSource",“Source"));

Series1->addMarSystem(Fanout1);

Series1->addMarSystem(mng.create(“Memory”, “TextureMemory”));

Series1->addMarSystem(Fanout3);

Series1->addMarSystem(mng.create(“classifier”, “Classifier”));

Source

Series1Fanout 1

Zero Crossings

TextureMemory

Fanout 3

Mean

Variance

Classifier

Spectrum

Fanout 2

Centroid

Rolloff

Flux

Series 2

Page 117: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

117 Copyright 2011 G.Tzanetakis

Interoperability

Ø  Python (Ruby, Java) bindings

Ø  Qt 4.x toolkit

Ø  MATLAB

Ø  Weka

Ø  MIDI

Ø  Open Sound Control (OSC)

Ø  Max/MSP external

Ø  VAMP plugin

Qt4® is available as GPL open source code for all platforms

Page 118: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

118 Copyright 2011 G.Tzanetakis

Industrial Collaborations

Ø  Marsyas is licensed under the GNU Public License

Ø  Anyone can download and modify the source code for free

Ø  Software that uses Marsyas source code must also be GPL

Ø  Copyright remains with the author

Page 119: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

119 Copyright 2011 G.Tzanetakis

Modes of collaboration

Ø  Modes of collaboration with industry

Ø  Consulting Ø  Prototype and rewrite Ø  Internal tool/batch processing

Ø  Based on proprietary data Ø  Commercial Licensing

Page 120: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

120 Copyright 2011 G.Tzanetakis

How can you help ?

Ø  Encourage your students to use Marsyas

Ø  Encourage your students to work on open source projects

Ø  Donate money to open source projects - even small amounts can make a huge difference

Ø  Do not hesitate to get involved in the community of a project

Ø  Hire open source developers for particular tasks

Page 121: Pitch-based representations, analysis and applicationsPitch-based representations, analysis and applications George Tzanetakis (gtzan@cs.uvic.ca) Associate Professor Canada Research

121 Copyright 2011 G.Tzanetakis

Summary

Ø  Marsyas is a free software framework for research in audio processing

Ø  It has been in development for 10 years and has steadily been growing

Ø  Several academic and commercial projects have used Marsyas

Ø  Open source development in academic environments is challenging but has its rewards

Ø  Any questions ?