informed feature representations meinard...

22
Informed Feature Representations for Music and Motion Meinard Müller Bonn University and MPI Informatik meinard@mpi-inf.mpg.de CREST Symposium Human-Harmonized Information Technology April 2012 meinard@mpi-inf.mpg.de 2007 Habilitation, Bonn 2007-2012 Max-Planck Institute for Informatics in Saarbrücken Senior Researcher Meinard Müller Multimedia Information Retrieval & Music Processing March 2012 Bonn University Professorship Audiosignalanalyse Thanks Sebastian Ewert Peter Grosche Andreas Baak Tido Röder Music and Motion Overview Audio Features based on Chroma Information Application: Audio Matching Motion Features based on Geometric Relations Application: Motion Retrieval Audio Features based on Tempo Information Application: Music Segmentation Depth Image Features based on Geodesic Extrema Application: Data-Driven Motion Reconstruction Overview Audio Features based on Chroma Information Application: Audio Matching Motion Features based on Geometric Relations Application: Motion Retrieval Audio Features based on Tempo Information Application: Music Segmentation Depth Image Features based on Geodesic Extrema Application: Data-Driven Motion Reconstruction

Upload: others

Post on 22-Feb-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Informed Feature Representations for Music and Motion

Meinard Müller

Bonn University and MPI Informatikmeinard@mpi -inf.mpg.de

CREST SymposiumHuman-Harmonized Information Technology

April 2012

meinard@mpi -inf.mpg.de

2007 Habilitation, Bonn

2007-2012 Max-Planck Institute forInformatics in Saarbrücken

Senior Researcher

Meinard Müller

Senior Researcher Multimedia Information Retrieval & Music Processing

March 2012 Bonn University

ProfessorshipAudiosignalanalyse

Thanks

� Sebastian Ewert

� Peter Grosche

� Andreas Baak

� Tido Röder

Music and Motion

Overview

� Audio Features based on Chroma InformationApplication: Audio Matching

� Motion Features based on Geometric RelationsApplication: Motion Retrieval

� Audio Features based on Tempo InformationApplication: Music Segmentation

� Depth Image Features based on Geodesic ExtremaApplication: Data-Driven Motion Reconstruction

Overview

� Audio Features based on Chroma InformationApplication: Audio Matching

� Motion Features based on Geometric RelationsApplication: Motion Retrieval

� Audio Features based on Tempo InformationApplication: Music Segmentation

� Depth Image Features based on Geodesic ExtremaApplication: Data-Driven Motion Reconstruction

Page 2: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Chroma-based Audio Features

� Very popular in music signal processing

� Based equal-tempered scale of Western music

� Captures information related to harmony

� Robust to variations in instrumentation or timbre

Chroma-based Audio Features

Example: Chromatic scale

Fre

quen

cy (

Hz)

Inte

nsity

(dB

)

Spectrogram

Time (seconds)

Fre

quen

cy (

Hz)

Inte

nsity

(dB

)

Chroma-based Audio Features

Example: Chromatic scale

Inte

nsity

(dB

)

Spectrogram

C8: 4186 Hz

Time (seconds)

Inte

nsity

(dB

)

C4: 261 HzC5: 523 Hz

C6: 1046 Hz

C7: 2093 Hz

C3: 131 Hz

Chroma-based Audio Features

Example: Chromatic scale

Inte

nsity

(dB

)

Log-frequency spectrogram

C6: 1046 Hz

C7: 2093 Hz

C8: 4186 Hz

Inte

nsity

(dB

)

Time (seconds)

C4: 261 Hz

C5: 523 Hz

C6: 1046 Hz

C3: 131 Hz

Chroma-based Audio Features

Example: Chromatic scale

Pitc

h (M

IDI n

ote

num

ber)

Inte

nsity

(dB

)

Log-frequency spectrogram

Pitc

h (M

IDI n

ote

num

ber)

Inte

nsity

(dB

)

Time (seconds)

Chroma-based Audio Features

Example: Chromatic scale

Pitc

h (M

IDI n

ote

num

ber)

Inte

nsity

(dB

)

Log-frequency spectrogram

Pitc

h (M

IDI n

ote

num

ber)

Inte

nsity

(dB

)

Time (seconds)Chroma C

Page 3: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Chroma-based Audio Features

Example: Chromatic scale

Pitc

h (M

IDI n

ote

num

ber)

Inte

nsity

(dB

)

Log-frequency spectrogram

Pitc

h (M

IDI n

ote

num

ber)

Inte

nsity

(dB

)

Time (seconds)Chroma C#

Chroma-based Audio Features

Example: Chromatic scale

Inte

nsity

(dB

)

Chroma representation

Chr

oma

Inte

nsity

(dB

)

Time (seconds)

Chroma-based Audio Features

Example: Chromatic scale

Inte

nsity

(no

rmal

ized

)

Chroma representation (normalized, Euclidean)

Inte

nsity

(no

rmal

ized

)

Chr

oma

Time (seconds)

Enhancing Chroma Features

� Making chroma features more robust to changes in timbre

� Combine ideas of speech and music processing

� Usage of audio matching framework for evaluatingthe quality of obtained audio features

M. Müller and S. EwertTowards Timbre-Invariant Audio Features for Harmony-Based Music.IEEE Trans. on Audio, Speech & Language Processing, Vol. 18, No. 3, pp. 649-662, 2010.

Motivation: Audio Matching

Time (seconds)

Motivation: Audio Matching

Four occurrences of the main theme

Third occurrenceFirst occurrence

1 2 3 4

Time (seconds)

Page 4: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Chroma Features

First occurrence Third occurrence

Chr

oma

scal

e

Time (seconds)Time (seconds)

Chr

oma

scal

e

Chroma Features

First occurrence Third occurrence

Chr

oma

scal

e

How to make chroma features more robust to timbre changes?

Chr

oma

scal

e

Time (seconds) Time (seconds)

Chroma Features

First occurrence Third occurrence

Chr

oma

scal

e

How to make chroma features more robust to timbre changes?

Idea: Discard timbre-related information

Chr

oma

scal

e

Time (seconds) Time (seconds)

MFCC Features and TimbreM

FC

C c

oeffi

cien

t

Time (seconds)

MF

CC

coe

ffici

ent

MFCC Features and Timbre

MF

CC

coe

ffici

ent

Lower MFCCs Timbre

Time (seconds)

MF

CC

coe

ffici

ent

MFCC Features and Timbre

MF

CC

coe

ffici

ent

Idea: Discard lower MFCCs to achieve timbre invariance

Lower MFCCs Timbre

Time (seconds)

MF

CC

coe

ffici

ent

Page 5: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Enhancing Timbre Invariance

Short-Time Pitch Energy

Pitc

h sc

ale

1. Log-frequency spectrogram

Steps:

Pitc

h sc

ale

Time (seconds)

Enhancing Timbre Invariance

Log Short-Time Pitch Energy

Pitc

h sc

ale

1. Log-frequency spectrogram2. Log (amplitude)

Steps:

Pitc

h sc

ale

Time (seconds)

Enhancing Timbre Invariance

PFCC

Pitc

h sc

ale

1. Log-frequency spectrogram2. Log (amplitude)3. DCT

Steps:

Pitc

h sc

ale

Time (seconds)

Enhancing Timbre Invariance

PFCC

Pitc

h sc

ale

1. Log-frequency spectrogram2. Log (amplitude)3. DCT4. Discard lower coefficients

Steps:

Pitc

h sc

ale 4. Discard lower coefficients

[1:n-1]

Time (seconds)

Enhancing Timbre Invariance

PFCC

Pitc

h sc

ale

1. Log-frequency spectrogram2. Log (amplitude)3. DCT4. Keep upper coefficients

Steps:

Pitc

h sc

ale 4. Keep upper coefficients

[n:120]

Time (seconds)

Enhancing Timbre Invariance

Pitc

h sc

ale

1. Log-frequency spectrogram2. Log (amplitude)3. DCT4. Keep upper coefficients

Steps:

Pitc

h sc

ale 4. Keep upper coefficients

[n:120]5. Inverse DCT

Time (seconds)

Page 6: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Enhancing Timbre Invariance

Chr

oma

scal

e 1. Log-frequency spectrogram2. Log (amplitude)3. DCT4. Keep upper coefficients

Steps:

Chr

oma

scal

e

Time (seconds)

4. Keep upper coefficients[n:120]

5. Inverse DCT6. Chroma & Normalization

Enhancing Timbre Invariance

1. Log-frequency spectrogram2. Log (amplitude)3. DCT4. Keep upper coefficients

Steps:

Chr

oma

scal

e

4. Keep upper coefficients[n:120]

5. Inverse DCT6. Chroma & Normalization

Chroma DCT-Reduced Log-Pitch

CRP(n)

Chr

oma

scal

e

Time (seconds)

Chroma versus CRPShostakovich Waltz

Third occurrenceFirst occurrence

Chroma

Time (seconds) Time (seconds)

Chroma versus CRPShostakovich Waltz

Third occurrenceFirst occurrence

Chroma

CRP(55)

n = 55

Time (seconds) Time (seconds)

Quality: Audio MatchingQuery: Shostakovich, Waltz / Yablonsky (3. occurrence)

Shostakovich/Chailly Shostakovich/Yablonsky

Time (seconds)

Quality: Audio MatchingQuery: Shostakovich, Waltz / Yablonsky (3. occurrence)

Shostakovich/Chailly Shostakovich/Yablonsky

Time (seconds)

Page 7: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Quality: Audio MatchingQuery: Shostakovich, Waltz / Yablonsky (3. occurrence)

Shostakovich/Chailly Shostakovich/Yablonsky

Time (seconds)

Quality: Audio MatchingQuery: Shostakovich, Waltz / Yablonsky (3. occurrence)

Shostakovich/Chailly Shostakovich/Yablonsky

Time (seconds)

Quality: Audio Matching

Standard Chroma (Chroma Pitch)

Query: Shostakovich, Waltz / Yablonsky (3. occurrence)

Shostakovich/Chailly Shostakovich/Yablonsky

Time (seconds)

Quality: Audio Matching

Standard Chroma (Chroma Pitch)

CRP(55)

Query: Shostakovich, Waltz / Yablonsky (3. occurrence)

Shostakovich/Chailly Shostakovich/Yablonsky

Time (seconds)

Quality: Audio Matching

Standard Chroma (Chroma Pitch)

Query: Free in you / Indigo Girls (1. occurence)

Free in you/Indigo Girls Free in you/Dave Cooley

Time (seconds)

Quality: Audio Matching

Standard Chroma (Chroma Pitch)

CRP(55)

Query: Free in you / Indigo Girls (1. occurence)

Free in you/Indigo Girls Free in you/Dave Cooley

Time (seconds)

Page 8: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Chroma Toolbox

� There are many ways to implement chroma features� Properties may differ significantly� Appropriateness depends on respective application

� http://www.mpi-inf.mpg.de/resources/MIR/chromatoolbox/� MATLAB implementations for various chroma variants

Overview

� Audio Features based on Chroma InformationApplication: Audio Matching

� Motion Features based on Geometric RelationsApplication: Motion Retrieval

� Audio Features based on Tempo InformationApplication: Music Segmentation

� Depth Image Features based on Geodesic ExtremaApplication: Data-Driven Motion Reconstruction

Motion Capture Data

� 3D representations of motions

� Computer animation

� Sports

� Gait analysis

Motion Capture Data

Optical System

Motion Capture Data Motion Retrieval

� = MoCap database

� = query motion clip

� Goal: find all motion clips in similar to

Page 9: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Motion Retrieval Motion Retrieval

� Numerical similarity vs. logical similarity

� Logically related � Logically related motions may exhibit significant spatio-temporal variations

Relational Features

� Exploit knowledge of kinematic chain

� Express geometric relations of body parts

� Robust to motion variations� Robust to motion variations

Meinard Müller, Tido Röder, and Michael ClausenEfficient content-based retrieval of motion capture data.ACM Transactions on Graphics (SIGGRAPH), vol. 24, pp. 677-685, 2005.

Meinard Müller and Tido RöderMotion templates for automatic classification and retrieval of motion capture data.Proceedings of the 2006 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA), Vienna, Austria, pp. 137-146, 2006.

Relational Features

Relational Features Relational Features

Right knee bent?

Right footfast?

Right handmoving upwards?

Page 10: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Motion Templates (MT) Motion Templates (MT)

Fea

ture

s

Time (frames) Time (frames)

Fea

ture

s

1

0

Motion Templates (MT)

Fea

ture

s

Temporal alignment

Time (frames) Time (frames)

Fea

ture

s

1

0

Motion Templates (MT)

Fea

ture

s

1

Superimpose templates

Time (frames)

Fea

ture

s

0

Motion Templates (MT)

Fea

ture

s

1

Compute average

Time (frames)

Fea

ture

s

0

Motion Templates (MT)

Page 11: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Motion Templates (MT)Average template

Motion Templates (MT)Quantized template

1

*

0

� Gray areas indicate inconsistencies / variations� Achieve invariance by disregarding gray areas

MT-based Motion Retrieval MT-based Motion Retrieval

Time (seconds)

Fea

ture

s

MT-based Motion Retrieval: Jumping Jack MT-based Motion Retrieval: Jumping Jack

Page 12: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

MT-based Motion Retrieval: Jumping Jack MT-based Motion Retrieval: Jumping Jack

MT-based Motion Retrieval: Jumping Jack

τ

MT-based Motion Retrieval: Elbow-To-Knee

τ

MT-based Motion Retrieval: Cartwheel

Matching curve using average MT

Matching curve blending out variations

MT-based Motion Retrieval: Throw

Page 13: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

MT-based Motion Retrieval: Throw MT-based Motion Retrieval: Basketball

MT-based Motion Retrieval: Basketball MT-based Motion Retrieval: Lie Down Floor

MT-based Motion Retrieval: Lie Down Floor Overview

� Audio Features based on Chroma InformationApplication: Audio Matching

� Motion Features based on Geometric RelationsApplication: Motion Retrieval

� Audio Features based on Tempo InformationApplication: Music Segmentation

� Depth Image Features based on Geodesic ExtremaApplication: Data-Driven Motion Reconstruction

Page 14: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Music Signal Processing

Analysis tasks� Segmentation� Structure analysis� Genre classification� Cover song identification� Music synchronization� Music synchronization� …

Music Signal Processing

Analysis tasks� Segmentation� Structure analysis� Genre classification� Cover song identification� Music synchronization� Music synchronization� …

Audio features � Musically meaningful� Semantically expressive� Robust to deviations� Low dimensionality� …

Music Signal Processing

Analysis tasks� Segmentation� Structure analysis� Genre classification� Cover song identification� Music synchronization

Relative comparisonof music audio data

� Music synchronization� …

Audio features � Musically meaningful� Semantically expressive� Robust to deviations� Low dimensionality� …

Music Signal Processing

Analysis tasks� Segmentation� Structure analysis� Genre classification� Cover song identification� Music synchronization

Relative comparisonof music audio data

� Music synchronization� …

Audio features � Musically meaningful� Semantically expressive� Robust to deviations� Low dimensionality� …

Need of robust mid-levelrepresentations

Mid-Level Representations

Musical Aspect Features Dimension

Timbre MFCC features 10 - 15

Harmony Pitch features 60 - 120

Harmony Chroma features 12Harmony Chroma features 12

Tempo Tempogram > 100

Mid-Level Representations

Musical Aspect Features Dimension

Timbre MFCC features 10 - 15

Harmony Pitch features 60 - 120

Harmony Chroma features 12Harmony Chroma features 12

Tempo Tempogram > 100

Tempo Cyclic tempogram 10 - 30

Peter Grosche, Meinard Müller, and Frank KurthCyclic tempogram – a mid-level tempo representation for music signals.Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, Texas, USA, pp. 5522-5525, 2010.

Page 15: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Novelty Curve

Example: Waltz, Jazz Suite No. 2

Fre

quen

cy (

Hz)

Novelty Curve

1. Spectrogram

Steps:Spectrogram

Fre

quen

cy (

Hz)

Time (seconds)

Novelty Curve

1. Spectrogram2. Log compression

Steps:Compressed spectrogram

Fre

quen

cy (

Hz)

Time (seconds)

Fre

quen

cy (

Hz)

Novelty Curve

1. Spectrogram2. Log compression3. Differentiation

Steps:Difference spectrogram

Fre

quen

cy (

Hz)

Time (seconds)

Fre

quen

cy (

Hz)

Novelty Curve

1. Spectrogram2. Log compression3. Differentiation4. Accumulation

Steps:

4. Accumulation

Novelty curve

Time (seconds)

Novelty Curve

1. Spectrogram2. Log compression3. Differentiation4. Accumulation

Steps:

4. Accumulation

Novelty curve / local average

Time (seconds)

Page 16: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Novelty Curve

1. Spectrogram2. Log compression3. Differentiation4. Accumulation

Steps:

4. Accumulation5. Normalization

Normalized novelty curve

Time (seconds)

Tempogram

Tem

po (

BP

M)

Time (seconds)

Tempogram

Tem

po (

BP

M)

Short-time Fourier analysis

Time (seconds)

TempogramTe

mpo

(B

PM

)

Short-time Fourier analysis

Time (seconds)

Tempogram

Tem

po (

BP

M)

Time (seconds)

Tempogram

480

240

Time (seconds)

120

60 30

Page 17: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Log-Scale Tempogram

480

240

120

60

Time (seconds)

60

30

Cyclic Tempogram

Rel

ativ

e te

mpo

Rel

ativ

e te

mpo

Time (seconds)

Cylic projection

Relative to tempo class […,30,60,120,240,480,…]

Cyclic Tempogram

Rel

ativ

e te

mpo

Rel

ativ

e te

mpo

Time (seconds)

Quantization: 60 tempo bins

Cyclic TempogramR

elat

ive

tem

poR

elat

ive

tem

po

Time (seconds)

Quantization: 30 tempo bins

Cyclic Tempogram

Rel

ativ

e te

mpo

Rel

ativ

e te

mpo

Time (seconds)

Quantization: 15 tempo bins

Audio Segmentation

0.5

Rel

ativ

e te

mpo

2

1.5

Example: Brahms Hungarian Dance No. 5

0

Rel

ativ

e te

mpo

1

Time (seconds)

Page 18: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Audio Segmentation

0.5

Rel

ativ

e te

mpo

2

1.5

Example: Zager & Evans: In the year 2525

0

Rel

ativ

e te

mpo

1

Time (seconds)

Audio Segmentation

0.5

Rel

ativ

e te

mpo

2

1.5

Example: Beethoven Pathétique

0

Rel

ativ

e te

mpo

1

Time (seconds)

Overview

� Audio Features based on Chroma InformationApplication: Audio Matching

� Motion Features based on Geometric RelationsApplication: Motion Retrieval

� Audio Features based on Tempo InformationApplication: Music Segmentation

� Depth Image Features based on Geodesic ExtremaApplication: Data-Driven Motion Reconstruction

Data-Driven Motion Reconstruction

� Goal: Reconstruction of 3D human poses from a depth image sequence

� Data-driven approach using MoCap database

Andreas Baak, Meinard Müller, Gaurav Bharaj, Hans-Peter Seidel, and Christian TheobaltA data-driven approach for real-time full body pose reconstruction from a depth camera.Proceedings of the 13th International Conference on Computer Vision (ICCV), 2011.

� Depth image features: Geodesic extrema

Data-Driven Motion Reconstruction

Input: Depth image Output: 3D pose

Data-Driven Motion Reconstruction

Voting

Local opt. Previous frameInput Output

Database lookup

Page 19: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Data-Driven Motion Reconstruction

Voting

Local opt. Previous frameInput Output

Database lookup

� Database lookup � Local optimization� Voting scheme

Data-Driven Motion Reconstruction

Voting

Local opt. Previous frameInput Output

Database lookup

� Database lookup � Local optimization� Voting scheme

Data-Driven Motion Reconstruction

Voting

Local opt. Previous frameInput Output

Database lookup

� Database lookup � Local optimization� Voting scheme

Database Lookup

Voting

Local opt. Previous frameInput Output

Database lookup

� Database lookup � Local optimization� Voting scheme

Need of motion featuresfor cross-modal comparison

Depth Image Features

� Point cloud

[Plagemann, Ganapathi, Koller, Thrun, ICRA 2010] Depth Image Features

� Point cloud� Graph

[Plagemann, Ganapathi, Koller, Thrun, ICRA 2010]

Page 20: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Depth Image Features

� Point cloud� Graph

[Plagemann, Ganapathi, Koller, Thrun, ICRA 2010] Depth Image Features

� Point cloud� Graph� Distances from root

[Plagemann, Ganapathi, Koller, Thrun, ICRA 2010]

Depth Image Features

� Point cloud� Graph� Distances from root� Geodesic extrema

[Plagemann, Ganapathi, Koller, Thrun, ICRA 2010]

Observation: First fiveextrema often correspondto end-effectors and head

Database Lookup

Local Optimization Voting Scheme

� Combine database lookup & local optimization

� Inherit robustness from database pose

� Inherit accuracy from local optimization pose

� Compare with original raw data poseusing a sparse symmetric Hausdorff distance

Page 21: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Voting SchemeDistance measure

Voting SchemeDistance measure (Hausdorff)

Voting SchemeDistance measure (Hausdorff, symmetric, sparse)

Experiments

Informed Feature Representations

� Audio Features based on Chroma InformationApplication: Audio Matching

� Motion Features based on Geometric RelationsApplication: Motion Retrieval

� Audio Features based on Tempo InformationApplication: Music Segmentation

� Depth Image Features based on Geodesic ExtremaApplication: Data-Driven Motion Reconstruction

Informed Feature Representations

� Audio Features based on Chroma InformationApplication: Audio Matching

� Motion Features based on Geometric RelationsApplication: Motion Retrieval

� Audio Features based on Tempo InformationApplication: Music Segmentation

� Depth Image Features based on Geodesic ExtremaApplication: Data-Driven Motion Reconstruction

Page 22: Informed Feature Representations Meinard Müllersap.ist.i.kyoto-u.ac.jp/crest/sympo12/program/S2-Mueller.pdf · Informed Feature Representations for Music and Motion Meinard Müller

Informed Feature Representations

� Exploit model assumptions – Equal-tempered scale– Kinematic chain

� Deal with variances on feature level– Enhancing timbre invariance

Features with explicit meaning.

Makes subsequent steps more robust– Relational features

– Quantized motion templates

� Consider requirements for specific application – Explicit information often not required– Mid-level features

steps more robustand efficient!

Avoid making problem harder as

it is.

Selected Publications (Music Processing)� M. Müller, P.W. Ellis, A. Klapuri, G. Richard (2011):

Signal Processing for Music Analysis.IEEE Journal of Selected Topics in Signal Processing, Vol. 5, No. 6, pp. 1088-1110.

� P. Grosche and M. Müller (2011):Extracting Predominant Local Pulse Information from Music Recordings.IEEE Trans. on Audio, Speech & Language Processing, Vol. 19, No. 6, pp. 1688-1701.

� M. Müller, M. Clausen, V. Konz, S. Ewert, C. Fremerey (2010):A Multimodal Way of Experiencing and Exploring Music.A Multimodal Way of Experiencing and Exploring Music.Interdisciplinary Science Reviews (ISR), Vol. 35, No. 2.

� M. Müller and S. Ewert (2010):Towards Timbre-Invariant Audio Features for Harmony-Based Music.IEEE Trans. on Audio, Speech & Language Processing, Vol. 18, No. 3, pp. 649-662.

� F. Kurth, M. Müller (2008): Efficient Index-Based Audio Matching.IEEE Trans. Audio, Speech & Language Processing, Vol. 16, No. 2, 382-395.

� M. Müller (2007): Information Retrieval for Music and Motion .Monograph, Springer, 318 pages

Selected Publications (Motion Processing)� J. Tautges, A. Zinke, B. Krüger, J. Baumann, A. Weber, T. Helten, M. Müller, H.-P. Seidel,

B. Eberhardt (2011):Motion Reconstruction Using Sparse Accelerometer Data.ACM Transactions on Graphics (TOG) , Vol. 30, No. 3

� A. Baak, M. Müller, G. Bharaj, H.-P. Seidel, C. Theobalt (2011):A Data-Driven Approach for Real-Time Full Body Pose Reconstruction from a Depth Camera.Proc. International Conference on Computer Vision (ICCV)

� G. Pons-Moll, A. Baak, T. Helten, M. Müller, H.-P. Seidel, B. Rosenhahn (2010):Multisensor-Fusion for 3D Full-Body Human Motion Capture.Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

� A. Baak, B. Rosenhahn, M. Müller, H.-P. Seidel (2009): Stabilizing Motion Tracking Using Retrieved Motion Priors.Proc. International Conference on Computer Vision (ICCV)

� M. Müller, T. Röder, M. Clausen (2005): Efficient Content-Based Retrieval of Motion Capture Data.ACM Transactions on Graphics (TOG) , Vol. 24, No. 3, pp. 677-685, (SIGGRAPH)