gesture recognition & machine learning for real-time musical interaction

38
Gesture Recognition & Machine Learning for Real-Time Musical Interaction Rebecca Fiebrink Assistant Professor of Computer Science (also Music) Princeton University Nicholas Gillian Postdoc in Responsive Environments MIT Media Lab

Upload: gannon

Post on 21-Mar-2016

62 views

Category:

Documents


3 download

DESCRIPTION

Gesture Recognition & Machine Learning for Real-Time Musical Interaction. Rebecca Fiebrink Assistant Professor of Computer Science (also Music) Princeton University. Nicholas Gillian Postdoc in Responsive Environments MIT Media Lab. Introductions. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

Gesture Recognition & Machine Learning for Real-Time Musical Interaction

Rebecca FiebrinkAssistant Professor of Computer Science (also Music)Princeton University

Nicholas GillianPostdoc in Responsive EnvironmentsMIT Media Lab

Page 2: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

2

Introductions

Page 3: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

3

Outline

• ~40 min: Machine learning fundamentals• ~1 hour: Wekinator: Intro & hands-on• ~1 hour: Eyesweb: Intro & hands-on• Wrap-up

Page 4: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

4

Models in gesture recognition & mapping

sensed action

interpretation

response (music, visuals, etc.)

computer

human + sensorssound, visuals,

etc.

model

•What is the current state (e.g., pose)?•Was a control motion performed?

• If so, which• How?

•What sound should result from this state, motion, motion quality, etc.?

Page 5: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

5

algorithm

trainingdata

Training

Supervised learning

model

inputs

outputs

Page 6: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

6

algorithm

trainingdata

Training

Supervised learning

model

inputs

outputsRunning

“Gesture 1” “Gesture 2” “Gesture 3”

“Gesture 1”

Page 7: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

7

Why use supervised learning?

• Models capture complex relationships from the data. (feasible)

• Models can generalize to new inputs. (accurate)• Supervised learning circumvents the need to

explicitly define mapping functions or models. (efficient)

Page 8: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

8

Data, features, algorithms, and models: the basics

Page 9: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

9

Features

• Each data point is represented as a feature vector

Example # Red(pixel1) Green(pixel1) Blue(pixel1) … Label

1 84 120 34 … Gesture1

2 43 25 85 … Gesture1

3 12 128 4 … Gesture2

Page 10: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

10

Features

• Good features can make a problem easier to learn!

Example # X(r_hand) Y(r_hand) Depth(r_hand Label

1 0.1 0.5 0.6 Gesture1

2 0.2 0.4 0.1 Gesture1

3 0.9 0.9 0.1 Gesture2

Page 11: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

11

Classification

This model: a separating line or hyperplane (decision boundary)

feature1

feat

ure2

Page 12: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

12

Regression

This model: a real-valued function of the input features

feature

outp

ut

Page 13: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

13

Unsupervised learning

• Training set includes examples, but no labels• Example: Infer clusters from data:

feature1

feat

ure2

Page 14: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

14

Temporal modeling

• Examples and inputs are sequential data points in time

• Model used for following, identification, recognition

Image: Bevilacqua et al., NIME 2007

Page 15: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

15

Temporal modeling

Image: Bevilacqua et al., NIME 2007

Page 16: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

16

How supervised learning algorithms work (the basics)

Page 17: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

17

The learning problem

• Goal: Build the best** model given the training data– Definition of “best” depends on context, assumptions…

Page 18: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

18

Which classifier is best?

Image from Andrew Ng

Competing goals:Accurately model training data**Accurately classify unseen data points**

“Overfit”“Underfit”

Page 19: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

19

A simple classifier: nearest neighbor

? feature1fe

atur

e2

Page 20: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

20

Another simple classifier: Decision tree

Images: http://ai.cs.umbc.edu/~oates/classes/2009/ML/homework1.html, http://nghiaho.com/?p=1300

Page 21: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

21

AdaBoost: Iteratively train a “weak” learnerImage from http://www.cc.gatech.edu/~kihwan23/imageCV/Final2005/FinalProject_KH.htm

Page 22: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

22

Support vector machine

• Re-map input space into a higher number of dimensions and find a separating hyperplane

Page 23: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

23

Choosing a classifier: Practical considerations

• k-Nearest Neighbor+ Can tune k to adjust smoothness of decision boundaries

- Sensitive to noisy, redundant, irrelevant features; prone to overfitting; weird in high dimensions

• Decision tree:+ Can prune to reduce overfitting, produces human-understandable model

- Can still overfit

• AdaBoost+ Theoretical benefits, less prone to overfitting

+ Can tune by changing base learner, number of training rounds

• Support Vector Machine+ Theoretical benefits similar to AdaBoost– Many parameters to tune, training can take a long time

Page 24: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

24

How to evaluate which classifier is better?

• Compute a quality metric– Metrics on training set (e.g, accuracy, RMS error)– Metrics on test set– Cross-validation

• Use it

Image from http://blog.weisu.org/2011/05/cross-validation.html

Page 25: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

25

Neural Networks

• TODO: Use nick’s slides

Page 26: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

26

Which learning method should you use?

• Classification (e.g., kNN, AdaBoost, SVM, decision tree):– Apply 1 of N labels to a static pose or state– Label a dynamic gesture, when segmentation & normalization are trivial

• E.g., feature vector is a fixed-length window in time

• Regression (e.g., with neural networks):– Produce a real-valued output (or vector of real-valued outputs) for each

feature vector

• Dynamic time warping, HMMs, other temporal models– Identify when a gesture has occurred, identify probable location within a

gesture, possibly also apply a label– Necessary when segmentation is non-trivial or online following is needed

Page 27: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

27

Suggested ML reading

• Bishop, 2006: Pattern Recognition & Machine Learning. Science and Business Media, Springer

• Duda, 2001: Pattern Classification, Wiley-Interscience• Witten, 2005: Data Mining: Practical machine

learning tools and techniques, Morgan Kaufmann

Page 28: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

28

Suggested NIME-y reading

• Lee, Freed, & Wessel, 1992. Neural networks for simultaneous classification and parameter estimation in musical instrument control. Adaptive and Learning Systems, 1706:244–55. (early example of ML in music)

• Hunt, A. and Wanderley, M. M. 2002. Mapping performer parameters to synthesis engines. Organised Sound 7, 2, 97–108. (learning as a tool for generative mapping creation)

• Chapter 2 of Rebecca’s dissertation: http://www.cs.princeton.edu/~fiebrink/thesis/ (historical/topic overview)

• Recent publications by F. Bevilacqua & team @ IRCAM (HMMs, gesture follower)

• TODO: Nick, anything else?

Page 29: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

29

Hands-on with Wekinator

Page 30: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

30

model(s)

.01, .59, .03, ....01, .59, .03, ....01, .59, .03, ....01, .59, .03, ...

5, .01, 22.7, …5, .01, 22.7, …5, .01, 22.7, …5, .01, 22.7, …

time

time

Feature extractor(s)

Parameterizable process

Inputs: from built-in feature extractors or OSC. Outputs: control ChucK patch or go elsewhere using OSC.

The Wekinator: Running in real time

OSC

OSC

Page 31: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

31

Brief intro to OSC

• Messages sent to host (e.g., localhost) and port (e.g., 6448)– Listener must listen on the same port

• Message contains message string (e.g., “/myOscMessage”) and optionally some data – Data can be int, float, string types– Listener code may listen for specific message strings &

data formats

Page 32: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

32

3.3098 Class24

Wekinator: Under the hood

Model1 Model2 ModelM

Feature1 Feature2 Feature3 FeatureN…

Parameter1 Parameter2 ParameterM

joystick_x joystick_y

pitchvolume

webcam_1

Page 33: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

33

3.3098 Class24

Under the hood

Model1 Model2 ModelM

Feature1 Feature2 Feature3 FeatureN…

Parameter1 Parameter2 ParameterM

Learning algorithms:Classification:

AdaBoost.M1J48 Decision TreeSupport vector machineK-nearest neighbor

Regression:Multilayer perceptron NNs

Page 34: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

34

Interactive ML with Wekinator

algorithm

trainingdata

Training

model

inputs

outputsRunning

“Gesture 1” “Gesture 2” “Gesture 3”

“Gesture 1”

Page 35: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

35

Interactive ML with Wekinator

algorithm

trainingdata

Training

model

inputs

outputsRunning “Gesture 1”

“Gesture 1” “Gesture 2”

creating training data

Page 36: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

36

Interactive ML with Wekinator

algorithm

trainingdata

Training

inputs

outputsRunning

“Gesture 1” “Gesture 2”

model

“Gesture 1”

creating training data…evaluating the trained model

Page 37: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

37

Interactive ML with Wekinator

algorithm

trainingdata

Training

model

inputs

outputsRunning “Gesture 1”

“Gesture 1” “Gesture 2” “Gesture 3”

creating training dataevaluating the trained model…

modifying training data (and repeating)

interactive machine learning

Page 38: Gesture Recognition & Machine Learning for Real-Time Musical Interaction

38

Time to play

• Discrete classifier• Continuous neural net mapping• Free-for-all