multimodal analysis of stand-up comedians

26
Multimodal Analysis Of Stand-up Comedians Audio, Video and Lexical Analysis Yash Singh, Madhav Sharan, Sree Priyanka Uppu, Nandan PC, Harsh Fatepuria, Rahul Agrawal

Upload: nandan-pc

Post on 29-Jan-2018

76 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Multimodal Analysis Of Stand-up Comedians

Multimodal Analysis Of Stand-up ComediansAudio, Video and Lexical Analysis

Yash Singh, Madhav Sharan, Sree Priyanka Uppu, Nandan PC,

Harsh Fatepuria, Rahul Agrawal

Page 2: Multimodal Analysis Of Stand-up Comedians

Motivation

Data set Description

Feature Engineering

Feature Analysis

Machine Learning

Conclusion

Page 3: Multimodal Analysis Of Stand-up Comedians

Why Stand up comedian?

● We love watching stand ups

● They express variety of emotions

● Feedback from audience in form of laughter available.

● Relatively new

Motivation

Page 4: Multimodal Analysis Of Stand-up Comedians

H1: Certain facial expression could contribute to laughter

Hypotheses

Page 5: Multimodal Analysis Of Stand-up Comedians

H2 : Pauses and word elongation contribute towards laughter

Hypotheses

Page 6: Multimodal Analysis Of Stand-up Comedians

H3 : Voice modulation - Pitch and intensity changes can also play a

crucial role

Page 7: Multimodal Analysis Of Stand-up Comedians

H4 : Laughter is sequential in nature, meaning small laughter could

add up to bigger laughters.

Page 8: Multimodal Analysis Of Stand-up Comedians

• We collected 3 hours 46 minutes of data from ‘The Tonight Show

Starring Jimmy Fallon’ or ‘Late Night with Conan O’Brien’.

• 46 Videos (11.76Gb) ~ approx 5 mins each

• 27 males and 19 female artists

• The backdrop in the videos is dark

• Most part of the videos ( 80-90%) the artist faces the camera.

Data Collection

Page 9: Multimodal Analysis Of Stand-up Comedians

• For facial feature extractions, we blacked out the frames manually

when the camera does not capture the artist’s face ~ setting the

video features to those frames as 0.

• For audio features, there can be 0, during a pause or while the

audience are laughing.

Pre Processing

Page 10: Multimodal Analysis Of Stand-up Comedians

• Manually segment the videos based on punch lines

• Annotate the laughter level in each segment based on product of

mean pitch and mean intensity to–

o Big (55% ~ 100% intensity)

o Small (36% ~ 55% intensity)

o No (0~36% intensity)

• Pitch range of 75 to 625 Hz gives a good sampling rate of 10 ms

and covers a wide range of frequencies.

• Pitch of laughter varies across videos and hence, it is normalized to

the range [0,1].

Data Annotation

Page 11: Multimodal Analysis Of Stand-up Comedians

OpenSmile• Extracted:5 low-level descriptors. we extracted

✧ Musical Chroma features - Tone

✧ Prosody features (Loudness and pitch),

✧ Energy(1),

✧ MFCC(13 MFCC from 0-12 from 26 Mel-frequency bands).

• All these features were captured at a frame rate of 10 ms.

• Processing : Aggregated the features on standard deviation and

mean for each segment

Feature Engineering - Audio

Page 12: Multimodal Analysis Of Stand-up Comedians

OpenFace

• Extracted

✧ eye gaze direction vector in world coordinates for both the eyes

✧ the location of the head with respect to camera in milimeters and the

rotation (radians)

✧ 68 facial landmark location in 2D pixel format (x,y)

✧ 33 rigid and non-rigid shape parameters

✧ 11 AU intensities and AU occurrences.

• Processing : Aggregated the features on standard deviation and mean

for each segment

Feature Engineering - Video

Page 13: Multimodal Analysis Of Stand-up Comedians

• Analyze features like Action Units, gaze (y and z direction), pose (rotation of

head) and various facial landmark points, Frown and Eyebrow raise.

Page 14: Multimodal Analysis Of Stand-up Comedians

IBM Watson

● Pauses

● Last pause

● Word elongation

● Sentiments

Feature Engineering - Textual

Page 15: Multimodal Analysis Of Stand-up Comedians

H1 : AU related features

Feature Analysis - Visual

AU 07 (Lid tightener) AU 14 (Dimpler)

Page 16: Multimodal Analysis Of Stand-up Comedians

H1 : facial features

Feature Analysis - Visual

Frown (distance)

Page 17: Multimodal Analysis Of Stand-up Comedians

H2 : Pause related features

Feature Analysis - Textual

Last Pause Length No of pauses

Page 18: Multimodal Analysis Of Stand-up Comedians

H3 : Pitch , Loudness, Energy related features

Feature Analysis - Audio

Pitch variation Loudness variation

Page 19: Multimodal Analysis Of Stand-up Comedians

H3 : Pitch , Loudness, Energy related features

Feature Analysis - Audio

Energy variation Energy mean

Page 20: Multimodal Analysis Of Stand-up Comedians

Multi modal analysis using boosted decision tree

classifier

Machine Learning

Page 21: Multimodal Analysis Of Stand-up Comedians

H1 : Certain facial expression could contribute to laughter

H2 : Pauses and word elongation contribute towards laughter

Results

Page 22: Multimodal Analysis Of Stand-up Comedians

H3 : Voice modulation - Pitch and intensity changes can also play a

crucial role

XGBoost

Page 23: Multimodal Analysis Of Stand-up Comedians

Early Fusion:

● Min Video frames = 100 frames/segment

● Min Audio frames = 30 frames/segment

● Min text = 0 words /segment

● No good way of taking equal frames from each modality → difficult to

do early fusion

H4 : Laughter is sequential in nature :

LSTM - Challenge

Page 24: Multimodal Analysis Of Stand-up Comedians

H4 : Laughter is sequential in nature :

Late Fusion

Page 25: Multimodal Analysis Of Stand-up Comedians

To do

● Tune LSTM

● Try other classifiers :

✧ SVM

✧ Naive Bayes

Page 26: Multimodal Analysis Of Stand-up Comedians

Thank you.