emotion detection for - gtc on demand · emotion detection for mobile jay turcot director of...

@affectiva

Facial expression and

Emotion detection for

mobile

Jay Turcot

Director of Applied AI

Affectiva

@affectiva

What if technology could identify emotions as humans can?

@affectiva

Our vision is to humanize technology with Emotion Intelligence by enabling

machines to be emotion-aware and by allowing businesses to get emotion

analytics

@affectiva

How it works?

@affectiva

How it works?

Face detection and

tracking

Facial action & attribute

classification

Facial expression

interpretation

@affectiva

Task: Facial expression recognition

• Multi-attribute classification (~20+ classes) • Upright, fixed-size, grayscale

• Fast enough to run on-device!

Brow furrow

Brow raise

Smile

@affectiva

Emotion AI platform built on deep learning

Sadness Joy Anger Surprise

Contempt Disgust Fear

Age Ethnicity Gender

Convolutional Neural Networks Output: Input:

11 Facial expressions

Gender

Labeled and unlabeled videos (+voice)

data. Meta data. Latest training used

1M+ images.

@affectiva

Training Setup

• NVIDIA Titan X (Pascal) • 12 GB, 3584 CUDA cores

• NVIDIA CUDA® Deep Neural Network • CUDA 8.0, cuDNN 5

FAST

• Keras + TensorFlow, Docker • TensorFlow 1.0, nvidia-docker SIMPL

E

@affectiva

A few tips on training

@affectiva

Use all annotated data available!

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Training Loss

Use every frame in a video as well as partially annotated data

0

0.1

0.2

0.3

0.4

0.5

0.6

Testing Loss

0.91

0.912

0.914

0.916

0.918

0.92

0.922

0.924

0.926

Accuracy (overall)

Sampled frames

All frames

All frames

(+ data w/ partial

annotation data)

@affectiva

Balancing data isn’t strictly required Classes with ~3 times more data

0.87

0.88

0.89

0.9

0.91

0.92

0.93

0.94

Gender Smile Brow raise Brow furrow

Balanced sampling Unique sampling

Balanced: 90.5% Natural (unbalanced):

90.8%

@affectiva

Building fast models

@affectiva

Speeding up deep learning models Several approaches are used for speeding up models

Model Pruning

Model Compression

Model Quantization

@affectiva

Match architecture to the problem Avoid network architecture that is larger than needed

Problem Object detection (& classification) Facial action & attribute

classification

Details 1000 classes

~224x224 pixels, color

Objects with arbitrary scales / positions /

orientations

20+ classes

~100x100 pixels, grayscale

Faces only, upright & registered

Architectures VGG’16 [1] - 16 layers (~30.9 GOP/image)

ResNet [2] - 152 layers (~22.6 GOP/image)

Others: Inception v4, E-Net

?

@affectiva

Lots of big filters are expensive! Use smaller filters to condense information

@affectiva

Look for redundancy in your layers Small filters are faster… but can be highly correlated

@affectiva

Small networks still work very well… … for simpler problems

7.14

21.97

15.96

0

5

10

15

20

25

CNN (small) CNN(medium)

ENet

MFLOPs

92.99% 93.09%

92.97%

92.00%

92.20%

92.40%

92.60%

92.80%

93.00%

93.20%

93.40%

93.60%

93.80%

94.00%

CNN (small) CNN(medium)

ENet

Accuracy

@affectiva

Result

• Smaller models (<10 MFLOPs) still

outperform traditional methods • Don’t just copy architectures like VGG (30+ GOPS)

• Explore network architectures that prioritize efficiency (E-Net)

• Other methods still apply to improve

runtime performance: • Quantize models to 8-bit fixed point or binary

• Prune models

• Models deployed in our on-device SDK: http://developer.affectiva.com/

@affectiva

Questions

emotion detection for - gtc on demand · emotion detection for mobile jay turcot director of...

Documents