multi modal affective data analytics - win.tue.nlmpechen/talks/sdad2012_invited.pdf ·...
TRANSCRIPT
Multi‐Modal Affective Data Analytics
Mykola Pechenizkiy
SDAD 2012 @ ECMLPKDD2012 2 September 2012
Bristol, UK
http://www.win.tue.nl/stressatwork
Affective data Social media
• Social media leads to masses of affective data related to peoples’ emotions, sentiments and opinions
• In the recent past was used mainly for marketing needs– Web analytics Social Media
• Whatever the incentive was to study this, sentiment classification has become much more accurate
3
Multilingual Sentiment Classification
4
Rule‐based polarity detectionRule‐based emission model: 8 kinds of rules:
Emission 5
SentiCorrHow much positive and negative content do we read or write?
6
Mobile SentiCorr AppWhat a fantastic idea, now ifThis app is esigned to make someone else (or a computer) read our e‐mails for us and “protect” us from WHAT??? How lazy can we get? Like someone commented on CNN reactions, if we are getting upset by tones and/or scoldings in e‐mails, we certainly have bigger issues that need to be dealt with. C’mon, guys, go invent something useful. Not to mention, does it detect irony? Will it weed out the liars?Pleeeeezzzeeee, what a WASTE OF SOMEONE’S COLLEGIATE TIME AND ENERGY. Don’t we have houses to clean and poor people to feed and old folks to help with their shopping? Go do something useful with your time, inventors of his app!!!
Great idea! Get it on iOSsoon (anonymous)“Stress is often made worse by the anticipation of an unpleasant event and actually dissipated once you tackle the problem directly”
Pamela BriggsBritish Psychological Society
7
OLAP Style Exploration of Data Summaries
8
Exploration of Individual Cases, e.g. e‐Mails
9
Sentiment vs. Fact Classification
News in media or business are considered to be sentiment neutral,– but they often contain positive or negative information, e.g.
“You will be fired in 3 months because of the serious budget cuts.”
– no sentiment, but negative informationSimilarly, in work‐related correspondence there could be stressing information:– How can we identify it?
10
Sentiment discovery: State‐of‐the‐art• Sentiment analysis/classification is mature!
– Commercial products, free services, open‐source, variety of apps, evolves in many directions
• Several great overviews:– Sentiment Analysis in Practice ‐ ICDM2011 tutorial by Tiger Zhang (eBay Research Labs)
• http://web.cs.dal.ca/~yongzhen/publication/paper/ICDM2011_SentimentAnalysisInPracticeTutorial.pdf
– Modeling Opinions and Beyond in Social Media by Bing Liu (UIC)
• http://kdd2012.sigkdd.org/sites/images/summerschool/Bing‐Liu.pptx
Outline
• Framework for Stress Analytics:–Data management, OLAP support– Shape‐based Query‐by‐Example
• Stress detection from speech and GSR–Predictive features and classification– From controlled experiments to real life
What is stress? Is it a bad thing?
Stress in NL according to Coosto.nl
Not really job related
Impact of Stress at WorkWHO: by 2020 Top 5 diseases will be stress related. USA: health care expenditures are ~50% greater for workers who report high levels of stress at work (J. Occup. Env. Med, 40:843‐854).
the Netherlands: (TNO, 2006; TU/e Cursor 2012):• The direct costs of stress are 4 billion Euro per year. • Every year 150.000 300.000 employees become ill because of stress at work.
• 1/7 disabled because of stress at work.• In TU Delft, 53% of surveyed students indicated that they experienced huge stress during their studies.
15
What do organizations try (not) to do?
Discuss psychological load (28%)
Change work processes (17%)
Source: (TNO, dossier Werkdruk)
Extend regulations (9%)
Reduce workload (33%)
Improve managers’ skills (13%)
Improve work/life balance (14%)
What can go wrong?• They are not always aware of the problem ordon’t know the exact cause
• People do not always want to share what they experience with others
• Not always timely enough• Expensive to organize meeting with psychologists, interventions
• The individual causes are different and notalways well understood
• Giving practical advises is not trivial
Types of Stress and StressorsDifferent types of stress:• Survival stress – a response to a physical danger• Environmental stress ‐ noise, crowding, pressure from work or family
• Internal stress ‐ worrying about things we can't control; putting ourselves in situations we know will cause us stress (addicted to stress – expanding todolist with more and more conference deadlines)
• Fatigue and overwork ‐ in a long term perspective
Stress affects both body and mind
18
Types of Stress and StressorsThree kinds of stress:• Acute: caused by an acute short‐term stress factor.• Episodic acute: occurs more frequently & periodically.• Chronic: caused by long‐term stress factors ‐ harmful.
Factors causing stress@work:• long work hours, work overload, time pressure, difficult, demanding or complex tasks, high responsibility, lack of breaks, lack of training
• conflicts, underpromotion, job insecurity, lack of variety, and poor physical work conditions (limited space, temperature and lighting conditions)
19
Concept
20
Be‐eep!Be‐eep!
StressAnalyticsMake people aware of their stress and stressors
Overview of stressors Exploration of relations
Access to evidence, i.e. annotated, measured stressEmpowerment by awareness (+ implicit/explicit advice)
Our approach to StressAnalytics
What, When, Where, with Whom
Physiological signs
OLAP cube
Pattern Mining
22
Our approach to Stress Analytics• Make a person aware of what is happening
– how they spend their time and when and from where the stress comes in
• Provide valuable input for pattern mining/knowledge discovery– Much richer data sources
• Visual analytics– Interactive exploration of stress‐related data– Collecting subjective data/labels from a person through the interaction
23
GSR, temp., voice, heart rate, facial expressions
Physiological signsrelated data
External user‐related data
KPI, E‐mail, calendar, social media, news
environmentExternal
environment
Zoom in&out, slice&diceOLAP
Zoom in&out, slice&dice
Pattern mining, prediction, query‐by‐exampleData Mining
Feature extraction, peak/change detection, classification
Raw data, objective evidence
GUIExploration, Interaction, Visual Analytics
temperature, lighting, noise, airconditioning
Evidence: physiol. signals & external sourcesGSR, Temperature, Speech, Facial expressions, Sentiment in text
Alignment of Information Sources
• What person reads and writes: SentiCorr• What person does in general according to agenda• Environment context (lighting, noise, temp etc.)• Annotate data from video, sound, text processing, and vital signs
• …• What person does with the computer
– http://wakoopa.com/
Different aspect with pre‐processing, storing, managing
26
Stress Data Cube/OLAPQuick data summaries wrt predefined dimensions
27
Stress Analytics Visualization• OLAP‐style exploration: selecting multi‐dimension, zoom‐in, zoom‐out.
• Navigating to the evidences: i.e. raw data:– GSR, skin temperature, speech, and email
• Shape‐based time‐series similarity search–State‐of‐the‐art UCR‐Suite (Keogh et al.)
• Demo:http://www.win.tue.nl:8080/saw_analytics/stress_v
isualization.jsp
OLAP system, a Star Schema
Shape‐Based Query‐by‐Example
• Find a similar shape time‐series with s
• Given a subsequence of GSR time series s
Query
Result
Shape‐based QBE• Euclidean Distance:
• Dynamic Time Warping (DTW)
State‐of‐the‐art UCR‐Suite (Keogh et al.)
How to measure stress
Determine stress levelbased on observed sweat production
32
Detection and Categorization of Stress
Based on GSR data alone ‐ not as easy as the following figure may suggest:
33
Challenges in Stress Detection• All kinds of noise, e.g. loosing contact with the skin
• Activity (exercising) , environment (cold/hot) context and personal differences may impact GSR we observe
34
Interpretation isn’t straightforward
35
Detection as Classification
• Total number of GSR response.• The sum of GSR amplitude.• The sum of rising time response.• The sum of energy response.
• Mean, SD, min and max of GSR.• Mean, min and max of peak height.
GSR features
Adding more data to disambiguate
• Skin and room temperature, noise, accelerometer, voice, face, …
37
e.g. activity recognition can helpWriting vs. typing vs. walking vs. teaching vs …
Analyzing accelerometer data only (wrist band) 38
Uncontrolled and semi‐controlled
• Philips Research employees wearing the device during their working hours
• Students passing the written and multiple choice exams
• Students presenting demos/posters with course project results
• More to come via HumanCapitalCare
39
Experiment demo
41
Measuring GSR in (un)controlled settings
• Philips prototype
• Self‐made, the LEGO Mindstorms NXT
42
Multi‐Source Affective Data Classification
Stress/Emotion classification from text, GSR & speech
GSR & other sensors
Facial expression analysis
43
Automatic Stress Detectionfeature enrichment ensemble learning
speech GSR
speechfeatures
GSRfeatures
combinefeatures
classification
speech GSR
speechfeatures
GSRfeatures
ensemble
classificationclassification
speech
speechfeatures
classification
GSR
GSRfeatures
classification
speech model GSR model
Stress and Skin Conductance
StressChanges in Autonomic Nervous System (ANS)
activation of sweat glands
Changes of the amount of the produced sweatChanges of skin conductance
• Relax skin is drier skin conductance is lower
• Stress sweat increasesskin conductance is higher
GSR features
• Total number of GSR response.• The sum of GSR amplitude.• The sum of rising time response.• The sum of energy response.
• Mean, SD, min and max of GSR.• Mean, min and max of peak height.
Change detection approachOnline settings
Preprocessing steps
50
Stress and Speech
Stress Respiration Rateincreases
Increased subglottal pressure
Increased Pitch
Voice is a good indicator of stress [scherer, 1986]
Speech Features• Voiced and unvoiced speech
Speech Features• Pitch / Fundamental frequency
Speech FeaturesMel Frequency Cepstral Coefficients (MFCCs)are coefficients that approximate human perception auditory response.
Audio(temporal)
frequency filtered frequency
log frequency
FFT Mel scale filter
logs power
DCT representation
DCTStore the first coefficientsMFCCs
Classification Methods
• Support Vector Machine (SVM) – State of the arts.
• Decision Tree classifier.• K‐means using Vector Quantization (VQ). This method is chosen as a baseline.
• Gaussian Mixture Model (GMM). This method works well for speaker recognition task.
• Change detectors: ADWIN, thresholding
Stress Dataset
• Three types of GSR patterns.
• First type:• Second type:• Third type:
Aligning of data sources
Instance 1 Instance 2 Instance 3
Instance 1 Instance 2 Instance 3
60 seconds
GSR
speech
Stress Dataset: Speech Features
Stress Model using GSR features
• SVM outperformed other methods.• Recognizing light vs heavy workload is harder than between recovery vs heavy workload.
46.12
55.54 53.21
70.5174.9
66.82
79.66 80.72
70.673.45
77.81
62.52
0
10
20
30
40
50
60
70
80
90
Recovery vs workloads Recovery vs heavy workload Light vs heavy workload
Accuracy (p
ercent)
k‐means
GMM
SVM
Decision Tree
10-times 10-fold CV (not subject independent)
Stress Model using speech features
• SVM outperforms the other classifier.• K‐means and GMM do not perform well for speech.• MFCC is a good indicator for stress detection.
49.6555.39
49.17 50.6
58.82 56.78 59.08
52.3
62.08
92.39 92.56 91.69
55.6
68.86 70.69 71.47
0
10
20
30
40
50
60
70
80
90
100
Pitch MFCC MFCC‐Pitch RASTA
Accuracy (p
ercent)
k‐means
GMM
SVM
Decision Tree
1‐subject‐leave‐out cross‐validation(subject independent model)
79.66 80.72
70.674.84 75
63.04
0
10
20
30
40
50
60
70
80
90
Recovery vs workloads Recovery vs heavy workload Light vs heavy workload
Accuracy (p
ercent)
GSR Tasks
10‐times 10‐fold CV 1‐Subject‐Leave‐Out CV
62.08
92.39 92.56 91.69
53.04
67.82 70 72.17
0
10
20
30
40
50
60
70
80
90
100
Pitch MFCC MFCC‐Pitch RASTA PLP
Accuracy (p
ercent)
Speech Features
10‐times 10‐fold CV 1‐subject‐leave‐out CV
It is better to address the problem of stress detection using a subject dependent model
Fusion ApproachesFeature
enrichmentEnsemble learning
Fusion of GSR and Speech90.73 91.34
69.04
92.43 92.47
70.17
0
10
20
30
40
50
60
70
80
90
100
MFCC and GSR MFCC‐Pitch and GSR Pitch and GSR
Accuracy (p
ercent)
Enriching Feature Space Logistic Regression as MetaLearner
Light vs. heavy workload, balanced data
Kappa Agreement for Classifiers
• Measure agreement between two model using Cohen’s Kappa test.
• Kappa = 1 complete agreement.• Kappa = 0 complete disagreement.
Stress detection summary• Speech is more reliable (in lab settings) than GSR, but more subject dependent.
• SVM is performing better on both GSR and Speech signal.
• ADWIN & thresholding detectors do well on GSR
• Combining GSR and Speech is not trivial:– Speech and GSR predictions are highly independent (low kappa value)
– This diversity may be exploited with dynamic integrations methods
Further directions• Extend the notion of stress (positive and negative) in the stress analytics framework.
• Stress analytics affective data analytics• Collect more data to enable OLAP KDD part of the framework.
• Combine with other signals, such as facial expression, heart rate, nutrition.
• Long path from lab setting to real‐life situation; but both are needed.
Is Acute Stress Good or Bad?
69
What is the Relaxation Then?
70
Is “Normal” Condition Good or Bad?
What if someone’s patterns looks like NNNNNNNNNNNNNNNN ……
71
Summary• The fun parts come from
– The fact that not much is known about stress–Playing with heterogeneous/multi‐modal data–Multi‐disciplinary (data collection, data management, data mining, visual analytics)
– Engineering approach to data mining
• How to show the utility – i.e. what we do–helps to understand better stress as a phenomenon, and the stressors, and how to
–helps people at the end 72
Take home messages• Lab settings vs. real world• Availability and quality of the signal
– Voice recorded– Someone’s else voice recorded– Noise and missing data, uncertainty– A person cannot speak (during the meeting while someone else is speaking)
• Ground truth, labels, subjective vs. objective• A large problem space
– If you know how to help us with any part on StressAnalytics – talk to me 73