machine learning challenges for automated prompting in smart homes
DESCRIPTION
As the world's population ages, there is an increased prevalence of diseases related to aging, such as dementia. Caring for individuals with dementia is frequently associated with extreme physical and emotional stress, which often leads to depression. Smart home technology and advances in machine learning techniques can provide innovative solutions to reduce caregiver burden. One key service that caregivers provide is prompting individuals with memory limitations to initiate and complete daily activities. We hypothesize that sensor technologies combined with machine learning techniques can automate the process of providing reminder-based interventions or prompts. This dissertation focuses on addressing machine learning challenges that arise while devising an effective automated prompting system. Our first goal is to emulate natural interventions provided by a caregiver to individuals with memory impairments, by using a supervised machine learning approach to classify pre-segmented activity steps into prompt or no-prompt classes. However, the lack of training examples representing prompt situations causes imbalanced class distribution. We proposed two probabilistic oversampling techniques, RACOG and wRACOG, that help in better learning of the``prompt'' class. Moreover, there are certain prompt situations where the sensor triggering signature is quite similar to the situations when the participant would probably need no prompt. The absence of sufficient data attributes to differentiate between prompt and no-prompt classes causes class overlap. We propose ClusBUS, a clustering-based under-sampling technique that identifies ambiguous data regions. ClusBUS preprocesses the data in order to give more importance to the minority class during classification. Our second goal is to automatically detect activity errors in real time, while an individual performs an activity. We propose a collection of one-class classification-based algorithms, known as DERT, that learns only from the normal activity patterns and without using any training examples for the activity errors. When evaluated on unseen activity data, DERT is able to identify abnormalities or errors, which can be potential prompt situations. We validate the effectiveness of the proposed algorithms in predicting potential prompt situations on the sensor data of ten activities of daily living, collected from 580 participants, who were part of two smart home studies.TRANSCRIPT
Machine Learning Challenges for Automated Prompting in Smart Homes
Barnan Das
May 22, 2014
2
2009 2030
Older adult (65+) population in US
72mn
40mn
3
5million
15million
60%
Alzheimer’s patient
Unpaid caregivers
Caregivers report stress
4
5
Machine learning algorithms trained on smart home sensor data can predict when an individual faces difficulty while performing everyday activities. “ ”
6
7
Smart Home Studies
Study 1 Study 2
Participants 400 180
Activities 8 6
Activity Errors Naturalistic
Naturalistic
8
Automated Prompting
Emulating Caregiver Prompt Timing
Detecting Activity Errors in Real Time
Imbalanced Class
DistributionClass Overlap
One-Class Classification
Overview
Study 1 Study 1, 2
9
Emulating Caregiver Prompt Timing
8 DailyActivities
Study 1
Prompts issued when errors were committed
Raw Data
1ActivityStep
17 EngineeredFeatures
Used by Algorithms
0/1
1Training Example
Binary class{prompt, no-prompt}
10
Total # training examples
39803.94%
Class Distribution
prompt
class
11
Automated Prompting
Emulating Caregiver Prompt Timing
Detecting Activity Errors in Real Time
Imbalanced Class
DistributionClass Overlap
One-Class Classification
Overview
12
Imbalanced Class Distribution
13
Preprocessing
Sampling• Over-sampling the minority class• Under-sampling the majority class
Oversampling• Spatial location of training examples in
Euclidean space
Existing Solutions
14
Preprocessing technique to oversample minority class
Approximate discrete probability distribution
using
Generate new minority class data points using
Chow-Liu’s algorithm
Gibbs sampling
Proposed Approach
17
Minority Class Samples
Majority Class
Samples
Markov Chains
Gibbs Sampling
18
(wrapper-based) RApidly COnverging Gibbs Sampler
RACOG wRACOG
Sample selection
Pre-defined lag on Markov chain
Highest probability of misclassification by wrapper classifier
Stopping criteria
Pre-defined number of iterations
No improvement of a performance measure
RACOG & wRACOG
19
Experimental Setup
Datasets Approaches Classifiers
Study 1 (Prompting) Baseline Classifier C4.5 Decision Tree
9 UCI Datasets SMOTE SVM
SMOTEBoost K-Nearest Neighbor
RUSBoost Logistic Regression
Baseline Prompting
RACOG
wRACOG
20
Results (True Positive Rate)
Baselin
e Cla
ssifie
r
SMO
TE
SMO
TEBoost
RUSBoost
Baselin
e Pro
mptin
g
RACOG
wRACOG
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
21
Results (G-mean)
Baselin
e Cla
ssifie
r
SMO
TE
SMO
TEBoost
RUSBoost
Baselin
e Pro
mptin
g
RACOG
wRACOG
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
22
Automated Prompting
Emulating Caregiver Prompt Timing
Detecting Activity Errors in Real Time
Imbalanced Class
DistributionClass Overlap
One-Class Classification
Overview
23
Class Overlap
24
Class Overlap in Prompting Data
3-dimensional PCA plot of prompting data
25
Tomek Links
26
Form clusters
Under-sampling clusters
Cluster-Based Under-Sampling
27
ClusBUS Ensemble
28
Experimental Setup
Dataset Approaches Classifiers
Study 1 (Prompting) Baseline C4.5 Decision Tree
SMOTE Naive Bayes
Clustering Algorithm ClusBUS K-Nearest Neighbor
DBSCAN ClusBUS Ensemble SVM
29
Result (True Positive Rate)
C4.5 Naïve Bayes IBk SMO0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Baseline SMOTEClusBUS ClusBUS Ensemble
30
Result (G-mean)
C4.5 Naïve Bayes IBk SMO0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Baseline SMOTEClusBUS ClusBUS Ensemble
31
Automated Prompting
Emulating Caregiver Prompt Timing
Detecting Activity Errors in Real Time
Imbalanced Class
DistributionClass Overlap
One-Class Classification
Overview
32
Detecting Activity Errors in Real Time
Sensor events labeled with
activity stepsAvailability of information on
activity errors
33
Basic Idea
Participants with no reported errors
One-Class Classifier
Participants who committed errors
Normal Activity
Data
Train Test
Activity Datawith ErrorsActivity
Data
34
6 DailyActivities
Participants
Annotated for error start times
Raw Data
1SensorEvent
>70EngineeredFeatures
1
1Training Example
One-class
{normal}
Used by Algorithms
580
DERT Data
35
One-Class SVM
x1
x2
36
Model Selection
37
Activity Error Classification
WHY? To characterize change in daily activities of older adults
HOW? Sensor data
Error Types Accuracy*
Study 1 4 73%Study 2 9 54%
*Using C4.5 decision tree and 10-fold CV
41
Activity Error Models
One-Class Multi-Class
42
Ensembles
One-Class SVM
Test Sample
Error Model
One-ClassMulti-Class
Logical AND
Normal/Error
43
Experimental Setup
Datasets Approaches
Study 1 (400 participants) Baseline
Study 2 (180) participants OCSVM
OCSVM + OCEM
OCSVM + MCEM
44
Results: Study 1
Sweepi
ng a
nd D
ustin
g
Takin
g M
edica
tion
Wat
erin
g Pla
nts
Cookin
g0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Sweepi
ng a
nd D
ustin
g
Takin
g M
edica
tion
Wat
erin
g Pla
nts
Cookin
g0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Recall Precision
Baseline OCSVM OCSVM+OCEM OCSVM+MCEM
45
Results: Study 2Recall Precision
Baseline OCSVM OCSVM+OCEM OCSVM+MCEM
Sweepi
ng a
nd D
ustin
g
Clean
ing
Count
erto
ps
Takin
g M
edica
tion
Wat
erin
g Pla
nts
Was
hing
Han
ds
Cookin
g
0
0.1
0.2
0.3
0.4
0.5
0.6
Sweepi
ng a
nd D
ustin
g
Clean
ing
Count
erto
ps
Takin
g M
edica
tion
Wat
erin
g Pla
nts
Was
hing
Han
ds
Cookin
g
0
0.01
0.02
0.03
0.04
0.05
0.06
46
Clinical Evaluation
18%
Continuation of
Previous error
ActuallyTrue Positives
33%
• Evaluation of algorithm-predicted false positives
• Psychology clinician looked at participant’s videos
• Emulate caregiver intervention.
• Class imbalance and overlap.
• Detect activity errors in real-time.
47
Conclusion
• Validated primary hypothesis.
• Foundation of a real-world prompting system.
• RACOG and wRACOG for continuous values.
• ClusBUS in other domains.
• Precise annotation for activity errors.
Summary Significance
FutureWork
48
Publications
Book Chapter Journal
B. Das, N.C. Krishnan, D.J. Cook, “Handling Imbalanced and Overlapping Classes in Smart Environments Prompting Dataset”, Spinger book on Big Data, 2014.
B. Das, N.C. Krishnan, D.J. Cook, “Real-Time Activity Error Prediction to Assist Older Adults in Smart Homes: An Outlier Detection-Based Approach”, AI in Medicine, 2014. (Submitted)
B. Das, N.C. Krishnan, D.J. Cook, “Automated Activity Intervention to Assist with Activities of Daily Living”, IOS Press book on Agent-Based Approaches to Ambient Intelligence, 2012.
B. Das, N.C. Krishnan, D.J. Cook, “RACOG and wRACOG: Two Probabilistic Oversampling Techniques”, IEEE Transaction of Knowledge and Data Engineering, 2014.
A.M. Seelye, M. Schmitter-Edgecombe, B. Das, D.J. Cook, “Application of cognitive rehabilitation theory to the development of smart prompting technologies”, IEEE Reviews in Biomedical Engineering, 2012.
B. Das, D.J. Cook, M. Schmitter-Edgecombe, A.M. Seelye, “PUCK: An Automated Prompting System for Smart Environments”, Journal on Personal and Ubiquitous Computing, 2012.
49
Publications
Conference Workshop
B. Das, N.C. Krishnan, D.J. Cook, “wRACOG: A Gibbs Sampling-Based Oversampling Technique”, International Conference on Data Mining, 2013.
B. Das, N.C. Krishnan, D.J. Cook, “Handling Imbalanced and Overlapping Classes in Smart Environments, ICDM Workshop in Data Mining in Bioinformatics and Healthcare, 2013.
S. Dernbach, B. Das, N.C. Krishnan, B.L. Thomas, D.J. Cook, “Simple and Complex Activity Recognition Through Smart Phones”, International Conference on Intelligence Environments, 2012.
B. Das, A.M. Seelye, B.L. Thomas, D.J. Cook, L.B. Holder, “Using Smart Phones for Context-Aware Prompting in Smart Environments”, International Workshop on Consumer eHealth Platforms, Services and Applications, 2012.
B. Das, C. Chen, A.M. Seelye, D.J. Cook, “An Automated Prompting System for Smart Environments”, International Conference on Smart Homes and Health Telematics, 2011.
B. Das, D.J. Cook, “Data Mining Challenges in Automated Prompting Systems”, Interactions with Smart Objects Workshop, 2011.
E. Nazerfard, B. Das, L.B. Holder, D.J. Cook, “Conditional Random Fields for Activity Recognition in Smart Environments”, ACM Symposium on Human Informatics, 2010.
B. Das, C. Chen, N. Dasgupta, D.J. Cook, “Automated Prompting in Smart Home Environment”, ICDM Workshop on Data Mining Services, 2010.
C. Chen, B. Das, D.J. Cook, “A Data Mining Framework for Activity Recognition in Smart Environments”, International Conference on Intelligent Environments, 2010.
C. Chen, B. Das, D.J. Cook, “Energy Prediction Using Resident’s Activity”, International Workshop on Knowledge Discovery from Sensor Data, 2010.
50
AcknowledgementDr. Diane Cook Prafulla Dawadi Adri Seelye
Dr. Larry Holder Dr. Ehsan Nazerfard Carolyn Parsey
Dr. Narayanan C. Krishnan (CK) Dr. Kyle Feuz Christa Simon
Dr. Maureen Schmitter-Edgecombe Brian Thomas Alyssa Weakley
Dr. Behrooz Shirazi Chris Cain Jennifer Williams
Dr. Alex Mihailidis Shirin Shahsavand
Dr. Aaron Crandall
Dr. Hassan Ghasemzadeh
And, all previous colleagues, collaborators and friends…
51