applications of machine learning to support dementia care ... · problems in dementia care. the...

Applications of Machine Learning to Support DementiaCare through Commercially Available Off-the-Shelf

Sensing

George Netscher

Electrical Engineering and Computer SciencesUniversity of California at Berkeley

Technical Report No. UCB/EECS-2016-204http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-204.html

December 15, 2016

Copyright © 2016, by the author(s).All rights reserved.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires priorspecific permission.

Applications of Machine Learning to Support Dementia Care throughCommercially Available O↵-the-Shelf Sensing

by

George Netscher

A project report submitted in partial satisfaction of the

requirements for the degree of

Master of Science, Plan II

in

Computer Science

in the

Graduate Division

of the

University of California, Berkeley

Committee in charge:

Alexandre M. Bayen, Research AdvisorTrevor Darrell, Second Reader

Fall 2016

12/13/16

Applications of Machine Learning to Support Dementia Care throughCommercially Available O↵-the-Shelf Sensing

Copyright 2016by

George Netscher

1

Abstract

Applications of Machine Learning to Support Dementia Care through CommerciallyAvailable O↵-the-Shelf Sensing

by

George Netscher

Master of Science, Plan II in Computer Science

University of California, Berkeley

Alexandre M. Bayen, Research Advisor

In this report, we discuss a project beginning August 2014 and ending in December 2016through which four applications of machine learning to dementia care were explored. Thepurpose of this project was to determine how advances in machine learning could be appliedto commercially available o↵-the-shelf sensing equipment to make a positive impact in carefor individuals with Alzheimer’s disease and related dementias (ADRD), a cause personallyimportant to the author and the research advisor. The project will be discussed for anaudience familiar with the state-of-the-art in machine learning but unfamiliar with the openproblems in dementia care. The first chapter gives background on Alzheimer’s disease andthe context for the current work in terms of the current challenges faced by the Alzheimer’sresearch community. The following four chapters each discuss one application. The firstdiscusses how a wearable system can be designed to support daily monitoring of individualsa↵ected by Alzheimer’s disease to study functional changes which can occur as the diseaseprogresses. The second discusses how analysis of speech can be used to detect the presenceof dementia. The third discusses how video monitoring can be used to detect safety-criticalevents with a particular focus on falls. The fourth provides preliminary pilot study resultsfrom the application of video monitoring in one 40-resident memory care community. Thefinal chapter concludes by discussing the gaps between the available technology and thecurrent needs and poses suggestions for future work to bridge the gaps.

i

Contents

Contents i

List of Figures iii

List of Tables vi

1 Introduction and Background 11.1 Background on Alzheimer’s Disease . . . . . . . . . . . . . . . . . . . . . . . 11.2 Context of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Functional Monitoring through Wearables 72.1 Chapter Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 The Dementia Care Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.5 Biggest Development Challenges . . . . . . . . . . . . . . . . . . . . . . . . . 142.6 Analysis Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.7 Indoor Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.8 Results of Beta Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Diagnosis through Speech 293.1 Chapter Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.3 Background and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4 Feature Extraction Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.5 Classification Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 Fall Detection through Video Analysis 474.1 Chapter Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

ii

4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Fall Reduction through Video Review 595.1 Chapter Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6 Conclusions 656.1 Review of Project Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.2 Final Conclusion: Hybrid Solutions are Required for Practical Challenges . . 66

Bibliography 68

iii

List of Figures

1.1 Alzheimers disease is the most expensive disease in the US and the only diseasein the top six for which the number of deaths is increasing [2] . . . . . . . . . . 1

1.2 Factors which impact the likelihood of cognitive decline and dementia . . . . . . 4

2.1 The architecture deployed for the Dementia Care Ecosystem. . . . . . . . . . . . 102.2 The five system components and their uses. . . . . . . . . . . . . . . . . . . . . 122.3 CPU usage before optimizing for battery life. Waking up the CPU frequently to

sample from the sensors caused rapid battery loss. . . . . . . . . . . . . . . . . . 142.4 Outliers are those points labeled by DBSCAN as not belonging to any cluster [53]. 172.5 Global outlier detection with k-NN fails for a nonstationary distribution. Local

outlier detection methods provide empirically worse performance as described indescribed in [29]. We instead prefer the k � NN with a rolling window whenanomaly detection over a nonstationary window is required. . . . . . . . . . . . 19

2.6 RANSAC fits a regression line by consensus, providing robustness to noise [53]. . 202.7 KLMS outperforms linear LMS in fitting nonlinear functions when the nonlinear

function class is known a priori [45]. . . . . . . . . . . . . . . . . . . . . . . . . . 222.8 The random forest increases classifier accuracy by averaging over random decision

trees to reduce variance [53]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.9 Salesforce user interface for care team navigators to view metrics and analysis for

monitored individuals with dementia . . . . . . . . . . . . . . . . . . . . . . . . 252.10 The Android user interface for administrators and a selected screen from home

setup. The interface for non-administrator users provides a subset of the functionsavailable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.11 Representative plot of room inference from raw RSSI . . . . . . . . . . . . . . . 272.12 Representative plot of inferred room location with outlier detection applied . . . 272.13 Representative plot of step count data with trend detection applied . . . . . . . 28

iv

3.1 Distribution of the 126 individuals with respect to disease. The set comprises66 healthy controls (HC, 52.4%), and 60 individuals with Alzheimer’s diseaseand related Dementias (ADRD, 47.6%). Of the a↵ected individuals, the primarydiagnosis for 16 is Alzheimer’s disease (AD, 12.7%), for 20 is behavioral Fronto-Temporal Dementia (bvFTD, 15.9%), for 1 is Dementia with Lewy Bodies (DLB,0.8%), for 23 is Primary Progressive Aphasia (PPA, 18.3%). Within the PPAsegment, 14 show the semantic variant (svPPA, 11.1%), 7 show the right semanticvariant (rsvPPA, 5.6%), and 2 show the non-fluent variant (nfvPPA, 1.6%) . . 34

3.2 Distribution of the 126 individuals with respect to gender and age. . . . . . . . 343.3 Vocabulary Features Extraction Process. Each word considered relevant is used to

compute aggregation functions. The others are counted to compute the relevanceratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.4 Procedure to perform classification using regressors. 0, 1 and 2 represent thelabels to predict (HC, AD, FTD, PPA). One regressor is associated to each of them 38

3.5 The best results in determining whether a speech segment belongs to an individualwith dementia or a healthy control. Two-step AdaBoost, or Selective Boosting,demonstrates 92% accuracy and greater than 90% precision and recall. . . . . . 39

3.6 The best results in determining whether a speech segment belongs to an individualwith dementia or a healthy control. Most tree-based methods reach accuracieshigher than 80%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.7 The best results in determining the diagnosis if present. Gradient Boostingdemonstrates 70% accuracy, greatly benefitted from high recall among healthycontrols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.8 The e↵ect of feature selection on the bimodal classification results. Feature se-lection is performed a priori using the AdaBoost score function, so the change inaccuracy for AdaBoost is most indicative. . . . . . . . . . . . . . . . . . . . . . 41

3.9 The best results when limited to 15 features. Note multiBoost still demonstrates85% accuracy. Here feature selection is performed using a decision tree with Ginicriterion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.1 Examples of data from the day-time and night-time settings. . . . . . . . . . . . 494.2 Domain confusion net, based on [74], used for experiments. Note that the first

seven layers are initialized from the VGG weights [68]. We lock the weights for alllayers except fc7 and fc8. In implementation, we use two fcD layers with sharedweights to connect to light and dark fc7 layers, respectively. . . . . . . . . . . . 50

4.3 An example of deep artistic style transfer from [26] whereby the content of imageA is transformed into the style of 3 separate paintings in images B, C, and D. . 53

4.4 Examples of style transfer with originals on the left and transformed images onthe right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

v

4.5 Measurement of the domain classifier’s ability to distinguish between light anddark domains over training process. There are initial spikes in precision andrecall, followed by convergence to under 50%. Note that the light and darkdomain confusion net results in Tables 4.2 and 4.3 occur at 25,000 and 15,000iterations, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.6 Success and failure examples in fall detection in the light domain. . . . . . . . . 574.7 Success and failure examples in fall detection in the dark domain. . . . . . . . . 58

5.1 Equipment. IP cameras were placed in all common areas and approved privaterooms. Video was transmitted from the cameras to the network attached storage(NAS) via Wi-Fi where it was maintained locally for 72 hours after which itwas transmitted to a remote server for archiving. Live video and video from theprevious 72 hours were made available to facility management via smartphoneapplications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2 Fall rate. In the four months prior to video review, the fall rate at the communitywas 10.5 2.5 falls per month, 79% of the national average. In the final month, 2falls occurred, 17% of the national average. . . . . . . . . . . . . . . . . . . . . 63

vi

List of Tables

3.1 Top 10 features for bimodal classification. The symbol � denotes composition offunctions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.2 Top 10 features for bimodal classification with ANOVA. The symbol � denotescomposition of functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3 Top 10 features for multimodal classification. The symbol � denotes compositionof functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.1 SGD solver parameters used to train all nets. . . . . . . . . . . . . . . . . . . . 504.2 Fall detection results for baseline, domain confusion, and style transfer methods. 514.3 Dark domain detection results for domain confusion method. The dark domain

detection results in this table correspond to the snapshots used to evaluate falldetection in Table 4.2. Note that the test set used for dark domain detection isthe same test set used in Table 4.2 but is partitioned by domain rather than bycategory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

vii

Acknowledgments

This work would not have been possible without help from so many people. I want tothank my advisor Alex Bayen for supporting each of the directions followed as we searchedfor the best way to make a positive impact and gracefully supporting transitions betweenmany projects and roles. I want to thank my coworker Julien Jacquemot for all of the latenights and long hours working together on projects. I want to thank the many studentsand faculty who worked on each of these projects and without whom this work would nothave been possible including: Pulkit Agrawal, Nick Boyd, Sbastien Levy, Bradley Zylstra,Yanrong Li, Ludovic Thea, Jun Jie Ng, Marie Douriez, Chong Wee Tan, Cyril Tamraz, PeterPressman, Bob Levenson, Katrin Schenk, Steve Bonasera, Kate Possin, and Bruce Miller.I want to thank each of my o�ce mates for supporting this work through daily discussionsand weekend barbecues including Cathy Wu, Jerome Thai, Walid Krichene, and FrancoisBelletti. Finally, I am so thankful to my girlfriend, Casey Maas. Without her daily support,I am quite sure my sanity would have been lost somewhere along the way.

1

Chapter 1

Introduction and Background

1.1 Background on Alzheimer’s Disease

In the US, Alzheimer’s disease is the sixth most common disease and the single most expen-sive disease ($236B direct costs; estimated $221B indirect). As shown in figure 1, amongthe top six diseases, it is the only disease for which the number of deaths is increasing. Asthe median age of the US population continues to increase, Alzheimer’s disease will onlybecoming more prevalent, and the resulting cost of Alzheimer’s care will continue rising tounsustainable levels [2]. Unfortunately, the drug failure rate for Alzheimer’s disease stillremains among the highest – currently 99.6% (as compared to 81% for cancer) [14] due toour limited understanding of the brain and the root causes of Alzheimer’s disease.

Figure 1.1: Alzheimers disease is the most expensive disease in the US and the only diseasein the top six for which the number of deaths is increasing [2]

CHAPTER 1. INTRODUCTION AND BACKGROUND 2

Alzheimer’s disease is characterized by a progressive loss of cognitive ability. In the mildstage, a↵ected individuals may have trouble remembering words, planning, and completingtasks independently. In the moderate stage, those a↵ected might forget their own personalhistory, become confused about time and place, and su↵er alterations in personal behaviorand self-control. By the severe stage, individuals usually require full-time care to helpcomplete the activities of daily living (ADLs) such as eating and toileting, and they mayexperience changes in physical ability such as the ability to walk, sit, and swallow.

The incidence rate of Alzheimer’s disease is alarming. 5.4 million people in the UnitedStates are a↵ected including 1 in 9 people over 65 years old and 1 in 3 over 85 years old. Thus,almost surely, every person who reads this statement will be personally a↵ected through aloved one, a close personal friend, or a personal diagnosis. One reason for the high inci-dence rate is that it is not uncommon for an individual with Alzheimers disease to live for20 years after diagnosis. In much of this time, individuals will require support with ac-tivities of daily living and struggle with preventable emergency room visits due to urinarytract infections, extrinsic fall incidents, and bedsores. These incidents could be preventedrespectively through proper hydration and toileting habits, proper design of environment toremove external factors contributing to falls, and through periodic changes in body position-ing. With costs already at unsustainable levels, Alzheimers disease threatens to cripple theUS healthcare system, and Alzheimers disease only represents two thirds of all those a↵ectedby dementia [2].

1.2 Context of Work

In this work, we discuss three approaches currently under development in the research com-munity to address the challenges presented by Alzheimer’s disease and related dementias.

1. Curing, delaying, or mitigating disease e↵ects: This research area focusesbroadly on the root cause of the disease. It includes the many pharmaceutical ap-proaches attempted to cure Alzheimer’s and many public health studies aimed at de-termining if certain interventions such as proper diet and exercise can mitigate thee↵ects on a population level.

2. Early detection: This research focuses on identifying relevant biomarkers which canbe detected before significant brain damage has occurred. By detecting these warningsigns early, the available interventions identified in the previous research area can beapplied to delay and/or mitigate the e↵ects of the disease.

3. Caring for those currently a↵ected: This research area focuses on improving thequality of life and reducing the cost of care for those currently a↵ected. Major themesin this area include reducing the rate of hospitalization where falls are the greatestcontributor and enabling individuals to remain independent for longer.


Although several other interesting areas of Alzheimer’s research exist, these three themescapture major thrusts within the research community. Moreover, we focus here because ineach area, there appear to be significant opportunities for the development of technologywhich could make a far-reaching positive impact.

Curing, Delaying, or Mitigating the E↵ects

The first approach where technology may provide support is in curing, delaying, or miti-gating the e↵ects of Alzheimer’s disease. Unfortunately, it seems every year a promisingpharmaceutical appears to provide hope before failing in clinical trial. Most recently, EliLilly’s experimental Alzheimer’s therapy solanezumab which showed potential for slowingthe e↵ects of cognitive impairment in 2015 [55] failed large scale clinical trial in November2016 [10]. This therapy targets the amyloid plaques that appear as tangles in the brain ofthose a↵ected by Alzheimer’s disease. The failure of this therapy adds supporting evidencethat these amyloid plaques which are characteristic of Alzheimer’s disease may only be asymptom and not the root cause of the disease.

Although research with respect to finding a cure has so far proven unsuccessful, interestingresults have been uncovered with respect to delaying the symptoms. For instance, withrespect to brain training games, a consensus statement was released by a group of leadinggeriatricians expressing concerns regarding the lack of supporting evidence [3]. Specifically,although evidence existed that individuals continued to perform well on brain training games,this success did not appear to extend to more general cognitive abilities. Since then, severalinteresting results have been released including [60] and [4], most notably showing from asample of 2,832 volunteers that those who completed gamified training sessions where 48%less likely to be diagnosed with some form of dementia after ten years.

The Alzheimer’s Association o�cial stance on preventative measures is based on a 2015review of the literature [9] in which su�cient evidence was found to support that regularphysical exercise and management of cardiovascular risk factors including diabetes, obesity,smoking, and hypertension reduce the risk of cognitive decline and may reduce the risk ofdementia. Healthy diet and lifelong learning or cognitive training may also reduce the riskof cognitive decline but su�cient evidence does not exist to suggest that they may reducethe risk of dementia. Our work with respect to this first thrust of Alzheimer’s research isdiscussed in Chapter 2 where we discuss the design and implementation of a system formonitoring the disease progression of individuals with Alzheimer’s. The aim of this systemis to provide researchers with the necessary tools to define more fine-grained relationshipsbetween the amount of exercise individuals undertake and the eventual onset of the disease.Traditional approaches require participants to fill out daily surveys and o↵-the-shelf toolsprovide limited functionality such as a step count. We design here a open-source platformbased on o↵-the-shelf components for interacting with sensors around the home (e.g., to seeif the stove is on) and perform typical signal processing and machine learning techniqueson the fine-grained data (e.g., to detect anomalies). More details on similar approaches


are discussed in Chapter 2 alongside the relevant sensing equipment and available machinelearning methods.

(a) Strength of evidence on risk factors for cog-nitive decline

(b) Strength of evidence on risk factors for de-mentia

Figure 1.2: Factors which impact the likelihood of cognitive decline and dementia

Early Detection

The second research thrust focuses on identifying ways in which Alzheimer’s disease canbe detected before noticeable changes in behavior occur. Typically, Alzheimer’s disease isdiagnosed after a family member or friend notices perceptible changes in the individual’smemory. Unfortunately, these changes often become apparent only after significant braindamage has occurred. At this point, the damage is thought to be irreversible [1]. Thus,significant interest exists in the development of screening tools which could be applied earlyand with high sensitivity to detect individuals living in the community which may haveAlzheimer’s disease or related dementia before significant damage occurs. The screeningtool would then refer the individual to clinical personnel for more accurate diagnosis.

Clinical diagnosis of Alzheimer’s disease is typically performed through personal history,family history, memory tests including the commonly used mini-mental state exam [22],physical tests such as blood and urine analysis, and brain scans such as CT, MRI, and PET.Although brain scans would seem to be a useful tool for a human expert to perform diag-nosis of brain abnormalities, the traditional role of these scans has been to rule out otherpossible causes of symptoms such as tumor growth [1], [38]. Even with all of these tools, thediagnostic accuracy by clinical experts is surprisingly low. A 5-year review of 919 subjectsfrom Alzheimer’s Disease Centers sponsored by the National Institute on Aging who haddied and been autopsied revealed sensitivity for the centers ranging from 70.9% to 87.3%and specificity ranging from 44.3% to 70.8%. Currently, the only way to identify Alzheimer’sdisease with 100% accuracy is through post-mortem histology to identify the plaques char-acteristic of the disease. These low accuracies in diagnoses are particularly startling after


considering the years of expertise required by the clinical expert and the potential harm thatcan be caused by misdiagnosis. Individuals with Dementia with Lewy Bodies, for instance,respond particularly negatively and can die when inappropriately prescribed anti-psychoticmedication [47].

These low diagnostic accuracies suggest that clinicians do not have the necessary toolsto obtain clear signals regarding patient state. Thus, a primary focus in this research area isidentifying relevant biomarkers that provide strong indications of particular diseases and thestage of these disease. The most relevant work in this regard is with respect to new techniquesfor identifying Alzheimer’s disease from cerebral spinal fluid [27], [67] where the concentrationof amyloid-�-derived di↵usible ligands provides a relevant marker for Alzheimer’s disease.In Chapter 3, we discuss how traditional machine learning techniques can be applied toconversational speech data from individuals with Alzheimer’s disease and their caregivers todetect the presence of dementia. This approach is tempting in that a smartphone applicationcould easily be developed to act as an early screening tool, but refining the approach presentsdi�culties in longitudinal data collection which were not practically feasible within the scopeof this project report.

Caring for Those Currently A↵ected

The final research area focuses on how we can support care for those currently a↵ected byAlzheimer’s disease and related dementias. Particular areas of interest include how we canimprove the quality of care and reduce the cost of providing care. The biggest contributorsto cost include the need for assistance with the activities of daily living and the high rate ofhospitalization for individuals a↵ected by Alzheimer’s disease [2]. A particular focus area inthis regard is delaying or reducing the need for institutionalized care by empowering familycaregivers and home care services to support care in the home of the a↵ected individual forlonger. Work in this area includes the development of proper tools such as those provided bythe Alzheimer’s Association and Family Caregiver Alliance for educating caregivers aboutthe resources available to them such as adult day care services and memory care communitieswhich may provide short-term respite care. Another interesting avenue includes the studyof how technology can be used with a human assistant in the loop. This includes work doneby the UC San Francisco and University of Nebraska Medical Centers on the Dementia CareEcosystem where anomalies can be detected by home sensors and screened by a low-cost casemanager before escalating to the need for an emergency room visit as discussed in Chapter2.

Several interesting commercial product o↵erings also exist in this space including tradi-tional fall detection pendants like the Phillips Lifeline, Emerald non-wearable fall detection,and wander detection systems like the GPS SmartSole and Bluetooth SafeWander. Unfortu-nately, although these products provide wander and fall detection, there are no systems withsignificant supporting evidence for reducing the rate at which these safety accidents occur. InChapters 4 and 5, we discuss methods for detecting falls from video and the first deploymentof such techniques. Falls are the leading cause of hospitalization in Alzheimer’s care and are


a particular concern in managed care where residents with dementia have been observed tofall at an average rate of 4 times per year, roughly twice that of cognitively healthy elderlyresidents [17]. Moreover, less than 10% of falls lead to serious injury [15], [17], but 50-75%of elderly fallers experience repeat falls. Although preliminary, it appears the use of camerasin dementia care communities may provide significant benefit with respect to lowering therate of repeat falls. In one pilot study with a 40-resident memory care community, the fallrate was reduced by 80% following video review of fall incidents after which, personalizedchanges could be made to individual room environments based on the way in which residentswere falling. The technology behind this video fall detection is discussed in Chapter 4 andthe results from the pilot study are discussed in 5.

7

Chapter 2

Functional Monitoring throughWearables

2.1 Chapter Abstract

The increasing availability of wearable computing opens up new avenues for cyberphysicalsystems which can provide content personalized to user preferences. In the past, testing theseideas has required expertise in disparate areas ranging from embedded systems to machinelearning. Max is a platform for rapidely prototyping new ideas in personalized wearablecomputing without the need for in-depth expertise and long design cycles. It is built fromo↵-the-shelf components – Bluetooth home sensors, an Android smartwatch, and Androidsmartphone – and (mostly) open source libraries – Android OS and Sci-Kit Learn. Fromthese components, Max is a full-stack system including methods for collecting data fromthe individual via the watch and from the environment via the sensors; maintaining datasecurely; transmitting data to a backend server; performing standard machine learning andsignal processing tasks such as filtering, classifying, detecting trends, and flagging anomalies;and displaying data through Android UX and Salesforce API. In addition, novel methodsfor cost-e↵ective approximate indoor positioning are developed. We show one use case forMax, monitoring individuals with Alzheimer’s disease (AD) through the Dementia CareEcosystem. The Dementia Care Ecosystem defines a new proactive model from the UCSFand UNMC medical centers aimed at reducing emergency room use. This article describesthe design and implementation of Max including the challenges faced, the tradeo↵s made,and beta test results from 13 healthy users over 39 total months. These results show 96.1%accuracy in room-locationing and many trade-o↵s that must be made concerning batterylife.

CHAPTER 2. FUNCTIONAL MONITORING THROUGH WEARABLES 8

2.2 Introduction

In the last four years the global market for wearable devices has grown from $0.75B to$2.93B. This nearly 400% growth has been fueled mainly by high market demand for fitnesstrackers which can monitor bodily signals such as step count, heart rate, and hours of sleep.For the user to view this data, these wearable devices usually use a bluetooth radio to pairwith the owner’s smartphone. This bluetooth radio, however, allows for communication notjust with the owner’s smartphone, but also with the owner’s environment. Since e�cient androbust algorithms exist for determining whether the device is worn in a given instant, theseplatforms pose the potential for a yet unrealized new paradigm for the user to monitor notjust their body but also their environment through communication with ambient bluetoothsensors. We believe this paradigm sets the stage for the next phase of home automationwhere the home is able to provide functions individualized to particular users such as TVsturning on to particular settings, shared vehicles moving to preset user settings, and homeautomation for particular habits. It also pushes the current paradigm of personal trackingforward by enabling tracking of habits which are exhibited not only by bodily signals such asstep count, but also by environmental signals such as how long a wearer spends in di↵erentrooms of the house, uses di↵erent appliances, or spends in a car.

In the past, those with a compelling idea for a personalized computing application havebeen faced with a di�culty challenge. They had to bring together diverse skill sets rangingfrom hardware expertise to build the physical device, web or mobile expertise to create theinterface, and machine learning expertise to provide the analysis. The di�culty of bringingtogether all these skills on a low budget made the success of new endeavors extremely unlikelyas evinced by the recent failings of startup companies like Lively and Ninja Blocks. With therecent deployment of Android Wear, we aim to provide the next logical step. We developa platform for rapid prototyping of personalized computing by pulling together the basichardware, web/mobile, and machine learning building blocks. The core of this platform arethree (mostly) open-source projects: 1) Android Wear, 2) TI Sensortag, and 3) Sci-Kit Learn.Using these projects requires only purchasing commercial o↵-the-shelf components such asan Android watch and TI Sensortag. Our project is built with generality, extensibility, andscalability in mind where methods are developed for hosting a full-stack application completewith extensive development both for computation on a local Android host and on a remoteserver.

In this work, we use the Sony Smartwatch 3, currently available for $130, and the Blue-tooth Smart TI Sensortag, currently available for $30. We present results for one use case:home monitoring of an individual with dementia. After 12 months of beta testing, this usecase is currently being deployed through the Dementia Care Ecosystem, a $10M projectsponsored for the Centers for Medicare and Medicaid Services. The rest of the paper contin-ues as follows. Section 2.3 gives background on Alzheimer’s disease and the specific use case.Section 2.4 describes the system architecture in detail. Section 2.5 describes the biggestchallenges faced in development. Section 2.6 defines the analysis methods available and thesubset included for the Alzheimer’s use case. Section 2.7 describes a new indoor positioning


infrastructure for low-cost, room-level indoor positioning. Section 2.8 presents the resultsfrom beta testing. Section 2.9 provides concluding remarks and recommendations for futureuse.

2.3 The Dementia Care Ecosystem

One example new program is the Dementia Care Ecosystem sponsored by the Centers forMedicare and Medicaid Services [56]. The Dementia Care Ecosystem is a 3-year clinical trailevaluating a care model called Navigated Care, for people with dementia and their familycaregivers. The goal is to improve quality of life, health care utilization, caregiver burden,and satisfaction with care. Central to the Care Ecosystem are minimally trained sta↵ called‘care team navigators’ (CTNs). These sta↵ members are the primary point of contact for upto 80 families, allowing for personalized communication between families and their medicalnetwork. The Dementia Care Ecosystem is composed of four modules:

1. The Caregiver Module includes educational forums, caregiver support, and connectsfamilies with community resources.

2. The Decision-Making Module facilitates proactive medical, financial, and safetydecisions.

3. The Medication Module tracks and reduces inappropriate medications or doses, andtriggers pharmacist review when indicated.

4. The Functional Monitoring Module uses smartphones and sensors to rapidly detectand respond to changes in functional status, which is particularly important for patientsliving remotely, alone, or who are at-risk for acute declines.

The use case of the system described in this paper is for the functional monitoring module.The goal of the functional monitoring module is to calculate five metrics and provide alertswhen these metrics deviate from expected. These five metrics include daily step count,approximate gait speed, daily lifespace, daily and hourly room percentage, and daily roomtransitions. Based on these five metrics, 2 sets of analyses are conducted: outlier detectionand trend detection. Data is collected with an Android smartphone, Android smartwatch,and in-home sensors, analyzed using a backend server, and displayed for CTNs using aSalesforce dashboard. CTNs respond to alerts by confirming the data appears abnormalthen calling the family or clinical team as needed.

This system is the first long-term, continuous, personalized monitoring system developedfor individuals with cognitive disorders. This represents part of a growing paradigm shiftfrom medical equipment that is designed to react to disease states (e.g., X-ray imaging),to equipment which is designed to enable a proactive medical system. It is the first long-term. Many other technology systems for Alzheimer’s care are available including o↵eringsfrom companies like HealthSense and BeClose, but they fail to identify a person uniquely.


Thus, the primary impetus for this system was the need for a system which could monitorthe behavior patterns of a specific individual in the presence of multiple individuals livingin the same residence. The greatest challenge in accomplishing this goal was designing asystem capable of robust indoor positioning using cost-e↵ective o↵-the-shelf equipment. Thesolution is described in Section 2.7.

2.4 System Architecture

The goal of the system architecture is to provide a pipeline whereby data from the externalenvironment can be collected, analyzed, and used for generating notifications if necessary. Forthe Alzheimer’s monitoring use case, the pipeline is used to calculate five metrics, determinepossible causes for alarm, and raise notifications for care team navigators responsible formonitoring a↵ected individuals. The full system architecture instantiated for this use caseis shown in Figure 2.1.

Figure 2.1: The architecture deployed for the Dementia Care Ecosystem.

The five metrics used for the Dementia Care Ecosystem include:


1. Step count: The number of steps taken per day

2. Approximate gait speed: The steps per minute during each period of activity

3. Lifespace: The maximum euclidean distance from the home per day

4. Room percentage: The percentage of time spent in each room on a daily and hourlybasis

5. Room transitions: The number of transitions between rooms per day

Although these are the five metrics chosen for the Dementia Care Ecosystem, we takecare to design a system which is flexible enough to handle the inclusion of any Bluetoothsensors. O↵-the-shelf, the application allows for the collection of all data sources from thesmartphone and smartwatch depending on the capabilities of each. For instance, althoughwe do not use heart rate data, it can be collected at the desired interval simply by connectinga smartwatch with a heart rate sensor to the smartphone through standard Android Wearprocedures then updating the appropriate settings.

To collect these metrics, the system architecture is composed of five units. The Estimotebeacons output a Bluetooth low energy (BLE) ping at a user-defined power and frequency.The Android smartwatch acts as a BLE receiver for in-home sensors, collects step-count data,and bu↵ers data until it can be transmitted to the smartphone. The Android smartphonecollects GPS data, provides a user interface (UI), and uploads data to the backend serverat fixed 12-hour intervals. The Android smartphone communicates with the backend serverthrough an optional security proxy called Mulesoft. The backend server performs dataprocessing to estimate the room location, calculate desired metrics, flag outliers, and detectundesirable trends. The Salesforce dashboard provides an interface for administrators toreceive alerts and view metrics from the backend server. The decision to push most of thecomputation to the server was made to preserve battery life. This allows for prototypingbefore isolating those functions which must be performed in real time. For instance, roomestimation is performed o✏ine with the method that provides the highest accuracy for ouruse case, but can be performed locally using the methods discussed in Section below.

BLE Sensors

Two types of Bluetooth low energy (BLE) sensors are used: one to provide consistent roomlocationing and another to infer activities of daily living (ADLs). The sensors used for roomlocationing are called Estimote beacons. They provide a simple BLE ping with customizablepower settings, so the receiving device may output a received signal strength indiction (RSSI)which is roughly proportional to the distance from the beacon. The second type of sensorused is the TI Sensortag. Equipped with 10 di↵erent sensors including temperature, motion,and humidity, the Sensortags allow for collecting many types of data which is useful forinferring the performance of di↵erent activities (e.g., medicating). These two sensor types


Figure 2.2: The five system components and their uses.

are actually redundant. The Estimotes are equipped with temperature and motion sensors,and the TI Sensortags provide RSSI values. In practice, using both sensors allows for rapidprototyping which is both flexible and modular where users have the option to only includethose sensors most relevant for their use case.

The sensors used to perform room detection are called Estimote beacons. The Estimotebeacons provide a simple BLE ping to infer distance from the Android application. The rateat which the beacons ping is set to 200ms by default and the smartwatch scans for thesepings every 10 second for a 1 second period by default. The Estimote beacons with stickybacks can be placed on ceilings or walls without reducing the aesthetics of the room.

The sensors used for inferring user activity are called TI Sensortags. They provide sensingfor 1) temperature, 2) acceleration, 3) orientation, 4) humidity, 5) magnetic flux, 6) ambientlight, 7) pressure, and 8) audio. Because the Sensortag is equipped with so many types ofsensors, it allows for prototyping of many di↵erent features with minimal changes to sourcecode. Data is sampled from the Sensortags as follows. Every minute the watch scans foravailable tags. It then connects to the three tags with the highest RSSI value. This givessome measure of which tags are closest and likely to provide data related to the user activity.Following connection, data is streamed at the desired interval from the desired sensors onthe tag. For example, tags identified for monitoring shower user provide readings from thehumidity and temperature sensors at a slower rate than tags identified for identifying ifa specific object has moved via the accelerometer. Because data from the Sensortags canquickly grow to large scales, this data is stored in as bson (binary json) and only transmittedto the phone once per hour to conserve battery by default.

Thus, Max allows collecting data and testing algorithms for di↵erent functions, but oncebattery life becomes a constraint new methods may need to be devised. Because the Sen-


sortag software is open source, the user can implement simple detection algorithms such asthe spike events discussed below on the tags themselves and only transmit resulting detectionto the watch. This would allow for Bluetooth beaconing mode instead of pairing mode tobe used, reducing battery consumption from both the sensor and the watch.

Smartwatch

The smartwatch used to determine the wearer’s location in the home and to collect stepcountdata is the Sony Smartwatch 3. We choose this watch because it provides a longer batterylife than the Moto360 or Samsung Gear Live (no longer produced) based on empirical tests.It further provides IP68 dust and water resistance, dust tight and submersible in water upto 1.5 meters. It should be noted that this water resistance requires a charging cap to beclosed which cannot be guaranteed in the use case discussed here. As discussed below, thebattery on the smartwatch is the greatest bottleneck in the project for which reason extremecare has been taken to develop e�cient asynchronous data collection from the smartwatch.Once data is collected, it is stored in a sqlite database on the watch and uploaded every hourto the phone by default.

Smartphone

The smartphone used to collect GPS data, provide a user-interface, and upload data to thebackend server is the Motorola Moto G. This phone was chosen because it was the mostcost-e↵ective phone supporting the latest version of Android. At the time of publication itcost $150 retail. The system developed is not phone specific, however, and has been testedon a number of Android phones. The smartphone uploads data to the backend server bydefault every 12 hours. GPS data is recorded whenever a change greater than a certainthreshold occurs through standard Android protocols.

Server

The server used to perform indoor positioning, outlier detection, and trend detection is ahigh-performance desktop computer located at the UCSF Memory and Aging Center. Thehardware specifications include 32 GB of RAM and 1 Intel Quadcore x86 CPU operating at2.7 Ghz. The operating system used is Linux Ubuntu 14.04. The server hardware specifi-cations can be chosen to match the algorithms and scale required. The server operates asfollows. An http receiver asynchronously uploads data to a SQL table responsible for stor-ing all metrics. When new data is received, functions are applied to calculate the requiredmetrics (e.g., percentage time spent in each room). After metrics are calculated, analysis isperformed to detect anomalies and trends. If any detection occurs, events are sent throughthe Salesforce API.


Dashboard

The dashboard for the administrator to observe events detected is implemented using Sales-force. For this use case, events are detected on a daily basis based on the data from theprevious day, so the care team navigator can review adverse situations and handle them asneeded. The goal here is for the clinical team to collect the data necessary to determinewhich information is most useful for evaluation before translating the system into a devicemost suitable for patient and clinician needs.

2.5 Biggest Development Challenges

Full system development from initial prototyping to deployment with patients required 18months from 3 active developers. The greatest challenges involved battery life, always-onfunctionality, security, robustness, and methods to encourage adherence specific to the usecase.

Battery Life

Battery life concerns limited the amount of data that could be collected and increased thelatency between data collection and analysis. We strove to maintain 1-day battery life toavoid any deviation from normal routines which many users would find overly burdensome.In order to accomplish this, we found that sampling the sensors on the watch caused thegreatest battery drain. As shown in Figure 2.3, this prevented the CPU from sleepingproperly and led to rapid battery loss. We explored storing data in a hardware FIFO [30],but found the best solution to be focusing in on what data we really needed. We read the stepcount directly from a chip present in many smartwatches where the step count algorithmsis implemented directly in hardware. We read gyroscope data once per minute, infrequentlyenough to prevent excessive battery drain. We then calculate the variance of this data overa rolling 30-minute window and see if it has passed an empirically defined threshold todetermine if the watch is currently on the body or not. We further conserve battery life byreducing the sampling rate from the Estimote beacons when the user is determined to beoutside the house by GPS or because an Estimote ping has not been received in a presetamount of time.

Figure 2.3: CPU usage before optimizing for battery life. Waking up the CPU frequently tosample from the sensors caused rapid battery loss.


Always-on Functionality

In order to maximize data collection time, the Android application starts itself when thephone turns on and cannot be closed by the user. In this way, data collection is ensuredunless either the phone or watch is turned o↵. This always-on functionality is accomplishedthrough the use of the notification services provided by Android (foreground service) that areindependent of the application user interface. Thus, unless the user forbids the notifications,there is no way to stop the service, but the user is always informed by a notification whenmonitoring is underway. As a further failsafe, a background service independent of theothers regularly checks if the other services are running, and if not, restarts them. Thiskeeps services running even if the application crashes.

Security

Before deployment with patients, the platform was required to pass security review at UCSF.The end result includes four measures to increase security.

Encryption

Data is encrypted in the phone and watch internal sqlite database. When it is transmitted,it is done through a secure https tunnel and decrypted on the server.

Mulesoft

Mulesoft is used as a security proxy following standard protocol at UCSF, preventing theneed for the Android application to store a private server key. Mulesoft further providesa simple mechanism for archiving all encrypted data before it reaches the backend server.Passing the Mulesoft security review at UCSF required installing a CA certificate on thebackend server; using HTTPS for pushing data from Android to Mulesoft and from Mulesoftto the backend server; creating Maven build profiles for each of the development, staging,and production environments; exporting all server URL’s, domain names, ports, and relatednetworking information to external property files; and modifying error logging to only recordthe most severe errors.

HockeyApp

HockeyApp is used for private distribution of the Android application. It allows sharingthe Android application with new devices via email and pushing software updates withoutrequiring posting the application publicly on Google Play. HockeyApp can be used forfree with up to 2 apps and unlimited storage, crash reports, users, and user feedback. Wealso explored creating a Google Play private channel, but found the $50/user/year fee tobe excessive. The most challenging part of incorporating HockeyApp was implementingautomatic updates. By default HockeyApp only checks for updates when the app is opened.


For the Dementia Care Ecosystem use case, however, it was expected that the app wouldoperate continuously in the background without the user ever necessarily opening the appafter the initial home installation. Unfortunately, HockeyApp provides no API for updatingthe application on a regular basis, so automatic updating was implemented by decompilingtheir library and reimplementing automatic checks for an update in a background serviceresponsible which is active whenever the phone is on through the always-on functionalitydescribed previously.

Progaurd

Progaurd provides Android enabled code obfuscation. We make it available in this system,but disable it by default since we found it caused too many dependencies to be brokenwithin our code. It also only makes it slightly more di�cult for a would-be attacker to finda password or ssh credentials hidden within the code since the password itself will still bepresent in the code even if the variable name is obfuscated. We thus prefer the Mulesoftsecurity proxy discussed above to any code obfuscation techniques where security is required.

Robustness

To ensure robustness, the system was beta tested by 13 di↵erent healthy subjects over 39 totalmonths months before deploying with real individuals a↵ected by Alzheimer’s disease. Thistesting exposed unexpected challenges including di�culty with certain ceiling types whenattaching the Estimote sensors, di�culty connecting to the backend server through certainrouters, di�culty connecting to certain beacons, and many other smaller issues. This led tonumerous bug fixes and an extensive trouble shooting guide for new installations. After suchextensive testing, we are happy to share Max with such high confidence in its robustness.

Adherence

To maximize adherence, we detect when the watch has not been worn, when the watch-phoneconnection has been lost, and when either the watch or phone is out of batteries. We alertthe caregiver through email or text message based on personal preference. The preferenceis determined during the initial home setup. The on-body detection is performed with thegyroscope as described above. The break in watch-phone connection and out-of-batterysignal are both provided by the Android API.

2.6 Analysis Methods

Two sets of analyses are provided to determine possible causes for alarm based on metricscollected.


1. Outlier Detection: Two methods for outlier detection are provided, DBSCAN anda nearest neighbors approach from [29]. These methods are used to determine if anymetric significantly di↵ers from a baseline which is set to 30 days by default. Both areused in a fully unsupervised setting.

2. Trend Detection: The RANSAC algorithm is used to determine if any metric hasdeclined by greater than a certain threshold over the a previous window. In the de-mentia use case, we examine if any metric has declined over 33% over the previous 30days.

DBSCAN

DBSCAN performs clustering based on proximity. The DBSCAN algorithm or Density-Based Spatial Clustering of Applications with Noise algorithm clusters points using twohyperparameters, the minimum number of points required to form a cluster and the maxi-mum distance two points can be apart to be considered within the same cluster. The clustersare formed by choosing an arbitrary starting point, connecting all points in the cluster, thenchoosing an arbitrary new starting point. The process is repeated until all points are eitherassigned to a cluster or identified as not belonging to any cluster in which case they arelabeled outliers.

This method of outlier detection is provided because it provides good results in many usecases and allows for seamless substitution with other methods from the Sci-Kit Learn APIwhere many other outlier detection methods are available. We prefer the method discussednext which was implemented for this project due to its higher level of support in the outlierdetection literature.

Figure 2.4: Outliers are those points labeled by DBSCAN as not belonging to any cluster[53].


k-NN Outlier Detection

As recommended by [29], we prefer k-NN for global outlier detection tasks. This method hasbeen shown to provide high performance on a diverse applications ranging from breast cancerdiagnosis to handwritten digit classification. The caveat is that these techniques work wellin low-dimensional spaces where distance metrics to determine the nearest neighbors can becalculated in a manner which is meaningful and e�cient. In all applications discussed in [29]where k-NN produced the best or very good results, k-NN was applied in a low dimensionalfeature space (d 40).

Outlier detection with k-NN is performed as follows. For each point, an outlier score iscalculated by choosing the k closest points and determining the average distance to thesepoints. We choose k = 4 and the Euclidean distance metric by default. This outlier score isused to define an ordering over the points. With this ordering defined, the most anomalouspoint is that with the greatest outlier score. It is the point furthest from its closest neighbors.After defining this ordering, some threshold must be set to determine what values constitutean outlier. In the semi-supervised setting, this threshold may be determined empiricallyfrom data or from thresholds which may be relevant for the use case. Before data collectionbegins, however, there may be no way of determining an appropriate threshold. For this case,we provide thresholds based on percentiles. By default, a severity level 1 event is triggeredwhen it is in the top 3% of the ordering, a level 2 event in the top 1%, a level 3 event inthe top 0.3%, a level 4 event in the top 0.1%, and a level 5 event in the top 0.03%. Forthe metrics defined in the dementia use case which are measured on a daily basis, a level 1event will occur in normal data approximately every 30 days and a level 5 event will occurin normal data approximately every 3000 days – much like using the term 100-year storm todescribe a weather event so severe it should only happen once every 100 years.

The challenge with these approaches is that they view the data as stationary. As shown inFigure 2.5, as the data distribution changes over time as would be expected in the dementiause case, these methods will fail to identify the shift and instead identify those points whichare furthest from the baseline as anomalous. If in actuality the distribution is shifting, thesepoints may signal an important trend rather than an anomaly, but will still be highlightedas anomalous until enough of them occur. In situations where the stationarity assumptionis violated, we recommend detecting anomalies based on a rolling window where the severitylevels are determined not for the whole data set, but for windows of various sizes which canbe thresholded in the same manner as discussed above.

RANSAC

The default method for trend detection is RANSAC [20] linear regression. The RANSACalgorithm or RANdom SAmple Concensus Algorithm performs model fitting in the presenceof outliers. In the original method, hyperparameters defining the tolerance and thresholdare provided [20]. In the method defined by sci-kit learn [53] and used here, the maximumnumber of trials is provided. The algorithm iterates by selecting a random sample of the


Figure 2.5: Global outlier detection with k-NN fails for a nonstationary distribution. Localoutlier detection methods provide empirically worse performance as described in describedin [29]. We instead prefer the k �NN with a rolling window when anomaly detection overa nonstationary window is required.

data containing the minimum number of points required to estimate the model parameters.In the case of linear regression with 2-dimensional data (e.g., daily step count), this requires2 points. From this sample, a model is fit. By [20], if the predefined threshold of points fitwithin the tolerance, the algorithm terminates. If not, the program iterates. In practice,this method can result in infinite iteration if no acceptable model with the predefined hy-perparameters exist. [53] instead simply repeats for a predefined number of iterations andthen choses the parameters which result in the greatest number of inliers. This surprisinglysimple and highly computational e�cient model grew out of the computer vision communityand has found wide success on a number of applications for which reason it is the defaulthere. Note that alternative methods such as the Theil-Sen regressor for median fitting areprovided by Sci-Kit Learn and can be easily substituted given the matching API.

2.7 Indoor Positioning

As discussed in Section 2.4, in order to perform indoor positioning, Estimote beacons areused and the received signal strength indicator (RSSI) is detected by the smartwatch. TheRSSI is filtered then the approximate location is determined through supervised learning.This method allows for approximate positioning to be performed by matching the wearer ofthe watch to the closest beacon. It was chosen over methods based on triangulation thatrequire multiple beacons per room to ensure at least 3 beacons are visible at all times. Incontrast, this method allows for cost-e↵ective approximate indoor positioning by binningusers into one location from a set of possible locations. In the Care Ecosystem, it is used


Figure 2.6: RANSAC fits a regression line by consensus, providing robustness to noise [53].

for cost-e↵ective room-level indoor positioning where one beacon is placed in each room. Bydefault, the RSSI values are adaptively filtered with the kernel least mean square (KLMS)algorithm then labeled with the random forest supervised learning algorithm; however, manymore techniques for filtering and supervised learning are provided. Both default methodsare empirically chosen based on results demonstrated in the following section. It should benoted although we do not provide o↵-the-shelf support for triangulation, the same Estimotebeacons can be used for this function if needed.

Radio Wave Propagation and WPL

The primary challenge in accurate indoor positioning is the highly variable nature of the radiowave received signal strength indicator (RSSI). Because standard construction materialsprovide moderate impedance to radio waves, signals may be both transmitted and reflectedthrough the surrounding environment. Thus, certain regions of the room may demonstratehigh RSSI despite being further away from the source. Moreover, changes in the roomenvironment such as people walking to di↵erent positions significantly alter these multipathe↵ects. For this reason, fingerprinting techniques that attempt to laboriously map the RSSIof the room struggle to maintain robustness [52]. To compensate, these methods often usetechniques like k-nearest neighbors (kNN) to increase accuracy, but show only marginal gainswithin an individual block of signal strength emitters (e.g., when only one emitter is usedper room). In order to account for the exponential decay of radiowaves, many nonlinearfilters have been applied including particle filters, extended Kalman filters, path loss models[52, 82, 76]. We leverage the path loss method here because it provides a natural fit as afeature in supervised classification.The path loss model used is based on the ITU IndoorPropagation Model [82] in which the signal strength can be expressed as the path loss over


a distance d (m) at frequency f(mHz)

PL(d, f) = 20log(f) + 10↵log(d) + c(k, f)� 28 (2.1)

where ↵ is the path loss exponent, k is the number of floors between the transmitterand receiver, c is an empirical floor penetration loss factor, and f is the radio frequency.With f considered constant in this case for Bluetooth at 2.4 GHz, the signal strength canbe expressed as

PL(d) = PL0 + 10↵log(d) (2.2)

In the weighted path loss model (WPL), the indoor propagation model is used to estimateposition based on the RSS [82]. Weights are assigned by solving equation (2) for d anddefining the weighted factor for the ith RSSI as

wi =1/diPi 1/di

(2.3)

The unknown position of the person is then estimated as

(q, r, s) =X

i

wi(xi, yi, zi) (2.4)

where (xi, yi, zi) is the position of the ith beacon.WPL has traditionally been used to replacetechniques like kNN as a supplement to fingerprinting. We use it here to define the kernelfor KLMS.

KLMS

Adaptive filtering techniques provide a framework for estimating a non-stationary signal.They converge to the optimal linear filter in the mean square error. The Kernel Least MeanSquare (KLMS) algorithm is a technique for adaptively filtering nonlinear data.

As an adaptive filtering technique, KLMS requires an iterative convex optimization algo-rithm to converge to the minimum mean square error. KLMS traditionally is the applicationof just one technique, the popular stochastic gradient method, but many convex algorithmscan be applied to a↵ect the convergence of the filter. We prefer stochastic gradient descentwith Nesterov momentum here due to the increased convergence rate. The general equationfor gradient descent with Nesterov momentum is:

xn+1 = xn � µrf(xn + �(xn � xn�1)) + �(xn � xn�1) (2.5)

where 0 < � < 1 defines the momentum hyperparameter. Note that if � = 0, nomomentum is present and the iterates are the same as the gradient method. Comparedto traditional gradient descent, this method is more robust to ill-conditioning and providesfaster convergence bounds which a↵ects how well the adaptive filter approaches optimality.


In stochastic methods, rather than accessing the gradient directly, we compute a functionwith the same expected value [61]. That is, we can approximate the gradient by a functiong(x) such that E[g(x)] = f(xn). In KLMS, this amounts to minimizing the mean squareerror over a fixed number of filter taps rather than the true mean square error. As in otherstochastic methods, because we replace the actual function by one that only shares theexpected value, we now converge only in the expected value. That is, some randomness willbe introduced into our convergence and we will converge to a ball with some radius ratherthan a fixed point.

The end result is a minor update to the traditional KLMS method to perform stochasticgradient descent with Nesterov momentum where

wn+1 = wn + µenxHn (2.6)

is updated to

wn�1 = wn + µ(dn � (wn + �(wn � wn� 1))Txn)xHn

+ �(wn � wn�1)(2.7)

As with traditional KLMS, this update rule is applied in the kernel space not the timedomain. This KLMS with momentum is applied to the RSSI values from each beaconindependently as a univariate analysis. The result is that correlations in the time domainare handled through filtering, so that the next supervised learning phase can handle eachpoint as if it is independent.

Figure 2.7: KLMS outperforms linear LMS in fitting nonlinear functions when the nonlinearfunction class is known a priori [45].


Random Forest

Many supervised learning techniques are available through Sci-Kit Learn. In our room-estimation pipeline we try several methods and choose the one which produces the best resultover 3 fold cross-validation. The best resulting classifier is often the random forest classifier.We thus describe the random forest classifier here to provide full insight into one potentialpipeline with the caveat that the Sci-Kit Learn API is nearly identical for all supervisedlearning methods, so any number of alternate methods are available for substitution.

The random forest classifier and related methods such as extra trees, gradient boostingand AdaBoost increase accuracy by ensembling many weak classifiers, a technique knownas bagging or boosting depending on how the ensemble is formed and accumulated. Theweak classifier for the random forest is the decision tree, and the random forest is formed byaveraging over the predictions of many decision trees. The di↵erence between the two canbe seen in Figure 2.8. The resulting classifier is improved if each of the weak classifiers issimilarly good, but uses di↵erent features to form the decision boundary (i.e., averaging overmany instances of the same decision tree will produce a result no better than the originaldecision tree). For this reason, randomness is injected by selecting a random subset of theavailable features and forming the best decision tree from this subset.

The fact that the random forest performs well in this scenario highlights that even afterfiltering with a nonlinear kernel, the resulting data points remain di�cult to separate witha linear decision boundary. This suggests the RSSI data is highly nonlinear not only due tothe exponential decay of radio waves, but also due to the variable impedances present in theenvironment.

2.8 Results of Beta Test

We present results from 39 total months beta testing the features of Max required for theDementia Care Ecosystem use case.

User Interface

The results are available for viewing through Salesforce. Max provides the methods necessaryto transmit metrics and analysis to Salesforce through a predefined API. The implementationin Salesforce itself was performed by contractors at UCSF with the end result shown in Figure2.9.

The primary function of the Android user interface is to allow new users to perform initialhome setup and allow administrators to view appropriate debugging information. Examplesof each are shown in Figure 2.10. This Android user interface can be extended and customizedthrough standard Android development techniques. As a sidenote, when developing userinterfaces in collaboration with individuals without an engineering background, we foundInVision to be an extremely useful service. InVision allows users to develop the look andfeel of an application in PowerPoint, a more broadly available skill.


Figure 2.8: The random forest increases classifier accuracy by averaging over random decisiontrees to reduce variance [53].

Room Estimation

Based on 13 home setups with number of rooms equal to 3.1 ± 1.7, the room detectionaccuracy is 96.1% ± 2.6%. In these procedures, only one was performed in a house withrooms on multiple floors. More rooms and more floors would naturally decrease the detectionaccuracy as the number of neighboring rooms increases. Room sizes were allowed to varyas they naturally do in the home setting with small rooms on the order of 1 meter diameter(e.g., bathrooms) and large rooms on the order of 6 meter diameter (e.g., living rooms).Estimote beacon settings were all set to the same parameters with broadcasting power setto -20 dBm, su�ciently large to cover any room size, and advertising interval set to 200ms, su�ciently small to provide many opportunities for detection even when just passingthrough a room.

Results from one representative plot are shown in Figure 2.11. The top line of circlesshows the true room at each point in time. The line below it of triangles shows the predictionmade at each time point. The squares scattered below show the RSSI value from each beaconat each point with higher values denoting beacons expected to be closer. As shown, themethod is able to successfully resolve uncertainty when the RSSI value is fluctuating betweentwo rooms, a situation which is highly detrimental to the Care Ecosystem use case in whicha key metric is the number of transitions between rooms. The cost is decreased accuracy


Figure 2.9: Salesforce user interface for care team navigators to view metrics and analysisfor monitored individuals with dementia

when a true room transition is made. In most situations this occurs far less frequently.

Analysis

After data is collected from the watch and ambient sensors, metrics can be formed andanalysis methods applied. Two example results of the analysis are displayed in Figures 2.12and 2.13. In Figure 2.12, outlier detection is applied after room estimation has been appliedto infer the percentage of the day the user spends in the Bathroom and the Bedroom. In thissituation, the home environment was an apartment limited to these two rooms. From theoutlier detection, a distinct pattern emerges. On most days, the user spends very little timein the bathroom. On some days, the user spends more time in the bathroom. This clusterwas mostly composed of days in which the user showered. Finally, an outlier is detectedwhen an atypical point was detected from the normal. In the Dementia Care Ecosystem,this would flag the care team navigator to call the user and ask about specific symptomsdefined by a flowchart designed by the clinicians involved. In Figure 2.13, trend detection isapplied to the step count data collected on the watch in the first 30 days of use. The use ofthe fitness tracker appears to show the desired increase in number of steps taken over thistime. Note that the two points in which many steps are taken are ignored in the resultingmodel. Similarly, this robust RANSAC regression allows model fitting in the presence ofmany days when the user forgets or chooses not to wear the device.


Figure 2.10: The Android user interface for administrators and a selected screen from homesetup. The interface for non-administrator users provides a subset of the functions available.

2.9 Conclusion

In this work, we describe Max, an open source prototyping platform constructed from o↵-the-shelf components for designing cyber-physical systems with personalized to individualusers. The current starting price is $400 assuming the smartphone and smartwatch usedhere, three sensors, and an available server for computation. At this price, many interestingnew applications are feasible. We hope Max reduces the engineering burden of creating suchsystems to spur innovation in creating new and interesting applications. We describe onesuch application here in the Dementia Care Ecosystem. The Dementia Care Ecosystem aimsto reduce the cost of dementia care through cost-e↵ective continuous monitoring to detectchanges in behavior early enough to respond before a painful and expensive emergency roomvisit may be needed. One example is monitoring changes in bathroom use for signs of possible


Figure 2.11: Representative plot of room inference from raw RSSI

Figure 2.12: Representative plot of inferred room location with outlier detection applied

urinary tract infections, an unfortunately common problem in dementia care.We present the system architecture for collecting data, maintaining data securely, and

performing several common data analysis techniques including filtering, classification, anomalydetection, and trend detection. Where possible, we give concrete examples from the De-mentia Care Ecosystem use case and highlight where other methods are available throughopen-source libraries such as Sci-Kit Learn. We further derive and demonstrate the e�cacyof a new technique for cost-e↵ective approximate indoor positioning, a common need formany personalized applications which is not met by current o↵erings.

There are several features that we plan for future inclusion in Max but are not yet avail-


Figure 2.13: Representative plot of step count data with trend detection applied

able. One significant missing feature is the presence of any actuators. Given the currentdisjointed market for sensors and actuators, it is di�cult to establish a common API forinclusion of the many types which would be interesting to use with Max. By building thisproject as an open-source collaboration, we hope to gain support from the community indeveloping support for prevalent IoT platforms. Some of the notable platforms for which wewill encourage inclusion in the near future include Automatic for automotive and SamsungSmartThings for home. Another missing feature is the lack of analytical methods which lever-age large quantities of data for increase performance. To this end, we anticipate the futureinclusion of Lasagne and Theano, open source libraries for deep learning and computationalgraph analysis. For example, as the Dementia Care Ecosystem scales, this inclusion willenable the ability to label sequences using recurrent neural networks based on the annotateddata already collected through the current implementation.

The challenge from a hardware perspective now stems not from the wearable device, butfrom the surrounding sensors. We thus conclude that if those who wish to drive innovationin the wearable computing market such as Google and Apple similar to that in currentsmartphone ecosystems, they should produce a developer’s kit with the sensors and actuatorsnecessary to enable a large array of potential applications. The seamless inclusion of thesesensors would dramatically reduce the burden of producing the next generation of wearables,where interaction is enabled not only with the user’s body but also with the surroundingenvironment.

29

Chapter 3

Diagnosis through Speech


The aim of this work is to provide computer tools to help diagnose subjects with variousdementias by applying machine learning algorithms to recorded conversations between pa-tients and close caregivers. The dataset includes 126 conversations collected between 2002and 2014 and including patients with Alzheimer’s Disease (AD), behavioral variant Fron-totemporal Dementia (bvFTD), Primary Progressive Aphasia (PPA), and healthy controls(HC). By combining both acoustic and text features, we reach a level of 92% accuracy indistinguishing dementia from healthy controls and 75% in distinguishing between subtypes(AD vs. bvFTD vs. PPA vs. HC). Most notably, by collecting more than 1200 features andselecting the most relevant ones, we highlight highly relevant features that cannot practicallybe collected by a human during clinical observation, suggesting new avenues for computer-aided diagnosis and prognosis of dementia.

3.2 Introduction

In this chapter, we develop, prototype, and test a set of signal processing and machinelearning tools, to support computational diagnosis of dementia. We focus on conversationalspeech data due to its high availability through cellphones and connected devices (i.e., nocustom sensors are needed) and its high expressive power (i.e., much can be inferred aboutan individual’s state from the content and quality of his/her speech). The primary aim ofthis article is to show that this speech data can provide valuable insight into the presenceor absence of dementia and into the specific kinds of dementia if present. Towards this aim,we set two specific goals:

1. To create an algorithm with leading results for determining whether an individual hasdementia or not based on recorded speech.

2. To determine the key features needed for this classification.

CHAPTER 3. DIAGNOSIS THROUGH SPEECH 30

The methods used in this article follow a three-stage process. First, features are extractedusing readily available open-source tools including the openSMILE package for acoustic fea-ture extraction and the Google speech recognition API for text-based feature extraction [18,81]. Second, feature selection techniques are applied to remove noisy features. This step wasoriginally applied in post-processing to select those features which were most indicative tothe clinical team. We later found this feature selection process significantly improved thefinal classification results and so included it as the second stage of the process. The finalstage performs classification by which we undertake both the bimodal task of determiningwhether an individual is healthy or has dementia and the multimodal task of determiningwhat diagnosis if any an individual should receive.

The system and study presented here was produced with two potential future applica-tions in mind. First, we aim to pave the way toward an early detection mechanism fordementias such as the ones described here. Although the results we present are on a datasetthat demonstrates considerable selection bias (i.e., the proportions of dementia subtypesdi↵er from the true population prevalence), results approaching human performance sug-gest that the proposed techniques could one day be applied to early detection through aneasily accessible medium such as a smartphone application. Second, we aim to support gen-eral practitioners that may not have specialists nearby to which he/she could refer di�cultcases. For instance, we believe that this proof-of-concept demonstrates the potential to fa-cilitate distinction between diseases that typically required special training to distinguish(e.g., bvFTD vs. PPA).

Outline

The rest of the article is organized as follows. Section 2 gives background on speech processingand dementia and details related work in automatic dementia detection. Section 3 discussesthe feature extraction process, describing the dataset and the collection of acoustic and text-based features. Section 3 discusses the classification process including methods for featureselection and classification. Section 4 describes the results obtained. Section 5 discusses theresults and limitations of the work. Section 6 provides some conclusions on the work andpossible future directions.

3.3 Background and Related Work

Types of Dementia Relevant to this Work

Dementia is defined clinically as a progressive cognitive disorder that leads to an inabilityfor an individual to independently perform their activities of daily living. While manyview dementia as synonymous with Alzheimers disease, there are in fact several forms ofdementia. Alzheimers disease is most common over the age of 65, but dementia can strikeyounger people, often resulting in misdiagnosis. In people under the age of 65, dementias


can be mistaken for personality changes or a psychiatric illness such as depression [78]. Evenif a dementia is suspected, the wider variety of possible dementia types in younger age makea precise diagnosis di�cult.

Frontotemporal dementia (FTD) at least as common as Alzheimers disease in peopleunder the age of 65. There are three main forms of frontotemporal dementia: behavioralvariant frontotemporal dementia (bvFTD), semantic variant primary progressive aphasia(svPPA), and the nonfluent variant of primary progressive aphasia (nfvPPA). All FTDsinterfere with social interaction: bvFTD causes a loss of social and emotional regulation andappropriate interaction, whereas the two forms of primary progressive aphasia interfere withlanguage comprehension and production [58, 57].

Accurate diagnosis of dementia, including the subtype, can have important implicationsfor treatment and prognosis of the disease. This will only become more important with theadvent of new therapeutic agents, as there is a growing recognition that treatments are mostlikely to be e↵ective early in the disease course, therefore requiring early diagnoses.

Background on Speech Processing

Methods in computational processing of speech have advanced considerably in recent years.Private companies have developed state-of-the-art automatic speech recognition (ASR) schemesby leveraging massive quantities of labeled training data. With these large datasets, deeplearning techniques first replaced more classical Gaussian mixture models (GMMs) for rec-ognizing individual phonemes then hidden Markov models (HMMs) for modeling temporalprobability distributions.

In limited data regimes where deep learning methods are prone to overfitting, however,GMM-HMM techniques continue to provide cutting-edge results. These techniques typicallyrepresent acoustic data by Mel Frequency Cepstral Coe�cients (MFCCs) or linear spectralpairs (LSPs) and their first or second temporal derivatives. The purpose of this preprocessingis to define summary statistics of the raw acoustic waveform that are smaller in size byselecting the information relevant for making accurate discrimination between the soundsto which humans are sensitive. This preprocessing reduces the feature space from havingdimension in the hundreds of thousands (e.g., a 10 second window sampled at 44kHz provides440,000 data points per sequence) to a low dimensional manifold in which we expect therelevant information to occur. In this way, e�cient calculations over acoustic data can beperformed while minimizing the loss in expressive power.

Related Work

Recent studies in automatic dementia detection have focused on extracting content basedfeatures and training simple classification algorithms.

In [70], they use the ACADIE corpus of transcripted conversations of AD patients com-piled within a study of donepezil. Using the frequencies of common words in the text,they achieve 95% accuracy in detecting AD. [32, 33] use Carolina Conversation Collection


composed of both raw conversations and transcripts . [33] mixes lexical richness measure-ments with hand designed features including filler words, repetitions, incomplete words andgo-ahead utterances.

Some studies have intended mixing textual features to some acoustic measures to predictspecific dementia types detection. [64] adds few acoustic measurements to text based featuresbut focus on detecting trouble-indicating speech for subjects already with AD. [23] focus onPPA detection using audio of patients asked to tell the Cinderella story and its transcribedversion made by research assistants. By using frequency text based features as well aspauses and fundamental frequency variations, they achieve 87% accuracy in detecting PPAon a dataset of 40 people. [51] uses a combination of part of speech related features andpauses to discriminate di↵erent variants of frontotemporal lobar degeneration on 38 patients.

In our study, we focus on a fully automatic dementia detection procedure. We use aconsequently larger dataset with 124 individual after preprocessing and including varioussubtypes of dementia. Thus, the high accuracy presented here is less likely due to samplebias or overfitting than previous results presented in the literature. Our main objective beingearly detection, we used patient data for early dementia variants. By using only raw con-versation, we designed algorithms easily applied in real world context (phone conversations,recorded appointments). We apply a more statistical approach using a large combinationof acoustic based features (frequencies, pitch, loudness, pauses) and textual based features(word frequencies, richness, word similarities, reaction times). By looking at di↵erent mea-surements (accuracy, recall, precision, importance of features), we provide an analysis of howwe could generalize with bigger datasets and what kind of measurements could be of interestfor practicioners.

3.4 Feature Extraction Process

Description of the Dataset

The dataset was obtained by gathering recordings of couples (participants with dementiaand a familial caregiver). Patients were diagnosed with bvFTD, svPPA, nfvPPA and eoADby a team of neurologists, speech pathologists and neuropsychologists. BvFTD diagnoseswere determined using the Neary clinical criteria [48], and svPPA and nfvPPA by consensuscriteria [31]. AD was diagnosed using National Institute on Aging-Alzheimer’s Associationdiagnostic guidelines [41]. The patients were in early stages of dementia as judged by amean Mini-Mental State Exam score of 23.4 (SD 5.8) [22]. All assessments were conductedbetween 2002 and 2014. All study participants provided written consent regarding studyparticipation. The study was approved by appropriate institutional review boards. Patientswith bvFTD, svPPA, nfvPPA and eoAD (early onset Alzheimer Disease) were evaluatedalong with a healthy companion, usually a close friend or family member.

The dataset used for this research consisted of 98 audio conversations between an individ-ual with dementia and his/her close caregiver, obtained during an assessment of emotional


functioning conducted at the Berkeley Psychophysiology Laboratory at the University ofCalifornia, Berkeley. Laboratory procedures for obtaining samples of conversations were de-rived from those originally developed by [44]. Couples were instructed to discuss a mutuallyselected topic of continuing disagreement in their relationship. Each conversation lasted be-tween 10 to 15 minutes. Recordings of the conversations were obtained using unidirectionalShure lavalier microphones attached to each participant.

The audio from the conflict conversations was then transformed into .wav files. A spectralnoise gaiting algorithm was used to remove background noise in Audacity 2.0.3 [69]. Trainedresearch assistants blinded to speaker diagnosis labeled controls and participants speech inPraat [12], an acoustic analysis program. Environmental noises and non-speech sounds werelabeled for exclusion. Each labeled conversation was checked for quality before use. A Praatscript then extracted intervals of uninterrupted speech for each speaker in the conversation.These intervals were then subjected to further analysis.

In addition to the raw audio files, the times each speaker concluded or began were man-ually marked in a file that we will call textgrid.Marking times in this fashion allowed forsimple segmentation of the conversation. To ensure quality, each timing file was indepen-dently checked and verified. Demographics for 126 of the individuals contained within thesample population were also provided. Two of the individuals in the sample spoke so littlethat no analysis was possible. Thus, from the original 98 audio conversations, segmentedaudio from 124 individuals was extracted and analyzed alongside matching diagnoses anddemographic information. The others couldn’t be used because the diagnosis was not pro-vided.

The sample characteristics of the dataset are shown in Figure 3.1. The dataset showsminor bias toward healthy controls (52.4%). Those controls were obtained by taking thehealthy caregiver speech in the conversation. Within the dementia subsample, the set isdivided among three classes of disease: Alzheimer’s disease (AD, 12.7%), behavioral vari-ant Fronto-Temporal Dementia (bvFTD, 15.9%), and Primary Progressive Aphasia (PPA,18.3%). The gender distribution is 51% male, 49% female. The age distribution is approxi-mately Gaussian centered at the 60-65 age bracket (Figure 3.2).

While conclusions drawn here may not generalize to the whole population due to theinclusion of a higher proportion of less common subtypes of dementia (e.g. bvFTD, PPA),the diversity of the dataset enables better accuracy for less frequent subtypes as well asbetter detection in a relatively young population.

Acoustic Features

Vocal production can be influenced by social, emotional, autonomic and motoric processes,all of which may be altered by neurodegenerative illness. This has led to demonstrateddi↵erences in vocal production from healthy controls in a wide range of neurological illnessessuch as Alzheimer’s disease. That vocal production can be measured by several measures. Forexample, the Mel-frequency cepstral coe�cients (MFCCs) – popular for autonomic speechprocessing tasks – use a scale accounting for human hearing perception. It is obtained by


Figure 3.1: Distribution of the 126 individuals with respect to disease. The set comprises66 healthy controls (HC, 52.4%), and 60 individuals with Alzheimer’s disease and relatedDementias (ADRD, 47.6%). Of the a↵ected individuals, the primary diagnosis for 16 isAlzheimer’s disease (AD, 12.7%), for 20 is behavioral Fronto-Temporal Dementia (bvFTD,15.9%), for 1 is Dementia with Lewy Bodies (DLB, 0.8%), for 23 is Primary ProgressiveAphasia (PPA, 18.3%). Within the PPA segment, 14 show the semantic variant (svPPA,11.1%), 7 show the right semantic variant (rsvPPA, 5.6%), and 2 show the non-fluent variant(nfvPPA, 1.6%)

Figure 3.2: Distribution of the 126 individuals with respect to gender and age.

mapping the power spectrum of the sound onto mel scale bands via equidistant triangularoverlapping filters, then taking the logarithm of the powers within each mel frequency band.Line spectral pairs (LSPs) are a representation of linear prediction coe�cients (LPCs), whichthemselves represent transmissions of the spectral envelope of speech. Compared to LPCs,LSPs have relative high stability and low sensitivity to quantization noise. [66]

In recent studies of computer-aided dementia detection [32, 33, 23, 25, 51], the study ofspeech characteristics plays an important role but is generally focused on content. Specificacoustic features stated above have been used partially in some studies [23, 25, 51] and havebeen proven significant in similar tasks such as emotion recognition [5] or autism detection[40].

To gather a wider variety of measurements, we computed the large acoustic feature set


using the open-source tool, openSMILE [18]. We created, given a part of the audio conversa-tion, 26 indicators for every frame of the audio, including principal frequency, MFCC, LSP,loudness and voicing probabilities. By optionally applying the discrete di↵erentiation (finitedi↵erence quotient) followed by 19 aggregating functions such as mean, standard deviations,quantiles or regressions slope and o↵set, we obtained a set of 988 features.

To compute the aforementioned features, we first divided the pre-filtered audio files intosamples of length at least 8 seconds containing speech from only one person (between 4 and20 samples per person). This ensured that each sample is long enough to be relevant. Wecomputed the features on each sample and then took the mean on all the samples extractedfor the same person from the same conversation. We additionally stored 8 features onthe whole conversation, including the segments shorter than 8 seconds. We refer to theseas conversational features because they encapsulate more global information relating to thetone of the conversation including ratio of speech, mean length of utterances, and the numberof uninterrupted parts of speech per conversation.

Following the study from Pakhomov et al. [51] and other similar studies, [23, 25], wefinally added 5 features to describe the pauses in the speech (functions of length and number),using a custom pause extraction process that finds pauses by merging close sets of consecutivesilent frames. We extract pauses of length more than 1ms, which correspond to what thehuman ear can detect. The resulting dataset of acoustic features has 1001 elements.

Textual Features

Given the good results of classification on content-based features [32, 33, 23, 25, 51], wealso extracted textual features. We started with an automatic transcription with a su�cientword-by-word accuracy (see Appendix C). The text was then modelled as a bag of words,meaning the order of the words was ignored and the text was viewed simply as a set ofwith multiplicity. The general process is therefore to apply text-based metrics on each wordseparately and then use four aggregation functions (min, max, mean and standard deviation)to derive features in a similar manner as acoustic features generation. In order to maintainfeature relevancy, we provided a specific function for each measurement to test if every wordwas relevant. For example, in bigram frequencies of the letters, we did not consider words oflength less than 3. We then stored the ratio of words not relevant in a feature: the relevanceratio (figure 3.3).

A large number of the word measurements were selected from the Elexicon project [7]which provides behavioral and descriptive data on 40,481 words. The corresponding fea-tures have been either computed or collected among six universities on normal studentsand sta↵. From this, 39 measurements were used divided in six categories: word com-plexity (frequency in di↵erent corpuses), orthographic neighborhood (size and complex-ity of words close in spelling), phonographic neighborhood (close in sound), numbers ofphonemes/syllables/letters, bigram frequencies (mean frequencies of consecutive pairs of let-ters) and reaction times to speeded naming and lexical decisions (time to pronounce a wordand to identify that a combination of letters is a word). For these features, the relevance


Figure 3.3: Vocabulary Features Extraction Process. Each word considered relevant is usedto compute aggregation functions. The others are counted to compute the relevance ratio

function was set to accept all words present in the Elexicon project. Plural and conjugateforms were assigned the value of the root word.

Eight other measurements were used in addition: bigram and trigram frequencies ofwords not included in the Elexicon project (4 features), vowel and consonant distributions(2 features), and the age of acquisition (2 features) [43]. The age of acquisition has beenused in several similar studies [24, 25, 23] and originates from a database that providesthe mean and the standard deviation of the user-reported age at which users learned eachword. Data for this database was collecting using Amazon Mechanical Turk, the web-basedcrowdsourcing technology. The relevance function was therefore set to reject unavailablewords as well as those with high standard deviation (greater than 4).

In addition to vocabulary oriented text features, inspired by the work of [13, 32, 33], wealso added four features that measure the richness of the vocabulary: number of words (N),richness ratio(RR), Brunet Index(BI), and Honore Statistic(HS). They are defined as:

8>>>><

>>>>:

N

RR = VN

BI = N�0.165V

HS = 100log(N)1�V1/V

where N is the number of words, V the number of distinct words and V1 the number ofwords used exactly once. To the richness features, we also added the TF-IDF indices (TermFrequency – Inverse Document Frequency) [65] for words present in at least 25% of thedocuments and that are in the top 15 features selected with at least one of the featureselection techniques described below. TF-IDF is defined as:

tfIdfi,j = tfi,j ·|D|

|{dj : ti 2 dj}|


where ti is the ith term, dj is the jth document, tfi,j the frequency of ti in dj and |D| thetotal number of documents. This process resulted in 10 words. All combined, a total of 249textual features were defined.

3.5 Classification Process

Preprocessing and Feature Selection

Once the feature set was computed, we performed several preprocessing steps. We separatedthe data by gender due to variance in acoustic features for men and women. We discardedevery sample with missing acoustic features and took the mean over all samples to obtaina single value for each person. We also discarded every person associated with a transcriptwith less than 20 words in order to keep relevant vocabulary features. Due to limited data,we did not discard subjects for other reasons. If values were missing for age, we took themean value. If no information on pauses could be extracted, we set a 0 value indicating nopauses were present. We finally eliminated any feature constant for every subject since thesefeatures only extend computation time without prodiving any discriminative power. Afterthe preprocessing steps, 124 subjects remained in the dataset. To this dataset we appliedleave-one-out cross validation.

Because of the high number of features coupled to a small number of samples, featureselection was critical to high performance. Because we wanted to keep interpretability aswell as some stability in the set of features selected, we used precomputed scoring functionson features to select the k best ones. On every cross-validation set, the scoring function wascomputed for every feature and then the average was taken across all sets. This had theprimary advantage of being easily interpretable and more stable than simply computing thescore functions directly. It is also significantly faster than computing it for each set of thecross-validation individually. However, because it uses the target values of all samples, itrisks overfitting. To verify that only minor overfitting occurred, we tested the final algorithmon 10 new subjects and observed a drop in accuracy of less than 5%.

We used three di↵erent types of scoring functions for the feature selection. First weused 3 variance-based approaches: ANOVA, the Welch t-test and the Chi2. In brief review,ANOVA tests the probability that according to one feature, individuals from two labels arelikely to come from the same population. The Welch t-test takes a similar approach but doesnot assume equal variance in both labels. The Chi2 tests the likelihood of independencebetween every feature and the label. We next used the coe�cients from both Lasso andRidge regression as scoring functions. We tuned the sparsifying coe�cient of each to selectapproximately 100 variables. We finally used the feature importance in several tree-basedmethods: Decision Tree, Random Forest, and AdaBoost with Trees as base learners usingeither the entropy or the gini impurity as criterion for the cuts. The methods used wereextracted directly from Scikit Learn Python library [54] except for the Welch t-test wherethe SciPy library [39] was used.


Figure 3.4: Procedure to perform classification using regressors. 0, 1 and 2 represent thelabels to predict (HC, AD, FTD, PPA). One regressor is associated to each of them

Classifiers

For the classification itself, we used a large number of standard classifiers from the ScikitLearn Python library [54]. We first used simple linear models such as Logistic Regression,Linear Discriminant Analysis (LDA), and Support Vector Machine (SVM). We also usedthe Decision Tree classifier and given its good results, we tried di↵erent tree-based methodsexploring the bias-variance tradeo↵, such as Random Forest, boosting (AdaBoost and Gradi-ent Boosting), and Extra Trees (a more randomized version of Random Forest). In additionto the above algorithms, based on the good accuracy of neural networks in many speech-based classifications, we implement the Extreme Learning Machine (ELM) [35] which is aone hidden layer neural network with weights between the input and the single hidden layerset randomly. It has the advantage of being significantly faster than traditional multilayernetworks and less prone to overfitting given the reduced parameter set. We did not applyany deep networks given the relatively small dataset, and the unavailability in the literatureof relevant pre-trained deep networks such as those available for vision through ImageNet[16].

In order to leverage penalization properties (sparsity, handling of highly correlated fea-tures) on simple models, we used regression algorithms such as Lasso, Ridge regression, LeastAngle Regression, and Elastic Net. To do so, we trained a regression algorithm for each labelin the output and used the index of the regression with maximum output as a prediction(see figure 3.4). The probability of belonging to each class was computed using the followingformula:

Prob(x 2 Cl) =Rl(x)�min(�1,mini2L(Ri(x)))Pj2L Rj(x)�min(�1,mini2L(Ri(x)))

with Cl the class of label l, Rl(x) the score of the regressor associated to the label l for x,and L the set containing all the labels. With the definition above, the regressor output withthe lower value had 0 probability if it is under -1, but had a positive probability otherwise.


Figure 3.5: The best results in determining whether a speech segment belongs to an in-dividual with dementia or a healthy control. Two-step AdaBoost, or Selective Boosting,demonstrates 92% accuracy and greater than 90% precision and recall.

Finally, we used di↵erent Voting Classifiers. As typical in ensemble methods, the ideawas to combine the predicted probabilities from di↵erent algorithms to get a more stable onewhich incorporates the benefits in prediction from di↵erent algorithms. In order to avoidgiving too much importance to some algorithms, the minimal probability was set to 0.01.This threshold also enabled the method to discredit the labels with null probability for oneof the classifier while still di↵erentiating if several labels had null probabilities.

3.6 Results

Classification Accuracy

The best classification accuracy scores for the bimodal problem (Dementia vs. Control) areshown in Figure 3.5. The best result of 92% accuracy was achieved using AdaBoost withthe 50 best features selected by AdaBoost a priori, as discussed in the previous section. Theprecision and recall are each over 90%. All other classifiers show significantly weaker results.With this result, the first goal is accomplished of providing leading-edge prediction accuracy.The result also seems to show that a two-step AdaBoost significantly improves accuracy. Inthe below sections, we refer to this method as Selective Boosting.

The best accuracy scores on the multimodal problem are shown in Figure 3.7. The scoresare considerably lower. In the best case, the overall accuracy is 70%, obtained using GradientBoosting. As shown, this result strongly benefits from the ability to separate healthy controlswith greater than 90% recall. Among the three disease types, each presents similar recallsuggesting the three variants present similar classification di�culty.


Figure 3.6: The best results in determining whether a speech segment belongs to an indi-vidual with dementia or a healthy control. Most tree-based methods reach accuracies higherthan 80%.

Figure 3.7: The best results in determining the diagnosis if present. Gradient Boostingdemonstrates 70% accuracy, greatly benefitted from high recall among healthy controls.

Feature Selection

As shown in Figure 3.8, the impact of feature selection is significant. In this case, featureselection is performed a priori using the AdaBoost scoring function. Thus, the benefits offeature selection are less apparent for the other algorithms. In the case of AdaBoost, thebenefits are significant. With 200 features, the accuracy is 74%. The accuracy increases asnoisy features are removed and overfitting is reduced until 50 features are used where theaccuracy is 92%. After this point, feature selection reduces the accuracy as useful informationis lost. When only 5 features are used, the accuracy drops to 76%. Although lower than thepeak performance, this result is unexpectedly high given how little information the algorithmhas access to with only 5 features present. These features are shown in table 3.8 and discussedin the following section.

To examine the e↵ects of feature selection, we limit each algorithm to only using 15features. We perform feature selection using a decision tree with Gini criterion to provide


Figure 3.8: The e↵ect of feature selection on the bimodal classification results. Featureselection is performed a priori using the AdaBoost score function, so the change in accuracyfor AdaBoost is most indicative.

Figure 3.9: The best results when limited to 15 features. Note multiBoost still demonstrates85% accuracy. Here feature selection is performed using a decision tree with Gini criterion.

similar benefit to each of the tree-based algorithms. The results are shown in Figure 3.9. Inthis case, the best results are achieved by multiBoost with 85% accuracy. As discussed in 3.2,multiBoost is a Voting Classifier based on the predicted probabilities of di↵erent tree-basedmethods to reduce variance (Random Forest, Extra trees, AdaBoost and Gradient Boostingon Decision Trees). As compared to Figure 3.8, the accuracy of AdaBoost is drasticallyreduced. This demonstrates the benefit of allowing a specific algorithm to perform featureselection. It also shows that although AdaBoost is able to leverage less informative featuresto achieve a higher final accuracy, it does not perform as well as the other classifiers withlimited information.

Most significant features

The most significant features for the bimodal classification problem are shown in Table3.1. The most indicative feature is the proportion of the conversation in which the a↵ectedindividual is speaking. Six of the features are acoustic features based on functions of the


Mel-frequency cepstral coe�cients (MFCC) and line spectral pairs (LSP). Interestingly, theremaining features are each based on how di�cult a word is to say (e.g, the variance of wordsused which start with a vowel). Other text-based features such as how di�cult a word is tocomprehend are not seen.

Table 3.1: Top 10 features for bimodal classification. The symbol � denotes composition offunctions.

Bimodal Features

1 Proportion of Conversation Spent Talking

2 MFCC-9 � Moving Average � Linear Regression Slope � Delta

3 Letter Trigram � Minimum

4 MFCC-12 � Moving Average � Linear Regression Y-Intercept

5 First Vowel � Variance

6 Orthographic Neighborhood Frequency � Mean � Minimum

7 MFCC-12 � Moving Average � Linear Regression Slope � Delta

8 LSP-1 � Moving Average � Slope

9 MFCC-5 � Moving Average � Interquartile Range 2-3

10 MFCC-9 � Moving Average � Minimum

Although its performance in the classification is less interesting than tree-based selection,ANOVA selection gives interesting features that have a strong discriminative power for theprediction of dementia. Its lower performance on the accuracy of the classification is certainlydue to high correlations in the feature space that can be captured by the tree and cannotbe understood with ANOVA. However, the top features are still interesting ones to considerfrom a medical perspective. The 10 top features have been summarized in Table 3.2: likefor tree based selection, it contains three features on MFCC (8th and 9th). It containstwo features on voicing probabilities which are generally more powerful in the multimodalproblem. We also see trigrams and orthographic neighborhood features in the top 10 featuresselected by ANOVA. This seems to suggest that words with unfamiliar sounds are not usedas frequently by dementia subjects. Finally, average intensity — which is often noted inclinical practice — appears among the top ANOVA features.

The most significant features for the multimodal classification problem are shown in Table3.3. The most significant feature is based on the zero-crossing rate, a common feature fordetermining whether or not a sound belongs to human speech. The second feature is basedon the orthographic neighborhood. The remaining eight features are all functions of theacoustics. Again, no features directly involving the complexity of a word in comprehensionare present.


Table 3.2: Top 10 features for bimodal classification with ANOVA. The symbol � denotescomposition of functions.

Bimodal ANOVA Features

1 Letter Trigram � Minimum

2 Number of Samples

3 MFCC-9 � Moving Average � Skewness

4 Orthographic Neighborhood Frequency � Mean � Minimum

5 Intensity � Moving Average � Skewness

6 MFCC-8 � Moving Average � Delta � Kurtosis

7 MFCC-9 � Moving Average � Delta � Linear Regression O↵set

8 Voicing Probability � Moving Average � LR Quadratic Error

9 Trigram with Subtl Norm � Minimum

10 Voicing Probability � Moving Average � LR Linear Error

Table 3.3: Top 10 features for multimodal classification. The symbol � denotes compositionof functions.

Multimodal Features

1 Zero-Crossing Rate � Moving Average � Delta � Skew

2 Orthographic Neighborhood Frequency � Minimum

3 MFCC-5 � Moving Average � Linear Regression Slope

4 MFCC-3 � Moving Average � Linear Regression Slope

5 MFCC-2 � Moving Average � Delta � Mean

6 MFCC-8 � Moving Average � Kurtosis

7 Fundamental Frequency � Moving Average � Linear Reg. Y-Int.

8 MFCC-5 � Moving Average � Delta � Quartile 2

9 LSP-3 � Moving Average � Maximum

10 LSP-0 � Moving Average � Delta � Variance

3.7 Discussion

Clinical Relevance

As the population continues to age, interest in accurate diagnosis of dementia is increasing.At this time, clinical diagnosis can demand much time on behalf of caregivers, patients, and


medical personnel, and these demands are expected to grow. A simpler screen that can beperformed in the home environment without imposing additional demands on caregivers,patients or medical sta↵ would be a valuable tool. This analysis shows greater than 90%agreement with clinical judgment in the diagnosis of dementia based on conversational speechalone. Given the ready availability of speech samples, it is possible that a similar approachcould permit screening for dementia with potential benefits of early detection and followingdisease progression over time. This sets the stage for future progress in early diagnosis,prognosis with readily available data streams, and inexpensive distinction between similarspeech pathologies.

Classification

The high accuracy in determining whether an individual has dementia or not suggests thatcomputer-aided diagnosis of dementia is worth pursuing. Although the results are on adataset which shows some bias in dementia pathologies, the accuracy is approaching thatof a human expert. With significantly more data including diagnoses confirmed by post-mortem histology as well as data from patients across disease types and stages, an algorithmcompetitive with human experts seems possible. On the task of dementia detection fromconversational speech, we present the highest accuracy achieved by an algorithm thus far ona dataset of this size. It should be noted, however, that the dataset is still relatively small formachine learning applications, thus, accuracies may be overestimated. By achieving the firstgoal of the article to produce state-of-the-art results for determining whether an individualhas dementia from recorded speech, we hope to encourage future work in 1) early detection ofdementia and 2) support of fine-grained diagnosis by the general practitioner without accessto specialists in clinical neurology. The lower accuracy on the multinomial classificationproblem suggests that continued research is needed before a computational diagnosis can beperformed independently.

Feature Selection

The most discriminative features are related to the proportion of the conversation spent talk-ing, the pronunciation complexity, and the Mel-frequency cepstral coe�cients (MFCC). Thepotential significance of a decreased proportion of conversational time spent talking has beennoted by clinicians. Although features capturing how di�cult a word is to say appeared, nofeatures appeared which are used to describe how complex a word is conceptually, such as theage of acquisition. This suggests that in this cohort of early stage patients, mental capacityis diminished less than muscular ability to perform di�cult articulation. The presence ofmany MFCC features, which are often used in speech recognition algorithms, suggests thatthe quality of the speech can be very indicative of the underlying disease state. The pres-ence of these features also suggests that the algorithm could be improved by incorporatingadvances that have been made in recent years to improve upon hand-engineered features likethe MFCCs by learning features from speech data itself (i.e., deep learning). Although we


chose not to use deep-learning approaches here based on the paucity of data, future attemptscould be made to tune a pre-trained deep speech network to the data presented here.

With key features needed to characterize dementia-like speech, the approach presentedhere can pose a privacy preserving, minimally invasive method for monitoring disease pro-gression. By extracting the features studied and identified by our approach from regularcellphone conversations, clinicians and family may obtain unique insight into the progres-sion of the disease over time. Rather than saving the recorded conversation itself, thesefeatures could be extracted locally, providing little insight into the actual content of theconversation in a privacy preserving architecture. These same features could then be used toperform regression whereby an individual may appear more dementia-like following a changein medication, primary caregiver, place of residence, etc.

Limitations of the study

The dataset could be improved by including more participants, including participants fromvarying stages of disease, providing data from the same participants over time, and obtainingdata with true labels based on post-mortem histology rather than clinical diagnosis. Inparticular, the dataset analyzed here contained only individuals presenting early dementiapathologies. In order to generalize these results to general diagnosis, the methods should beapplied to a broader dataset. It will likely not be possible to claim an algorithm is capable ofsurpassing the performance of a human expert until a dataset is created which contains labelsbased on post-mortem histology. It will not be possible to claim an algorithm is capableof surpassing the performance of a human expert until this time. A more interesting goal,however, may be to see how an algorithm can be developed to support the clinical expert toimprove the final diagnostic capability as discussed next.

The methods used could be improved by limiting overfitting and improving transcription.To that extent, one could perform the feature selection directly inside the cross-validation.It would complicate the computation of top features and increase the computational costbut would limit the likelihood of overfitting. By improving transcription, one could havebetter accuracies with textual features, and add measurements on the sentences structure.Moreover, if a team of clinical experts created a list of the features used in diagnosis, anindividual clinician could provide a score for each of these features on making a new di-agnosis. These feature scores could be used in tandem with the features selected here toperform final classification through standard machine learning techniques. This strategywould avoid the di�cult technical hurdle of automatically detecting nuanced features suchas changes in eye movements while allowing for the inclusion of features which are di�cultfor a human to measure such as the frequency with which certain word types are used (e.g.,orthographic neighborhood). Existing methods could be further improved by more exten-sive hyper-parameter optimization (i.e., through random search) and by including pruningtechniques into the tree-based learning algorithms which are not included o↵-the-shelf fromScikit Learn Python library.


Given the size of the dataset available for the present work, several design decisions weremade that could be improved upon with the addition of more data. The results are providedbased on leave-one-out cross validation. A better indicator of generalizability would be toreserve a strict holdout set containing 20% of the data. This cross-validation method is proneto overfitting, but has received recent justification through iterative data analysis techniquessuch as the thresholdout method.

3.8 Conclusion

This work presents a method for distinguishing the speech of healthy individuals from peoplewith dementia. The method is based on assembling a large vector of features to characterizethe speech, then selecting those features that demonstrate the most discriminative power.From this feature selection, we highlight certain features that are more widely clinically rec-ognized, such as the proportion of the conversation spent speaking, and others which are notcurrently used by clinicians such as the mel-frequency cepstral coe�cients. We show thatthis method provides discriminative power approaching that of clinicians in binomial classi-fication, but only moderate ability to discriminate between Alzheimer’s disease, behavioralfronto-temporal dementia, and primary progressive aphasia.

In the future, we believe that this work could be improved in several ways. First, themethod could be fully automated by implementing computational segmentation of the speak-ers in the conversation. This is easily obtainable from cell phone conversations and obtainablewith high accuracy in more natural settings if multiple microphones are used and readilyavailable algorithms such as independent component analysis (ICA) are applied. Second,the study would benefit from more data, data from each individual from multiple points intime, and particularly from data in which the post-mortem histology is known. Althoughwe show 70% accuracy in predicting the diagnosis, a more interesting result would be theaccuracy in predicting the true disease. The predictive features highlighted could be addedto clinical procedures in early dementia detection. However, although a physician may havean intuitive understanding that the MFCC provide discrimination based on pitch in a scaleapproximating the human auditory system, there is not yet a practical clinical system fordetecting and determining correlation between certain abnormalities in MFCCs and certaindisease types. Thus, the physician cannot yet use these features to inform their own diag-nosis in the same way they can use cues obtained naturally by a human expert in practicesuch as hand tremor or eye movement.

We hope that as medicine shifts from a reactive to a proactive paradigm, the present workdemonstrates a proof-of-concept process that recorded speech provides a data source thatis both easily obtainable and presents high expressive power. Moreover, by highlighting themost influential features of the data, we propose a privacy protecting method for performingdaily prognosis and suggest methods in which features that are best detected my humansand features that are best detected by machines can be used together to improve the overallquality of care.

47

Chapter 4

Fall Detection through Video Analysis


We study robust fall detection in the context of images collected in the light with standardRGB sensing and images collected in the dark with IR sensing. We collect a data setin which 4 healthy adults simulate falls in the home environment. The data set contains103,315 images in the light and 43,485 images in the dark with 30,608 fall images in the lightdomain and 10,842 fall images in the dark domain. We explore three methods for domainadaption none of which have previously been explored in the context of fall detection: (1)tuning the pre-trained VGG network to the fall-detection task [68], (2) applying the domainconfusion loss developed by Tzeng, et al. [74], and (3) implementing a novel domain-specificdata augmentation technique based on the deep style work of Gatys, et al [26].

The best results for our application indicate 0.92 precision and 0.86 recall in the lightdomain and 0.72 precision and 0.63 recall in the dark domain, both originating from simplytuning VGG. For future work, we will generate a larger data set in a 3-month pilot study,extend the discrete domain adaptation results to continuous domain adaptation under day-light cycles, and explore the use of recurrent neural networks to exploit the time-dependenciesbetween video frames for better fall detection.

4.2 Introduction

Fall accidents account for 26% of all Alzheimer’s related hospitalizations [2] and are thus amajor concern and key cost contributor. Unfortunately, safety products developed for fallsrequire a wearable device; they were developed for cognitively aware adults and not designedspecifically for individuals with Alzheimer’s disease and related dementias. No dementia-friendly fall detection solution currently exists to a↵ordably provide home AD care within acomprehensive framework.

Our team proposes a system which uses o↵-the-shelf wall-mounted cameras and wirelesssensors to passively detect the key safety concerns for individuals with Alzheimer’s disease.

CHAPTER 4. FALL DETECTION THROUGH VIDEO ANALYSIS 48

The proposed system does not require any action of individuals or their caregivers such aswearing a fall pendant and is therefore well-suited for individuals with dementia. Althoughthe proposed system provides several functions critical to AD care, we focus here on thecritical issue of fall detection from video.

Prior work in vision-based fall detection [19, 36, 46, 80]follows generally the same process. The interested group collects a small data set of falls

which is necessarily limited given the rarity of fall events and the di�culty for an actor toreplicate an authentic fall event. The group then proposes a method which is based generallyon a three-stage pipeline:

1. Detection of the person within the frame

2. Extraction of key features from the detected region

3. Classification into fall / no-fall based on these features

In the literature, Stage 1 is often accomplished by simple background subtraction underthe assumption that the only movement in the frame is due to human motion. Stage 2 isaccomplished through numerous techniques including Gabor feature extraction, ellipse fittingto the human profile, projection histograms, Gram-Schmidt orthogonalization, nonlinearPCA, collections of heuristics, and deep features extracted from neural networks. Stage 3 isperformed by applying traditional SVMs, hierarchical SVMs, shallow neural networks, anddeep neural networks [19, 36, 46, 80] .

Although there is clear room for improvement in Stage 1 using algorithms which can betuned to person-specific localization such as fast-RCNN [28], the focus of this work is onStages 2 and 3. Namely, we provide the first application of pre-trained deep networks tofall classification using VGG [68] pretrained on ImageNet [16], we study the robustness ofthis classification to a change in the image capture modality from day-time RGB sensing(light) to night-time IR sensing (dark), and we explore two supervised learning techniques formaintaining robustness across modalities. The first method applies the deep domain transfertechniques developed by Tzeng et al. [74], and the second applies the deep style techniquesdeveloped by Gatys et al. [26] to learn a domain transfer mapping for domain-specific dataaugmentation.

4.3 Methods

In this section, we first describe how we generated a fall dataset. Then, we outline the networkarchitectures and training methods used to perform fall detection in multiple domains. Allexperiments were implemented using Ca↵e and deployed on a NVIDIA Titan X GPU [37].

Data Collection

In order to train and evaluate our model, we built a small fall data set. We recorded roughly 1hour of video containing four individuals in a standard living room environment simulating


typical fall behavior. To create a realistic domain shift, we recorded data in a day-timesetting with a color camera and in a night-time setting with infrared (IR) cameras. Wecreated bounding boxes and fall labels for human figures with Amazon’s Mechanical Turk,using the Video Annotation Tool from Irvine California (VATIC) [75]. Workers were giveninstructions on how to select regions they believed contained humans and on how to labelwhat they thought of as a fall or someone on the ground. Images were labeled as “fall” or“no fall” , with a fall defined as a person lying on the ground and not in an intermediatepose. To control the quality of the region proposals and labels, we fine-tuned results fromthe workers, discarding errant proposals and trimming bounding boxes.

We generated 103,315 light and 43,485 dark data-points this way with 30,608 falls in thelight domain and 10,842 falls in the dark domain. We show examples of data from the twodi↵erent domains in Figure 4.1.

(a) Example of “fall” in the night-time set-ting.

(b) Example of “no fall” in the day-time set-ting.

Figure 4.1: Examples of data from the day-time and night-time settings.

Baselines

Our first attempt at performing fall detection was to simply finetune two distinct VGG-16nets [68] for the light and dark domains. The baseline nets were initialized from the pre-trained VGG-16 nets; only the final layer, fc8, was not initialized with weights from VGG.For training, we locked the weights for the first 6 layers and only allowed the two final fullyconnected layers (fc7 and fc8) to update weights. The Stochastic Gradient Descent (SGD)solver parameters used for all experiments in this paper are listed in Table 4.1. For thebaselines, we noted convergence by 20000 iterations.


Figure 4.2: Domain confusion net, based on [74], used for experiments. Note that the firstseven layers are initialized from the VGG weights [68]. We lock the weights for all layersexcept fc7 and fc8. In implementation, we use two fcD layers with shared weights to connectto light and dark fc7 layers, respectively.

Batch Size 64Base Learning Rate 0.01

LR Policy StepStep Size 5000Momentum 0.9

Weight Decay 0.0005

Table 4.1: SGD solver parameters used to train all nets.

Domain Confusion

After performing the baseline experiments, we wanted to see if we could discover a domain-invariant representation to allow use of a single net to perform fall detection in both light anddark domains. We hoped this would allow the accuracy in both domains to be improved byleveraging all available information. In [74], the authors achieved domain transfer throughmaximizing domain confusion and transferring task correlations from a source domain to atarget domain. This method results in a feature representation that is di�cult to classifyby domain but simple to classify by category, with categories that were close to each otherin the source domain representation still close in the resultant domain-invariant featurerepresentation.

We used this technique with a few modifications. Since our problem is a binary classifica-tion task (i.e. fall detection), we did not apply the soft label loss to achieve task transfer; weonly used the domain confusion loss. In addition, we used VGG as the base net for feature


Model Precision Recall Number of“Falls” inTest Set

Number of“No Falls” in

Test Set

Light Baseline 0.859 0.920 2506 18157Light DomainConfusion

0.884 0.840 2506 18157

Dark Baseline 0.715 0.632 1542 7155Dark DomainConfusion

0.939 0.511 1542 7155

Style Transfer 0.558 0.640 1542 7155

Table 4.2: Fall detection results for baseline, domain confusion, and style transfer methods.

Model Precision Recall Number ofLight Imagesin Test Set

Number ofDark Imagesin Test Set

Light DomainConfusion

0.424 0.463 20663 8697

Dark DomainConfusion

0.457 0.409 20663 8697

Table 4.3: Dark domain detection results for domain confusion method. The dark domaindetection results in this table correspond to the snapshots used to evaluate fall detection inTable 4.2. Note that the test set used for dark domain detection is the same test set used inTable 4.2 but is partitioned by domain rather than by category.

representation rather than AlexNet [42].A graphical representation of the domain confusion net is shown in Figure 4.2. Note that

we feed in both the light and dark data simultaneously; each input layer has a batch size of 64,respectively. In addition, the DomainConfusionInnerProduct layer (i.e. fcD in Figure 4.2)was provided by the authors of [74]. It implements an iterative update for back-propagation,which is explained in the original paper. For this layer, we used the recommended loss weightof 0.1 for both domain classifier and domain confusion losses. We noticed convergence after30,000 iterations.

Style Transfer

Gatys, et al. demonstrate that the content from one image and the style from another canbe merged by extracting deep features from each image and matching first-order statisticsfrom the content image with second-order statistics from the style image [26]. One exampleis shown in Figure 4.3. The result is achieved by starting with a white-noise image and


iteratively minimizing the Euclidean loss between the exact features of one layer from thecontent image and the Gram matrix of the features from several layers of the style image. Forexample, Figure 4.3 is achieved by extracting the features from one feed-forward pass of theoriginal VGG network pre-trained on ImageNet, then matching the content reconstructionfrom convolution layer 4 2 with the style reconstructions from convolution layers 1 1, 2 1,3 1, 4 1 and 5 1.

In this work, we apply the deep style algorithm to domain-specific data augmentation.Given that training data is present in both domains, we propose to learn a mapping from onedomain to the other whereby data from both domains can be leveraged to improve overallaccuracy. Given that the dark domain by definition captures less information, we proposeto map all data from the light domain into the style of night-vision capture. Although theopposite direction is also possible, we show in Figure 4.4d that attempting the oppositedirection is an ill-posed problem due to the relative lack of information in the dark domain.

Style transfer was achieved by applying the Gatys style transfer algorithm on a frame-by-frame basis based on code developed by [49]. To ensure diversity in the data augmentationscheme, frames from the light domain were randomly matched with frames from the darkdomain. The intent of this scheme was to develop an augmented data set matching the globalstatistics of the night vision data set rather than the specific statistics of a single frame.This algorithm was parallelized and implemented using an NVIDIA Titan X GPU where 8transformations could occur simultaneously. Each individual image transform was limited to200 iterations. With this implementation, the processing time required to transform 30,000images was 8 days. Due to the limited time remaining for the deadline, the transformationwas halted after 7,500 transformations from which 27,645 cropped images were extracted.This augmented data was added to the existing training set containing 34,788 images fromnight-vision capture for a total augmented data set containing 62,433 images.

4.4 Results and Discussion

In this section we discuss the results from our domain confusion and style transfer experi-ments. The key results are displayed in Table 4.2. We measured the precision and recall ofthe di↵erent approaches with a positive result corresponding to a fall. The results reveal thatdomain confusion produces a classifier that has fewer false alarms (i.e. better precision) atthe expense of more missed detections (i.e. lower recall). Alternately, our results from styletransfer show minor improvements in recall at the expense of dramatically reduced precision.

Baseline vs. Domain Confusion

There is an interesting trade-o↵ between the baseline model and the domain confusion model.Recall was higher for the baseline model for both the dark and light domains by roughly 10%. In contrast, precision was higher for the domain confusion model for both domains. Thissuggests that the models that are trained only on a single domain are more sensitive fall


Figure 4.3: An example of deep artistic style transfer from [26] whereby the content of imageA is transformed into the style of 3 separate paintings in images B, C, and D.

detectors. However, the models that are trained on both domains concurrently and exhibitdomain confusion are better at rejecting “no fall” cases than the single domain models.

In this application, we are more sensitive to not missing falls. However, too many falsealarms could overload supervisors tasked with deploying resources in an emergency situation.In the light domain, the baseline model is preferable as there is higher recall with relativelylow loss in precision. However, in the dark domain, there is a sizeable drop in precision ifwe choose the dark baseline model. Thus, we do derive value from domain confusion fordetecting falls in dark environments and further work is warranted.

To verify that the domain confusion worked as expected, we looked at the domain con-fusion net’s ability to distinguish between light and dark domains at di↵erent training iter-ations. Table 4.3 shows the precision and recall of the domain confusion net at the iterationmatching the best performance in light and dark domains (i.e. the same set of weights usedfor the results in Table 4.2). In this context, we designate the dark domain as a positiveresult and the light domain as a negative result in calculating precision and recall. Figure 4.5shows that the nets quickly learn how to tell the domains apart but then converge to a resultwhere both precision and recall are under 50%. Therefore, we conclude that the domain


(a) Original picture from the dark domain. (b) Transformed picture with content of lightdomain (4.4c) and style of dark domain(4.4a).

(c) Original picture from the light domain. (d) Transformed picture with content of darkdomain (4.4a) and style of light domain(4.4c).

Figure 4.4: Examples of style transfer with originals on the left and transformed images onthe right.

confusion loss is working and that our trained domain classifier is poor at distinguishing thefeature representation of a dark image from a light image.

Style Transfer

Figure 4.4 shows the qualitative results of style transfer. When transforming the contentfrom a light image (4.4c) to appear in the style of the dark domain (4.4a), the results appearas expected (4.4b). For example, areas of local brightness are transformed to resemble lightsin the dark domain. This can be seen in the bottom left corner of Figure 4.4c where thewhite spot of table surrounded by dark headphones is transformed to resemble the lightemitting from the bicycle reflectors in Figure 4.4b. Similarly, the cloudiness from Figure4.4a appears in Figure 4.4b although minor separations across color channels occur. Whenattempting the transformation in the reverse direction (Figures 4.4a, 4.4c, 4.4d), the inabilityto compensate for information loss becomes readily apparent. For example, in comparing thetop right corner of Figures 4.4a and 4.4c, the colors on the canvas are lost by the IR camera.


Figure 4.5: Measurement of the domain classifier’s ability to distinguish between light anddark domains over training process. There are initial spikes in precision and recall, followedby convergence to under 50%. Note that the light and dark domain confusion net results inTables 4.2 and 4.3 occur at 25,000 and 15,000 iterations, respectively.

In the same region of Figure 4.4d, the style transfer mechanism generates a seemingly randomaccumulation of shapes and colors where it is unable to compensate for the information loss.Interestingly, it is still able to learn selected transformations such as the appropriate colorfor skin.

As seen in Table 4.2, augmenting the training set in the dark domain with transformedimages from the light domain did not provide significant benefit. The recall is improved by0.008 at the expense of a 0.157 reduction in precision. Although the priority in this workis on detecting all events at the expense of possible false alarms, this great of a trade-o↵ isnot warranted. Given that only one fourth of the available light data set was augmenteddue to time constraints, it remains to be seen how this result scales with the amount oftransformed data. It may be that the results continue to improve by continuing to addaugmented data, but it may also be that once the majority of the training set originatesfrom simulated data, the statistics from the training set deviate too much from those ofthe test set to provide satisfactory performance. Similarly, it may be that if standard dataaugmentation techniques were applied to generate an equal amount of supplemental syntheticimages, the results from standard data augmentations may actually perform better than thedomain-specific data augmentation performed here.


Successes and Failures

To better understand the representation of the domain confusion net utilized, we took il-luminating sample failure and success cases in light and dark domains and used them toidentify where improvements could be made.

Figure 4.6 contains sample light images that are particularly instructive. In the leftcolumn, we see that Figure 4.6a and Figure 4.6c appear similar, yet our classifier predictsthat Figure 4.6a contains a fall while Figure 4.6c contains no fall. To make sense of this,we recall that our net was pre-trained on ImageNet with many human images in uprightpositions with visible heads. A possible cause of this failure mode may have been that theimage in Figure 4.6c contains a barely visible head in a non-standard position, making itdi�cult to determine if the object on the floor is human.

In the right column, we have two images, Figure 4.6b and Figure 4.6d with fallen humansin similar positions. However, the large occlusion in Figure 4.6d fools the net into makingthe wrong prediction.

Samples of the dark images provide even more information on success and failure modesof our classifier. In the left column (4.7a and 4.7c), we observe two images that appearnearly identical. Yet, the net correctly classifies one image and not the other. We exploredthis perplexing behavior further. We found that the di↵erence between both images is thatFigure 4.7c has a uniformly lower pixel intensity value. This misclassification could suggestthat the image lies directly on the decision boundary, but more likely is a sign that the net isover-fitting to features specific to the training set. The images in the right column (Figures4.7b and 4.7d) indicate that strong motion blur might also be a reason for misclassification.

4.5 Conclusion

The hope of this study was to develop a method which could be used to implement falldetection with high accuracy regardless of the domain of origin. We extend the prior workin the field by employing a pre-trained deep network for feature extraction, performing clas-sification by implementing the current state-of-the-art for domain adaptation, and proposingone novel method for domain-specific data augmentation. Unfortunately, the results do notyet provide the high performance guarantees needed to provide this system to families caringfor a loved one with Alzheimer’s disease as evinced by the low accuracy in the dark domain.In the light domain, however, results are on par with existing wearable techniques for falldetection which show 90% accuracy on realistic datasets [6].

The best results from the light domain indicate 8% of falls will be missed and 2% of non-falls will generate false alarms. Given that the camera sampling rate is 7 frames per second,this corresponds to an average false alarm rate of 8.4 false alarms per minute. Althoughthe distribution of false alarms is not likely to be uniform and false alarms will likely clusteraround areas of uncertainty, this false alarm rate remains too high to perform independent falldetection (i.e., without the support of a human observer). More significantly, 8% of frames


(a) Example of a fall labeled correctly. (b) Example of a fall labeled correctly.

(c) Example of a fall labeled incorrectly.This image is similar to 4.6a except that thehuman head is not well distinguished. Thisis a possible reason for failure, as it would behard to detect the human.

(d) Example of a fall labeled incorrectly.This image is similar to 4.6b except for theocclusion of the subject of interest. This is apossible reason for failure.

Figure 4.6: Success and failure examples in fall detection in the light domain.

containing falls are missed in this test set preventing the current system from providing thehigh safety guarantees needed for a home safety system.

In the dark domain, neither domain confusion nor domain-specific data augmentationprovide significant improvements over the baseline detection results. Moreover, the baselineresults leave much to be desired where 36% of falls are missed and 11% of non-falls generatefalse alarms.

To improve on the current work and to better understand the problem, we will be gen-erating a larger data set through a 3-month pilot study where falls will be observed undernatural conditions. In this study, we will further investigate how these results generalizeto continuous domain adaptation given daylight cycles present in normal home conditions,and we will explore oversampling to adjust for class imbalance. We will also explore the useof recurrent neural networks to leverage the time-dependencies inherent in fall-detection toprovide a more natural, accurate, and robust classifier. Finally, we will explore how domain-specific data augmentation changes with more data and how it compares to standard dataaugmentation techniques.


(a) Example of a fall labeled correctly. (b) Example of fall labeled correctly.

(c) Example of a fall labeled incorrectly.This image is similar to 4.7a except for uni-formly lower intensity values. Incorrect clas-sification could be a sign of over-fitting

(d) Example of a fall labeled incorrectly.Strong motion blur obfuscates the image,making detection harder. Figure 4.7b, a sim-ilar image with less blurring, is labeled cor-rectly.

Figure 4.7: Success and failure examples in fall detection in the dark domain.

59

Chapter 5

Fall Reduction through Video Review


A camera system composed of o↵-the-shelf video recording equipment was placed in one 40-resident memory care community for 3 months to collect real-world data regarding the naturein which fall incidents occur. The purpose was to use this data to develop computationalalgorithms for fall detection. We describe here an unexpected result in which the memorycare facility used the fall video to review how incidents were occurring, updated policies androom layout to reduce potential fall risks, and reduced fall rate from 10.5 falls per month onaverage to 2 falls in the final month. Preliminary analysis shows a statistically significantfall reduction with p=0.030. Given the small sample size, further studies are needed andunderway to validate this result.

5.2 Introduction

Significance

Fall accidents are the primary cause of AD-related hospitalization, contributing to 26% ofall hospitalizations at an estimated annual Medicare cost of $5.3B [15]. In nursing facilities,individuals with dementia fall 4.05 times/year on average versus 2.33 times/year for otherresidents [17]. Unfortunately, safety products such as wearable pendants were developedfor cognitively aware adults and fail to meet the needs of individuals with dementia thatcannot reliably wear or use such devices. Detecting falls early and in an ongoing mannerprovides significant potential for reduced hospitalization and system-wide savings. Less than10% of falls lead to serious injury [15], [17], but 50-75% of elderly fallers experience repeatfalls [8], [11], [50], [59], [72], [71]. Thus, detecting the first fall and taking preventativeaction provides significant potential for reducing fall risk. Through a randomized clinicaltrial of 160 ambulatory fallers, [63] showed that a nurse practitioner analysing a patientand fall circumstances after the event led to 26% fewer hospitalizations and 52% reduction

CHAPTER 5. FALL REDUCTION THROUGH VIDEO REVIEW 60

in hospital days. Rapid fall detection also limits the amount of time fallers spend on theground. Mary Tinetti, developer of the well-known Tinetti score for fall risk assessment [73],noted the risks of time spent on the ground following a fall in a study of 596 non-injuriousfalls [71]. Of 313 fallers, 47% were unable to get up independently after at least 1 fall. Fallerswho were unable to get up were more likely to die, to be hospitalized, and to su↵er lastingdecline in activities of daily living (35% vs 26%). These correlations are confirmed in [21],[77].

Related Work

Current commercial solutions for fall detection fail to address how a fall occurred. Themost well-known commercial solutions include wearable systems like Phillips Lifeline whichdemonstrate limited success in dementia care where individuals forget or refuse to wear adevice. Non-wearable fall detection systems based on radar and optical sensors are underdevelopment by groups including Emerald and C2S, but are not yet commercially availablein the US and have not yet demonstrated robustness through evidence-based studies. Fallmats and bed alarms are prevalent solutions in memory care but are intended only forthose residents that should never be walking independently. When applied to general falldetection, these alarms su↵er from high false alarm rates due to the prevalence of night-time wandering in dementia care. None of these solutions allow care providers to see howfalls occur. There is an absence in the academic literature examining how fall review canimpact the quality of care. The most relevant study is conducted by Robinovitch et al. [62]in which video is collected from cameras in two long-term care facilities over a period of 3years and capturing 227 falls. This dataset was collected to determine the most commoncauses of falls in managed care and so was collected in coordination with care facilities atwhich cameras were already installed. Video was not reviewed with facility sta↵ with thespecific intention of identifying and removing any possible cause. The study thus o↵erslittle insight into the e↵ect of introducing cameras or how video review can impact fall rate.It does confirm an increased fall incidence among residents with Alzheimers disease andhighlights that 43% of falls captured involved a cause which could be addressed by facilitysta↵ including trips, stumbles, hits, bumps, and loss of support from external objects. Manycauses such as incorrect transfer of body weight, responsible for 41% of falls, do not provideobvious changes to the environment which sta↵ could address. Continued data collectionfrom this group appears to be in progress [79]. Only one other study conducting video reviewof falls appears to exist [34] but has significantly smaller sample size and is also based onpre-existing cameras with no feedback to sta↵.


Figure 5.1: Equipment. IP cameras were placed in all common areas and approved privaterooms. Video was transmitted from the cameras to the network attached storage (NAS)via Wi-Fi where it was maintained locally for 72 hours after which it was transmitted toa remote server for archiving. Live video and video from the previous 72 hours were madeavailable to facility management via smartphone applications.

5.3 Materials and Methods

Equipment

Figure 1 shows the o↵-the-shelf video recording equipment used. Cameras were placed inall common areas and private rooms of consenting residents and families in accordance withthe privacy guidelines described below. Camera video was transmitted using Wi-Fi to localnetwork attached storage (NAS) devices. Facility Wi-Fi coverage was upgraded using o↵-the-shelf routers and range extenders to remove Wi-Fi dead zones. Video was maintainedon the local NAS for 72 hours before transmitting to a university server where the completevideo data set was maintained. A smartphone application was provided for viewing videofrom the previous 72 hours, developed by the makers of the NAS. A smartphone applicationfor accessing the live video from each camera was provided, developed by the makers ofthe cameras. Cameras were configured to record only on motion to filter unneeded videoand software was developed to support video transcoding and uploading from the NAS towork around bandwidth limitations defined by the upload speed granted to the memorycare facility through their internet service provider. The specific equipment provided to thefacility included the following:

1. DLink 932L IP camera (x43),

2. QNAP 451+ network attached storage (x2),

3. Netgear AC5300 Nighthawk X8 Wifi Router (x2),


4. Netgear Nighthawk AC1900 Wifi Range Extender (x2).

Fall Review

In the first two months of the three-month study, no video review took place. The originalpurpose of the study was only to collect video of falls for development of fall detectionalgorithms. Thus, although video recordings from the previous 72 hours were availableto facility management, no formal review occurred. Facility management reported hardlyever using the video feeds during this time due to the many other challenges faced withoperating a memory care facility and the little obvious value of the video. After two months,a particularly severe fall incident was recorded. In accordance with procedures approved bythe university institutional review board, this incident was reported to facility management.After reviewing this fall, facility management requested reviewing other significant falls.Video was provided and facility management chose interventions which they believed wouldaddress the causes. Interventions included movement of furniture that had caused trippinghazards and head injuries and changes to policy that included checking on high-risk residentsevery hour instead of every two hours at night. The facility did not use any devices for falldetection before or after the incident occurred.

Privacy and Consent Procedures

Privacy and consent procedures were developed with support from the university institutionalreview board. Permission was granted from facility management to place cameras in commonareas and to speak at town hall meeting to introduce the idea to families. After speaking,interested families volunteered their contact information for follow-up discussion regardingcameras within the private room. At the discussion, the study was explained in plain English,and for those families who decided to participate, surrogate consent for the resident living inthe care facility was obtained following university guidelines for surrogate decision makers.Residents were required to give assent; if they ever expressed a desire not to take part in thestudy or have cameras placed in their rooms through verbal or non-verbal communication,they were not included. Video segments defined as improper by the review board weredeleted including any video containing sexual activity, actions that could imply abuse iftaken out of context, and other incriminating behaviours. Before deleting, the dementiacare nurse on the team was responsible for determining if the matter should be taken tofacility management or to adult protective services. Following California state guidelines,audio recording was disabled and signs were posted visibly on the door of each private roomin which video recording occurred. Before publishing video in any way, media release formswere signed from individuals contained in the videos or from their surrogate decision makersallowing for public release of the specific videos in question. The number of falls recorded andresident population for each month were determined by interview with facility management.


Figure 5.2: Fall rate. In the four months prior to video review, the fall rate at the communitywas 10.5 2.5 falls per month, 79% of the national average. In the final month, 2 falls occurred,17% of the national average.

5.4 Results

During the three-month study period from July to September, 26 falls occurred. In the twomonths prior, 18 falls occurred. Overall, the rate was 10.5 2.5 falls per month before videoreview, and 2 falls occurred in the month following video review. The facility supported anaverage of 38 residents with a slight dip in the final month. For a facility with 38 residents,the expected fall rate is 12.7 falls per month based on the national average of 4 falls permonth of individuals with dementia living in care facilities [17]. In the final month, theresident population declined slightly and thus, the national average is adjusted accordinglyin Figure 2. The overall fall rate in this community was 79% of the national average forthe 4 months prior to review and 17% of the national average in the month following fallreview. Applying a one-tailed, two-sample t-test, the reduction in fall rate normalized bythe number of residents cared for at the time is statistically significant with p=0.030.


5.5 Discussion

Although promising, these results are only preliminary. For example, in the final month 1resident that fell four times in the previous month passed away. She fell zero times in themonth before that. As can be seen in Figure 2, deviations in the fall risk naturally occur assome residents become greater risk before eventually passing away. If these 4 falls had beenremoved, there would only have been 7 falls in the second month, the same number of fallsexperienced in the month before the pilot study began. Controlling for this resident (i.e.,removing her falls from all months), the result of the one-tailed two-sample t-test appliedto the per resident fall rate drops to p=0.058, no longer significant at the often-used 0.05threshold. Moreover, this t-test requires the assumption that the variance in the fall rateis the same before and after the introduction of video review. With only one sample datapoint, there is no evidence to support this assumption. Clearly, more data is needed validatethat the fall rate in managed care facilities can be reduced through interactive video reviewof falls.

5.6 Conclusions

If verified, the impact of these results could be tremendous. Based on feedback from thefamilies, the reduction from 10.5 falls per month to 2 falls in the final month is equivalent toapproximately $20k in savings in emergency room visits alone both for the families and forMedicare. More importantly, any one of these falls could have led to serious fracture, severeloss in mobility, significantly decreased quality of life, and significantly increased cost of care.Given that Alzheimers disease is the most expensive disease in the US, and fall accidentsare the leading cause of hospitalization in Alzheimers care, this simple intervention may bethe first steps toward a big impact. Based on this preliminary result, the operators of thiscare community have agreed to expand the study to 10-20 more facilities in California. Inthis next phase, we will deploy the same system, conduct video review after the first month,and observe if a statistically significant reduction in the fall rate occurs.

65

Chapter 6

Conclusions

6.1 Review of Project Report

In this project report we discuss 4 projects spanning a total of 2.5 years. We begin bydiscussing in Chapter 1 the relevant open problems in the dementia research communitythat could be supported by the development of new technologies. In the following 4 chapterswe discuss several applications of machine learning and commercially available sensing todevelop a new technologies for the Alzheimer’s research community.

We first discuss the search for methods of curing, mitigating, or delaying the e↵ectsof Alzheimer’s disease. Relevant research in this area includes monitoring the e↵ects ofdiet, exercise, cognitive stimulation, and related factors on disease progression. To supportthis research area, in Chapter 2, we present the design and implementation of a wearablesystem called Max for collecting fine-grained measurements from environmental sensors andto perform analysis of this data for trends and anomalies. With this system, we hope toprovide the Alzheimer’s research community with the proper tools to perform functionalmonitoring of individual patients, to study how the e↵ects of potential mitigating factorsinfluence patients on an individual as opposed to a population level, and to perform newstudies on how non-invasive home monitoring could be performed to recognize risks factors forconditions like urinary tract infections before they escalate into emergency room visits. TheMax system is currently deployed with 18 individuals living in private homes and monitoredby the the University of Nebraska Medical Center as part of the Dementia Care Ecosystemwith data collection ongoing at the time this project report was submitted.

We next discuss the di�culty with accurately diagnosing Alzheimer’s disease and relateddementias. Since typical clinical diagnosis only provides accuracy near 75% for Alzheimer’sdisease, a focus here is on identifying biomarkers which are particularly indicative of a par-ticular disease type. The ideal biomarker would be obtainable non-invasively and wouldidentify both the disease and the stage of the disease. The most relevant biomarker forAlzheimer’s disease involves measuring the concentration of molecules deriving from thecharacteristic amyloid-� in cerebral spinal fluid. In Chapter 3, we discuss another possible

CHAPTER 6. CONCLUSIONS 66

biomarker which involves performing computational assessment of the speech patterns. Weshow that 92% accuracy can be achieved in matching an existing diagnosis for individualswith dementia based on one short conversation based on traditional machine learning ap-proaches. Unfortunately, practical circumstances limit the scope of this study within theproject report. Specifically, the much more interesting results require the collection of exten-sive longitudinal data which is not feasible within the scope of this project report – showingthat the final clinical diagnosis performed from post-mortem histology can be determinedwith accuracies matching that of a human expert or that a screening tool for early diagnosiscould be developed with high sensitivity based on a smartphone app alone.

We finally discuss how care could be improved for those currently living with Alzheimer’sdisease and related dementias, paying particular attention to improving the quality of careand reducing the cost. This is a major research focus within the public health communitysince Alzheimer’s disease is currently the most expensive disease in the US and the numberof a↵ected individuals is continuing to rise at an ever-increasing rate [2]. In Chapters 4 and5, we explore how existing methods in computer vision can be applied to fall detection andprevention. As discussed, since falls are the greatest cause of hospitalization in Alzheimer’scare and 3 in 4 fallers experience repeat falls [50], it seems that a technology capable ofboth detecting falls and showing caregivers how falls are occurring could provide significantbenefits. In Chapter 4, we show that the technology is possible by applying existing domainadaptation techniques to a dataset of 200 falls acted out by healthy individuals. In thisdataset, we show 92% precision and 86% recall in daylight recordings – comparable withresults from wearable fall detection systems [6]. Given the deep-learning paradigm, thepractical challenge in this case appears to arise from the shortage of data containing truefalls collected with IR night-vision sensors. By applying the same techniques to a su�cientlylarge database of true falls, it appears technically feasible to perform fall detection with highaccuracy from video. In Chapter 5, we study practical concerns with placing video camerasin memory care through one 3-month pilot study at a 40-resident memory care community inthe San Francisco Bay Area. From active reviews of falls in the community, the managementat this community reduced the fall rate by 80% during the study period. Moreover, possibleconcerns around the invasion of privacy for residents and sta↵ appeared minimal given thepossible safety benefits. Although preliminary, this result appears encouraging for reducingthe fall rate in managed dementia care.

6.2 Final Conclusion: Hybrid Solutions are Requiredfor Practical Challenges

From this work, the key conclusion is the need for hybrid solutions when solving practicalproblems given the current limitations of the machine learning methods discussed. Despiteamazing successes in artificial intelligence, the best results continue to emerge from theapplication of supervised learning techniques to large datasets where clear loss functions can

CHAPTER 6. CONCLUSIONS 67

be applied (e.g., image classification on ImageNet [16]). The need for large labeled trainingsets puts several limitations on the value which can be provided by fully automated systems.This is especially true for early stage companies that are continuing to learn about the mostuseful services they can provide and cannot a↵ord the long design cycles and high resourcecosts required to develop large training sets.

The first type of hybrid solution is like that presented in Chapter 2 whereby a system likethe Max smartwatch system must be designed before data collection can begin. In this case,we found that unsupervised anomaly detection methods provide the most value. Once ananomaly is detected, a human observer can step in to determine whether or not the anomalyactually signifies a noteworthy event. The di�culty in this case is designing a feature spacewhereby anomalous events are likely to contain information which is interesting to a humanobserver. For instance, in the functional monitoring case, we want to flag events where thea↵ected individual is dramatically less active, but we do not want to flag every time shevisits a new GPS location. Hence, anomaly detection algorithms are performed on the totalEuclidean distance traveled in a given day, not on the raw GPS data itself.

The second type of hybrid solution is similar to the first but based on a semi-supervisedinstead of unsupervised approach. For instance, in Chapters 4 and 5, a system is discussedfor detecting falls automatically, but evaluation of what caused the fall is left to a humanobserver. Thus, a more simple problem is posed which can fit within the current limitationsof the computational methods, but reasoning about cause and e↵ect is left to a humanexpert. This approach appears particularly fruitful in situations such as the fall detectionproblem where substantial benefit could be provided by a human expert reviewing all of thedata, but without the use of computational detection, this review would not be practicallyfeasible due to the volume of data collected. Another example of this is the screening toolfor Alzheimer’s disease based on speech discussed in Chapter 3.

The final type of hybrid solution is like that posed at the end of Chapter 3 whereby ahighly accurate Alzheimer’s detection system could be developed by implementing a systemwhere human experts and trained computational models work side-by-side. In this type ofsystem, the human is able to detect features which can be di�cult for the computationalmodel to discover such as di�culty responding to subtle social cues and the model is able todetect features which are di�cult for the human to track such as changes in the frequencywith which the a↵ected individual uses the word ’you’ over time. Although untested here,it would be very interesting to build a joint model to classify a↵ected individuals based onthe features and weights chosen both by a human expert and trained computational model.

68

Bibliography

[1] National Institute on Aging: Alzheimer’s Disease Education and Referral Center. Alzheimer’sDisease Fact Sheet. Aug. 2016. url: https://www.nia.nih.gov/alzheimers/publication/alzheimers-disease-fact-sheet.

[2] Alzheimer’s-Association. “2016 Alzheimer’s Disease Facts and Figures”. In: Alzheimer’sand Dementia 11.3 (2016).

[3] J. A. Anguera et al. “A Consensus on the Brain Training Industry from the Sci-entific Community”. In: Stanford Center on Longevity (Oct. 2014). url: http://longevity3.stanford.edu/blog/2014/10/15/the-consensus-on-the-brain-

training-industry-from-the-scientific-community-2/.

[4] J. A. Anguera et al. “Video game training enhances cognitive control in older adults”.In: Nature 501 (Sept. 2013), pp. 97–101.

[5] Moataz El Ayadi, Mohamed S. Kamel, and Fakhri Karray. “Survey on speech emotionrecognition : Features, classification schemes, and databases”. In: Pattern Recognition(Sept. 2010), pp. 572–587.

[6] Fabio Bagala et al. “Evaluation of Accelerometer-Based Fall Detection Algorithms onReal-World Falls”. In: PLoS ONE 7.5 (2012).

[7] David A. Balota et al. “The English Lexicon Project. 39”. In: Behavior Research Meth-ods, 39 (2007), pp. 445–459.

[8] L. J. Bara↵ et al. “Practice guideline for the ED management of falls in community-dwelling elderly persons”. In: Annals of Emergency Medicine 30 (1992), pp. 480–492.

[9] Matthew Baumgart et al. “Summary of the evidence on modifiable risk factors forcognitive decline and dementia: A population-based perspective”. In: Alzheimer’s andDementia: The Journal of the Alzheimer’s Association 11.6 (June 2015), pp. 718–726.

[10] Pam Belluck. “Eli Lilly’s Experimental Alzheimer’s Drug Fails in Large Trial”. In: TheNew York Times (Nov. 2016). url: http://www.nytimes.com/2016/11/23/health/eli-lillys-experimental-alzheimers-drug-failed-in-large-trial.html.

[11] A. J. Blake, K. Morgan, and M. J. Bendall. “Falls by elderly persons at home: preva-lence and associated factors”. In: Age Ageing 17 (1988), pp. 365–372.

BIBLIOGRAPHY 69

[12] Paul Boersma et al. “Praat, a system for doing phonetics by computer”. In: Glotinternational 5.9/10 (2002), pp. 341–345.

[13] R.S. Bucks et al. “Analysis of spontaneous, conversational speech in dementia ofAlzheimer type: Evaluation of an objective technique for analysing lexical perfor-mance”. In: Aphasiology, vol. 14 (2000), pp. 2–35.

[14] Maria Burke. “Why Alzheimer’s Drugs Keep Failing”. In: Scientific American (July2014).

[15] Julie P. W. Bynum et al. “The Relationship Between a Dementia Diagnosis, ChronicIllness, Medicare Expenditures, and Hospital Use”. In: Journal of the American Geri-atrics Society 52.2 (2004), pp. 187–194.

[16] Jia Deng et al. “ImageNet: A large-scale hierarchical image database”. In: IEEE Con-ference on Computer Vision and Pattern Recognition. IEEE. 2009, pp. 248–255.

[17] Carol Van Doorn et al. “Dementia as a Risk Factor for Falls and Fall Injuries AmongNursing Home Residents”. In: Journal of the American Geriatrics Society 51.9 (2003),pp. 1213–1218.

[18] Florian Eyben et al. “Recent developments in opensmile, the munich open-source mul-timedia feature extractor”. In: Proceedings of the 21st ACM international conferenceon Multimedia. ACM. 2013, pp. 835–838.

[19] P. Feng et al. “Deep learning for posture analysis in fall detection”. In: 2014 19thInternational Conference on Digital Signal Processing. Aug. 2014, pp. 12–17. doi:10.1109/ICDSP.2014.6900806.

[20] Martin A. Fischler and Robert C. Bolles. “Random sample consensus: a paradigm formodel fitting with applications to image analysis and automated cartography”. In:Communications of the ACM 24.6 (June 1981), pp. 381–395.

[21] J. Fleming and C. Brayne. “EInability to Get up after Falling, Subsequent Time onFloor, and Summoning Help: Prospective Cohort Study in People over 90”. In: BMJ337.1 (Nov. 2008).

[22] Marshal F. Folstein, Susan E. Folstein, and Paul R. McHugh. “Mini-mental state:a practical method for grading the cognitive state of patients for the clinician”. In:Journal of psychiatric research 12.3 (1975), pp. 189–198.

[23] Kathleen C. Fraser, Frank Rudzicz, and Elizabeth Rochon. “Using text and acousticfeatures to diagnose progressive aphasia and its subtypes”. In: Proceedings of Inter-speech (2013).

[24] Kathleen C. Fraser et al. “Automated classification of primary progressive aphasiasubtypes from narrative speech transcripts”. In: Cortex (May 2014), pp. 43–60.

[25] Kathleen Fraser et al. “Automatic speech recognition in the diagnosis of primary pro-gressive aphasia”. In: Fourth Workshop on Speech and Language Processing for Assis-tive Technologies (2013), pp. 47–54.

BIBLIOGRAPHY 70

[26] Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. “A Neural Algorithm ofArtistic Style”. In: arXiv:1508.06576 [cs, q-bio] (Aug. 2015). arXiv: 1508.06576. url:http://arxiv.org/abs/1508.06576 (visited on 05/06/2016).

[27] Dimitra G. Georganopoulou et al. “Nanoparticle-based detection in cerebral spinalfluid of a soluble pathogenic biomarker for Alzheimer’s disease”. In: Proceedings of theNational Academy of Sciences 102.7 (2004), pp. 266–273.

[28] Ross Girshick. “Fast R-CNN”. In: arXiv:1504.08083 [cs] (Apr. 2015). arXiv: 1504.08083.url: http://arxiv.org/abs/1504.08083 (visited on 05/09/2016).

[29] Markus Goldstein and Seiichi Uchida. “A Comparative Evaluation of UnsupervisedAnomaly Detection Algorithms for Multivariate Data”. In: PLOS ONE (Apr. 2016).

[30] Google. Android Developer Guide. 2016. url: https://source.android.com.

[31] ML Gorno-Tempini et al. “Classification of primary progressive aphasia and its vari-ants”. In: Neurology 76.11 (2011), pp. 1006–1014.

[32] Curry Guinn and Anthony Habash. “Language Analysis of Speakers with Dementia ofthe Alzheimer’s Type”. In: Artificial Intelligence for Gerontechnology Technical Report(2012), pp. 8–13.

[33] Curry Guinn, Ben Singer, and Anthony Habash. “A Comparison of Syntax, Seman-tics, and Pragmatics in Spoken Language among Residents with Alzheimer’s Diseasein Managed-Care Facilities”. In: IEEE Symposium on Computational Intelligence inHealthcare and E-Health (Dec. 2014).

[34] P. J. Holliday et al. “Video recording of spontaneous falls of the elderly”. In: AmericanSocity for Testing and Materials (1990), pp. 7–16.

[35] Guang-Bin Huang, Qin-Yu Zhu, and Chee-Kheong Siew. “Extreme learning machine:Theory and applications”. In: Neurocomputing, 70 (May 2006), pp. 489–501.

[36] Stanislaw Jankowski et al. “Deep learning classifier for fall detection based on IR dis-tance sensor data”. In: IEEE, Sept. 2015, pp. 723–727. isbn: 978-1-4673-8359-2 978-1-4673-8361-5. doi: 10.1109/IDAACS.2015.7341398. url: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7341398 (visited on 05/09/2016).

[37] Yangqing Jia et al. “Ca↵e: Convolutional Architecture for Fast Feature Embedding”.In: arXiv preprint arXiv:1408.5093 (2014).

[38] Keith A. Johnson et al. “Brain Imaging in Alzheimer Disease”. In: Cold Spring HarbPerspect Med 2.4 (Apr. 2012).

[39] Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scientific toolsfor Python. 2001–. url: http://www.scipy.org/.

BIBLIOGRAPHY 71

[40] Yasuhiro Kakihara et al. “Acoustic Feature Selection Utilizing Multiple Kernel Learn-ing for Classification of Children with Autism Spectrum and Typically DevelopingChildren”. In: Proceedings of the 2013 IEEE/SICE International Symposium on SystemIntegration, Kobe International Conference Center, Kobe, Japan (Dec. 2013), pp. 490–494.

[41] Zaven S Khachaturian. “Revised criteria for diagnosis of Alzheimer’s disease: NationalInstitute on Aging-Alzheimer’s Association diagnostic guidelines for Alzheimer’s dis-ease”. In: Alzheimer’s & dementia: the journal of the Alzheimer’s Association 7.3(2011), pp. 253–256.

[42] Alex Krizhevsky, Ilya Sutskever, and Geo↵rey E. Hinton. “ImageNet Classificationwith Deep Convolutional Neural Networks”. In: Advances in Neural Information Pro-cessing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1097–1105. url: http://papers.nips.cc/paper/4824- imagenet- classification-with-deep-convolutional-neural-networks.pdf.

[43] Victor Kuperman, Hans Stadthagen-Gonzalez, and Marc Brysbaert. “Age of acquisi-tion ratings for 30,000 English words”. In: Behavorial Research (May 2012), pp. 978–990.

[44] Robert W Levenson and John M Gottman. “Marital interaction: physiological linkageand a↵ective exchange.” In: Journal of personality and social psychology 45.3 (1983),p. 587.

[45] Weifeng Liu, Puskal P. Pokharel, and Jose C. Principe. “The Kernel Least-Mean-Square Algorithm”. In: IEEE Transactions on Signal Processing IEEE Trans. SignalProcess. 56.2 (2008), pp. 543–554.

[46] Hong Lu et al. “Intelligent Human Fall Detection for Home Surveillance”. In: IEEE,Dec. 2014, pp. 672–676. isbn: 978-1-4799-7646-1. doi: 10.1109/UIC-ATC-ScalCom.2014.56. url: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7307023 (visited on 05/09/2016).

[47] Bruce L. Miller and Bradley F. Boeve. The Behavioral Neurology of Dementia. Cam-bridge University Press, 2009.

[48] David Neary et al. “Frontotemporal lobar degeneration A consensus on clinical diag-nostic criteria”. In: Neurology 51.6 (1998), pp. 1546–1554.

[49] Neural Artistic Style in Python. {https://github.com/andersbll/neural_artistic_style}. url: https://github.com/andersbll/neural_artistic_style (visited on05/10/2016).

[50] M. C. Nevitt et al. “Risk factors for recurrent nonsyncopal falls: a prospective study”.In: JAMA 261 (1989), pp. 1663–2668.

[51] Serguei V. S. Pakhomov et al. “Computerized Analysis of Speech and Language toIdentify Psycholinguistic Correlates of Frontotemporal Lobar Degeneration”. In: Cog-nitive and Behavioral Neurology (Sept. 2010), pp. 165–177.

BIBLIOGRAPHY 72

[52] Anindya S. Paul and Eric A. Wan. “RSSI-Based Indoor Localization and TrackingUsing Sigma-Point Kalman Smoothers”. In: IEEE Journal of Selected Topics in SignalProcessing 3.5 (Oct. 2009), pp. 543–554.

[53] F. Pedregosa et al. “Scikit-learn: Machine Learning in Python”. In: Journal of MachineLearning Research 12 (2011), pp. 2825–2830.

[54] F. Pedregosa et al. “Scikit-learn: Machine Learning in Python”. In: Journal of MachineLearning Research 12 (2011), pp. 2825–2830.

[55] Andrew Pollack. “New Data on 2 Alzheimer’s Drugs Alters Hope and Expectation”.In: The New York Times (July 2015). url: http://www.nytimes.com/2015/07/23/business/new-data-on-2-alzheimers-drugs-alters-hope-and-expectation.

html.

[56] Katherine L. Possin and Bruce L. Miller. Care Ecosystem: Navigating Patients andFamilies Through Stages of Care. 2016. url: https://www.nia.nih.gov/alzheimers/clinical-trials/care-ecosystem-navigating-patients-and-families-through-

stages-care.

[57] Peter S. Pressman and Gorno-Tempini. “Introduction and History of Primary Progres-sive Aphasia”. In: Neurobiology of Language (2015). Ed. by Greg Hickock and SteveSmall.

[58] Peter S Pressman and Bruce L Miller. “Diagnosis and management of behavioral vari-ant frontotemporal dementia”. In: Biological psychiatry 75.7 (2014), pp. 574–581.

[59] D. Prudham and J. G. Evans. “Factors associated with falls in the elderly: a communitystudy”. In: Age Ageing 10 (1981), pp. 141–146.

[60] George W. Rebok et al. “Ten-Year E↵ects of the Advanced Cognitive Training forIndependent and Vital Elderly Cognitive Training Trial on Cognition and EverydayFunctioning in Older Adults”. In: Journal of the American Geriatrics Society 62.1(Jan. 2014), pp. 16–24.

[61] Benjamin Recht. UC Berkeley Course Notes for EE227C: Convex Algorithms. 2015.

[62] Steven N. Robinovitch et al. “Video Capture of the Circumstances of Falls in ElderlyPeople Residing in Long-term Care: An Observational Study”. In: The Lancet 381.9860(2013), pp. 47–53.

[63] Laurence Z. Rubenstein et al. “The Value of Assessing Falls in an Elderly Population”.In: Annals of Internal Medicine 113.4 (1990), p. 308.

[64] Frank Rudzicz et al. “Automatically Identifying Trouble-indicating Speech Behaviorsin Alzheimer’s Disease”. In: Proceedings of the 16th International ACM SIGACCESSConference on Computers & Accessibility. ASSETS ’14. Rochester, New York, USA:ACM, 2014, pp. 241–242. isbn: 978-1-4503-2720-6. doi: 10.1145/2661334.2661382.url: http://doi.acm.org/10.1145/2661334.2661382.

BIBLIOGRAPHY 73

[65] Gerard Salton and Christopher Buckley. “Term-weighting approaches in automatictext retrieval”. In: Information processing & management 24.5 (1988), pp. 513–523.

[66] Bjorn Schuller and Anton Batliner. Computational paralinguistics: emotion, a↵ect andpersonality in speech and language processing. John Wiley & Sons, 2013. Chap. 8.

[67] Leslie M. Shaw et al. “Cerebrospinal fluid biomarker signature in Alzheimer’s diseaseneuroimaging initiative subjects”. In: Annals of Neurology 65.4 (Apr. 2009), pp. 403–413.

[68] Karen Simonyan and Andrew Zisserman. “Very Deep Convolutional Networks forLarge-Scale Image Recognition”. In: arXiv:1409.1556 [cs] (Sept. 2014). arXiv: 1409.1556.url: http://arxiv.org/abs/1409.1556 (visited on 05/06/2016).

[69] Audacity Team. Audacity (Version 2.0.5). 2014. url: http://audacity.sourceforge.net/.

[70] Calvin Thomas et al. “Automatic detection and rating of dementia of Alzheimer typethrough lexical analysis of spontaneous speech”. In: IEEE International ConferenceMechatronics and Automation, 2005. Vol. 3. IEEE. 2005, pp. 1569–1574.

[71] Mary E. Tinetti, Wen-Liang Liu, and Elizabeth B. Claus. “Predictors and Prognosisof Inability to Get Up After Falls Among Elderly Persons”. In: JAMA 261.1 (1993),p. 65.

[72] Mary E. Tinetti, M. Speechley, and S. F. Ginter. “Risk factors for falls among elderlypersons living in the community”. In: N Engl J Med 319 (1988), pp. 1701–1707.

[73] Mary E. Tinetti, T. Franklin Williams, and Raymond Mayewski. “Fall Risk Index forElderly Patients Based on Number of Chronic Disabilities”. In: The American Journalof Medicine 80.3 (1986), pp. 429–434.

[74] Eric Tzeng et al. “Simultaneous Deep Transfer Across Domains and Tasks”. In: IEEE,Dec. 2015, pp. 4068–4076. isbn: 978-1-4673-8391-2. doi: 10.1109/ICCV.2015.463.url: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7410820 (visited on 05/06/2016).

[75] Carl Vondrick, Donald Patterson, and Deva Ramanan. “E�ciently Scaling up Crowd-sourced Video Annotation”. In: International Journal of Computer Vision (). 10.1007/s11263-012-0564-1, pp. 1–21. issn: 0920-5691. url: http://dx.doi.org/10.1007/s11263-012-0564-1.

[76] Kevin Weekly et al. “Indoor Occupant Positioning System Using Active RFID Deploy-ment and Particle Filters”. In: DCOSS ’14 Proceedings of the 2014 IEEE InternationalConference on Distributed Computing in Sensor Systems. May 2014, pp. 35–42.

[77] D. Wild, U. S. Nayak, and B. Isaacs. “How Dangerous Are Falls in Old People atHome?” In: BMJ 282.6260 (1981), pp. 266–268.

BIBLIOGRAPHY 74

[78] Josh D Woolley et al. “The diagnostic challenge of psychiatric symptoms in neurode-generative disease: rates of and risk factors for prior psychiatric diagnosis in patientswith early neurodegenerative disease”. In: The Journal of clinical psychiatry 72.2(2011), pp. 126–133.

[79] R. Woolrych et al. “Exploring the potential of using real life video capture to investigatethe circumstances of falls among older adults in long-term care”. In: Gerontechnology13.2 (2014), pp. 132–133.

[80] M. Yu et al. “A Posture Recognition-Based Fall Detection System for Monitoring anElderly Person in a Smart Home Environment”. In: IEEE Transactions on InformationTechnology in Biomedicine 16.6 (Nov. 2012), pp. 1274–1286. issn: 1089-7771. doi:10.1109/TITB.2012.2214786.

[81] Anthony Zhang. Speech Recognition (Version 2.0) [Software]. 2015.

[82] Han Zou et al. “An RFID indoor positioning system by using weighted path loss andextreme learning machine”. In: Cyber-Physical Systems, Networks, and Applications(CPSNA), 2013 IEEE 1st International Conference on. Aug. 2013.

applications of machine learning to support dementia care ... · problems in dementia care. the...

Documents