wearable sensors for parkinson’s disease: which data are worth...

ARTICLE OPEN

Wearable sensors for Parkinson’s disease: which data areworth collecting for training symptom detection modelsLuca Lonini1,2, Andrew Dai 1,3, Nicholas Shawen1,4, Tanya Simuni5, Cynthia Poon5, Leo Shimanovich5, Margaret Daeschler6,Roozbeh Ghaffari7, John A. Rogers7,8 and Arun Jayaraman1,2,9

Machine learning algorithms that use data streams captured from soft wearable sensors have the potential to automatically detectPD symptoms and inform clinicians about the progression of disease. However, these algorithms must be trained with annotateddata from clinical experts who can recognize symptoms, and collecting such data are costly. Understanding how many sensors andhow much labeled data are required is key to successfully deploying these models outside of the clinic. Here we recordedmovement data using 6 flexible wearable sensors in 20 individuals with PD over the course of multiple clinical assessmentsconducted on 1 day and repeated 2 weeks later. Participants performed 13 common tasks, such as walking or typing, and a clinicianrated the severity of symptoms (bradykinesia and tremor). We then trained convolutional neural networks and statistical ensemblesto detect whether a segment of movement showed signs of bradykinesia or tremor based on data from tasks performed by otherindividuals. Our results show that a single wearable sensor on the back of the hand is sufficient for detecting bradykinesia andtremor in the upper extremities, whereas using sensors on both sides does not improve performance. Increasing the amount oftraining data by adding other individuals can lead to improved performance, but repeating assessments with the same individuals—even at different medication states—does not substantially improve detection across days. Our results suggest that PDsymptoms can be detected during a variety of activities and are best modeled by a dataset incorporating many individuals.

npj Digital Medicine (2018) 1:64 ; doi:10.1038/s41746-018-0071-z

INTRODUCTIONParkinson’s disease (PD) is a neurological movement disorder thataffects ~1% of people above 60 years of age in industrializedcountries.1,2 Cardinal motor symptoms of PD that are responsiveto levodopa therapy include tremors (particularly while at rest),rigidity, and slowness of movements (bradykinesia). These motorsymptoms gradually worsen, hindering daily living and negativelyimpacting quality of life.3,4 Dopaminergic medications are used toalleviate some of these symptoms, but their effects tend to wearoff more quickly as the disease progresses and affects widerdomains. Some individuals also experience frequent changes insymptoms (‘OFF/ON’ state) or involuntary movements (dyskinesia)as a medication side effect,5 which in turn, forces individualadjustments in dosing. Tracking how motor symptoms and theirresponse to medication change over time is crucial in quantifyingthe progression of PD for a given individual, in order to craftpersonalized treatment regimens.The current gold standard of care to evaluate PD symptoms is

through clinical examinations, whereby a trained clinician asks thepatient to perform a series of standardized motor tasks (e.g. hand

pronation-supination or heel tapping) while visually evaluating thequality of their movements and providing a symptom score. Themost common rating scale to perform such evaluation is theMovement Disorder Society Unified Parkinson Disease’s RatingScale (MDS-UPDRS).6 In addition, patients can be asked to keep adiary of their symptoms (e.g. Hauser diary,7 CAPSIT-PD,8 Parkin-son’s Symptom Diary9). These methods are, however, limited bypoor temporal resolution and low accuracy, respectively.8

Automatic evaluation of PD symptoms using wearable accel-erometers, inertial, and electromyography sensors has beenproposed as a way to overcome these limitations.10 Severalstudies have shown that it is possible to estimate MDS-UPDRSscores or detect motor symptoms using data from wearablesensors and machine learning models. Whereas signal processinghas been used to manually design-specific features to identifysymptoms,11 machine learning algorithms such as support vectormachines or neural networks are increasingly used because oftheir ability to learn a classification model from the data andgeneralize to unseen scenarios. These models can be trainedbased on movement data collected from several individuals, as

Received: 15 June 2018 Accepted: 2 November 2018

1Max Nader Lab for Rehabilitation Technologies and Outcomes Research, Shirley Ryan AbilityLab, Chicago, IL 60611, USA; 2Department of Physical Medicine and Rehabilitation,Northwestern University, Chicago, IL 60611, USA; 3Department of Biomedical Engineering, Northwestern University, Evanston, IL 60208, USA; 4Medical Scientist Training Program,Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA; 5Department of Neurology, Northwestern University, Chicago, IL 60611, USA; 6The Michael J. FoxFoundation for Parkinson’s Research, New York, NY 10163, USA; 7Center for Bio-Integrated Electronics, Departments of Materials Science and Engineering, BiomedicalEngineering, Chemistry, Mechanical Engineering, Electrical Engineering and Computer Science, Neurological Surgery, Simpson Querrey Institute for Nano/Biotechnology,McCormick School of Engineering, Feinberg School of Medicine, Northwestern University, Evanston, IL 60208, USA; 8Frederick Seitz Materials Research Laboratory, Department ofMaterials Science and Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA and 9Department of Physical Therapy and Human Movement Sciences,Northwestern University, Chicago, IL 60611, USACorrespondence: Arun Jayaraman ([email protected])These authors contributed equally: Luca Lonini, Andrew Dai, Nicholas Shawen.

www.nature.com/npjdigitalmed

Scripps Research Translational Institute

http://orcid.org/0000-0002-9783-5463

http://orcid.org/0000-0002-9783-5463

http://orcid.org/0000-0002-9783-5463

http://orcid.org/0000-0002-9783-5463

http://orcid.org/0000-0002-9783-5463

https://doi.org/10.1038/s41746-018-0071-z

mailto:[email protected]

www.nature.com/npjdigitalmed

they perform standardized tasks in the clinic, as well as groundtruth scores provided by clinicians.12–15 The data collectionprocedure often involves recording data from multiple subjects,wearing multiple sensors and undergoing repeated clinicalassessments over several hours, so as to collect enough data ontransitions between ‘OFF’ and ‘ON’ medication states. However,there is no consensus on the minimal data collection procedure,including number of sensors and sensor locations, or number ofrepeated clinical assessments and type of activities required todevelop an accurate model while maximizing patient compli-ance.16,17 This information is critical towards facilitating thetranslation of this approach to real world symptom detectionand monitoring in the clinic, as well as in the home andcommunity.18,19

One of the main challenges for real world deployment iscollecting sufficient labeled data to comprehensively model thevaried presentation of motor symptoms during daily activities. Inparticular, symptoms may manifest differently across individualsand activities, and may also vary within the same individual overtime.20–23 This renders deployment of symptom detection modelsin naturalistic environments challenging,24,25 and complicates theproblem of generalizing symptom detection across individuals.26

As such, it is still unclear how many individuals and sensors datashould be collected from, as well as whether data collected duringseveral medication states is needed to train a system that cangeneralize symptom detection across days.The objective of this study was to evaluate the relative value of

several methods for increasing the amount of training data whendeveloping machine learning models for PD symptom detectionduring activities representative of common daily tasks. Werecorded movement data from 6 body-conforming flexiblewearable sensors attached to the hands, arms, and thighs, andtrained a machine learning classifier to detect the presence oftremor or bradykinesia in upper extremities, as individuals with PDperformed a series of common daily activities and standard tasksused in clinical assessments. We trained statistical ensembles(random forest) and convolutional neural network (CNN) classifiersusing data from a single sensor or multiple sensors, and evaluatedthe contribution of having sensors on both sides of the body. Wethen evaluated the effect of number of participants used to train

the model on model performance. Finally, we evaluated whetherincorporating training data from multiple clinical assessmentsrather than a single assessment improved the detection of thesesymptoms.

RESULTSSensor data from 19 participants performing the repeated clinicalassessments (Fig. 1a) were segmented into 5-second clips, yieldinga total of 41,802 data clips. Demographic data of participants isreported in Table 1, whereas Table 2 contains the list of tasks usedfor the assessments. The mean proportion of task performancesshowing symptoms across participants were 48.5% (SD: 21.7%) forbradykinesia, 22% (SD: 24.4%) for tremor, and 8% (SD: 11.5%) fordyskinesias. The prevalence of tremor was substantially lower andmore variable than that of bradykinesia, with one participantshowing no manifestation of tremor at all. Because, the overallprevalence of dyskinesia in this dataset was low, and only 8individuals showed dyskinesia during more than one taskperformance, we did not develop models for detection ofdyskinesia. We trained separate random forest (RF) classifiers27

for the detection of each symptom (tremor or bradykinesia) from aset of features computed on the sensor data (Table 3). In addition,we compared the symptom detection models based on RFclassifiers to those based on deep convolutional neural networks(CNNs). Details on the implementation are described in theMethods section.

Effect of adding sensorsRandom forest models trained on hand data yielded the highestmean AUROC for both the detection of bradykinesia (0.73, 95% CI:0.68–0.77) and tremor (0.79, 95% CI: 0.74–0.84) across all activities(Fig. 2a). Using data from sensors on both hands to detectsymptoms on the impaired side did not significantly improveperformance (bradykinesia: 0.73, CI: 0.69–0.78; paired t-test, t=−0.74, p= 0.47; tremor: 0.79, CI: 0.74–0.84; paired t-test, t= 0.35,p= 0.73). Similarly, using data from hand, arm, and thigh sensorstogether did not confer a significant advantage relative to usingone hand sensor only for bradykinesia (0.73, 95% CI: 0.68–0.77;

Fig. 1 Data collection and sensor setup. a Individuals with PD underwent multiple clinical assessments spaced by 30min during a first visit(day 1); assessments were done before and after each participant took their PD medication. A single follow-up assessment was performedabout 2 weeks later during a second visit (day 2). During each assessment, participants performed a series of daily activities and standardizedclinical tasks. b Overview of the MC10 BioStampRC senor; c Position of sensors on the body and sensor modalities; sensors were placed onboth sides, although only one side is shown for clarity. Data were recorded from the accelerometer (acc) and gyroscope (gyro) sensors, or fromthe accelerometer and electromyography (EMG) sensor. Data from the EMG sensor was not used in the current study

Wearable sensors for Parkinson’s disease: which datayL. Lonini et al.

2

npj Digital Medicine (2018) 64 Scripps Research Translational Institute

1234567890():,;

paired t-test, t= 0.54, p= 0.59) or for tremor (0.79, 95% CI:0.72–0.86; t= 0.19, p= 0.85).Using a CNN-based approach showed a similar result (Fig. 2b

bradykinesia—single hand: 0.69, CI: 0.63–0.75; hands bilateral:0.68, CI: 0.63–0.73, p= 0.72; tremor—single hand: 0.78, CI:0.73–0.82; hands bilateral: 0.70, CI: 0.62–0.77, p= 0.06); using datafrom all sensors significantly degraded model performance(bradykinesia: 0.60, CI: 0.54–0.66, p= 0.014; tremor: 0.60, CI:0.50–0.69, p= 0.001), suggesting overfitting of the data. Therefore,detection of symptoms in the upper extremities across allactivities was best achieved using one sensor placed on the hand

with the dominant symptoms. All subsequent analyses use onlydata from such hand sensor.

Symptom detection across activitiesRandom forest models were able to detect bradykinesia andtremor during both clinical structured activities (finger to nose andalternating hand movements), and activities of daily living (Fig. 3).Bradykinesia detection during fine motor tasks and walkingyielded the highest mean AUROC across participants (walking:0.77, CI: 0.67–0.87; fine motor: 0.76, CI: 0.70–0.81); detectionaccuracy was not significantly different from that achieved onclinical structured tasks (0.73, CI: 0.66–0.80, p>= 0.50). In contrast,detection during gross motor tasks had the lowest AUROC andwas significantly lower compared to that on clinical tasks (0.62; CI:0.54–0.70, p= 0.04). Tremor was most accurately detected fromclinical tasks (0.79; CI: 0.71–0.88), though mean AUROC here was

Table 1. Participants’ demographics and associated clinical data

Participant ID Sex Age Onsetyear

Diagnosisyear

Fluctuator (Y/N)

Side predominantlyaffected at firstassessment

MDS part III—day 1,time 0

MDS part III—day 1,time 60

MDS partIII—day 2

Daysbetweenvisits

1004 M 52 2011 2013 Y Bilateral 31 30 14 11

1016 F 66 2016 2016 N Bilateral 19 21 32 14

1018 M 58 2012 2015 N Left 18 13 14 18

1019 F 36 2015 2015 N Left 36 14 10 19

1020 F 58 2005 2005 N Right 24 21 24 14

1024 M 70 2000 2000 Y Left 42 18 19 22

1029 M 74 2009 2010 N Left 42 32 21 13

1030 M 68 2010 2010 N Left 18 NA 22 20

1032 M 70 2012 2012 N Bilateral 28 12 26 16

1038 M 72 2007 2007 N Right 30 25 20 69

1044 M 59 2013 2014 Y Left 29 24 24 13

1046 F 69 2012 2014 N Right 21 18 18 14

1047 M 52 2009 2010 Y Right 18 9 6 20

1049 F 54 2006 2008 Y Left NA 24 24 13

1051 M 62 2013 2015 N Left 14 6 11 49

1052 M 69 2007 2008 Y Bilateral 31 15 NA NA

1053 F 66 2014 2014 Y Left 25 15 NA NA

1054 F 65 2000 2002 Y Right 44 15 NA NA

1055 M 75 2006 2009 Y Right 36 26 NA NA

1056 M 72 2005 2006 Y Left 46 61 NA NA

Table 2. Tasks performed by participants during the visits for theassessment of PD symptoms

Task Symptom detected Type of task

Walking Bradykinesia/Tremor Functional

Walking while counting Bradykinesia/Tremor Functional

Finger to nose Bradykinesia/Tremor Clinical

Alternating hand movements Bradykinesia/Tremor Clinical

Sit to stand Bradykinesia Functional

Sitting Tremor Functional

Standing Tremor Functional

Drawing on paper Bradykinesia/Tremor Fine motor

Typing on a computer keyboard Bradykinesia/Tremor Fine motor

Nuts and bolts Bradykinesia/Tremor Fine motor

Pouring water from a bottle anddrinking

Bradykinesia/Tremor Gross motor

Organizing a set of folders Bradykinesia/Tremor Gross motor

Folding towels Bradykinesia/Tremor Gross motor

Tasks were divided into functional, fine motor, and gross motor groups

Table 3. Features computed on both the accelerometer andgyroscope data to train the symptom detection classifier

Feature Feature dimension

Range (X,Y,Z) 3

Skew (X,Y,Z) 3

Kurtosis (X,Y,Z) 3

Cross-correlation peak (XY,XZ,YZ) 3

Cross-correlation lag (XY,XZ,YZ) 3

Dominant frequency (acceleration magnitude) 1

Relative magnitude 1

Moments of power spectral density (accelerationmagnitude)

4

Moments of Jerk magnitude 4

Sample entropy (X,Y,Z) 3


3

Scripps Research Translational Institute npj Digital Medicine (2018) 64

not significantly higher than AUROC for walking (0.70; CI:0.53–0.88; p= 0.25) or fine motor tasks (0.72; CI: 0.62–0.81, p=0.23). Again, detection during the execution of gross motor taskswas significantly worse than during clinical tasks (mean AUROC:0.56; 95% CI: 0.49–0.63; p < 0.001).Overall, AUROC values for CNN models for detection of

bradykinesia and tremor were comparable to those of randomforest models. Detection performance was highly variable acrosssubjects, and as such there was no significant difference in meanAUROC values (p>= 0.26) (see Supplementary Material). Interest-ingly, detection of bradykinesia from walking achieved betterresults than for all other tasks (p= 0.02). This suggests that it ispossible to detect symptoms from some daily activities andstructured clinical assessments with comparable accuracies.

Effect of number of training subjectsTo evaluate the effect on accuracy of number of subjects in thetraining dataset, we trained random forest models on data froman increasing number of participants (from 3 to 18). At eachiteration, a model was trained on a random subset of participantsand tested on the left-out ones.Models for tremor detection showed a noticeable improvement

with added training subjects (Fig. 4). The mean AUC whenevaluating data from a new subject improved from 0.76 (CI:0.74–0.78) for a training set using data from only three individualsto 0.80 (CI: 0.79–0.82, p < 0.001) for a training set using data from10 individuals. However, improvement seemed to plateau,showing limited improvement beyond that point. Similarly, forbradykinesia detection, increased training pool size also correlatedwith increases in expected AUC (three subjects: 0.68, CI: 0.66–0.70;12 subjects, 0.76, CI: 0.74–0.77, p < 0.001), and plateaued around asimilar number of training subjects.

Due to the large number of trainable parameters, CNNs requirelarger datasets than random forest models to prevent overfitting,and therefore were not included in this analysis.

Effect of number of training sessionsWe analyzed whether training models on data from multipleclinical assessments (sessions) performed on day 1 improved theirperformance at detecting symptoms on the same day or on adifferent day (day 2). Random forest models for detection ofbradykinesia benefited from adding training data collected overmultiple sessions, relative to models trained on a single session,when tested on data from the same day (Fig. 5). However, theimprovement was modest (mean AUROC change: 0.02; p < 0.001),and was not significant when models were tested on data from asubsequent visit (mean AUROC change population: 0.02, p=0.053).A similar pattern was observed for detection of tremor (day 1:

mean AUROC change population: 0.03, p < 0.001), although asmall but significant improvement was observed also on day 2when multiple sessions data were added (day 2: mean AUROCchange population 2: 0.03, p < 0.001). As above, CNNs were notincluded in this analysis as they tend to underperform with smallerdatasets. Therefore, collecting data from repeated assessmentsdoes not substantially help generalization across days.

DISCUSSIONThis study aimed to investigate how to efficiently collect wearablesensor data for detection of bradykinesia and tremor in the upperextremities. To that end, we evaluated the effects on modelperformance of number and location of wearable sensors used,types of tasks performed by participants, and number of data

A

B

Fig. 2 The effect of sensor location and number of sensors on model performance. AUROC curves for detection of bradykinesia and tremorusing random forest population models a when trained on data from either a single hand (with dominant symptoms), both hands (Hand_Bi),or a combination of hand, forearm and thigh sensors unilaterally (Combo). Corresponding AUROC curves from CNN population models b fordetection of bradykinesia and tremor. Using a combination of sensors did not yield any advantage to only using hand sensors. Solid linesindicate mean AUROC and shaded areas represent 95% confidence intervals


4


collection sessions performed. In addition, we evaluated howmany individuals are needed to reach optimal model perfor-mance. We used random forest classifiers, based on pre-engineered features, and compared them to convolutional neuralnetworks trained on raw sensors data.Overall, CNN-based models showed comparable mean AUROC

values to those of classifiers based on random forest. This suggeststhat the networks can learn features for detection of PD symptomsdirectly from the raw sensor data, without the need of engineeringa feature set. Although CNNs have the potential to learnrepresentations of the data that can distinguish between differentactivities or contexts (i.e. walking vs. gross vs. fine motor tasks), wedid not observe any substantial improvement relative to randomforest models for the detection of symptoms. Future work shouldexplore alternative architectures for the CNN, including LSTMmodels28 to see whether any improvement in generalization canbe obtained.Another novelty of our study was the use of soft wearable

sensors,29 which adhere to the skin and are able to conform to itsdeformation during movements. This allows greater flexibility insensor placement for collecting data from any part of the body,and with minimal burden for the wearer. Our results suggest thata single motion sensing device placed on the hand is sufficient todetect symptoms of tremor or bradykinesia in the upperextremities. This is in agreement with a previous study evaluatingthe minimum number of traditional wearables to estimatebradykinesia.16

Combining data from sensors on both sides of the body didnot aid detection on the impaired side, for bradykinesia andtremor. Other studies have evaluated the contribution of usingnon-movement sensing modalities, such as EMG, though it isunclear how necessary these modalities are for symptomdetection.30 It is also possible that the optimal combination ofsensor locations and modalities may be dependent on the typesof tasks that are analyzed, but this study was focused on theevaluation of upper limb symptoms. It remains to be seenwhether soft sensors provide an advantage over traditionalwearables in terms of improved symptom detection during dailyactivities.We also found that fine motor tasks and walking are suitable

activities for automated evaluation of PD symptoms in the upperlimbs using a single-hand sensor. Detection of bradykinesia andtremor from fine motor tasks or walking had comparable accuracyto detection from tasks used for clinical evaluation of thesesymptoms (e.g. finger to nose movements). Symptom detectionduring gait has previously been focused primarily on freezing ofgait, rather than bradykinesia or tremor.11,31–34 Other studies haveshown that motor symptoms can be quantified by promptingusers to perform gait and finger tapping tests at home.35,36 Our

results suggest that fine motor tasks and walking may also beuseful targets for symptom monitoring during naturalisticbehaviors at home.We observed that increasing the number of subjects in the

training dataset improved model performance, even if accuracytended to decrease when the model was trained on data frommore than 10 individuals. This may be partially due to the highvariability in manifestation of PD symptoms, particularly bradyki-nesia, across individuals. Therefore, a smaller amount of appro-priately targeted data may lead to more accurate models than alarger pool of more general data. In the field of activityrecognition, some studies have proposed methods for developing“semi-personalized” models.37 The best method for doing so inthe context of symptom detection and scoring in Parkinson’sdisease should be explored in future studies.Collecting data from repeated assessments significantly

improved symptom detection, although to a limited extent.Incorporating data from ‘ON’ and ‘OFF’ medication states, as wellas transition states may aid symptom detection, but this effectseems to be attenuated in population models. Interestingly,bradykinesia detection on a separate visit (day 2) was notimproved by increasing the number of data collection sessionsused for training. The difference in model performance could beexplained by changes in the behavior of participants,38 or in themedication state of participants at the subsequent visit. Sensorswere removed between visits and replaced at the second visit,which may also explain this observation. Therefore, generalizingsymptom detection to a different day does not seem to be aidedby the use of data from additional assessments, even whenaltering the medication state.The cost-benefit ratio of training data collection needs to be

considered when developing symptom detection models for PD,especially given the need for a trained clinician to evaluate thepresence and severity of symptoms. It may be important to target-specific types of activity, as symptom detection is improvedduring fine motor tasks and walking relative to gross upperextremity activity. Our findings also indicate that increasing thenumber of body sensor devices, the number of subjects whosedata is used for training models, or the number of data collectionsessions per individual does not necessarily translate into animproved accuracy for detection of bradykinesia and tremor.Unique strategies for leveraging the value of extremely largedatasets (deep learning), or for personalizing detection models(transfer learning) are likely to be important directions forcontinued improvement of accuracy in motor symptom detectionoutside of the clinic.

Fig. 3 Symptom detection across activities. AUROC curves for bradykinesia and tremor detection using population models (random forest),split by groups of activities. Symptoms were detected equally well during both clinical structured tasks (e.g. finger to nose movements) andmost of daily functional activities (walking and fine motor tasks, e.g. typing on a keyboard). The lowest AUROC was during the execution ofgross motor tasks


5


LimitationsBecause of the limited number of individuals in this study whoexhibited symptoms of dyskinesia, this study did not examinemodels for dyskinesia detection. The low number of dyskinesiaevents may be in part associated to restricting the monitoring toupper extremity symptoms only. Dyskinesia is a significant sideeffect of levodopa use, and is another important target ofsymptom monitoring in patients with PD. A study with a largernumber of individuals may be necessary to capture sufficientinstances of dyskinesia and to perform a similar analysis fordetection of this important symptom.Our study is limited in exploring longitudinal model perfor-

mance due to having available data for only two distinct visits.Additionally, not all participants were able to complete the secondvisit, leading to reduced statistical power in that analysis. Also,data used here were collected in a supervised clinical environ-ment, and thus is not perfectly representative of real-worldbehaviors. However, the tasks used here were intended toapproximate some typical daily activities that patients mightperform. Accelerometer data collected from a smartwatch duringan extended period of time outside the clinic, as performed in theparent CIS-PD study, may be able to address this limitation andwill be explored in a future analysis.Another limitation of our study was the fact that we only had

one rater for scoring the standardized motor assessments.Therefore, we could not estimate the inter-rater variability, andthus the uncertainty of the target scores provided to the model.Furthermore, whereas our aim was to evaluate the relativecontribution of different data sources to the accuracy of detectionmodels, our dataset was still limited in size to draw generalconclusions. Future studies should examine these aspects with alarger and broader dataset in terms of subjects and monitoringperiods.

METHODSStudy designTwenty individuals (13 males; mean age: 63.35 ± 9.63 y) diagnosed with PDand a Hoehn and Yahr stage of 2 provided written consent andparticipated in the study, which took place at Northwestern MemorialHospital (Chicago, IL, USA) and was approved by the NorthwesternUniversity Institutional Review Board (IRB #: STU00203796). Ten partici-pants were considered as “motor fluctuators”, meaning that they were‘OFF’ for an average of 2 h daily, as determined by the clinician interviewand confirmed by the modified Hauser diary; the remaining 10 participantstended to experience stable symptoms before and after taking medica-tions (Table 1). One participant had a deep brain stimulation (DBS) deviceimplanted. This analysis was a sub-study (Wireless Adhesive Sensor Sub-Study) of a larger multi-center study sponsored by the Michael J. FoxFoundation and named “Clinician Input Study on Parkinson Disease” (CIS-

PD), which aimed at developing objective biomarkers to monitorParkinson’s symptoms based on wearable sensors data.In order to capture changes in symptom severity over time the study

included two sets of experimental visits (day 1 and day 2), performed inthe clinic on 2 different days, spaced about 2–3 weeks apart. The day 1 visitconsisted of 6 repeated assessments (sessions), spaced by at least 30 minapart (Fig. 1a). Participants were instructed to come to the clinic withouttaking any PD medication for at least 12 h prior to the visit, so as to startthe assessment in the ‘OFF’ medication state. After completing the firstsession, they were instructed to take their PD medications, in order toassess their symptoms as they transition into the ‘ON’ medication state.During each session, participants performed 13 different motor tasks(standardized motor assessments, Table 2), some of which resembled dailyactivities with others being standard clinical tasks used to elicit symptoms,e.g. finger to nose and alternating hand movements. A trained clinicianrated the severity of tremor, bradykinesia and dyskinesia in the upperextremities for the left and right side on a scale from 0 to 4 during theexecution of each task (Table 2). The day 2 visit consisted of a singlesession and was meant to introduce longitudinal variability andcharacterize participants’ symptoms on a completely different day andduring their regular medication schedule. Only the first 15 participantsperformed the day 2 assessment. In addition to the aforementionedactivities, an MDS-UPDRS (Part III) motor assessment was also performedduring session 1 and 3 (time 0 and 60minutes) on day 1, and during thevisit on day 2. The total MDS-UPDRS score for each subject are reported inTable 1. Sensor data from one of the participants (1020) were not used asnot all of their clinical assessments were rated by the trained neurologist.

Sensor setupStudy participants were instrumented with 6 BioStampRC sensors (MC10Inc., Lexington, MA, USA), which are flexible wearable sensors that can beattached directly to the skin and include a tri-axial accelerometer, agyroscope, and a single lead (2 electrodes) analog voltage sensor (Fig. 1b).A single BioStampRC can record data from up to 2 sensors simultaneously.Here, sensors were placed on the dorsal part of each hand, on each thighproximal of the femur epicondyle, and on each forearm on top of theflexor carpi radialis (Fig. 1c). All sensors were set to record data from the tri-axial accelerometer (sampling rate: 62.5 Hz; range: ±4G) and from thegyroscope (sampling rate: 62.5 Hz; range:±1000 °/s), except for the forearmsensors which recorded the acceleration and the electromyography (EMG)signal from the left and right flexor carpi radialis muscles (1000 Hz). Datafrom each sensor were initially stored on the local memory (32 MB), thentransmitted to a tablet via Bluetooth, and finally uploaded to the MC10cloud server. EMG data were not used in the current analysis.

Data analysis and preprocessingData processing and analysis was performed using Python 3.5. Sensor datafor each participant was downloaded and organized according to side andsensor modality, as well as activity and session based on the timestampinformation recorded during data collection using the BioStampRC app.Data from sensors on the left (right) side was matched with clinical scoresfor bradykinesia, tremor and dyskinesia for the corresponding side. Allsensor data (accelerometer and gyroscope) were segmented into 5-second

Fig. 4 Comparing performance of population models with different number of individuals in the training pool. Plots showing the trend inexpected AUROC for bradykinesia and tremor detection when testing on a new individual, when using population models with varyingnumbers of individuals in the training pool. Shaded areas mark 95% confidence interval on the mean


6


clips with 50% overlap, and data from each clip was filtered using a 4thorder Butterworth filter. The length of the time window was chosenaccording to previous work12,13 and considering the average time scale ofactivities (e.g. opening a water bottle, pouring, and drinking), Acceler-ometer data were filtered with a high-pass filter (0.5 Hz) to remove theeffect of limb orientation; in addition, a low-pass filter with cutoff at 3 Hzwas applied to the accelerometer and gyroscope data for bradykinesiadetection. Such a choice has been found to help detection of symptoms inprevious studies.12 Accelerometer and gyroscope data from each clip werematched with their corresponding metadata (patient ID, side, activity,session, clinical score). A total of 41,802 clips were generated from thisprocess. Out of these, 41% had bradykinesia and 22% tremor. There werenot enough examples of dyskinesia (8%), and therefore these data werenot used to train a dyskinesia detection model.

Symptom detection modelsWe trained 2 random forest (RF) classifiers:27 one for detecting whether adata clip had bradykinesia, and the other for tremor. Therefore, clinicalscores >0 for each clip were assigned a value of 1 (symptom present). Eachclassifier inputted a vector of 56 features (Table 3), which were calculatedon both the accelerometer and gyroscope data from each clip. Thesefeatures capture movement properties related to speed and frequency andwere successfully used in prior PD studies.12 We used RF because of thelow number of hyperparameters to be tuned and its ability to reduceoverfitting. The number of trees was set to 50 and was optimized using theout-of-bag error method.In addition, we trained two classification models based on deep

convolutional neural networks (CNN). CNN-based models can perform end-to-end learning of symptom detection from the raw sensors data, andtherefore does not need the development of a set of features to encode adata clip. Features representing the relationship between sensor data andsymptoms are learned from the raw data and encoded in the weightsacross the network layers. CNNs have been used in a variety of tasks, suchas image classification, speech recognition, and natural languageprocessing (NLP), where they tend to outperform traditional approachesbased on feature engineering.39 Deep networks have also been used toclassify wearable sensor data for activity recognition28 and recentlysymptom detection in Parkinson’s14,31, where they proved to be at least aseffective as traditional methods based on feature engineering.39 Each CNN(Fig S1) consisted of the following layers: 2 convolutional layers (kernelsizes= 32 and 16 samples) with 16 and 32 rectified linear units (ReLU),respectively, each followed by a max-pooling layer (pool-size= 4 and 6units). The last two layers are 2 dense layers with 32 neurons each, alsowith ReLU activation functions. Dense layers used dropout,40 such that afixed proportion (0.5) of units were ‘shut down’ during training to reduceoverfitting. The output layer used a softmax function for the classification,with as many neurons as classes (2), which output the probability of theinput clip showing a symptom (bradykinesia/tremor). The total number of

trainable parameters for each model was 24,722. Further details on theCNN models are provided in supplementary material.We trained population-based models, i.e, models trained on data from

other subjects to detect bradykinesia and tremor in a new subject. Areaunder the receiver operating characteristic curve (AUROC) was used toassess model performance. Mean AUROC across subjects was computedusing a leave-one-subject-out cross validation for population models.Statistical comparisons were performed using t-test for either paired orindependent samples.

Code availabilityThe code used to process and analyze the findings of this publication areavailable in a GitHub repository, [https://github.com/Luke3D/CIS_PD-NDM/].

DATA AVAILABILITYThe dataset used to support the findings of this publication are available from theMichael J. Fox Foundation but restrictions apply to the availability of these data,which were used under license for this study. The Michael J. Fox Foundation plans torelease the dataset used in this publication alongside a significant, additional portionof related PD data from a separate smartwatch as part of a community analysis in thelarger CIS-PD study timeline. Data are, however, available from the authors uponreasonable request and with permission from the Michael J. Fox Foundation.

ACKNOWLEDGEMENTSWe thank the Division of Movement Disorders in the Department of Neurology atNorthwestern Memorial Hospital and the Feinberg School of Medicine at North-western University along with Carolyn Taylor, NP for conducting clinical patientstudies and data collection and providing support for this study. We also thank theMichael J. Fox Foundation for Parkinson’s Research for their support and forproviding study funding through the Clinician Input Study (CIS-PD) Wireless AdhesiveSensor Sub-Study (Grant No. 14702). A.D. acknowledges support from theDepartment of Biomedical Engineering at Northwestern University. N.S. acknowl-edges support from an NIH grant to the Medical Scientist Training Program atNorthwestern University (Grant No. T32GM008152). R.G. and J.A.R. acknowledgesupport from the Center for Bio-Integrated Electronics at Northwestern University.Funding for this study was provided by the Michael J. Fox Foundation for Parkinson’sResearch under the Clinician Input Study (CIS-PD) Wireless Adhesive Sensor Sub-Study (Grant No. 14702). This research was also partially funded by an NIH traininggrant to the Medical Scientist Training Program at Northwestern University (Grant No.T32GM008152).

Fig. 5 The effect of using training data from multiple assessments (session) on model performance. Boxplots show the distribution of AUROCacross participants for population models trained on data from day 1 and tested on either day 1 or day 2. Adding data from multiple sessionsimproves the AUROC for bradykinesia detection on day 1. However, no significant changes in AUROC occur when models are tested on datafrom a different day (day 2). A similar pattern is observed for detection of tremor, although a modest improvement was observed on day 2


7


https://github.com/Luke3D/CIS_PD--NDM/

AUTHOR CONTRIBUTIONSConception, design, and study direction: L.L., A.D., N.S., T.S., D.D., R.G., J.A.R., and A.J.Clinical studies: L.L., A.D., N.S., T.S., C.P., and L.S. Data analysis: L.L., A.D., N.S., R.G., J.A.R.,and A.J. Manuscript writing: L.L., A.D., N.S., T.S., C.P., L.S., D.D., R.G., J.A.R., and A.J.

ADDITIONAL INFORMATIONSupplementary information accompanies the paper on the npj Digital Medicinewebsite (https://doi.org/10.1038/s41746-018-0071-z).

Competing interests: J.A.R. and R.G. both hold equity in the company MC10 thatmakes wearable devices for medical applications. The remaining authors declare nocompeting interests.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claimsin published maps and institutional affiliations.

REFERENCES1. Tysnes, O. B. & Storstein, A. Epidemiology of Parkinson’s disease. J. Neural Transm.

124, 901–905 (2017).2. Pringsheim, T., Jette, N., Frolkis, A. & Steeves, T. D. L. The prevalence of Parkinson’s

disease: a systematic review and meta-analysis. Mov. Disord. 29, 1583–1590(2014).

3. Jankovic, J. Parkinson’s disease: clinical features and diagnosis. J. Neurol. Neuro-surg. 79, 368 LP–368376 (2008).

4. Antony, P. M. A., Diederich, N. J., Krüger, R. & Balling, R. The hallmarks of Par-kinson’s disease. FEBS J. 280, 5981–5993 (2013).

5. Bastide, M. F. et al. Pathophysiology of L-dopa-induced motor and non-motorcomplications in Parkinson’s disease. Prog. Neurobiol. 132, 96–168 (2015).

6. The Unified Parkinson’s Disease Rating Scale (UPDRS. Status and recommenda-tions. Mov. Disord. 18, 738–750 (2003).

7. Hauser, R. A. et al. A home diary to assess functional status in patients withParkinson’s disease with motor fluctuations and dyskinesia. Clin. Neuropharmacol.23, 75–81 (2000).

8. Reimer, J., Grabowski, M., Lindvall, O. & Hagell, P. Use and interpretation of on/offdiaries in Parkinson’s disease. J. Neurol. Neurosurg. Psychiatry 75, 396–400 (2004).

9. MONTGOMERY, G. K. & REYNOLDS, N. C. Compliance, reliability, and validity ofself-monitoring for physical disturbances of Parkinson’s disease: the Parkinson’ssymptom diary. J. Nerv. Ment. Dis. 178, 636–641 (1990).

10. Maetzler, W., Domingos, J., Srulijes, K., Ferreira, J. J. & Bloem, B. R. Quantitativewearable sensors for objective assessment of Parkinson’s disease. Mov. Disord. 28,1628–1637 (2013).

11. Moore, S., MacDougall, H. & Ondo, W. Ambulatory monitoring of freezing of gaitin Parkinson’s disease. J. Neurosci. Methods 167, 340–348 (2008).

12. Patel, S. et al. Monitoring motor fluctuations in patients with Parkinson’s diseaseusing wearable sensors. IEEE Trans. Inf. Technol. Biomed. 13, 864–873 (2009).

13. Klucken, J. et al. Unbiased and mobile gait analysis detects motor impairment inParkinson’s disease. PLoS ONE 8, e56956 (2013).

14. Eskofier, B. et al. Recent machine learning advancements in sensor-basedmobility analysis: deep learning for Parkinson’s disease assessment. 38th AnnualInternational Conference of the IEEE Engineering in Medicine and Biology Society.submitted (2016). https://doi.org/10.1109/EMBC.2016.7590787

15. Hoff, J. I., Plas, A. A., Wagemans, E. A. H. & van Hilten, J. J. Accelerometricassessment of levodopa-induced dyskinesias in Parkinson’s disease. Mov. Disord.16, 58–61 (2001).

16. Daneault, J. F. et al. Estimating Bradykinesia in Parkinson’ s Disease with aMinimum Number of Wearable Sensors. in 2017 IEEE/ACM International Con-ference on Connected Health: Applications, Systems and Engineering Technologies(CHASE) 5–6 (2017). https://doi.org/10.1109/CHASE.2017.94

17. Odin, P. et al. Viewpoint and practical recommendations from a movementdisorder specialist panel on objective measurement in the clinical managementof Parkinson’s disease. npj Park. Dis. 4, 14 (2018).

18. Espay, A. J. et al. Technology in Parkinson’s disease: challenges and opportunities.Mov. Disord. 31, 1272–1282 (2016).

19. Del Din, S., Godfrey, A., Mazzà, C., Lord, S. & Rochester, L. Free-living monitoring ofParkinson’s disease: lessons from the field. Mov. Disord. 31, 1293–1313 (2016).

20. Marras, C. Subtypes of Parkinson’s disease: state of the field and future directions.Curr. Opin. Neurol. 28, 382–386 (2015).

21. Thenganatt, M. A. & Jankovic, J. Parkinson disease subtypes. JAMA Neurol. 71,499–504 (2014).

22. Wickremaratchi, M. M. et al. The motor phenotype of Parkinson’s disease inrelation to age at onset. Mov. Disord. 26, 457–463 (2011).

23. Chou, K. L. et al. The spectrum of ‘off’ in Parkinson’s disease: What have welearned over 40 years? Parkinsonism Relat. Disord. (2018). https://doi.org/10.1016/j.parkreldis.2018.02.001

24. Hammerla, N. Y. et al. PD disease state assessment in naturalistic environmentsusing deep learning. in Proc. Twenty-Ninth AAAI Conference on Artificial Intelligenceand Twenty-Seventh Innovative Applications of Artifical Intelligence Conference.1742–1748 (AAAI Press, Cambridge, 2015).

25. Fisher, J. M. et al. Unsupervised home monitoring of Parkinson’s disease motorsymptoms using body-worn accelerometers. Park. Relat. Disord. 33, 44–50 (2016).

26. Jankovic, J. Motor fluctuations and dyskinesias in Parkinson’s disease: clinicalmanifestations. Mov. Disord. 20, S11–S16 (2005).

27. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).28. Guan, Y. & Ploetz, T. Ensembles of deep LSTM learners for activity recognition

using wearables. Proc. ACM Interactive, Mobile, Wearable Ubiquitous Technol.https://doi.org/10.1145/3090076. (2017).

29. Liu, Y. et al. Epidermal mechano-acoustic sensing electronics for cardiovasculardiagnostics and human-machine interfaces. Sci. Adv. 2, e1601185–e1601185(2016).

30. Roy, S. H. et al. High-resolution tracking of motor disorders in Parkinson’s diseaseduring unconstrained activity. Mov. Disord. 28, 1080–1087 (2013).

31. Camps, J. et al. Deep learning for freezing of gait detection in Parkinson’s diseasepatients in their homes using a waist-worn inertial measurement unit. Knowl.Based Syst. 139, 119–131 (2018).

32. Rodríguez-Martín, D. et al. Home detection of freezing of gait using SupportVector Machines through a single waist-worn triaxial accelerometer. PLoS ONE 12,e0171764 (2017).

33. Silva de Lima, A. L. et al. Freezing of gait and fall detection in Parkinson’s diseaseusing wearable sensors: a systematic review. J. Neurol. 264, 1642–1654 (2017).

34. Pham, T. & Moore, S. Freezing of gait detection in Parkinson’s disease: a subject-independent detector using anomaly scores. IEEE Trans. Biomed. Eng. 64,2719–2728 (2017). S. L.-I. T. & 2017, U.

35. Arora, S. et al. Detecting and monitoring the symptoms of Parkinson’s diseaseusing smartphones: a pilot study. Park. Relat. Disord. 21, 650–653 (2015).

36. Zhan, A. et al. Using smartphones and machine learning to quantify Parkinsondisease severity the mobile parkinson disease score. 21218, (2018).

37. Hong, J.-H., Ramos, J. & Dey, A. K. Toward personalized activity recognition sys-tems with a semipopulation approach. IEEE Trans. Human Mach. Syst. 46, 101–112(2016).

38. Lonini, L., Gupta, A., Kording, K. & Jayaraman, A. Activity recognition in patientswith lower limb impairments: do we need training data from each patient?in2016 38th Annual International Conference of the IEEE Engineering in Medicineand BiologySociety (EMBC) 3265–3268 (IEEE, 2016). https://doi.org/10.1109/EMBC.2016.7591425

39. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).40. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout:

a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15,1929–1958 (2014).

Open Access This article is licensed under a Creative CommonsAttribution 4.0 International License, which permits use, sharing,

adaptation, distribution and reproduction in anymedium or format, as long as you giveappropriate credit to the original author(s) and the source, provide a link to the CreativeCommons license, and indicate if changes were made. The images or other third partymaterial in this article are included in the article’s Creative Commons license, unlessindicated otherwise in a credit line to the material. If material is not included in thearticle’s Creative Commons license and your intended use is not permitted by statutoryregulation or exceeds the permitted use, you will need to obtain permission directlyfrom the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

© The Author(s) 2018


8


https://doi.org/10.1038/s41746-018-0071-z

https://doi.org/10.1109/EMBC.2016.7590787

https://doi.org/10.1109/CHASE.2017.94

https://doi.org/10.1016/j.parkreldis.2018.02.001

https://doi.org/10.1016/j.parkreldis.2018.02.001

https://doi.org/10.1145/3090076



http://creativecommons.org/licenses/by/4.0/

http://creativecommons.org/licenses/by/4.0/

wearable sensors for parkinson’s disease: which data are worth...

Documents