activity monitor reliability and validity in community...

81
Activity monitor reliability and validity in community-dwelling older adults By Stephanie Andrea Maganja B.Sc., Simon Fraser University, 2015 Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in the Department of Biomedical Physiology and Kinesiology Faculty of Science © Stephanie Andrea Maganja 2018 SIMON FRASER UNIVERSITY Fall 2018 Copyright in this work rests with the author. Please ensure that any reproduction or re-use is done in accordance with the relevant national copyright legislation.

Upload: others

Post on 13-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Activity monitor reliability and validity in

community-dwelling older adults

By

Stephanie Andrea Maganja

B.Sc., Simon Fraser University, 2015

Thesis Submitted in Partial Fulfillment of the

Requirements for the Degree of

Master of Science

in the

Department of Biomedical Physiology and Kinesiology

Faculty of Science

© Stephanie Andrea Maganja 2018

SIMON FRASER UNIVERSITY

Fall 2018

Copyright in this work rests with the author. Please ensure that any reproduction or re-use is done in accordance with the relevant national copyright legislation.

ii

Approval

Name: Stephanie Andrea Maganja

Degree: Master of Science

Title: Activity monitor reliability and validity in community-dwelling older adults

Examining Committee: Chair: Thomas Claydon Associate Professor

Dawn Mackey Senior Supervisor Associate Professor

David Clarke Supervisor Assistant Professor

Scott Lear Supervisor Professor

Maureen Ashe External Examiner Associate Professor Department of Family Practice University of British Columbia

Date Defended/Approved: November 20, 2018

iii

Ethics Statement

iv

Abstract

Monitoring older adult physical activity is central to surveillance of individual and

population health as well as delivery and evaluation of health promotion programs, and it

requires reliable and valid measurement tools. I investigated step count test-retest

reliability and criterion validity across consumer-grade activity monitors in community-

dwelling older adults with and without self-reported mobility limitations during over-ground

walking (n = 36; mean 71.4 years). I evaluated six activity monitors (Fitbit Charge, Fitbit

One, Garmin vívofit 2, Jawbone UP2, Misfit Shine, and New-Lifestyles NL-1000) during

two 100-step walks, one continuous 400-metre walk, and one interrupted 400-metre walk.

On average, all monitors undercounted steps. Step counts from hip-worn monitors

generally exhibited better reliability and validity than from wrist-worn monitors. Mobility

status did not affect monitor step count errors, but interruptions to walking negatively

impacted criterion validity. The hip-worn Fitbit One was the only monitor with sufficiently

high test-retest reliability and criterion validity.

Keywords: older adults; activity monitors; step counting; mobility; reliability; validity

v

Dedication

To my dad.

Thank you for ensuring I was always prepared.

vi

Acknowledgements

Foremost, I want to begin by thanking my senior supervisor, Dr. Dawn Mackey. I

am ever grateful for the opportunities you have provided me, the hours you have spent in

one-on-one meetings and writing thoughtful feedback on my work, and your overwhelming

support. Thank you for your patience and understanding, especially within this last year. I

admire your dedication to your students, work, and family and the wonderful environment

that surrounds you. You are someone I look up to and it has been a pleasure to work with

and learn from you over the last four years as a co-op student, research assistant, tutor-

marker, and graduate student. Thank you.

Thank you to my supervisory committee, Drs. David Clarke and Scott Lear. I am

greatly appreciative of your willingness to advance my learning, insightful advice, and

various contributions to my thesis work. Thank you, Dr. Maureen Ashe, for taking the time

to act as the external examiner for my defence and to review my thesis. Thank you, Dr.

Meghan Donaldson, for your guidance during my training at the Centre for Hip Health and

Mobility and in the development of the STRIDES Study. I am thankful for the abundance

of thoughtful feedback and generous support from all throughout the course of my studies.

I would like to acknowledge my fellow students, committee and group members,

and friends in the BPK department who helped shape my two degrees at Simon Fraser

University. Thank you to the wonderful BPK department staff who always made asking for

help an enjoyable experience. I owe a debt of gratitude to the current and former members

of the Aging and Population Health Lab. Amy, Ashley, Chantelle, Emaan, Kristina, Michal,

Nicole, Tim, and Valeriya, thank you for reading over my emails, attending and providing

valued feedback on every presentation, and supporting me with your friendship during our

time together.

This research would not have been possible without the generosity of the

STRIDES Study participants. In addition, thank you to Amy, Kaitlin, the pilot participants,

and the study volunteers for their time and dedication to the project.

Finally, I want to thank my family, friends, and loved ones for their support over the

years. Mike, you have seen me through this degree and were always someone to lean on.

Thank you for all that you have done to help me reach this point. Mom, thank you for being

a source of stability and for your constant unwavering strength.

vii

Table of Contents

Approval ............................................................................................................................... ii Ethics Statement ................................................................................................................. iii Abstract ............................................................................................................................... iv Dedication ............................................................................................................................v Acknowledgements ............................................................................................................. vi Table of Contents ............................................................................................................... vii List of Tables ....................................................................................................................... ix List of Figures.......................................................................................................................x List of Acronyms ................................................................................................................. xi Glossary ............................................................................................................................. xii

Chapter 1. Background ................................................................................................. 1 1.1. Physical Activity in Canada...................................................................................... 2

1.1.1. Canadian Physical Activity Guidelines ............................................................. 2 1.1.2. Older Adult Physical Activity Levels ................................................................. 6

1.2. Interventions and Measurement for Physical Activity ............................................. 6 1.3. Step Counting .......................................................................................................... 7 1.4. Activity Monitors ....................................................................................................... 8 1.5. Evaluation of Measurement Instruments ............................................................... 11 1.6. Activity Monitor Studies in Adults .......................................................................... 12 1.7. Activity Monitor Studies in Older Adults ................................................................ 14

1.7.1. Wrist-Worn Monitors ....................................................................................... 15 1.7.2. Hip-Worn Monitors .......................................................................................... 15 1.7.3. Additional Considerations ............................................................................... 16

Mobility ....................................................................................................................... 16 Sedentary, Ambulatory, and Household Activities .................................................... 17 Speed and Placement ............................................................................................... 17

1.8. Research Gaps ...................................................................................................... 24 1.9. Objectives .............................................................................................................. 24

Chapter 2. Test-Retest Reliability and Criterion Validity of Step Counts from Consumer-Grade Activity Monitors Worn by Older Adults: The STRIDES Study ..................................................................................................................... 25

2.1. Introduction ............................................................................................................ 25 2.2. Methods ................................................................................................................. 27

2.2.1. Participants ..................................................................................................... 27 2.2.2. Descriptive Measures ..................................................................................... 27 2.2.3. Outcome Measures ........................................................................................ 28

Monitors ..................................................................................................................... 28 Walking Trials ............................................................................................................ 29 Measurements ........................................................................................................... 31

viii

2.2.4. Statistical Analysis .......................................................................................... 31 Sample Size............................................................................................................... 31 Descriptive Analysis .................................................................................................. 31 Test-Retest Reliability ............................................................................................... 32 Criterion Validity ........................................................................................................ 33

2.3. Results ................................................................................................................... 33 2.3.1. Participants ..................................................................................................... 33 2.3.2. Test-Retest Reliability..................................................................................... 35 2.3.3. Criterion Validity.............................................................................................. 35 2.3.4. Activity Monitor Preferences........................................................................... 39

2.4. Discussion .............................................................................................................. 40 2.4.1. Strengths......................................................................................................... 42 2.4.2. Limitations ....................................................................................................... 43 2.4.3. Conclusion ...................................................................................................... 44

Chapter 3. General Discussion .................................................................................. 45 3.1. Feasibility of Recruiting and Studying Older Adults with Intact- and Limited-Mobility .............................................................................................................................. 46 3.2. Additional Challenges and Limitations .................................................................. 47 3.3. Knowledge Translation .......................................................................................... 49 3.4. Future Directions .................................................................................................... 49 3.5. Conclusions............................................................................................................ 50

References ..................................................................................................................... 52

Appendix A. STRIDES Participant Information Questionnaire ........................... 62

Appendix B. STRIDES Physical Inactivity Questionnaire .................................... 65

Appendix C. STRIDES Functional Comorbidities Index ...................................... 67

Appendix D. STRIDES Device and Technology Survey ....................................... 68

ix

List of Tables

Table 1.1 Physical activity recommendations for adults from Warburton et al., 2010 . ..................................................................................................................... 4

Table 1.2 Studies evaluating activity monitor step count validity and reliability within the older adult population ......................................................................... 18

Table 1.3 Outcomes of studies evaluating activity monitor step count validity and reliability within the older adult population ................................................ 20

Table 2.1 Description of activity monitors ................................................................. 28

Table 2.2 Participant characteristics ......................................................................... 34

x

List of Figures

Figure 2.1 Images of activity monitors.86–91 ................................................................ 29

Figure 2.2 Walking trials completed in a level hallway. Participants walked four laps of the continuous and interrupted courses to reach 400 metres. ............ 30

Figure 2.3 Box plots with median absolute difference step count percent errors between test-retest reliability walk 1 (RW1) and walk 2 (RW2) for the six indicated activity monitors (n = 36 for all monitors except Misfit Shine, n = 28) (95CI = 95% confidence interval, SEM = Standard error of measurement). Central rectangle spans the interquartile range (IQR), and the whiskers represent the inner fence (upper: Q3 + 1.5 x IQR, lower: Q1 – 1.5 X IQR). *Charge different than One (p = 0.017); Shine different than One (p < 0.001), NL-1000 (p < 0.001), and vívofit 2 (p = 0.004); UP2 different than One (p < 0.001), NL-1000 (p < 0.001), and vívofit 2 (p = 0.002). ....................................................................................................... 36

Figure 2.4 Box plots with median step count percent errors for the indicated activity monitors (n = 72 for all monitors except the Misfit Shine, n = 67) and for walk interruptions (n = 213 for Continuous, n = 214 Interrupted) in 36 older adults (95CI = 95% confidence interval). Central rectangle spans the interquartile range (IQR), and the whiskers represent the inner fence (upper: Q3 + 1.5 x IQR, lower: Q1 – 1.5 X IQR). Horizontal dotted lines represent zero step count percent error. *vívofit 2 different than Shine (p = 0.016); Charge different than Shine (p < 0.001), One (p < 0.001), and NL-1000 (p = 0.031); Interrupted different than Continuous (p = 0.018). ...... 37

Figure 2.5 Bland-Altman analyses for each activity monitor (n = 72 for all monitors except the Misfit Shine, n = 67) and for walk interruptions (n = 213 for Continuous, n = 214 Interrupted) compared to the criterion tally counts in 36 older adults. The solid lines represent the mean step count error (horizontal) and line of best fit (trend line). Dotted lines represent the limits of agreement (mean ± 1.96SD). Grey points represent self-reported mobility-intact participants, red points represent self-reported mobiltiy-limited participants. ................................................................................... 38

Figure 2.6 Activity monitor (n = 72 for all monitors except the Misfit Shine, n = 67) and walk interruptions (n = 213 for Continuous, n = 214 for Interrupted) equivalence tests. Mean step count errors (%) with 95% confidence intervals (95CI). Area between dotted vertical lines represents

equivalence bounds ( 5.0%). .................................................................. 39

xi

List of Acronyms

ANOVA Analysis of Variance

BPK Biomedical Physiology and Kinesiology

COPD Chronic Obstructive Pulmonary Disease

COSMIN COnsensus-based Standards for the selection of health Measurement INstruments

CW 400-metre Continuous Walk

EPESE Established Populations for the Epidemiologic Study of the Elderly

GPS Global Positioning System

IAGG International Association of Gerontology and Geriatrics

ICC Intraclass Correlation Coefficient

IQR Interquartile Range

IW 400-metre Interrupted Walk

LIFE Study Lifestyles Interventions and Independence for Elders Study

MET Metabolic Equivalent

MoCA Montreal Cognitive Assessment

MVPA Moderate-to-vigorous physical activity

NHANES National Health and Nutrition Examination Survey

PAR-Q+ The Physical Activity Readiness Questionnaire for Everyone

RW 100-step Reliability Walk

RW1 100-step Reliability Walk 1

RW2 100-step Reliability Walk 2

SD Standard Deviation

SEM Standard Error of Measurement

SFU Simon Fraser University

SPPB Short Physical Performance Battery

STRIDES Step Tracking Reliability In Devices worn by Seniors

Tukey’s HSD Tukey’s Honest Significant Difference

95CI 95% Confidence Interval

xii

Glossary

Community-Dwelling An older adult living independently in the community, not living in an assisted-living residence or long-term care.

Criterion Validity “How well the scores of the measurement instrument agree with the scores of the gold standard.”1,2

Mobility Limitation Any self-reported difficulty walking a quarter of a mile (2-3 blocks) outside on level ground and/or walking up a flight of stairs (about 10 steps) without resting.3–5

Reliability “The extent to which scores for participants who have not changed are the same for repeated measurement under several conditions, e.g., using different sets of items from the same multi-item measurement instrument (internal consistency); over time (test-retest); by different assessors on the same occasions (inter-rater); or by the same assessors (i.e. raters or responders) on difference occasions (intra-rater).”1,2

Validity “The degree to which an instrument truly measures the construct(s) it purports to measure.”1,2

1

Chapter 1. Background

As the number and proportion of older adults increases worldwide,6,7 there will be

concomitant increases in the prevalence of chronic diseases and functional limitations,8,9

as well as healthcare utilization.10,11 Physical inactivity is a major contributor to chronic

disease, mortality, and poor physical function in older adults.8,12 Fortunately, physical

activity is a modifiable risk factor and adequate levels of physical activity decrease the risk

of negative health outcomes.8,12

Interventions to increase physical activity have proven effective at increasing

physical activity levels in older adults.10,13–18 However, the measures of physical activity

used to assess intervention effects have been variable across studies, and their reliability

and validity among older adults are not certain.19 To successfully assess the adherence

to and effects of long-term interventions, reliable and valid tools for measuring physical

activity must be identified.

In my thesis, I aimed to evaluate the test-retest reliability and criterion validity of

consumer-grade activity monitors as a measure of step counting in older adults. In this

Chapter, I start by describing physical activity levels of Canadian older adults in relation to

our national physical activity guidelines. After, I review the effectiveness of interventions

to increase physical activity. This serves to provide general context for my research and

specific justification for the importance of physical activity measurement. I next describe

how step counts from activity monitors can be used as a measure of physical activity.

Additionally, I outline the importance of test-retest reliability and criterion validity of a

measurement tool. I then provide an overview of studies that have evaluated the reliability

and validity of step counts from activity monitors in adults and older adults. I conclude this

Chapter with a summary of the most critical research gaps in the currently available

literature and present the specific aims of my thesis.

2

1.1. Physical Activity in Canada

1.1.1. Canadian Physical Activity Guidelines

The current Canadian physical activity guidelines recommend that older adults (65

years and older) should “accumulate at least 150 minutes of moderate-to-vigorous

intensity aerobic physical activity per week, in bouts of 10 minutes or more… perform

muscle and bone strengthening activities using major muscle groups, at least two days

per week… [and] those with poor mobility should perform physical activities to enhance

balance and prevent falls”.20,21 By adhering to these guidelines, individuals can

significantly reduce their risk of premature death and a range of chronic diseases including

coronary artery disease, stroke, hypertension, colon cancer, breast cancer, type 2

diabetes, and osteoporosis.12,20–22 Sufficient levels of physical activity can also help to

maintain functional independence, mobility, bone health, and mental health, as well as

improve fitness and improve or maintain body weight.20,21

Walking is a common form of daily physical activity;23 thus, its relation to the

physical activity recommendations is of interest. In studies conducted with younger adults,

a cadence of 100 steps per minute equates to a moderate-intensity walk on flat terrain.24,25

Therefore, walking 3000 steps in 30-minute intervals on 5 days of the week would meet

the physical activity recommendations of 150 minutes of MVPA per week.24,25 Activity

intensity was verified using metabolic equivalents (METs), where 1 MET equates to 3.5

ml O2/kg/min or 1 kcal/kg/hour.24,25 It is important to note that the authors caution using

step rates based on pedometers as a substitute for METs because of the large prediction

error.24,25 In addition, Tudor-Locke and colleagues emphasized that MVPA should be

completed in addition to the activities of normal living.24 Accordingly, daily step count

recommendations for older adults are approximately 7,000 to 10,000 steps per day, with

at least 3,000 steps completed at a moderate-to-vigorous pace.24

The current Canadian physical activity guidelines are supported by a large body of

scientific evidence that was summarized in a 2010 systematic review.12 The review

examined the dose-response relationship between adult physical activity and all-cause

mortality as well as the incidence of cardiovascular disease, stroke, hypertension, type 2

diabetes, colon cancer, breast cancer, and osteoporosis (Table 1.1). In many situations

there was a linear dose-response relationship: as physical activity increased health

3

benefits also increased. The review also revealed the undeniable support that physical

activity, throughout the adult lifespan, is an effective preventative strategy against

premature mortality and the aforementioned chronic conditions.

An additional systematic review further contributed to the development of the

Canadian physical activity guidelines. Paterson and Warburton examined the relationship

between physical activity and the maintenance of functional independence in older adults.8

Their review included studies with community-dwelling older adults between the ages of

65 and 85 years. Sixty-six studies published between 1995 and 2009 with over 80,000

participants were reviewed, including 35 prospective cohort studies of physical activity and

functional limitations in older adults. Generally, higher levels of physical activity resulted

in higher functional status. They established older adults could reduce their relative risk of

morbidity and mortality, and loss of independence by 30% or more through meeting the

physical activity recommendations. In addition, further benefits occur with more physical

activity and greater fitness gains (~60% reduction in risk). Unfortunately, the majority of

Canadian older adults are not meeting the current physical activity recommendations.

4

Table 1.1 Physical activity recommendations for adults from Warburton et al., 2010

Condition Number of Articles

Published (years)

Number of participants

Results Recommendation

All-cause mortality 70 1985 - 2007 1,525,377 • Dose-response relationship between premature all-cause mortality and physical activity

• 31% risk reduction with physical activity, compared to physical inactivity

• With only small increases in physical activity, physically inactive and sedentary individuals will see a significantly reduced risk in all-cause mortality

“For a reduced risk for premature mortality, it is recommended that individuals should participate in 30 min or more of moderate to vigorous exercise on most days of the week. Greater health benefits appear to occur with higher volumes and/or intensities of activity.”12

Cardiovascular disease

49 1975 - 2007 726,474 • Graded inverse dose-response relationship between cardiovascular disease and physical activity/fitness

• 36% risk reduction in the incidence of cardiovascular disease

“For a reduced risk for cardiovascular disease-related events and mortality, it is recommended that individuals participate in 30 min or more of moderate to vigorous exercise on most days of the week. Greater health benefits appear to occur with high volume and/or intensities of activity. Health benefits may also occur with as little as one hr of brisk walking per week.”12

Stroke 25 1993 - 2007 479,336 • Strong evidence that physical activity is associated with a reduced risk of stroke

• 31% mean risk reduction

“For a reduced risk of stroke, it is recommended that individuals should participate in 30 min or more of moderate to vigorous exercise on most days of the week. Brisk walking appears to be protective against the development of stroke. It remains to be determined whether lower volumes of physical activity lead to a reduced risk for stroke.”12

Hypertension 12 1983 - 2007 112,636 • Moderate to vigorous intensity physical activity can lead to a noticeably reduced risk

• 32% mean risk reduction with physical activity/fitness

“For a reduced risk for hypertension, it is recommended that individuals should participate in 30 min or more of moderate to vigorous exercise on most days of the week.”12

5

Condition Number of Articles

Published (years)

Number of participants

Results Recommendation

Colon (c) and breast (b) cancer

33 (c) 43 (b)

1985 - 2008 (c) 1993 - 2007 (b)

1,433,103 (c) 1,861,707 (b)

Colon cancer

• In most studies, there was a dose-dependent relationship

• 30% mean risk reduction when comparing most active group to the least active group

Breast cancer

• Reduced risk of developing breast cancer is strongly associated with physical activity

• 20% mean risk reduction when comparing most active group to the least active group

• A dose-response relationship

“For a reduced risk for site specific cancers (such as colon cancer and breast cancer), it is recommended that individuals should participate in 30 min or more of moderate to vigorous exercise on most days of the week.”12

Type 2 diabetes 20 1991 - 2007 624,952 • Inverse relationship between type 2 diabetes and increasing levels of physical activity

• 42% mean risk reduction when comparing most active group to the least active group

• With only small increases in physical activity, physically inactive and sedentary individuals will see a significantly reduced risk for type 2 diabetes

“For a reduced risk for type 2 diabetes, it is recommended that individuals should participate in 30 min or more of moderate to vigorous exercise on most days of the week.”12

Osteoporosis 2 2008 8,790 • Unclear if there is a dose-response relationship

• Current guidelines are sufficient to maintain and improve bone health based off of preliminary evidence Further research is needed

“For a reduced risk for osteoporosis, it is recommended that individuals should participate in load bearing activities for 30 min or more on most days of the week.”12

Musculoskeletal fitness and health

n/a n/a n/a • Limited, but there is evidence it is positively associated with health status

“For improved health status and reduced risk for chronic disease and disability, it is recommended that individuals should include daily activities that tax the musculoskeletal system”12

6

1.1.2. Older Adult Physical Activity Levels

Physical inactivity among Canadian adults accounted for an estimated $6.8 billion

of the total healthcare costs in 2009.26 Older adults are significantly contributing to these

costs, according to the 2012 and 2013 Canadian Health Measures survey, as 88% of older

Canadians (60 to 79 years) are not meeting the national physical activity

recommendations of 150 minutes of MVPA per week.9 Adults between the ages of 18 to

39 (32%) and 40 to 59 (18%) met the guidelines more often than older adults. Adults

between the ages of 60 and 79 were the most sedentary group, accumulating about 10

hours per day of sedentary time and only about 14 minutes of MVPA per day.

Normative data from the National Health and Nutrition Examination Survey

(NHANES) indicated that, on average, individuals 60 years and older took between 2749

and 4490 steps per day.27 Furthermore, within defined age groups, steps per day

decreased as age increased. Individuals between the ages of 65 and 69 walked, on

average, between 3,302 to 5,269 steps per day, while those aged 85 and up walked

between 667 and 1,415 steps per day. These step counts represent exceptionally low

levels of physical activity and do not meet the recommendation, mentioned previously, of

7,000 to 10,000 steps per day.

It is evident that physical inactivity is prevalent among older adults. Interventions

are needed, as this population is at risk for several chronic diseases, premature mortality,

and functional decline due to inactivity. There are a number of strategies that may help to

mitigate low levels of physical activity including launching community-wide physical activity

campaigns, social support interventions, adapting the built environment, and/or promoting

active transport.28 Promisingly, physical activity or exercise promotion programs have

shown to improve physical activity in the older adult population.10,13–18

1.2. Interventions and Measurement for Physical Activity

Numerous reviews over the last two decades have published the positive effects

of interventions on older adult physical activity levels.10,13–18 In terms of adherence and

effectiveness, participation rates were high for various programs (e.g., home-based)10,17,18

and there is strong evidence that older adults will successfully adhere to physical activity

interventions up to 12 months in duration. Regarding changes in physical activity,

7

treatment groups had higher levels of physical activity post intervention (when compared

to control groups) and often showed levels were maintained.10,13–18 Moreover, a large

number of short-term group-based studies (less than one year) saw significant increases

in physical activity compared to education programs.18 Overall, these reviews found that

interventions can improve physical activity levels and outcomes in older adults.10,13–18

Many studies included in these aforementioned reviews relied on self-report

measures of physical activity.10,13,15–17 For example, a 2016 review by Falck and

colleagues found that most older-adult physical activity intervention studies used self-

report measures of activity; however, the measures used had limited evidence of reliability

and validity.19 Forty-four studies were included in their review, with 33 different measures

of physical activity. Thirty-two studies implemented self-report measures, while only 12

relied on objective measures (e.g., accelerometers). Additional reviews have indicated the

limitations of self report in detecting changes in older adult physical activity levels.19,29

While self-report measures can be simple and inexpensive, they are subject to recall bias

which can compromise the reliability and validity of physical activity measurements, and

in turn, negatively impact the assessment of changes in physical activity in response to

intervention.16,19

Objective measures of physical activity can provide reliable and valid

measurement.19 These measures are often more costly but they eliminate recall bias and

their use is recommended in order to reach accurate conclusions about interventions.19

Evidence that identifies low-cost, valid, and reliable objective methods for long-term

monitoring of physical activity levels in older adults is limited, and this represents a

significant barrier to implementing and comparing the results of physical activity

interventions.18,19,30–32

1.3. Step Counting

In Canada, walking is the most common leisure-time physical activity, with 63% of

men and 76% of women regularly partaking in walking in 2005.23 Walking is accessible for

a large number of older adults and is a preferred form of physical activity.33 As a result,

step counts are a natural measure of physical activity behaviour in older adults and have

been widely used as a measure of walking and physical activity in previous research. Step

counting is low cost, simple, and can be implemented on a large scale (e.g., community

8

physical activity interventions). Moreover, it is desirable for interventions that involve

physical activity goal setting as participants can easily understand the outcome, a target

number of steps can be set, and values are easy to compare.34

1.4. Activity Monitors

Pedometers are the simplest type of activity monitor, and they count the number

of steps taken per day. Total steps accumulate throughout the day and are displayed on

a digital screen for the user.34 A traditional pedometer records steps when the vertical

acceleration of the hip (where the device is worn) exceeds the predetermined force-

sensitivity threshold and the internal lever is disturbed to the point where the electronic

circuit is completed.34 The device is outfitted with a horizontal, spring-suspended lever arm

which moves up and down when the hip produces vertical accelerations.34 Some newer

monitors generate sine waves based on hip accelerations via a piezoelectric

accelerometer.34 When used in research settings, pedometer-based interventions lead to

increases in physical activity and steps walked per day.34 In addition, the effectiveness of

pedometers has been linked to their sensitivity to walking, the simplicity of step counts as

a measure of physical activity, high user acceptance, clear and immediate feedback of

physical activity behaviour, and the ability to easily self-monitor behaviour and set goals.34

Although older adults are comfortable and familiar with pedometers,35 traditional

pedometers cannot be synced (connected) to a smartphone, tablet, or web-based

application. Therefore, research studies involving pedometers require manual data entry

of step counts by participants or researchers, which is undesirable for long-term monitoring

studies.35

Newer activity monitors are capable of counting steps and wirelessly syncing data.

Thus, researchers can receive information on the intensity, frequency, duration, and timing

of steps without direct input from the study participant.35 Moreover, within the last fifteen

years, numerous consumer-grade wearable physical activity monitors (activity monitors,

a.k.a. activity trackers, electronic activity monitors, fitness or activity trackers, and

wearable activity trackers/monitors)35,36 have become available on the market. The

wearable activity monitor market is continuing to expand as monitors become more

affordable and customizable.35 In 2017, 115.4 million wearable monitors were shipped (an

increase of 10.3% from the previous year).37 As of the end of 2017, Apple and Fitbit topped

the fitness wearables market.37 Since their development, several studies have explored

9

the use of consumer-grade activity monitors to assess step counts.36,38 These monitors

have been used in physical activity interventions for reinforcement, goal-setting, self-

monitoring, and measurement.36,39–44

Current consumer-grade activity monitors are affordable and have several features

in addition to step counting which makes them promising technology for long-term physical

activity monitoring. These monitors allow for goal-setting and have smartphone, tablet, or

web-based applications that can be personalized and provide physical activity and health

recommendations or tips to the user.35 Data, such as steps, calories of energy

expenditure, distance walked, and sleep characteristics, can be stored and accessed

through these applications which provide minute-by-minute to yearly overviews.35,36 There

are a variety of brands that produce activity monitors (e.g., Fitbit, Misfit) and each brand

outfits their products with various capabilities, battery life, display features, and cost.45

Brands have also created multiple monitors to satisfy a range of consumers such as less

expensive monitors with fewer features and more costly monitors with global positioning

systems (GPS). Some monitors allow for competition with peers or an online community

and help promote self-monitoring and goal setting (daily or long-term).36

To detect steps, activity monitors are outfitted with an accelerometer. An

accelerometer can detect acceleration in one (uni), two (bi), or three (tri) directions.

Research-grade triaxial accelerometers, such as the ActiGraph and StepWatch, have

been regarded as gold standards in the measurement of physical activity.46,47

Unfortunately, research-grade monitors are typically expensive, have extra software costs,

may not be comfortable for the user, and do not provide instantaneous feedback.45,48

Consumer-grade activity monitors may be more suitable for these situations and in health

care monitoring with clinicians.

Consumer-grade activity monitors are typically equipped with a three-axis

accelerometer.49 They have predefined and proprietary algorithms that are designed to

detect motion patterns that resemble walking.49 When the individual wearing the monitor

moves enough to pass the predefined acceleration threshold, a step will be recorded.49 It

is possible that movements other than steps can pass such thresholds and incorrectly be

recorded as steps, or that thresholds may not be met while walking (e.g., walking on a soft

surface like a plush carpet).49 Activity monitors also typically estimate distance walked,

calories burned, as well as sleep and physical activity duration. Some monitors have been

10

outfitted with altimeters (to count floors walked), heart rate monitors, and GPS. Most

current monitors are designed to be worn on the wrist; however, some can be clipped onto

the waistband, pocket, or bra (e.g., Fitbit Zip), while others can be worn on the wrist or in

a necklace pendant (e.g., Fitbit Flex 2).

General adoption of technology is increasing in older adults, and older adults have

also shown to be accepting of activity monitors. Mercer and colleagues investigated older

adults’ perceptions of these monitors.35 Their mixed-methods study tested four monitors

(Fitbit Zip, Jawbone UP 24, Misfit Shine, and Withings Pulse) and assessed participants ’

responses through thematic analysis. Thirty adults aged 50 and older first wore a

pedometer (Sportline or Mio) for three days, then each activity monitor for three-days each.

If participants did not own a smartphone, they were lent one for the study. Participants

synced their data with their smartphone or tablet and recorded their daily step counts. In

general, participants enjoyed using the monitors and preferred them over the pedometer.

They found the activity monitors could be helpful for motivation, promotion of self-

awareness, and should be used for health care purposes as opposed to entertainment.

When given the option, older adults would choose a newer activity monitor over a simple

pedometer (rated as the least preferred option). These results are positive, as adoption

and uptake of these devices by older adults could lead to simpler implementation in long-

term monitoring studies.

The invention and advancement of consumer-grade activity monitors provides

researchers with new opportunities for physical activity measurement. Data, including the

number of steps, can be tracked over time and may be used to gain insights into the older

population’s daily physical activity.48 These monitors have the possibility of detecting lower

levels of physical activity that make up the majority of older adults’ daily life, which are

often underestimated or missed in self-report questionnaires or diaries.48 Unfortunately,

manufacturers release limited information on the reliability and validity of their measures

and estimations. In order for researchers to use activity monitors in intervention studies,

we must determine how well they work in specified target populations by evaluating the

reliability and validity of step counting.48

11

1.5. Evaluation of Measurement Instruments

To confidently use an activity monitor to measure intervention-induced changes in

physical activity (i.e., steps) in an older adult population, it must be reliable and valid in the

population it is being used in.19 If monitors are used without first establishing their reliability

and validity, it can lead to false-negatives or inconsistent results, and researchers are at

risk of reporting incorrect conclusions.19

The COnsensus-based Standards for the selection of health Measurement

INstruments (COSMIN) defines reliability and validity as follows:1,2

Reliability: “the extent to which scores for patients/participants who have not changed are the same for repeated measurement under several conditions, e.g., using different sets of items from the same multi-item measurement instrument (internal consistency); over time (test-retest); by different persons on the same occasions (inter-rater); or by the same persons (i.e. raters or responders) on difference occasions (intrarater).”1,2

Validity: “The degree to which an instrument truly measures the construct(s) it purports to measure.”1,2

Criterion Validity: “How well the scores of the measurement instrument agree with the scores of the gold standard,”1,2 e.g., how well the activity monitor step count agree with the direct observation of steps (tally count)

A number of methodologies have been applied to evaluate the reliability and

validity of activity monitors. With respect to reliability, previous studies have assessed the

test-retest (intradevice) reliability, which indicates consistency within the same monitor,

and the interdevice reliability, which expresses consistency across a brand or monitor

while worn at the same time in the same location.36 Regarding validity, researchers have

evaluated the criterion and construct validity. The former compares the monitor step count

to a criterion measure (e.g., tally count, research-grade accelerometer) and the latter

involves comparing an activity monitor to another construct and determining if there is

convergent (positively correlated) or divergent (negatively correlated) validity.36

When investigating the measurement properties of an instrument, the focus is on

quantifying measurement error; the difference between the observed value and the true

value.50 According to Hopkins, the most important aspects of measurement error are test-

retest reliability and concurrent validity (criterion validity).50 Regarding reliability,

measurement error has an impact on measuring changes between repeated

12

measurements (or a single measurement).50 It is studied through repeated measurements

on a reasonable number of individuals.50 The most important error is the random error

(within-subject variation). This random error is the random variation between trials when

one person is tested repeatedly.50 A smaller error equates to a better measure and the

more easily a change in performance is noticed.50 Validity is concerned with the agreement

between the activity monitor (observed) and the criterion step count measure (truth).50

Again, a smaller error value represents a better measure.50

1.6. Activity Monitor Studies in Adults

A 2015 systematic review from Evenson and colleagues compiled studies that

assessed the reliability and validity of popular consumer-grade activity monitors (Fitbit and

Jawbone).36 They focused on the step, distance, physical activity, energy expenditure, and

sleep tracking features of these monitors. Their search excluded studies on special

populations (e.g., stroke, traumatic brain injury, COPD, assisted-living). Twenty-two

studies were included in the systematic review; 20 tested at least one Fitbit monitor

(Classic, Ultra, One, Zip, and Flex), and eight reported on Jawbone monitors (UP and

UP24). The majority of the studies were from the United States with data collected

between 2010 and 2015. One study focused on older adults51 and two on youth

populations52,53; the majority included generally healthy individuals 18 years and older.

In the systematic review, most studies evaluated the validity of one or more activity

monitor with sample sizes ranging from six30 to 65.53 Twelve studies focused on the validity

of step counts from Fitbit monitors.30–32,54–62 Four studies evaluated the validity of step

counts from Jawbone monitors.31,32,58,60 Criterion measures included direct observation

(tally count,58–60 video recording,30,31,54,56 pedometers,62 and research-grade

accelerometers).32,55,57,61,62 In terms of validity, hip-worn monitors were superior to wrist-

worn monitors,30,56,58,60 and one study found an ankle-worn Fitbit One to perform better

than a hip-worn monitor.54 In some controlled setting studies, the Fitbit Ultra,59,60 One,31,58

Zip,58 and Jawbone UP60 monitors correlated well with direct observation and the Fitbit

Classic,57 Ultra,55,57 Zip,62 One,32 and UP32 monitors correlated well with criterion

accelerometers. During treadmill tests, some monitors undercounted steps when

compared to direct observation (One,56 Flex,56,58 Ultra,59 and UP2458) and criterion

accelerometers (One,61 Flex,56,58 UP,32,61 and UP2458). Lastly, after two days of free-living

monitoring, a study reported that the Fitbit One overcounted steps compared to a criterion

13

accelerometer.32 Overall, validity was generally high for activity monitors (especially during

treadmill testing), better in hip-worn monitors, and steps were, on average, undercounted.

No studies assessed the intradevice reliability (test-retest) and only five

studies31,56,57,59,63 investigated the interdevice reliability of step counting for Fitbit monitors.

Sample sizes ranged from one59,63 to 30.31 In multiple studies, the Fitbit Classic and Ultra’s

performed reliably during treadmill walking.57,59 Studies also found that, when walking or

running on the treadmill, speed and hip placement or wrist placement (left vs right) did not

negatively impact reliability.31,56 Lastly, in a free-living study, Fitbit Ultras had high reliability

(high ICC).63 Overall, the interdevice reliability of activity monitors was generally high,

especially during treadmill testing.

Evenson and colleagues recommended several directions for future research.36

First, they advised continuing to test the validity and reliability of new monitors within a

company, testing during specific activities, and evaluating monitors in a range of

populations not only within controlled settings but also free-living settings.64 Additionally,

they recommend standardizing validity and reliability analyses65 because the studies in

their review used numerous analyses and inconsistent interpretations. For replication

purposes, monitors should be set-up consistently and this information should be provided

in publications. It could also be important to include the monitor type, date purchased, and

date tested. Lastly, research should target the reliability of activity monitors. At the time of

the review, there were no intradevice (test-retest) reliability studies and no interdevice

reliability studies on any Jawbone monitors (or the Fitbit Zip).

An additional systematic review and narrative synthesis of Fitbit studies was

published in August 2018 with findings consistent with Evenson et al.66 Feehan and

colleagues’ examined 27 studies that tested the criterion validity of Fitbit monitors (21

studies with healthy adult populations, six with limited mobility or chronic disease

populations). Feehan et al. reported, approximately 50% of the time and in controlled

settings, Fitbit monitors had step count errors within ± 3%. In general, monitors

undercounted steps in these conditions and there was evident undercounting during very

slow walking. Moreover, placement of the device tended to affect step count error during

specific activities. Monitors had smaller step count errors during normal or self-paced

walking activities when worn at the hip, on the wrist when jogging, and on the ankle during

slow walking activities. In free-living scenarios, in healthy adults, monitors had step count

14

errors within ± 10%, around 50% of the time, when compared to research-grade monitors.

In these conditions, monitors normally overcounted steps; but, considerably undercounted

in older adults with limited mobility. In conclusion, hip-worn Fitbit monitors produced low

step count errors in adults with no mobility limitations when walking at normal or self-

selected speeds.

1.7. Activity Monitor Studies in Older Adults

It is important to evaluate the performance of activity monitors in older adult

populations because of the impact that age-related changes in gait, including reduced gait

speed, decreased stride length, and increased stride width, could have on the validity and

reliability of step counting.67 However, relative to the large number of studies conducted

with adult populations, limited research has been conducted to evaluate the validity and

reliability of activity monitors when worn by older adults.

Nine studies have specifically studied consumer-grade activity monitors in an older

adult population. Tables 1.2 and 1.3 provide an overview of the nine studies and the

evaluation of the step count validity and reliability of 12 different activity

monitors.30,45,46,54,68–72 These studies included a total of 312 older adults, 60 years of age

or older, who had a mean age of 73.6 years. Seven monitors were tested on the wrist:

Fitbit Charge HR,45 Fitbit Flex,46,69,70 Fitbit Ultra,30 Jawbone UP,69 Jawbone UP24,70 Misfit

Shine,45 and Polar A300.68 Seven monitors were tested on the hip: Fitbit One,54,69–71 Fitbit

Ultra,30 Fitbit Zip,70–72 Misfit Shine,45 Omron HJ-112,69 Omron HJ-720IT,70 and the

Samsung G-I9300 phone with “Noom” application.30 One monitor was tested on the ankle,

the Fitbit One.54 These studies used numerous step count criterion measures:

ActiGraph,45,46,71 Bodymedia Sensewear,68 direct observation by tally count70,71 and by

video,30,54,69 NL2000i,45 Shimmer3,72 and the StepWatch.69 Four out of nine studies (44%)

tested monitors in controlled settings30,54,69,70 and 56% of studies were conducted in free-

living scenarios with monitoring periods ranging from 24-hours to 7 days.45,46,68,71,72

While the majority of studies included individuals 60 years of age or older,

community-dwelling, and with unassisted ambulation,45,54,69,71 some studies had more

specific inclusion criteria. Thorup et al. recruited individuals 18 years and older who were

hospitalized with acute coronary syndrome, heart failure, coronary artery bypass grafting

or valve surgery.72 Their testing occurred in hospital and free-living conditions. Boeselt’s

15

group’s study population was individuals between 40 and 90 years diagnosed with

COPD.68 Lauritzen et al. recruited three groups of participants: healthy adults (25 to 45

years of age) with no gait disabilities, nursing home residents with normal mobility (≥ 65

years), and nursing home residents with reduced mobility (≥ 65 years) who used rollator

walking aids.30 Lastly, Alharbi and colleagues, tested monitors on individuals diagnosed

with coronary heart disease or who had a family member with coronary heart disease.46

Studies used a variety of analyses to assess the step count validity and reliability

of these activity monitors in older adults. Due to the lack of consistency, it is difficult to

compare results across studies. The most common included correlation30 (Pearson,46

Spearman Rank45), equivalency,68,69 intraclass correlation coefficient (ICC),45,68,69,71,72

limits of agreement,45,46,54,68,69,71,72 mean absolute percent error,30,69,70 and mean percent

error.45,54,69,71,72

1.7.1. Wrist-Worn Monitors

In general, during free-living conditions, wrist-worn monitors overcounted steps but

correlated well with criterion monitors. Alharbi and colleagues found that when individuals

with heart disease wore the Fitbit Flex for four days, regardless of sex or cardiac diagnosis,

it overcounted steps (~1000 steps/day) and was significantly correlated with the criterion

(ActiGraph).46 Boeselt’s group found that the Polar A300 was significantly correlated with

the criterion (Bodymedia Sensewear) after a three-day monitoring period and overcounted

the number of steps per day (~180 to 600 steps/day).68 Farina et al. discovered after a

seven-day monitoring period that the Misfit Shine undercounted by about 10 steps per day

and the Fitbit Charge over counted by about 1700 steps per day compared to the criterion

monitors (NL2000i and ActiGraph) in community-dwelling older adults.45

1.7.2. Hip-Worn Monitors

Overall, step count errors per day were inconsistent for hip worn monitors (some

overcounted while some undercounted) in select populations. Farina et al. found that the

Misfit Shine undercounted by approximately 630 steps per day compared to the NL2000i

and overcounted by about 170 steps per day compared to the ActiGraph.45 Paul and

colleagues reported the Fitbit (One or Zip) had good agreement with the criterion

(ActiGraph) and excellent agreement with direct observation (tally count) after a two-

16

minute walk test with community-dwelling older adults.71 Furthermore, after a seven-day

monitoring period, there was excellent agreement between the Fitbit and ActiGraph and

the Fitbit overcounted approximately 750 steps per day.71 Lastly, Thorup’s group tested

the Fitbit Zip in individuals with cardiovascular disease in hospital and at home for 24

hours.72 Step count percent error in hospital was -47%, while at home the percent error

lowered to -28%. The Zip undercounted by about 2400 steps per day compared to the

criterion (Shimmer3).

1.7.3. Additional Considerations

Mobility

Floegel and colleagues assessed the validity of five activity monitors during a 100-

metre walk in four different populations: older adults with no walking impairment, walking

impairment, cane use, and walker use.69 They found that monitors undercounted steps for

older adults with no walking impairment (2.6% to 26.9% mean error) and only the Fitbit

One was equivalent to the criterion (direct observation). Additionally, the monitors

undercounted for older adults with walking impairment (1.7% to 16.3% mean error) and

only three monitors were equivalent to the criterion (StepWatch, Fitbit One, and Omron).

Expectedly, monitors undercounted steps for cane- and walker-users (≥ 1.8% StepWatch;

> 11.5% all other monitors).

In 2013, Lauritzen et al. tested two monitors on a 20-metre path in nursing home

residents with reduced mobility (walker-dependent), nursing home residents with normal

mobility, and healthy adults.30 They found that older adults with reduced mobility had larger

mean absolute percent errors than healthy adults for the Fitbit Ultra at the hip and wrist

and the Samsung phone. Also, the older adults with reduced mobility had larger mean

absolute percent errors than older adults with normal mobility for the Fitbit Ultra at the wrist

and the Samsung phone. Gait assessment score was significantly negatively correlated

with absolute percent error while time to complete the walk and the number of steps taken

were positively correlated with absolute percent error. All monitors undercounted steps for

older adults with reduced mobility and did not record steps for 80% of the trials.

17

Sedentary, Ambulatory, and Household Activities

Nelson and colleagues tested five monitors during three types of activities

(sedentary, ambulatory, and household).70 Their sample included 10 participants in

multiple age groups (18 to 39, 40 to 59, and 60 to 90 years). Unfortunately, they did not

perform age-group specific analyses. When testing the monitors during sedentary

activities, no monitors were different than the criterion (direct observation). In addition,

when testing during ambulatory conditions, monitors were not different than the criterion

(except the Fitbit Zip) with mean absolute percent errors less than 10%. Monitors

performed best during walking (mean absolute error hip: 2% to 3%, wrist: 8% to 11%).

During walking stairs and cycling, the mean absolute percent errors were between 70% to

93% and 10% to 41% respectively. Lastly, during household activities four monitors

undercounted steps significantly (mean absolute percent errors 54% to 79%).

Speed and Placement

Simpson and colleagues evaluated the Fitbit One at various speeds on a treadmill

in a group of older adults.54 In their study, when worn on the hip, the Fitbit recorded zero

steps while participants walked at speeds between 0.3 and 0.5 m/s, but when worn at the

ankle it always recorded steps. The step count percent error was below 5% at speeds

between 0.5 and 0.9 m/s for the ankle-worn Fitbit One. When worn at the hip, the Fitbit

One had a mean percent error below 10% for only 0.8 and 0.9 m/s.

When taking these additional considerations together, individuals who require a

cane or walker will see large step count errors (e.g., > 11.5%) when using an activity

monitor69 and in extreme cases, monitors may not record any steps (100% error).30 In

addition, monitors have shown to perform well during sedentary activities and walking, but,

high errors occur when walking up stairs and performing household activities.70 Lastly,

speed is a factor that affects step count percent error, and individuals will see better step

count performance at walking speeds greater than 0.8 m/s.54

18

Table 1.2 Studies evaluating activity monitor step count validity and reliability within the older adult population

Authors, Year

Country Sample Size

Mean Age (SD), Years

Monitoring Period Criteria

Paul et al., 201271

Australia 32 67.7 (5.7) 2MWT 7 days Waking hours Free-living

Inclusion: ≥ 60 years; community dwelling; regular (weekly) users of the internet via computer or tablet device; regular (at least once/week) out of home activity without physical assistance from another person Exclusion: cognitive impairment; insufficient English language skills; had a progressive neurological condition or medical conditions precluding exercise; participating in >150 min/week of moderate intensity physical activity; had undergone a fall risk assessment in the past year

Lauritzen et al., 201330

Spain 18 RME n = 5 NME n = 7

RME 87.6 (3.9) NME 84.1 (3.7) HA 35.3 (6.5)

Straight 20-metre path Controlled

RME: ≥ 65 years; Expanded Disability Status Scale73 score of 6.5 - 7.0; utilize a rollator walking aid NME: ≥ 65 years; Expanded Disability Status Scale73 score < 6.0. HA: age 25 - 45; no gait disabilities Exclusion: cognitive impairment; disabilities or other limitations that would hinder gait and correct placement of recording monitors

Simpson et al., 201554

Canada 42 73.0 (6.9) 15-metre 8 trials (self-selected speed and 7 trials 0.3 - 0.9 m/s, increments 0.1 m/s) Pace-setter Controlled

Inclusion: ≥ 65 years; able to walk independently for at least 30 metres with or without an assistive device Exclusion: had a major medical condition that affected their ability to walk or had had major surgery in the past 12 months

Boeselt et al., 201668

Germany 20 66.4 (7.4) 3 days 24 hrs Free-living

Inclusion: 40 - 90 years; diagnosed with COPD stage I to IV; able to walk; involved in outpatient, multidisciplinary pulmonary rehabilitation program Exclusion: lack of mobility, paralysis of the arm, diseases that exclude any physical activity

Nelson et al., 201670

United States

30 60 - 90 yrs n = 10

48.9 (19.4) 11 activity protocols (3 sedentary, 4 household, and 4 ambulatory) 5 minutes each Self-selected pace Controlled

Inclusion: adults; no gait abnormalities; ability to participate in a variety of activities for at least 5 minutes each Exclusion: acute illness; unstable chronic conditions; pregnancy

19

Authors, Year

Country Sample Size

Mean Age (SD), Years

Monitoring Period Criteria

Alharbi et al., 201646

Australia 48 65.6 (6.9) 4 days Waking hours Free-living

Inclusion: diagnosed with coronary heart disease or family with coronary heart disease; completed phase II cardiac rehabilitation; engaged in regular physical activity of 30 min/day; able to participate for the full four-day study; able to apply and wear both monitors simultaneously Exclusion: major comorbidities that would limit their ability to participate in regular physical activity during the study period or affect their ability to apply the monitors

Floegel et al., 201769

United States

99 78.9 (8.6) 100-metre Self-selected pace Controlled

Inclusion: ≥ 62 years; met the health conditions on modified Physical Activity Readiness Questionnaire-Revised including stable blood pressure for 3 months, no chest pain with activity or joint problems limiting physical activity; able to walk 100 metres without stopping with/without an assistive device; willing to wear all activity monitors

Thorup et al., 201772

Denmark 24 67 (10.03) 24 hrs in hospital 24 hrs at home Free-living

Inclusion: ≥ 18; hospitalized with acute coronary syndrome, heart failure, coronary artery bypass grafting or valve surgery Exclusion: pregnant; breast feeding; non-Danish speaking; gait disorder or any other conditions that might affect walking

Farina and Lowry, 201845

United Kingdom

25 72.5 (4.9) 7 days Waking hours Free-living

Inclusion: 65 - 84 years; community-dwelling; independently ambulatory (walking aids excluded)

Abbreviations: HA: Healthy Adults; NME: Healthy Elderly; RME: Elderly with reduced mobility; 2MWT: 2-minute walk test.

20

Table 1.3 Outcomes of studies evaluating activity monitor step count validity and reliability within the older adult population

Authors, Year

Activity Monitors & Criterions

Step Count Analyses Step Count Outcomes

Paul et al., 201271

Fitbit One (h) Fitbit Zip (h) Did not evaluate monitors separately ActiGraph (c) 2MWT: Direct observation (tally count) (c)

• ICC

• Percent Error

• Limits of Agreement

• Good agreement between Fitbit & ActiGraph on 2MWT (ICC2,1 = 0.66, 95% confidence interval 0.41 - 0.82). Excellent agreement between Fitbit & visually counted steps on 2MWT (ICC2,1 = 0.88, 95% confidence interval 0.76 - 0.94)

• Excellent agreement between Fitbit & ActiGraph in mean steps/day over 7 days (ICC2,1 = 0.94, 95% confidence interval 0.88 - 0.97)

• No systematic bias in mean daily step counts

• Fitbit over-counted by 716.7 steps/day

Lauritzen et al., 201330

Fitbit Ultra (h, w) Samsung G-I9300 (h) with “Noom” application Direct observation (video) (c)

• Mean Absolute Percent Error

• Correlation

• Fitbit (h), absolute percent error significant difference (p = 0.008). RME performed more poorly than HA (p = 0.009)

• Fitbit (w), absolute percent error significant difference (p = 0.005). RME performed more poorly than NME (p = 0.003) and HA (p = 0.004)

• Samsung, absolute percent error significant difference (p = 0.015). RME performed more poorly than NME (p = 0.005) and HA (p = 0.017)

• Gait assessment score was significantly negatively correlated with the absolute percent error (poorer gait = larger absolute percent error), Fitbit (w) (rs = -0.663, p = 0.003), Samsung (rs = -0.658, p = 0.003), Fitbit (h) (rs = -0.670, p = 0.002)

• Time to complete the 20-metre walk and number of steps taken during the walk: significantly positively correlated with absolute percent error (Fitbit (h) rs = 0.778, p < 0.01/ rs = 0.666, p = 0.003; Fitbit (w) rs = 0.551, p = 0.018/ rs = 0.518, p = 0.028; Samsung rs = 0.561, p = 0.15/ rs = 0.490, p = 0.039)

• RME: large undercounting errors (Fitbit (h) -27.45% ± 79.9%, Fitbit (w) -99.64% ± 0.8% and Samsung -61.57% ± 25.5%, using non-absolute data)

• Fitbit (w) did not detect any steps in 80% the trials and detected only 1.79% of the steps taken in trials where it collected data

21

Authors, Year

Activity Monitors & Criterions

Step Count Analyses Step Count Outcomes

Simpson et al., 201554

Fitbit One (h, a) Direct observation (video) (c)

• Percent Error

• Mean Difference

• Limits of Agreement

• Fitbit (h) recorded 0 steps for numerous participants at speeds of 0.3 - 0.5 m/s. Fitbit (a) never recorded zero steps at any speed

• Limits of agreement for Fitbit (a) narrower than limits of agreement for Fitbit (h) at all speeds

• Fitbit (a): percent error below 5% for all but the 2 slowest walking speeds (0.3, 0.4 m/s) and < 10% at speeds of 0.4 - 0.9 m/s

• Fitbit (h): percent error never below 5% and only < 10% at 0.8 and 0.9 m/s

• Error of Fitbit (a) was significantly lower than the error of Fitbit (h) at all speeds

Boeselt et al., 201668

Polar A300 (w) Bodymedia Sensewear (c)

• ICC

• Limits of Agreement

• Equivalency

• Polar A300 significantly correlated with Bodymedia Sensewear (r = 0.96; p < 0.01)

• 3-day data analysis showed 90% of steps (95% confidence interval −4223 to 1887) within limits of agreement

• No systematic deviation and data within the limits of agreement

• A300 vs criterion: ICC = 0.986 (p < 0.01)

• Over-counted by 183 - 596 (range) steps/day

Nelson et al., 201670

Fitbit One (h) Fitbit Zip (h) Fitbit Flex (w) Jawbone UP24 (w) Omron HJ-720IT (h) Direct observation (tally count) (c)

• Mean Absolute Error (MAE)

• Mean Absolute Percent Error (MAPE)

• Root Mean Square Error (RMSE)

• Household: Omron, Fitbit One, Fitbit Zip, and Jawbone UP24 significantly undercounted steps by 35% to 74% (all p = 0.006 to < 0.001) (raw step counts). MAPE ranged from 54% to 79%

• Ambulatory: All monitors except the Fitbit Zip were not statistically different from researcher-counted steps, with mean steps within 4% of researcher-counted steps

o All monitors displayed MAPE ≤10%, ranging from 3% to 6% o Walking: hip-worn monitors MAPE of 2% to 3%, and wrist-worn monitors MAPE of

8% to 11% o Cycling: all monitors undercounted steps by 68% - 94% (p = 0.013 to < 0.001),

MAPE was large for all monitors, ranging from 70% to 93% o Stairs: no monitors were statistically different from researcher-counted steps, with

mean step count differences ranging from -11% to 3% (p = 0.16 to > 0.99). All monitors displayed large MAPE, ranging from 10% to 41%

• Sedentary: hip monitors recorded 0 steps for all participants, except the Fitbit One, which recorded one step taken during the writing task in one subject

• No monitors were statistically different from researcher-counted steps for the sedentary category (all p = 0.99)

22

Authors, Year

Activity Monitors & Criterions

Step Count Analyses Step Count Outcomes

Alharbi et al., 201646

Fitbit Flex (w) ActiGraph (c)

• Pearson Correlation

• Limits of Agreement

• Absolute Differences

• Sensitivity, Specificity, PPV (positive predictive value) and AUC (area under curve)

• Fitbit Flex significantly correlated with ActiGraph (regardless of gender or cardiac diagnosis) o Fitbit Flex significantly correlated with ActiGraph in males, females, total

participants and cardiac patients (r = 0.964; r = 0.945; r = 0.948; r = 0.947, p < 0.01)

• Fitbit Flex overcounted in females (556 steps/ day), males (1462 steps/day) and total participants (1038 steps/day)

• Fitbit Flex high sensitivity (100% accuracy) for identifying participants achieving physical activity guidelines of 7000 and 10,000 steps/day cut-off points and 150 min/week cut-off points

• Specificity and PPV of Fitbit-Flex for these cut-off points varied (0.83 - 0.57 and 0.67 respectively)

• Fitbit Flex high accuracy and specificity for participants who failed to achieve the cut-off values above but moderate specificity and high accuracy for participants who failed to achieve 7000 steps/day

Floegel et al., 201769

Fitbit One (h) Omron HJ-112 (h) Fitbit Flex (w) Jawbone UP (w) Direct observation (video) (c) StepWatch (a) (c)

• ICC

• Mean Percent Error

• Mean Absolute Percent Error

• Limits of Agreement

• Equivalency

• Monitors undercounted steps for older adults with no impairment: 4.4% StepWatch (ICC = 0.87); 2.6% Fitbit One (ICC = 0.80); 4.5% Omron HJ-112 (ICC = 0.72); 26.9% Fitbit Flex (ICC = 0.15); 2.9% Jawbone UP (ICC = 0.55)

• Older adults with no impairment Fitbit One was equivalent to the criterion (StepWatch, Omron, and Jawbone UP had marginal equivalence)

• Monitors undercounted steps for older adults with impairment: 3.5% StepWatch (ICC = 0.91); 1.7% Fitbit One (ICC = 0.96; 3.2% Omron HJ-112 (ICC = 0.89); 16.3% Fitbit Flex (ICC = 0.25); 8.4% Jawbone UP (ICC = 0.50)

• Older adults with impairment: StepWatch, Fitbit One, and Omron were equivalent to criterion

• Monitors undercounted steps for cane and walker-users: 1.8% StepWatch (ICC = 0.98) (cane); 1.3% StepWatch (ICC = 0.99) (walker); > 11.5% all other monitors (ICC < 0.05)

Thorup et al., 201772

Fitbit Zip (h) Shimmer3 (c)

• ICC

• Percent Error

• Limits of Agreement

• Hospitalized patients 24-h: percent error of −47.15 ± 24.11 (ICC = 0.60)

• At home, percent error −27.51 ± 28.78 (ICC = 0.87)

• Zip undercounted by 2402.7 steps/day

23

Authors, Year

Activity Monitors & Criterions

Step Count Analyses Step Count Outcomes

Farina and Lowry, 201845

Misfit Shine (h, w) Fitbit Charge HR (w) ActiGraph (h) (c) NL2000i (h) (c)

• Spearman Rank Correlations

• ICC

• Percentage Error

• Limits of Agreement

• Criterion monitors (ActiGraph and NL2000I) had near perfect agreement (ICC = 0.96, 95% confidence interval: 0.89 - 0.98), ActiGraph: mean 7,503.67 (SD = 3526.26) steps/day, NL2000i: mean 8,350.42 (SD = 3906.67) steps/day

• Misfit Shine (h): Undercounted steps/day vs NL2000i (rs = 0.90, p < 0.001) 633.2 steps/day, and overcounted vs ActiGraph device (rs = 0.96, p < 0.001) 167.6 steps/day

o Near perfect agreement vs ActiGraph and NL2000i

• Misfit Shine (w): Undercounted steps/day compared to both reference monitors, NL2000i (rs

= 0.89, p < 0.001 ) 899.7 steps/day and ActiGraph (rs = 0.91, p < 0.001) 10.1 steps/day o Good agreement (wide CIs), moderately wide limits of agreement

• Fitbit Charge HR (w): Overcounted steps vs both ActiGraph (rs = 0.84, p < 0.001) 2690.3 steps/day, and NL2000i (rs = 0.83, p < 0.001) 1721.6 steps/day

o Very wide limits of agreement against both criterions

Abbreviations: ICC: intraclass correlation coefficient; (a): ankle-worn activity monitor; (c): criterion device; (h): hip-worn activity monitor; (w): wrist-worn activity monitor; 2MWT: 2-minute walk test

24

1.8. Research Gaps

Considering the sum of evidence generated by past research on the reliability and

validity of step counts from activity monitors, there are three critical research gaps that

remain. Firstly, the test-retest reliability of any activity monitor, to my knowledge, has not

been evaluated in older adults. This is crucial information to evaluate when assessing

whether activity monitors should be used in an older adult population, as it indicates

whether the error of the monitor (i.e., the number of steps the monitor counts) will be

consistent throughout a day, week, or longer period of time. Secondly, participants in

individual studies have not always been comparable as their living situations, mobility, and

chronic diseases varied across studies.38 Due to these inconsistencies, comparison

across studies is limited. For example, out of the nine studies discussed above, only four

clearly indicated including community-dwelling older adults who did not use walking

aids.45,54,69,71 Finally, several monitors that have been previously evaluated are now

outdated and newer monitors, as well as additional brands, have not been tested in older

adults.36

1.9. Objectives

The objective of my thesis was to evaluate the test-retest reliability and criterion

validity of step counts from six consumer-grade activity monitors when worn by older

adults. The specific aims of my thesis were as follows.

1. To determine how the test-retest reliability of step counting varied across six consumer-grade activity monitors and was affected by the presence of self-reported mobility limitations in community-dwelling older adults during over-ground walking.

2. To determine how the criterion validity of step counting varied across six consumer-grade activity monitors and was affected by the presence of self-reported mobility limitations and walk interruptions in community-dwelling older adults during over-ground walking.

25

Chapter 2. Test-Retest Reliability and Criterion Validity of Step Counts from Consumer-Grade Activity Monitors Worn by Older Adults: The STRIDES Study

2.1. Introduction

Physical inactivity is a widespread problem among older adults, and it carries

serious health implications. Almost 90% of older Canadians (≥ 65 years) are not meeting

the national physical activity recommendation of ≥ 150 minutes per week of MVPA.9

Physical inactivity is linked to an increased risk of type 2 diabetes, cardiovascular disease,

colon cancer, osteoporosis, and postmenopausal breast cancer.12,20–22,74,75 In addition,

physically inactive older adults are at risk for falls, dependence in activities of daily living,

and mobility limitation.30 Mobility limitation impacts approximately 30% of older adults in

Canada and the United States, and is linked to adverse health outcomes including mobility

disability and nursing home admission.3,76,77 Older adults with mobility limitation could

especially benefit from physical activity interventions and corresponding physical activity

monitoring.3,76,77

Monitoring physical activity in older adult populations in both research and clinical

settings is useful for several reasons: to detect longitudinal changes in physical activity

levels,78 to determine effects of interventions,10,13–18 to assess adherence to physical

activity programs,10,17 to quantify daily physical activity patterns,79,80 and to motivate older

adults toward meeting physical activity goals.34 Consumer-grade activity monitors are a

relatively affordable type of wearable technology that count steps in addition to quantifying

other metrics of physical activity behaviour. Older adults are accepting of activity monitors,

find them helpful for motivation, and often prefer them over simple pedometers.35

To confidently use an activity monitor to count steps of older adults in research

and clinical settings for any of the aforementioned purposes, the step count measures

must be reliable and valid.19 If unreliable, the error in step count measurement will vary

from day-to-day for an individual, and in turn, this will limit the ability to precisely detect

changes in an individual’s physical activity over time.50 In addition, high criterion validity is

26

important to ensure that an activity monitor does not substantially under or overcount one’s

true step count, which could lead to incorrect conclusions in intervention studies aimed at

increasing physical activity or assessing the effects of physical activity on health

outcomes.19

Substantial evidence indicates that step counts from consumer-grade activity

monitors exhibit high interdevice reliability and criterion validity in healthy adults.36

However, age-related changes in gait may impact the precision and accuracy of step

counting.67 To this end, emerging evidence from studies of older adults shows that criterion

validity of step counts from consumer-grade activity monitors is high during short-distance

walks conducted in controlled laboratory settings at walking speeds > 0.8 m/s.38 However,

consumer-grade activity monitors tend to overcount steps of older adults during longer

distance walking in free-living conditions38,45,46,68 and undercount steps when older adults

walk with an assistive device, such as a walker.30,38,54,69

Important gaps in evidence remain to be addressed. First, test-retest reliability of

step counts from consumer-grade activity monitors has not been evaluated in older

adults.38 Second, the influence of self-reported mobility limitation on the reliability and

validity of activity monitor step counts in older adults has not been investigated. Finally,

although aspects of the walking environment, including interruptions to continuous

walking, have been suggested to influence reliability and validity of step counting in

adults,31,61,70,81 the effect of interruptions has not been studied in older adults.

The purpose of the STRIDES Study was to evaluate the reliability and validity of

step counts from consumer-grade activity monitors when worn by older adults. The first

aim was to determine how the test-retest reliability of step counting varied across six

consumer-grade activity monitors and was affected by the presence of self-reported

mobility limitations in community-dwelling older adults during over-ground walking. The

second aim was to determine how the criterion validity of step counting varied across six

consumer-grade activity monitors and was affected by the presence of self-reported

mobility limitations and walk interruptions in community-dwelling older adults during over-

ground walking.

27

2.2. Methods

2.2.1. Participants

Older adults were recruited through a variety of methods: study flyers posted

around the community (e.g., libraries, community centres, senior centres, coffee shops);

presentations by researchers and fitness instructors to groups of older adults (e.g.,

Osteofit exercise classes); advertisements in newspapers and recreation program guides;

and emails to previous research participants, fitness class attendees, and university

alumni. Individuals were eligible for inclusion, determined through telephone screening, if

they were 65 years and older, community-dwelling, and were able to speak, read, and

write English. Volunteers were classified as mobility-limited if they self-reported difficulty

walking one-quarter mile (2 to 3 blocks) outside on level ground or up a flight of stairs

(about 10 steps) without resting;3–5 otherwise they were classified as mobility-intact.

Individuals were excluded if they reported the inability to walk 400-metres independently

or scored below 26 (indicative of cognitive impairment) on the Montreal Cognitive

Assessment (MoCA).82,83 If the Physical Activity Readiness Questionnaire for Everyone

(PAR-Q+)84 indicated any medical contraindication to physical activity, the volunteer was

asked to seek physician approval to participate in the study. If they did not receive

physician approval, they were excluded. Approval of this study was obtained from Simon

Fraser University’s (SFU) research ethics board and University of British Columbia’s

clinical research ethics board. All participants provided verbal consent to telephone

screening and written informed consent to participate in the study.

2.2.2. Descriptive Measures

Participant demographic information including age, sex, racial background, marital

status, living arrangement, level of education, employment status, mobility aid use, and

smoking history was obtained through a self-report questionnaire (Appendix A). To assess

overall perception of health, participants self-rated their health compared to others of

similar age on a five-point scale (i.e., excellent, good, fair, poor, or very poor). Physical

measurements included height (seca stadiometer), and weight (seca digital scale).

Additional information was collected through self-report questionnaires, including physical

inactivity, functional comorbidities (functional comorbidity index),85 activity monitor

preferences, and computer and cellphone use (see Appendices). The Short Physical

28

Performance Battery (SPPB) was used to assess lower-extremity physical function and

involved tests of standing balance, 6-metre gait speed, and leg strength.3,77 Participants

received a score out of 12; a higher score indicated better function.

2.2.3. Outcome Measures

Monitors

Six activity monitors were evaluated (Table 2.1). Three monitors were worn on the

hip: Fitbit One, Misfit Shine, and New-Lifestyles NL-1000 Pedometer. The other three

monitors, the Fitbit Charge, Garmin vívofit 2, and the Jawbone UP2, were worn on the

wrist (Figure 2.1). The settings for each monitor were customized to the participant’s

height, weight, and age and were simultaneously placed on the non-dominant side of their

body according to the manufacturer’s instructions. Wrist-worn monitors were randomized

to their location on the arm (closest to wrist, middle, or furthest from wrist). Two of the hip-

worn monitors were randomly assigned to one of two sites, either closer to the belly button

or to the hip. The position of the NL-1000 hip-worn monitor was not randomized and was

always placed halfway between the belly button and hip, per the manufacturer’s

recommendation. The randomization procedure was performed prior to the date of testing.

Table 2.1 Description of activity monitors

Monitor Manufacturer Placement Digital Display

Step Counting Instrument

Fitbit Charge Fitbit, San Francisco, CA, US Wrist Yes Three-axis accelerometer

Fitbit One Fitbit, San Francisco, CA, US Hip Yes Three-axis accelerometer

Garmin vívofit 2 Garmin, Olathe, KS, US Wrist Yes Three-axis accelerometer

Jawbone UP2 JAWBONE, San Francisco, CA, US Wrist No Three-axis accelerometer

Misfit Shine Misfit, Burlingame, CA, US Hip No Three-axis accelerometer and magnetometer

New-Lifestyles NL-1000

New-Lifestyles, Lee’s Summit, MO, US Hip Yes Piezoelectric pedometer

29

Figure 2.1 Images of activity monitors.86–91

Walking Trials

Participants completed four walking trials in a hallway at SFU’s Burnaby campus

(Figure 2.2). Testing was conducted on weekends to avoid weekday foot traffic and signs

were displayed to minimize interruptions. If disruptions occurred during walks, the details

were recorded. For each walk, one researcher instructed the participant to start and stop

walking; the other researcher timed the walk with a stopwatch and recorded the time to

complete the walk. During the walks, the two researchers walked slightly behind the

participant and counted their steps using tally counters. Tally counts obtained were used

as the criterion measure, which is common in activity monitor assessment.19,58,70 When

discrepancies occurred between steps counted by the two researchers the following

procedures were followed: first, if a researcher believed they undercounted the steps, the

other researcher’s number was used; if there was no indication of a miscount, the median

value was used and rounded up to the nearest whole number. For comparison to the

criterion, the six activity monitor step counts were recorded immediately before and after

each walk. The participant was permitted to move once step counts from all monitors were

written down. For all trials, participants were instructed to walk at their preferred walking

speed, defined as a speed comfortable for them that they could maintain for the duration

of the walk. To prevent fatigue, participants were provided adequate rest time between

the walks (5 to 15 minutes). The four walking trials were typically completed within one

hour, with about 15 minutes of walking for testing purposes (about 1 minute per 100-step

walk, and about 5 to 7 minutes for 400-metre walks).

30

Figure 2.2 Walking trials completed in a level hallway. Participants walked four laps of the continuous and interrupted courses to reach 400 metres.

Reliability Walks (RW1 & RW2)

The first two walks required the participant to walk 100 steps. A researcher notified

the participants when they had five steps left to walk and provided a verbal count-down to

the end of the walk. If a participant did not walk exactly 100 steps on their first walk, the

participant was instructed to walk the same number of steps for their second walk.

400-metre Continuous Walk (CW)

A course of 100 metres was defined using pylons. Participants completed four laps

of the course without stopping, beginning and ending at the same point on the course.

400-metre Interrupted Walk (IW)

Participants walked the same 400 metres as in the CW. Seven interruptions were

incorporated into the 100-metre lap using additional pylons and signs. The interruptions

included an s-curve, two consecutive five-second stops, object avoidance (stepping over

a tree branch), a sharp turn to change direction, one five-second stop, two successive 90-

degree angle turns, and an additional sharp turn. In completing four laps, participants

encountered each interruption four times (36 interruptions in total). This walk was included

31

to more closely mimic conditions of daily walking than the CW because research suggests

that walking environment and interruptions could affect the validity or reliability of the

monitors.31,61,81

Measurements

Walking trial step counts for each activity monitor were calculated by subtracting

the step count recorded at the beginning of the walk from the step count recorded at the

end of the walk (e.g., End of CW Step Count – Beginning of CW Step Count = CW Step

Count). To account for participants walking a different number of steps, all step counts

were converted to step count percent errors (see equation). Step count percent errors

closer to zero were more desirable. Positive step count percent errors indicated that an

activity monitor was overcounting steps relative to the tally count (criterion), while negative

step count percent errors indicated undercounting relative to the tally count.

2.2.4. Statistical Analysis

Sample Size

We calculated that a sample size of 34 participants, 17 within each group (mobility-

intact and mobility-limited), would provide 80% statistical power to detect an effect size of

5% step count error within each group with significance level α = 0.05, assuming that the

standard deviation in step count error was similar to what was observed in our pilot data

(n = 5 young adults). We aimed to recruit 20 participants within each group to account for

potential incomplete data from all participants.

Descriptive Analysis

Descriptive data are presented as means and standard deviations (SD) for

normally distributed continuous variables and medians and interquartile ranges (IQR) for

skewed continuous variables. Judgements of normality were based off visual inspection

of frequency distributions. Frequencies and percentages are reported for categorical

variables. To assess differences in descriptive characteristics between the mobility-intact

32

and mobility-limited groups, three types of statistical tests were applied depending on how

the data were distributed: independent-samples t-tests were conducted for normally-

distributed continuous variables, Wilcoxon Rank Sum test for skewed continuous

variables, and chi-square for categorical variables. These tests were performed using JMP

software [Version 13.1, 2016]. Descriptive information for step count errors are presented

as means and 95% confidence intervals (95CI). Statistical modeling was conducted using

RStudio version 1.0.136. The significance level for each test was set at α = 0.05.

Test-Retest Reliability

As a measure of trial-to-trial consistency, the absolute difference between step

count percent errors from RW1 and RW2 was calculated.

A two-way analysis of variance (ANOVA) was used to compare the mean absolute

difference between RW1 step count percent error and RW2 step count percent error

based on activity monitor and mobility status. Post hoc analysis of pairwise differences

was conducted using Tukey’s HSD (Honest Significant Difference) test, where

appropriate, which held the experiment-wise error constant at α = 0.05. To assess

normality, we visually inspected the quantiles of the distribution, histograms, and density

plots and ran a Shapiro-Wilk normality test. Due to suggestions of lack of normality, we

also ran a non-parametric test, Kruskal-Wallis, which produced the same results and led

to the same conclusions as the ANOVA. For ease of interpretation, we reported results

from the ANOVA.

Additionally, the standard error of measurement (SEM) was calculated as a

descriptive measure of reliability. SEM was calculated as the SD of the differences

between the step count percent errors of RW1 and RW2, divided by the square root of the

number of walks, in accordance with Hopkins.50

33

Criterion Validity

A three-way ANOVA was used to determine whether activity monitor, interruptions

to walking, and mobility status had effects on the mean step count percent errors. Again,

post-hoc analysis of pairwise comparisons was conducted using Tukey’s HSD test, where

appropriate.

Bland-Altman plots92,93 were produced to assess systematic bias and limits of

agreement in step counts, on average, for each activity monitor and the CW and IW. The

mean of the two measures was plotted on the x-axis (e.g., [activity monitor step count] +

[tally counter step count]/2) and the error between the two measures was plotted on the

y-axis (e.g., [activity monitor step count] – [tally counter step count]). Reference lines

indicated the mean step count error, trend, and 95% limits of agreement (mean ± 1.96

SD).

Lastly, in line with past studies,69 equivalence testing was conducted to evaluate

whether mean step count percent errors were equivalent to zero step count percent error

for each activity monitor and both 400-metre walks. We defined an equivalence bound as

± 5.0% step count error, which we deemed to be clinically relevant. Two one-sided t-tests

were conducted to test both sides of the equivalence interval. If there was sufficient

evidence to reject both the null hypothesis of the upper threshold (mean error ≤ 5%), and

the null hypothesis of the lower threshold (mean ≥ -5%), then the mean step count error

was interpreted as “practically” equivalent to zero step count error.

2.3. Results

2.3.1. Participants

A total of 36 individuals participated in the study, including 20 mobility-intact (seven

females) and 16 mobility-limited (12 females) (Table 2.2). The mean age of the

participants was 71.4 years (SD = 4.7 years) and mean body mass index was 29.4 kg/m2

(SD = 5.9 kg/m2). For most characteristics, there were no significant differences between

the mobility-intact and mobility-limited groups. However, the mobility-limited group had

significantly slower gait speed than the mobility-intact group for the 6-metre (p < 0.001)

and continuous 400-metre (p < 0.001) walks. Additionally, the mobility-limited group had

a greater number of comorbidities (p = 0.018).

34

Table 2.2 Participant characteristics

Characteristic Overall n = 36

Mobility-Intact n = 20

Mobility-Limited n = 16

p value

Female 19 (52.8) 7 (35.0) 12 (75.0) 0.023

Age, years 71.4 (4.7) 73.1 (3.7) 71.6 (5.8) 0.400

Weight, kg 82.0 (16.8) 78.1 (17.4) 87.0 (15.1) 0.110

Body mass index, kg/m2 29.4 (5.9) 27.7 (4.4) 31.5 (7.0) 0.070

Caucasian 25 (69.4) 13 (65.0) 12 (75.0) 0.458

University education 16 (44.0) 9 (45.0) 7 (43.8) > 0.999

MoCA score (out of 30)a 27 (26-28) 27 (26-27) 28 (27-29) 0.026f

Previously smoked 17 (47.2) 8 (40.0) 9 (56.3) 0.332g

Self-rated good/excellent health 29 (80.6) 18 (90.0) 11 (68.8) 0.204

SPPB score (out of 12)a 11.0 (10.0-11.0) 11.0 (10.0-12.0) 10.0 (9.8-11.0) 0.056f

6-metre gait speed, m/s 1.2 (0.2) 1.3 (0.2) 1.1 (0.2) < 0.001

400-metre gait speed, m/s 1.3 (0.2) 1.4 (0.1) 1.2 (0.2) < 0.001

CW step count 600 (69.5) 562 (29.1) 648 (76.1) < 0.001

IW step count 656 (72.8) 617 (40.8) 703 (76.5) < 0.001

Self-reported MVPA, min/week 342.8 (368.0) 342.8 (272.0) 342.8 (471.4) 0.100

Self-reported walking, min/week 207.6 (197.4) 240.0 (207.6) 167.2 (182.3) 0.271

Number of comorbiditiesa 2.0 (0.0-3.0) 0.0 (0.0-2.0) 2.5 (1.0-4.0) 0.018f

Arthritis 13 (36.1) 4 (20.0) 9 (56.3) 0.038

Obesity 10 (27.8) 3 (15.0) 7 (43.8) 0.073

Visual impairmentsb 10 (27.8) 7 (35.0) 3 (18.8) 0.456

Degenerative disc diseasec 6 (16.7) 2 (10.0) 4 (25.0) 0.374

Upper gastrointestinal diseased 5 (13.9) 2 (10.0) 3 (18.8) 0.637

Depression 4 (11.1) 1 (5.0) 3 (18.8) 0.303

Diabetes (type 1 or 2) 4 (11.1) 1 (5.0) 3 (18.8) 0.303

Hearing impairmentse 3 (8.3) 1 (5.0) 2 (12.5) 0.574

Osteoporosis 3 (8.3) 0 (0.0) 3 (18.8) 0.078

COPD, ARDS or emphysema 2 (5.6) 0 (0.0) 2 (12.5) 0.191

≥ 1 comorbidities 19 (52.8) 8 (40.0) 11 (68.8) 0.041

≥ 2 comorbidities 12 (33.3) 4 (20.0) 8 (50.0) 0.086g

Has a computer with Internet access 31 (86.1) 17 (85.0) 14 (87.5) 1.000

Frequent computer use 27 (87.1) 16 (94.1) 11 (78.6) 0.304

Has a cellphone or smartphone 33 (91.7) 18 (90.0) 14 (87.5) 0.574

Frequent cellphone/smartphone use 28 (84.8) 16 (88.9) 12 (75.0) 1.000

Abbreviations: ARDS, Acquired respiratory distress syndrome; CW, 400-metre continuous walk; COPD, Chronic obstructive pulmonary disease; IW, 400-metre interrupted walk; MoCA, Montreal Cognitive Assessment; MVPA, Moderate-to-vigorous physical activity, includes self-reported walking; SPPB, Short Physical Performance Battery.

Data are presented as mean (SD) or N (%) unless otherwise indicated. Obesity, body mass index >30 kg/m2. Frequent computer use, 4 to 7 days per week. Frequent cellphone/smartphone use, A few times per week or every day. p values comparing mobility-intact vs mobility-limited, from Chi-Square Fisher’s Exact Test for categorical variables and from independent samples t-test for continuous variables

a Median (IQR) is reported.; b e.g., cataracts, glaucoma, macular degeneration.; c e.g., back disease, spinal stenosis, or severe chronic back pain.; d e.g., ulcer, hernia, reflux.; e e.g., very hard of hearing, even with hearing aids.; f from Wilcoxon Rank Sum Test; g from Chi-Square Pearson Test

35

2.3.2. Test-Retest Reliability

We found a significant main effect of activity monitor on the absolute difference

between the step count percent errors of RW1 and RW2 (p < 0.001) but no main effect of

mobility status (p = 0.312) and no interaction between activity monitor and mobility status

(p = 0.292). The One (1.0%, 95% confidence interval [95CI]: 0.6% to 1.3%), NL-1000

(2.6%, 95CI: 1.3% to 3.9%), and vívofit 2 (6.0%, 95CI: 3.2% to 8.8%) exhibited the

smallest mean absolute differences in step count percent errors. Post-hoc tests revealed

that the Charge (p = 0.017), UP2 (p < 0.001), and Shine (p < 0.001) exhibited significantly

higher mean absolute differences than the One. Compared to the NL-1000, the UP2 (p <

0.001) and Shine (p < 0.001) had greater mean absolute differences. Lastly, relative to

the vívofit 2, the UP2 (p = 0.002) and Shine (p = 0.004) had greater mean absolute

differences. The SEM values ranged from 1.0% (One) to 23.5% (UP2) (Figure 2.3).

2.3.3. Criterion Validity

All monitors undercounted steps relative to the manual criterion counts (Figure

2.4), with the Shine exhibiting the lowest step count percent error (-1.3%). The activity

monitor (p < 0.001) and walk interruptions (p = 0.018) factors affected the step count

percent error, but mobility status (p = 0.648) did not. We observed no interactions between

any of the factors. Post-hoc tests revealed that the Charge (p < 0.001) and vívofit 2 (p =

0.016) exhibited significantly higher step count percent errors than the Shine. Relative to

the One, the Charge had a greater mean step count percent error (p < 0.001). Lastly,

compared to the NL-1000, the Charge had a greater mean step count percent error (p =

0.031). Regarding the main effect of interruptions, compared to the CW, the IW resulted

in a greater mean step count percent error (mean difference 1.9%, p = 0.018).

Bland-Altman plots illustrated non-systematic bias across the range of observed

step counts for the One and UP2 (Figure 2.5). Systematic bias and wide limits of

agreement were observed for the Shine, NL-1000, vívofit 2, and the Charge. Additionally,

Bland-Altman plots indicated systematic bias across the range of steps and wide limits of

agreement for the CW and IW.

36

Figure 2.3 Box plots with median absolute difference step count percent errors between test-retest reliability walk 1 (RW1) and walk 2 (RW2) for the six indicated activity monitors (n = 36 for all monitors except Misfit Shine, n = 28) (95CI = 95% confidence interval, SEM = Standard error of measurement). Central rectangle spans the interquartile range (IQR), and the whiskers represent the inner fence (upper: Q3 + 1.5 x IQR, lower: Q1 – 1.5 X IQR). *Charge different than One (p = 0.017); Shine different than One (p < 0.001), NL-1000 (p < 0.001), and vívofit 2 (p = 0.004); UP2 different than One (p < 0.001), NL-1000 (p < 0.001), and vívofit 2 (p = 0.002).

37

Figure 2.4 Box plots with median step count percent errors for the indicated activity monitors (n = 72 for all monitors except the Misfit Shine, n = 67) and for walk interruptions (n = 213 for Continuous, n = 214 Interrupted) in 36 older adults (95CI = 95% confidence interval). Central rectangle spans the interquartile range (IQR), and the whiskers represent the inner fence (upper: Q3 + 1.5 x IQR, lower: Q1 – 1.5 X IQR). Horizontal dotted lines represent zero step count percent error. *vívofit 2 different than Shine (p = 0.016); Charge different than Shine (p < 0.001), One (p < 0.001), and NL-1000 (p = 0.031); Interrupted different than Continuous (p = 0.018).

38

Figure 2.5 Bland-Altman analyses for each activity monitor (n = 72 for all monitors except the Misfit Shine, n = 67) and for walk interruptions (n = 213 for Continuous, n = 214 Interrupted) compared to the criterion tally counts in 36 older adults. The solid lines represent the mean step count error (horizontal) and line of best fit (trend line). Dotted lines represent the limits of agreement (mean ± 1.96SD). Grey points represent self-reported mobility-intact participants, red points represent self-reported mobiltiy-limited participants.

39

Equivalence tests indicated that mean step count percent error of two monitors lay

within the equivalence bound, the One (p < 0.001) and Shine (p = 0.001); thus, step counts

from these monitors were deemed equivalent to zero step count percent error (Figure 2.6).

Regarding interruptions, the CW mean step count percent error was statistically equivalent

to zero step count percent error (p = 0.002) while the IW mean step count percent error

lay outside the equivalence bounds (p = 0.277).

Figure 2.6 Activity monitor (n = 72 for all monitors except the Misfit Shine, n = 67) and walk interruptions (n = 213 for Continuous, n = 214 for Interrupted) equivalence tests. Mean step count errors (%) with 95% confidence intervals (95CI). Area between dotted vertical lines

represents equivalence bounds ( 5.0%).

2.3.4. Activity Monitor Preferences

Thirty-five of 36 participants had heard of activity monitors prior to participating in

the study, and 30.6% owned an activity monitor (or pedometer). When participants were

asked about their activity monitor preferences, the vívofit 2 (30.6% of participants) and

Charge (27.8%) were most often selected as the monitor participants preferred.

Participants were less likely to select the One (16.7%), Shine (13.9%), UP2 (5.6%), or NL-

1000 (5.6%). The features frequently reported as most liked or most useful were step

40

counting, time of day display, ease of use, impermeability to water, and caloric expenditure

estimation.

2.4. Discussion

The STRIDES Study aimed to determine how test-retest reliability of step counting

by six consumer-grade activity monitors was affected by the presence of self-reported

mobility limitations and how the criterion validity of step counting by these six activity

monitors was affected by the presence of self-reported mobility limitations and walk

interruptions in community-dwelling older adults during over-ground walking. We found

that test-retest reliability varied across activity monitors (highest for the Fitbit One) and

was unaffected by self-reported mobility status. Criterion validity also varied across the

activity monitors (highest for the Fitbit One), was negatively impacted by walk

interruptions, and was unaffected by self-reported mobility status.

Our study is the first to report on the test-retest reliability of consumer-grade activity

monitors in the community-dwelling older adult population.38 We found that test-retest

reliability of step counting, measured by mean absolute percent difference in step count

error between repeat 100-step walks, varied across activity monitors. Specifically, only

two monitors had small mean absolute percent differences in step count error (RW1 -

RW2) less than 5.0%, the One and the NL-1000. Three monitors (Charge, UP2, and

Shine) were less reliable than either or both the One and NL-1000. Finally, the SEM of the

One was small, within ± 2.5%, which translates into a between-trial difference of ± 4.9%

step count error in 95 of 100 instances (95% likely range of -4.9 to 4.9%). All other monitors

had SEM values indicative of poor reproducibility. Therefore, only the One had sufficiently

high test-retest reliability.

We found that criterion validity of step counting was affected by both activity

monitor and walk interruptions during 400-metre walks, with no interaction observed

between the two factors. The One was the only monitor with high criterion validity. This

was determined based on the One’s small mean step count percent error (less than ±

5.0%), lack of systematic bias, small limits of agreement, and it was deemed equivalent

to zero step count percent error (equivalence bound of ± 5.0%). Three of the other

monitors (Shine, NL-1000, and UP2) exhibited moderate correspondence to the criterion,

while both the vívofit 2 and Charge had poor correspondence with the criterion. The results

41

for criterion validity are consistent with previous research by Floegel and colleagues who

found that the Fitbit One had the lowest mean step count percent error and outperformed

other monitors (StepWatch, Omron HJ-112, Fitbit Flex, Jawbone UP) when compared to

direct observation during a 100-metre walk involving both older adults with no impairment

(mobility intact) and impairment (mobility-limited).69

For all activity monitors, walking with interruptions resulted in greater mean step

count percent errors than walking continuously. In addition, the mean step count percent

error for interrupted walking was not equivalent to zero step count percent error

(equivalence bound of ± 5.0%), while the mean step count percent error for continuous

walking was equivalent. However, Bland-Altman plots revealed both walking conditions

showed systematic bias, specifically that step count errors increased in proportion to the

number of steps, and wide limits of agreement. Previous studies did not explicitly test older

adults in a controlled setting with interrupted walking; however, studies have investigated

activity monitors during free-living conditions in older adults.45,46,68,71,72 In these studies,

five out of eight consumer-grade hip- and wrist-worn activity monitors were found to

overcount the criterion measures.45,46,68,71,72 These results are inconsistent with our finding

that the activity monitors undercounted steps during continuous and interrupted walking.

A possible reason for the discrepancy is that, during free-living conditions, movements

other than stepping (e.g., moving during eating or conversation) may reach accelerations

greater than the monitor algorithm thresholds, causing steps to be erroneously recorded.49

Tudor-Locke and colleagues compared the hip-worn and wrist-worn ActiGraph

accelerometer during treadmill walking and in free-living conditions.94 During treadmill

testing, they found the wrist-worn monitor detected fewer steps than the hip-worn monitor;

however, during free-living conditions the wrist-worn monitor counted more steps than the

hip-worn monitor.

Regarding self-reported mobility, we found that test-retest reliability and criterion

validity of step counting was not affected by the presence of a self-reported mobility

limitation, suggesting that older adults with a self-reported mobility limitation can expect

similar performance from the activity monitors tested in this study as older adults with self-

reported intact mobility. Consistent with our results, Floegel and colleagues reported that

mean step count errors for most monitors tested were similar and small for older adults

with or without walking impairment who did not walk with a cane or walker

(StepWatch -4.42% vs -3.45%, Fitbit One -2.59% vs -1.71%, Omron -4.48% vs -3.15%,

42

Fitbit Flex -26.94% vs -16.31%, Jawbone UP -2.86% vs -8.43%).69 In contrast, Lauritzen

and colleagues reported that mobility limitations decreased activity monitor validity when

comparing a small group of walker-dependent older adults in nursing homes to healthy

older adults.30 Lower gait assessment scores were significantly negatively correlated with

larger absolute percent errors, while longer walk times and larger step counts were

positively correlated with larger absolute percent errors. Our study population differed in

that participants did not use walking aids. Furthermore, previous literature indicates slow

gait speed significantly affects the criterion validity of activity monitors.54 Simpson and

colleagues reported that the Fitbit One, when worn on the hip, recorded zero steps when

participants walked at speeds between 0.3 and 0.5 m/s, and it had a mean percent error

below 10% only when walking speed was 0.8 and 0.9 m/s.54 Our participants walked, on

average, at 1.2 m/s (mobility-intact 1.3 m/s, mobility-limited 1.1 m/s); thus, speed would

not have impacted activity monitor performance. Had our participants with self-reported

mobility limitation had very slow gait speed or more severe asymmetries in their gait, we

may have seen a difference in the reliabilities or validities of the monitors based on self-

reported mobility status.

2.4.1. Strengths

This study had a number of strengths. First, all walking tests were performed

during over-ground walking, which more closely represented natural walking conditions

than treadmill walking. Treadmill walking has been used frequently in previous studies to

evaluate measurement properties of activity monitors, and it enables testing monitors at

controlled walking speeds, but older adults unfamiliar with treadmill walking exhibit

increased heart rate and oxygen consumption while walking on a treadmill compared to

over-ground walking.95 Moreover, treadmills impose greater symmetry in gait than may be

observed naturally, which could in turn influence measurement of reliability and validity.

Second, this study tested six different activity monitors, and, to our knowledge, four of six

(Charge, vívofit 2, UP2, and NL-1000) have not been previously tested in older adults.

Lastly, we included older adults with self-reported mobility limitations, which is important

because they are a relevant population for physical activity interventions and surveillance

and comprise a sizeable proportion of the older adult population.

43

2.4.2. Limitations

This study featured several limitations. First, the results have limited

generalizability with respect to the activity monitors. We tested a single monitor of each

activity monitor model with a relatively small sample size. Thus, results obtained from this

study may not be applicable to all versions of the activity monitor model tested or other

monitors produced by the same brand. A poorly calibrated monitor (in relation to the

average monitor) would have led to underestimated monitor validity while a better-than-

average calibrated monitor would have led to overestimation of monitor validity. Ideally,

multiple versions of each monitor would have been tested and the difference between the

monitors assessed. We had to limit the number and distance of walks performed with our

older adult study population, so it was not feasible to conduct additional testing. However,

we believe that inter-device variation would likely have been minimal based on a

systematic review on consumer-grade activity monitors that reported high inter-device

reliability for step counts from four studies testing three Fitbit models (Classic, One, and

Ultra; ICCs ranged from 0.76 to 1.00).36

Additionally, since the reliability of consumer-grade activity monitors had not been

previously evaluated in older adults, we chose to begin by assessing test-retest reliability

on short, 100-step continuous walks. Future studies are needed to examine the effects of

walking interruptions and longer distances on test-retest reliability; our results suggest that

reliability under these conditions would likely be better for a hip-worn Fitbit monitor, such

as the One, than for other monitors.

In this study we did not consider the contributions of sex, walking speed, participant

height, and stride length on test-retest reliability or criterion validity. Additionally, we did

not investigate how common daily tasks, other than walking, affected activity monitor step

counts. It will be important for future studies to evaluate the reliability and validity of step

counting by consumer-grade activity monitors during a wider range of every day

movements than was tested in this study. Further, future studies should seek to determine

the sources of error during activities of daily living which often result in overcounting during

free-living assessment of consumer-grade activity monitors.

Occasional monitor malfunctions occurred in several of the activity monitors that

we evaluated. For example, the Misfit Shine would intermittently not register any steps

44

completed after attempting to sync the monitor numerous times. If the monitor would not

sync after a few minutes, the step count was recorded as missing.

2.4.3. Conclusion

The STRIDES Study provides essential information about the test-retest reliability

and criterion validity of several consumer-grade activity monitors in older adults with self-

reported intact- and limited-mobility and thus contributes to the emerging literature in this

area. Researchers can refer to this study, and others in the field, to help choose an activity

monitor for future physical activity studies with older adults designed to detect changes in

activity levels, assess adherence to a physical activity program, quantify daily physical

activity patterns (in conjunction with self-report questionnaires), or use as a motivational

tool (goal setting).

Specifically, we found that step count test-retest reliability and criterion validity was

different based on activity monitor when worn by a sample of older adults with self-reported

intact- and limited-mobility. We also found that walk interruptions increased the step count

error for all monitors but did not affect any monitor to a greater extent than the others.

Fortunately, we found that the presence of self-reported mobility limitations did not affect

step count error. Only one monitor exhibited both high test-retest reliability and criterion

validity, the hip-worn Fitbit One, and it is recommended for use in groups of older adults

with self-reported intact- and limited-mobility.

45

Chapter 3. General Discussion

The STRIDES Study aimed to determine how the test-retest reliability and criterion

validity of step counting varied across six consumer-grade activity monitors and how it

was affected by the presence of self-reported mobility limitations in community-dwelling

older adults during over-ground walking. We also sought to evaluate the effect of

interruptions on the criterion validity in the same population. We tested six devices: the

Fitbit Charge, Fitbit One, Garmin vívofit 2, Jawbone UP2, Misfit Shine, and New-Lifestyles

NL-1000. Test-retest reliability and criterion validity varied across activity monitors.

Additionally, criterion validity was poorer when interruptions were incorporated into a 400-

metre walk. Furthermore, test-retest reliability and criterion validity were unaffected by

self-reported mobility status. In general, the hip-worn monitors performed better than wrist-

worn monitors, with only the Fitbit One displaying both high test-retest reliability and high

criterion validity. The results of this study will aid researchers in selecting an appropriate

activity monitor to measure steps in community-dwelling older adults.

It is evident that test-retest reliability and criterion validity are both important

aspects of a measurement instrument and require evaluation. When considering activity

monitors, the number of steps that are missed and counted are of concern, or the

undercounting and overcounting. Through evaluation, the mean step count percent error

for a monitor can be determined. For example, if this error is -10%, we expect that, on

average, the monitor would count 9,000 of 10,000 actual steps. If this device is always

undercounting by 10%, this bias can be accounted for and step count goals or how

physical activity levels are determined in a study can be adjusted accordingly. For the

purpose of long-term monitoring in studies with older adults, I believe a step count error

within ± 5% would be acceptable. However, if this monitor is unreliable, the mean step

count percent error will vary from day-to-day. Unfortunately, it is not straightforward to

account for an unreliable device.

46

3.1. Feasibility of Recruiting and Studying Older Adults with Intact- and Limited-Mobility

We sought to recruit older adults at high risk for functional decline and poor health

outcomes, since these individuals stand to benefit most from physical activity monitoring

and corresponding interventions. We recruited participants by using telephone screening

to identify individuals who self-reported mobility limitation. An important question to

consider is whether this was an effective method to identify the high-risk group of older

adults we targeted.

Consistent with many studies in the field, we defined an individual as mobility-

limited if they self-reported any difficulty walking one-quarter mile (2 to 3 blocks) outside

on level ground or up a flight of stairs (about 10 steps) without resting.3–5 Our study

recruited 20 individuals with self-reported intact mobility and 16 with mobility limitation.

During the study visit, the SPPB was used to assess lower-extremity physical function and

involved tests of standing balance, 6-metre gait speed, and leg strength.3,77 Participants

received a score out of 12, with a higher score indicating better function. In line with our

expectations, the mobility-intact group had higher functioning (mean score of 11.0) than

the mobility-limited group (mean score of 10.0). This difference was clinically relevant.96

Additionally, the mobility-limited group had a slower mean 6-metre gait speed (1.1 vs. 1.2

m/s, p < 0.001), a higher number of comorbidities (p = 0.018), and featured more

individuals with two or more comorbidities (p = 0.041).

These observations notwithstanding, the mobility-limited group was still relatively

healthy. The five-year survival rate for men and women 65 to 74 years of age, which

incorporates our participants’ mean age, is 90% and 96%, respectively, when walking at

a usual pace between ≥ 1.0 and < 1.2 m/s.97 In contrast, with the same age group, walking

between ≥ 0.6 and < 0.8 m/s results in reduced five-year survival rates (79% for men, 91%

for women).97 Furthermore, individuals with SPPB scores equal to or less than nine are at

a higher risk for mobility-related disability compared to those with the best performance

(10 to 12).3,98 In the Established Populations for the Epidemiologic Study of the Elderly

(EPESE), the relative risk of mobility-related disability ranged from 1.5 to 2.1 for those with

scores between 7 and 9, compared to the best performance.98 This raises the possibility

that in order to recruit high-risk older adults with poorer physical functioning, we would

need to recruit people by direct or objective observation of their slow gait speeds or low

47

SPPB scores, similar to the Lifestyles Interventions and Independence for Elders (LIFE)

Study.99

Even though our population was healthier than the LIFE Study population, and we

used telephone screening which was logistically simpler and less time consuming than in-

person screening, it was still a challenge to recruit individuals with self-reported mobility

limitations. This challenge could have been due to the nature of the study. Our recruitment

advertisements indicated that we were looking for older adults with and without walking

difficulties to perform a number of simple walks. It is possible that individuals who have

difficulty walking were not comfortable with having their walking “tested” or were

apprehensive of travelling up to our SFU campus on Burnaby Mountain. In addition, eight

individuals with mobility limitations were screened but did not provide physician approval

to participate in the study. Some of these individuals did not seek approval while others

did not receive written approval from their physician. Making the testing more accessible

might increase recruitment of individuals with limited mobility. For example, study

procedures and walking trials could be conducted at community centres around the city,

eliminating the need to travel a long distance or to an unfamiliar location. Additionally,

individuals responded well to recruitment from familiar sources (e.g., activity coaches,

community program leaders, SFU Alumni). Targeted advertisement and promotion from

these sources may lead to more successful recruitment.

3.2. Additional Challenges and Limitations

Our study was conducted in a controlled setting, and we did not test monitors

during activities common amongst older adults other than walking, such as household

chores (e.g., cooking, cleaning), or leisure activities (e.g., knitting, gardening). It is

important to understand the amount of activity detected during these types of activities.

Since step counts are not equivalent to all the physical activity one performs in a day,

there is a desire for devices to measure all physical activity using other tools (e.g., minutes

of MVPA) but at the same time not produce false positives (i.e., overcounting steps). False

positives often occur in these monitors during non-stepping activities which can affect the

quantification of physical activity.100 Additionally, Nelson et al. found that activity monitors

perform well when worn by adults (18 to 90 years) during sedentary and walking activities,

but perform poorly (larger step count errors) during cycling, stair walking, and performing

48

household activities.70 In the future, studies should evaluate these devices during non-

walking activities within a community-dwelling older adult population.

The six activity monitors we evaluated all undercounted steps on average;

however, it was unclear why these monitors missed steps. Each activity monitor was

outfitted with a triaxial accelerometer (or piezoelectric pedometer: NL-1000). Companies

employ algorithms that specify acceleration thresholds to detect motion patterns that

resemble walking.49 Therefore, if the acceleration of the hip or wrist does not meet this

defined threshold, then a step will not be recorded.49 Research suggests that if hip

acceleration is slow, the predefined required threshold might not be met, thus resulting in

missed steps (undercounting).30,54 In our interrupted walking, participants were

decelerating and accelerating throughout when stopping or starting, avoiding obstacles,

and/or making turns. Often, the participants’ steps were slow and stride length decreased

during these tasks, which possibly resulted in low wrist and hip accelerations. Generally,

our population did not walk particularly slowly97 in any of the walking trials; however, we

did find that some wrist-worn monitors had significantly higher mean step count percent

errors than hip-worn devices. This observation was consistent with those of Tudor-Locke

and colleagues.94 They compared the ActiGraph worn at the hip and wrist during treadmill

walking.94 Through treadmill testing, they found the wrist-worn monitor detected fewer

steps than the hip-worn monitor.94 Another consideration is whether the monitor software

updates adjust the algorithms and detection thresholds, which would cause changes to

recorded step counts.

As discussed, activity monitors can estimate a number of outcomes such as

MVPA. However, our study only analysed the reliability and validity of step counting.

Assessment of validity and reliability of activity duration and energy expenditure estimation

has produced variable results in adults.36 Straiton’s review with older adults discovered

only two studies that evaluated physical activity duration estimated by activity monitors,

which again reported conflicting information.38 It is also important to note, most commonly,

these devices are designed with healthy adults in mind and algorithms for defining the

thresholds for a step or MVPA are at healthy adults’ levels and not the level of older

adults.38,101

Another challenge of conducting studies with activity monitors is the inability to

keep up with the rapidly changing market. Of six devices tested, only three models are still

49

in production: the Garmin vívofit 2, Misfit Shine, and New-Lifestyles NL-1000. Since the

monitors tested in our study were purchased, Jawbone has gone out of business and Fitbit

has released at least 10 new activity monitors. Fortunately, we now have a practical

method of testing the reliability and validity of these devices in a population of older adults

that can be implemented in the future.

3.3. Knowledge Translation

The results of the STRIDES Study have been communicated in various formats. I

presented results as a poster at the Canadian Society for Exercise Physiology conference

in Victoria, British Columbia, October 2016. In July 2017, I attended the 21st International

Association of Gerontology and Geriatrics (IAGG) World Congress of Gerontology and

Geriatrics in San Francisco, California where I participated in a poster session102 and the

Technology and Aging Day Minute Madness: Breaking Technology Research from New

Investigators. Most recently, in October 2018, I presented a poster of the results at the

Canadian Association of Gerontology conference in Vancouver, British Columbia.

Additionally, the results have been presented in seminars and poster format internally at

SFU. In the future, study participants will be provided with a lay summary of the results.

Additionally, we are planning to submit the study as a manuscript for publication.

3.4. Future Directions

Our study revealed a high test-retest reliability and criterion validity of the Fitbit

One, but other features must also be considered when deciding which monitor to use in a

study. Researchers must consider the adherence and feasibility of using the device: Is an

older adult willing to wear a device on their hip daily? Older adults have numerous opinions

on these monitors and some have indicated they feel less comfortable with a device if it

clips on to clothing out of fear of losing or breaking the monitor. Additionally, studies have

reported that both adult and older adult participants have lost hip-worn monitors.39,103–105

In our study, participants most often preferred two wrist-worn monitors (Garmin vívofit 2

and Fitbit Charge). Future studies that survey older adults could be useful to determine

which monitors and features are ideal for this population to help select or choose between

available monitors.

50

Given the low physical activity levels of older adults9 and general acceptance of

activity monitors among older adults,35 companies may be inclined to target older adults

with specifically designed devices. In addition to using algorithms appropriate for the age

group,106 other researchers have suggested providing users with a simple paper-based

manual, instead of web-based, which is a more familiar medium and using large text with

high contrast on the monitor and corresponding application.35 Also, waterproof monitors

might be ideal for older adults because of how common water-based activities are for

leisure and exercise and to eliminate the worry of damaging the monitor by forgetting to

take it off when bathing or washing the dishes.35 Our study participants indicated that a

waterproof device was desirable.

The feasibility of future physical activity interventions can be improved by

implementing reliable and valid activity monitors, such as the Fitbit One, as measures of

step counting. Moreover, by using a reliable and valid monitor and reducing the

dependency on self-report measures, older adult physical activity studies will achieve

more accurate conclusions. The ability to detect changes in activity levels (step counts),

assess adherence to a program, and quantify daily physical activity patterns will improve.

Ultimately, enhancing the implementation and comparison of future physical activity

interventions. Furthermore, with improved intervention strategies, a positive increase in

levels of physical activity among older adults may occur. Improving physical activity levels

is of great importance in this population because of their high risk for premature death and

a range of chronic diseases.12,20–22 While there are many benefits to uncovering a reliable

and valid activity monitor for the community-dwelling older adult population, it is important

to remember that these activity monitors should only be used in study populations where

they have been deemed reliable and valid.

3.5. Conclusions

The STRIDES Study provides new information about the use of consumer-grade

activity monitors in older adults. We assessed six consumer-grade activity monitors (Fitbit

Charge, Fitbit One, Garmin vívofit 2, Jawbone UP2, Misfit Shine, and New-Lifestyles NL-

1000) and aimed to determine how the test-retest reliability and criterion validity of step

counting varied across these monitors and how they were affected by the presence of self-

reported mobility limitations in community-dwelling older adults during over-ground

walking. Additionally, we aimed to evaluate the effect of interruptions on the criterion

51

validity in the same population. Consistent with previous studies, reliability and validity

were variable in hip- and wrist-worn consumer-grade activity monitors during controlled

settings.30,54,69 Specifically, test-retest reliability and criterion validity varied across six

activity monitors. We also found that validity was affected by walking interruptions.

Criterion validity was poorer when interruptions were incorporated into a 400-metre walk.

The Fitbit One monitor excelled both in terms of reliability and validity, and, in general, hip-

worn monitors outperformed wrist-worn devices. Furthermore, self-reported mobility-

status of the participants did not significantly affect the error of the monitors. The results

of the STRIDES Study contribute to the growing literature on consumer-grade activity

monitors in the older adult community.

52

References

1. de Vet HC, Terwee CB, Mokkink LB, Knol DL. Measurement in Medicine. New York: Cambridge University Press; 2011.

2. Mokkink LB, Terwee CB, Patrick DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63(7):737-745. doi:10.1016/j.jclinepi.2010.02.006

3. Guralnik JM, Ferrucci L, Simonsick EM, Salive ME, Wallace RB. Lower-Extremity Function in Persons over the Age of 70 Years as a Predictor of Subsequent Disability. N Engl J Med. 1995;332(9):556-562. doi:10.1056/NEJM199503023320902

4. Simonsick EM, Newman AB, Visser M, et al. Mobility limitation in self-described well-functioning older adults: importance of endurance walk testing. J Gerontol A Biol Sci Med Sci. 2008;63A(8):841-847. doi:http://dx.doi.org/10.1093/gerona/63.8.841

5. Wolinsky FD, Miller DK, Andresen EM, Malmstrom TK, Miller JP. Further evidence for the importance of subclinical functional limitation and subclinical disability assessment in gerontology and geriatrics. J Gerontol B Psychol Sci Soc Sci. 2005;60B(3):S146-S151.

6. United Nations Department of Economic and Social Affairs Population Division. World Population Ageing.; 2015. http://www.un.org/en/development/desa/population/publications/pdf/ageing/WPA2015_Report.pdf. Accessed May 28, 2018.

7. United Nations Department of Economic and Social Affairs Population Division. World Population Prospects: The 2015 Revision, Key Findings and Advance Tables.; 2015. https://esa.un.org/unpd/wpp/publications/files/key_findings_wpp_2015.pdf. Accessed May 28, 2018.

8. Paterson DH, Warburton DE. Physical activity and functional limitations in older adults: a systematic review related to Canada’s Physical Activity Guidelines. Int J Behav Nutr Phys Act. 2010;7:38. doi:10.1186/1479-5868-7-38

9. Statistics Canada. Directly Measured Physical Activity of Adults, 2012 and 2013.; 2015. http://www.statcan.gc.ca/pub/82-625-x/2015001/article/14135-eng.htm. Accessed November 15, 2016.

10. King AC, Rejeski WJ, Buchner DM. Physical activity interventions targeting older adults. A critical review and recommendations. Am J Prev Med. 1998;15(4):316-333. doi:10.1016/S0749-3797(98)00085-3

53

11. Hoffman C, Rice D, Sung H-Y. Persons With Chronic Conditions. Their Prevalence and Costs. JAMA. 1996;276(18):1473-1479. doi:10.1001/jama.1996.03540180029029

12. Warburton DE, Charlesworth S, Ivey A, Nettlefold L, Bredin SS. A systematic review of the evidence for Canada’s Physical Activity Guidelines for Adults. Int J Behav Nutr Phys Act. 2010;7:39. doi:10.1186/1479-5868-7-39

13. Cyarto EV, Moorhead GE, Brown WJ. Updating the evidence relating to physical activity intervention studies in older people. J Sci Med Sport. 2004;7(1):30-38. doi:10.1016/S1440-2440(04)80275-5

14. Conn VS, Valentine JC, Cooper HM. Interventions to Increase Physical Activity Among Aging Adults: A Meta-Analysis. Ann Behav Med. 2002;24(3):190-200. doi:10.1207/S15324796ABM2403

15. Conn VS, Minor MA, Burks KJ, Rantz MJ, Pomeroy SH. Integrative Review of Physical Activity Intervention Research with Aging Adults. J Am Geriatr Soc. 2003;51(8):1159-1168. doi:10.1046/j.1532-5415.2003.51365.x

16. Müller AM, Khoo S. Non-face-to-face physical activity interventions in older adults: a systematic review. Int J Behav Nutr Phys Act. 2014;11(1):35. doi:10.1186/1479-5868-11-35

17. Ashworth NL, Chad KE, Harrison EL, Reeder BA, Marshall SC. Home versus center based physical activity programs in older adults. Cochrane Database Syst Rev. 2005;(1). doi:10.1002/14651858.CD004017.pub2

18. Van Der Bij AK, Laurant MGH, Wensing M. Effectiveness of physical activity interventions for older adults. A review. Am J Prev Med. 2002;22(2):120-133. doi:10.1016/S0749-3797(01)00413-5

19. Falck RS, McDonald SM, Beets MW, Brazendale K, Liu-Ambrose T. Measurement of physical activity in older adult interventions: a systematic review. Br J Sports Med. 2016;50:464-470. doi:10.1136/bjsports-2014-094413

20. Canadian Society for Exercise Physiology. Canadian Physical Activity Guidelines. For Older Adults - 65 Years & Older.; 2011.

21. Tremblay MS, Warburton DER, Janssen I, et al. New Canadian physical activity guidelines. Appl Physiol Nutr Metab. 2011;36(1):36-46. doi:10.1139/H11-009

22. Warburton DER, Katzmarzyk PT, Rhodes RE, Shephard RJ. Evidence-informed physical activity guidelines for Canadian adults. Can J Public Heal. 2007;98:Suppl 2:S16-68.

54

23. Statistics Canada. Leisure-Time Physical Activity.; 2013. https://www.statcan.gc.ca/pub/82-229-x/2009001/deter/lpa-eng.htm. Accessed May 29, 2018.

24. Tudor-Locke C, Craig CL, Aoyagi Y, et al. How many steps/day are enough? For older adults and special populations. Int J Behav Nutr Phys Act. 2011;8:80. doi:10.1186/1479-5868-8-80

25. Marshall SJ, Levy SS, Tudor-Locke CE, et al. Translating Physical Activity Recommendations into a Pedometer-Based Step Goal. 3000 Steps in 30 Minutes. Am J Prev Med. 2009;36(5):410-415. doi:10.1016/j.amepre.2009.01.021

26. Janssen I. Health care costs of physical inactivity in Canadian adults. Appl Physiol Nutr Metab. 2012;37(4):803-806. doi:10.1139/h2012-061

27. Tudor-Locke C, Schuna JM, Barreira T V., et al. Normative steps/day values for older adults: NHANES 2005-2006. Journals Gerontol - Ser A Biol Sci Med Sci. 2013;68(11):1426-1432. doi:10.1093/gerona/glt116

28. Baker PRA, Francis DP, Soares J, Weightman AL, Foster C. Community wide interventions for increasing physical activity. Cochrane Database Syst Rev. 2015;(1). doi:10.1002/14651858.CD008366.pub3

29. Helmerhorst HJ, Brage S, Warren J, Besson H, Ekelund U. A systematic review of reliability and objective criterion-related validity of physical activity questionnaires. Int J Behav Nutr Phys Act. 2012;9:103. doi:10.1186/1479-5868-9-103

30. Lauritzen J, Munoz A, Luis Sevillano J, Civit A. The usefulness of activity trackers in elderly with reduced mobility: a case study. Stud Health Technol Inform. 2013;192:759-762. doi:10.3233/978-1-61499-289-9-759

31. Takacs J, Pollock CL, Guenther JR, Bahar M, Napier C, Hunt MA. Validation of the Fitbit One activity monitor device during treadmill walking. J Sci Med Sport. 2014;17(5):496-500. doi:10.1016/j.jsams.2013.10.241

32. Ferguson T, Rowlands AV, Olds T, Maher C. The validity of consumer-level, activity monitors in healthy adults worn in free-living conditions: a cross-sectional study. Int J Behav Nutr Phys Act. 2015;12:42. doi:10.1186/s12966-015-0201-9

33. Szanton SL, Walker RK, Roberts L, et al. Older adults’ favorite activities are resoundingly active: Findings from the NHATS study. Geriatr Nurs (Minneap). 2015;36(2):131-135. doi:10.1016/j.gerinurse.2014.12.008

34. Tudor-Locke C, Lutes L. Why do pedometers work? A reflection upon the factors related to successfully increasing physical activity. Sport Med. 2009;39(12):981-993. doi:10.2165/11319600-000000000-00000

55

35. Mercer K, Giangregorio L, Schneider E, Chilana P, Li M, Grindrod K. Acceptance of Commercially Available Wearable Activity Trackers Among Adults Aged Over 50 and With Chronic Illness: A Mixed-Methods Evaluation. JMIR mHealth uHealth. 2016;4(1):e7. doi:10.2196/mhealth.4225

36. Evenson KR, Goto MM, Furberg RD. Systematic review of the validity and reliability of consumer-wearable activity trackers. he Int J Behav Nutr Phys Act. 2015;12:159. doi:10.1186/s12966-015-0314-1

37. Ubrani J, Llamas R, Shirer M. Global Wearables Market Grows 7.7% in 4Q17 and 10.3% in 2017 as Apple Seizes the Leader Position, Says IDC. International Data Corporation. https://www.idc.com/getdoc.jsp?containerId=prUS43598218. Published 2018. Accessed May 30, 2018.

38. Straiton N, Alharbi M, Bauman A, et al. The validity and reliability of consumer-grade activity trackers in older, community-dwelling adults: a systematic review. Maturistas. 2018;112:85-93. doi:https://doi.org/10.1016/j.maturitas.2018.03.016

39. Cadmus-Bertram LA, Marcus BH, Patterson RE, Parker BA, Morey BL. Randomized Trial of a Fitbit-Based Physical Activity Intervention for Women. Am J Prev Med. 2015;49(3):414-418. doi:10.1016/j.amepre.2015.01.020

40. Bentley F, Tollmar K, Stephenson P, et al. Health Mashups: Presenting Statistical Patterns between Wellbeing Data and Context in Natural Language to Promote Behavior Change. ACM Trans Comput Interact. 2013;20(5):1-27. doi:10.1145/2503823

41. Kurti AN, Dallery J. Internet-based contingency management increases walking in sedentary adults. J Appl Behav Anal. 2013;46(3):568-581. doi:10.1002/jaba.58

42. Washington WD, Banna KM, Gibson AL. Preliminary efficacy of prize-based contingency management to increase activity levels in healthy adults. J Appl Behav Anal. 2014;47(2):231-245. doi:10.1002/jaba.119

43. Thompson WG, Kuhle CL, Koepp GA, McCrady-Spitzer SK, Levine JA. “Go4Life” exercise counseling, accelerometer feedback, and activity levels in older people. Arch Gerontol Geriatr. 2014;58(3):314-319. doi:10.1016/j.archger.2014.01.004

44. Wang JB, Cadmus-Bertram LA, Natarajan L, et al. Wearable Sensor/Device (Fitbit One) and SMS Text-Messaging Prompts to Increase Physical Activity in Overweight and Obese Adults: A Randomized Controlled Trial. Telemed J E Health. 2015;21(10):782-792. doi:10.1089/tmj.2014.0176

45. Farina N, Lowry RG. The Validity of Consumer-Level Activity Monitors in Healthy Older Adults in Free-Living Conditions. J Aging Phys Act. 2018;26:128-135. doi:10.1123/japa.2016-0344

56

46. Alharbi M, Bauman A, Neubeck L, Gallagher R. Validation of Fitbit-Flex as a measure of free-living physical activity in a community-based phase III cardiac rehabilitation population. Eur J Prev Cardiol. 2016;23(14):1476-1485. doi:10.1177/2047487316634883

47. Garatachea N, Luque GT, Gallego JG. Physical activity and energy expenditure measurements using accelerometers in older adults. Nutr Hosp. 2010;25(2):224-230. doi:10.3305/nh.2010.25.2.4439

48. Schrack JA, Cooper R, Koster A, et al. Assessing Daily Physical Activity in Older Adults: Unraveling the Complexity of Monitors, Measures, and Methods. J Gerontol A Biol Sci Med Sci. 2016;71(8):1039-1048. doi:10.1093/gerona/glw026

49. Fitbit. How does my Fitbit device calculate my daily activity? http://help.fitbit.com/articles/en_US/Help_article/1141/?q=accelerometer&l=en_US&fs=Search&pn=1. Published 2018. Accessed June 6, 2018.

50. Hopkins WG. Measures of reliability in sports medicine and science. Sport Med. 2000;30(1):1-15. doi:10.2165/00007256-200030050-00006

51. Stahl S, Patrick JH. Older adults’ daily activity and everyday memory: a 10-day diary study using Fitbit® trackers. In: The Gerontology Society of Amercia 64th Annual Scientific Meeting. ; 2011:39.

52. Meltzer LJ, Hiruma LS, Avis K, Montgomery-Downs H, Valentin J. Comparison of a Commercial Accelerometer with Polysomnography and Actigraphy in Children and Adolescents. Sleep. 2015;38(8):1323-1330. doi:10.5665/sleep.4918

53. de Zambotti M, Baker FC, Colrain IM. Validation of Sleep-Tracking Technology Compared with Polysomnography in Adolescents. Sleep. 2015;38(9):1461-1468. doi:10.5665/sleep.4990

54. Simpson LA, Eng JJ, Klassen TD, et al. Capturing step counts at slow walking speeds in older adults: comparison of ankle and waist placement of measuring device. J Rehabil Med. 2015;47(9):830-835. doi:10.2340/16501977-1993

55. Gusmer RJ, Bosch TA, Watkins AN, Ostrem JD, Dengel DR. Comparison of FitBit® Ultra to ActiGraphTM GT1M for Assessment of Physical Activity in Young Adults During Treadmill Walking. Open Sport Med J. 2014;8(1):11-15. doi:10.2174/1874387001408010011

56. Diaz KM, Krupka DJ, Chang MJ, et al. Fitbit®: An accurate and reliable device for wireless physical activity tracking. Int J Cardiol. 2015;185:138-140. doi:10.1016/j.ijcard.2015.03.038

57. Noah J, Spierer DK, Gu J, Bronner S. Comparison of steps and energy expenditure assessment in adults of Fitbit Tracker and Ultra to the Actical and indirect calorimetry. J Med Eng Technol. 2013;37(7):456-462. doi:10.3109/03091902.2013.831135

57

58. Case MA, Burwick HA, Volpp KG, Patel MS. Accuracy of Smartphone Applications and Wearable Devices for Tracking Physical Activity Data. JAMA. 2015;313(6):625-626. doi:doi:10.1001/jama.2014.17841

59. Mammen G, Gardiner S, Senthinathan A, McClemont L, Stone M, Faulkner G. Is this Bit Fit? Measuring the Quality of the Fitbit Step-Counter. Heal Fit J Canada. 2012;5(4):30-39.

60. Stackpool C, Porcari JP, Mikat RP, Gillette C, Foster C. The accuracy of various activity trackers in estimating steps taken and energy expenditure. J Fit Res. 2014;3(3):32-48. doi:10.1017/CBO9781107415324.004

61. Storm FA, Heller BW, Mazza C. Step detection and activity recognition accuracy of seven physical activity monitors. PLoS One. 2015;10(3):e0118723. doi:10.1371/journal.pone.0118723

62. Tully MA, McBride C, Heron L, Hunter RF. The validation of Fibit ZipTM physical activity monitor as a measure of free-living physical activity. BMC Res Notes. 2014;7:952. doi:10.1186/1756-0500-7-952

63. Dontje ML, de Groot M, Lengton RR, van der Schans CP, Krijnen WP. Measuring steps with the Fitbit activity tracker: an inter-device reliability study. J Med Eng Technol. 2015;39(5):286-290. doi:10.3109/03091902.2015.1050125

64. Lee JM, Kim YY, Welk GJ. Validity of consumer-based physical activity monitors. Med Sci Sport Exerc. 2014;46(9):1840-1848. doi:10.1249/MSS.0000000000000287

65. Montgomery-Downs HE, Insana SP, Bond JA. Movement toward a novel activity monitoring device. Sleep Breath. 2012;16(3):913-917. doi:10.1007/s11325-011-0585-y

66. Feehan LM, Geldman J, Sayre EC, et al. Accuracy of Fitbit devices: Systematic review and narrative syntheses of quantitative data. JMIR mHealth uHealth. 2018;6(8):e10527. doi:10.2196/10527

67. Ko SU, Hausdorff JM, Ferrucci L. Age-associated differences in the gait pattern changes of older adults during fast-speed and fatigue conditions: results from the Baltimore longitudinal study of ageing. Age Ageing. 2010;39(6):688-694. doi:10.1093/ageing/afq113

68. Boeselt T, Spielmanns M, Nell C, et al. Validity and Usability of Physical Activity Monitoring in Patients with Chronic Obstructive Pulmonary Disease (COPD). PLoS One. 2016;11(6):e0157229. doi:10.1371/journal.pone.0157229

69. Floegel TA, Florez-Pregonero A, Hekler EB, Buman MP. Validation of Consumer-Based Hip and Wrist Activity Monitors in Older Adults With Varied Ambulatory Abilities. J Gerontol A Biol Sci Med Sci. 2017;72(2):229-236. doi:10.1093/gerona/glw098

58

70. Nelson MB, Kaminsky LA, Dickin DC, Montoye AHK. Validity of Consumer-Based Physical Activity Monitors for Specific Activity Types. Med Sci Sports Exerc. 2016;48(8):1619-1628. doi:10.1249/MSS.0000000000000933

71. Paul SS, Tiedemann A, Hassett LM, et al. Validity of the Fitbit activity tracker for measuring steps in community-dwelling older adults. BMJ Open Sport Exerc Med. 2015;1:e000013. doi:10.1136/bmjsem-2015-000013

72. Thorup CB, Andreasen JJ, Sørensen EE, Grønkjær M, DInesen BI, Hansen J. Accuracy of a step counter during treadmill and daily life walking by healthy adults and patients with cardiac disease. BMJ Open. 2017;7(3):e011742. doi:10.1136/bmjopen-2016-011742

73. Kurtzke JF. Rating neurologic impairment in multiple sclerosis: An expanded disability status scale (EDSS). Neurology. 1983;33(11):1444-1452. doi:10.1212/WNL.33.11.1444

74. Pedersen BK. The diseasome of physical inactivity - and the role of myokines in muscle-fat cross talk. J Physiol. 2009;587(23):5559-5568. doi:10.1113/jphysiol.2009.179515

75. Katzmarzyk PT, Janssen I. The economic costs associated with physical inactivity and obesity in Canada: an update. Can J Appl Physiol. 2004;29(1):90-115.

76. Statistics Canada. Disability in Canada : Initial Findings from the Canadian Survey on Disability.; 2013. http://www.statcan.gc.ca/pub/89-654-x/89-654-x2013002-eng.htm. Accessed December 21, 2016.

77. Guralnik JM, Simonsick EM, Ferrucci L, et al. A short physical performance battery assessing lower extremity function: association with self-reported disability and prediction of mortality and nursing home admission. J Gerontol. 1994;49(2):M85-94. doi:10.1093/geronj/49.2.M85

78. Boisvert-Vigneault K, Payette H, Audet M, Gaudreau P, Bélanger M, Dionne IJ. Relationships between physical activity across lifetime and health outcomes in older adults: Results from the NuAge cohort. Prev Med (Baltim). 2016;91:37-42. doi:10.1016/j.ypmed.2016.07.018

79. Colley RC, Garriguet D, Janssen I, Craig CL, Clarke J, Tremblay MS. Physical activity of Canadian adults: Accelerometer results from the 2007 to 2009 Canadian Health Measures Survey. Heal Reports. 2011;22(1):Statistics Canada, Catalogue no. 82-003-XPE. http://www.statcan.gc.ca/pub/82-003-x/2011001/article/11396-eng.htm.

80. Loprinzi PD. Light-Intensity Physical Activity and All-Cause Mortality. Am J Heal Promot. 2017;31(4):340-342. doi:10.4278/ajhp.150515-ARB-882

59

81. Fulk GD, Combs SA, Danks KA, Nirider CD, Raja B, Reisman DS. Accuracy of 2 Activity Monitors in Detecting Steps in People With Stroke and Traumatic Brain Injury. Phys Ther. 2014;94(2):222-229. doi:10.2522/ptj.20120525

82. Nasreddine ZS. MoCA Montreal - Cognitive Assessment. MoCA Montreal - Cognitive Assessment - FAQS. http://www.mocatest.org/. Published 2016. Accessed December 20, 2016.

83. Nasreddine ZS, Phillips NA, Béddirian V, et al. The Montreal Cognitive Assessment, MoCA: A Brief Screening Tool For Mild Cognitive Impairment. J Am Geriatr Soc. 2005;53(4):695-699. doi:10.1111/j.1532-5415.2005.53221.x

84. Canadian Society for Exercise Physiology. PAR-Q+ The Physical Activity Readiness Questionnaire for Everyone. Sept 12. 2011:1-4. http://www.csep.ca/cmfiles/publications/parq/parqplussept2011version_all.pdf. Accessed November 13, 2016.

85. Groll DL, To T, Bombardier C, Wright JG. The development of a comorbidity index with physical function as the outcome. J Clin Epidemiol. 2005;58(6):595-602. doi:10.1016/j.jclinepi.2004.10.018

86. Amazon. Fitbit One Wireless Activity Plus Sleep Tracker, Black. https://www.amazon.com/Fitbit-Wireless-Activity-Sleep-Tracker/dp/B0095PZHPE?th=1. Accessed October 24, 2018.

87. Vivid Living. Top 15 Sleep Gadgets and Apps for Better Sleep. http://www.vividliving.com/blog/2015/7/top-15-gadgets-and-app-for-better-sleep. Published 2015. Accessed October 24, 2018.

88. New Lifestyles. NL-1000 Accelerometer. http://dgncu.rmams.servertrust.com/NL-1000-Accelerometer-p/nl-1000.htm. Accessed October 24, 2018.

89. Amazon. Fitbit Charge Wireless Activity Wristband, Black, Large. https://www.amazon.com/Fitbit-Charge-Wireless-Activity-Wristband/dp/B00N2BVOUE?th=1. Accessed October 24, 2018.

90. Garmin. vívofit® 2. https://buy.garmin.com/en-CA/CA/p/504038. Accessed October 24, 2018.

91. Souq. JAWBONE UP2 Activity Tracker Wristband Adjustable one size - Black. https://uae.souq.com/ae-en/jawbone-up2-activity-tracker-wristband-adjustable-one-size-black-8121722/i/. Published 2018. Accessed October 24, 2018.

92. Bland JMM, Altman DGDG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8(2):135-160. doi:10.1191/096228099673819272

60

93. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Int J Nurs Stud. 2010;47(8):931-936. doi:10.1016/j.ijnurstu.2009.10.001

94. Tudor-Locke C, Barreira T V., Schuna JM. Comparison of step outputs for waist and wrist accelerometer attachment sites. Med Sci Sports Exerc. 2015;47(4):839-842. doi:10.1249/MSS.0000000000000476

95. Parvataneni K, Ploeg L, Olney SJ, Brouwer B. Kinematic, kinetic and metabolic parameters of treadmill versus overground walking in healthy older adults. Clin Biomech. 2009;24(1):95-100. doi:10.1016/j.clinbiomech.2008.07.002

96. Perera S, Mody SH, Woodman RC, Studenski SA. Meaningful change and responsiveness in common physical performance measures in older adults. J Am Geriatr Soc. 2006;54(5):743-749. doi:10.1111/j.1532-5415.2006.00701.x

97. Studenski S, Perera S, Patel K, et al. Gait Speed and Survival in Older Adults. J Am Med Assoc. 2011;305(1):50-58. doi:10.1001/jama.2010.1923

98. Guralnik JM, Ferrucci L, Pieper CF, et al. Lower extremity function and subsequent disability: Consistency across studies, predictive models, and value of gait speed alone compared with short physical performance battery. Journals Gerontol Med Sci. 2000;55A(4):M221-M231.

99. Fielding RA, Rejeski WJ, Blair S, et al. The Lifestyle Interventions and Independence for Elders Study: Design and Methods. Journals Gerontol - Ser A Biol Sci Med Sci. 2011;66A(11):1226-1237. doi:10.1093/gerona/glr123

100. O’Connell S, ÓLaighin G, Quinlan LR. When a Step Is Not a Step! Specificity Analysis of Five Physical Activity Monitors. PLoS One. 2017;12(1):e0169616. doi:10.1371/journal.pone.0169616

101. Piwek L, Ellis DA, Andrews S, Joinson A. The Rise of Consumer Health Wearables: Promises and Barriers. PLoS Med. 2016;13(2):e1001953. doi:10.1371/journal.pmed.1001953

102. Maganja SA, Donaldson MD, Mackey DC. Step count accuracy and reliabilty from activity monitors worn by mobility-intact older adults. In: Innovation in Aging. Vol 1. San Francisco; :402.

103. Guo F, Li Y, Kankanhalli MS, Brown MS. An Evaluation of Wearable Activity Monitoring Devices. In: Proceedings of the 1st ACM International Workshop on Personal Data Meets Distributed Multimedia. ; 2013:31-34. doi:10.1145/2509352.2512882

61

104. Harrison D, Marshall P, Berthouze N, Bird J. Tracking Physical Activity: Problems Related to Running Longitudinal Studies with Commercial Devices. In: Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication. ; 2014:699-702. doi:10.1145/2638728.2641320

105. Stahl ST, Insana SP. Caloric expenditure assessment among older adults: Criterion validity of a novel accelerometry device. J Health Psychol. 2014;19(11):1382-1387. doi:10.1177/1359105313490771

106. Strath SJ, Rowley TW. Wearables for promoting physical activity. Clin Chem. 2018;64(1):53-63. doi:10.1373/clinchem.2017.272369

62

Appendix A. STRIDES Participant Information Questionnaire

63

64

65

Appendix B. STRIDES Physical Inactivity Questionnaire

66

67

Appendix C. STRIDES Functional Comorbidities Index

68

Appendix D. STRIDES Device and Technology Survey

69