stairstep recognition and counting in a serious game for …aiolli/papers/pucstairstep16.pdf ·...
TRANSCRIPT
Personal and Ubiquitous Computing manuscript No.(will be inserted by the editor)
Stairstep Recognition and Counting in a Serious Game forIncreasing Users’ Physical Activity
Matteo Ciman · Michele Donini · Ombretta Gaggi · Fabio Aiolli
Received: date / Accepted: date
Abstract The high diffusion of smartphones in the
users’ pockets allows to sense their movements, thus
monitoring the amount of physical activity they do dur-
ing the day. But, it also gives the possibility to use these
devices to persuade people to change their behaviors. In
this paper, we present ClimbTheWorld, a serious game
which uses a machine learning based technique to recog-
nize and count stairsteps and aims at persuading people
to use stairs instead of elevators or escalators. We per-
form a fine-grained analysis by exploiting smartphone
sensors to recognize single stairsteps. Energy consump-
tion is widely investigated to avoid exhausting smart-
phone battery. Moreover, we present game appreciation
and persuasive power results after a trial experiment
with 13 participants.
Keywords activity recognition · energy consump-
tion · pervasive and mobile computing · ubiquitous
applications
1 Introduction
According to the World Health Organization, at least
3.2 million people die every year due to heart diseases,
diabetes, cancer and obesity [33]. One of the main causes
of these diseases can be attributed to insufficient phys-
ical activity (both in adults and children): in the last
few decades the technological progress has completely
changed people’s lifestyle, making everyday activities
M. Ciman, M. Donini, O. Gaggi and F. AiolliDepartment of MathematicsVia Trieste, 63, University of Padua, ItalyE-mail: {mciman, mdonini, gaggi, aiolli}@math.unipd.it
M. DoniniComputational Statistics and Machine LearningIstituto Italiano di Tecnologia, Genoa, Italy
much easier. This progress also moved people to a seden-
tary life, thus increasing the occurrence of these diseases
and the medical costs for their treatment.
Unfortunately, the availability of many infrastruc-
tures that reduce movements necessary to reach a des-
tination is not balanced by the presence of tools which
help people to modify unhealthy behaviors, and to avoid
negative consequences of physical inactivity. Since ev-
eryday choices between taking a bus and a short dis-
tance walk, or between stairs and elevator, could greatly
affect the total amount of activity made, we propose as
a solution the use of a serious game, ClimbTheWorld,
to incentivize people to use stairs instead of elevators
or escalators.
ClimbTheWorld is a smartphone application. The
idea underlying the game is simple: the user has to
climb real world buildings, e.g., the Empire State Build-
ing or the Eiffel Tower, climbing stairs during his/her
everyday life. Once started, the game records and ana-
lyzes data from the accelerometer and counts the num-
ber of stairsteps climbed by the user. The use of mobile
technologies in real-time persuasive solutions is a very
good combination since smartphones are commonly avail-
able in the users’ pocket and can invisibly work and help
people to change their unhealthy behavior.
To be efficient, the game must fulfill two require-
ments: it has to count stairsteps without immediately
exhausting the smartphone battery, and to persuade
users to change their behavior. ClimbTheWorld imple-
ments a new method for stairsteps recognition and count-
ing. The game requires a fine-grained classification, that
is a real-time counting of the number of stairsteps along
with a real-time feedback. Moreover, it does not im-
pose any constraints on the smartphone position, and
it does not require people to buy expensive tools (e.g.,
2 Matteo Ciman et al.
the Nike+ FuelBand bracelet1), but it uses only the
smartphone sensors.
To avoid exhausting the smartphone battery, the
whole design and implementation process has taken
into account the energy consumption issue, since sev-
eral studies have shown that energy consumption is a
critical aspect for mobile applications [25,28] and can
influence performances on some user experience metrics
[8]. In particular, battery lifetime is one of the most im-
portant aspects considered by users when dealing with
applications on mobile devices. For this reason, applica-
tions developers must avoid to waste energy since this
can require a user to recharge frequently the smart-
phone. Therefore, data acquisition frequency must be
carefully considered since it is an extremely energy con-
suming task. Similarly, our approach gathers from data
strictly necessary information only, avoiding the use of
a high number of features computations that could lead
the final classification to become too expensive.
We first implemented the game to prove the feasi-
bility of the stairstep counter: some preliminary results
have been presented in [1]. Then we designed the game
to be persuasive, and to be able to engage the users. We
use a smartphone to avoid requiring the user to buy ex-
pensive and bulky devices, since this could be a reason
for not using the game. In fact, the use of smartphones
allows the implementation of a pervasive and ubiquitous
solution. Moreover, with respect to the first tentative
implementation, we have now completely re-designed
the game with new social features: the game can now
use Facebook to save and share results, and to create
some multiplayer game modes. An experiment with a
group of 13 users has shown that the engagement of
friends in the building climbing can increase the num-
ber of stairsteps made by the users. Finally, to improve
persuasiveness of the game we followed the Fogg Be-
haviour Model [15] during the whole design process.
Beside a restyling of the game modes and interface,
other important issues that were not considered in [1]
are considered in this paper. First of all, the dataset and
the size of the training set has been increased. Then, a
data smoothing phase, used to clean data from errors
due to the sensibility and variability of the sensors, has
been added to the application pipeline. Finally, a deep
analysis has been performed to find the most relevant
(and not redundant) features, thus allowing to remove a
huge set of features whose computation would represent
a waste of battery power. All these novelties allowed to
improve results obtained during the first, initial, study.
Other works in literature and commercial apps pro-
pose solutions for activity recognition, e.g., running,
walking or driving. These approaches collect data for
1 http://www.nike.com/us/en_us/c/nikeplus-fuelband
an interval of time, and, after another interval of time
for calculation, they guess which activity was performed
by the user in that period. Our approach is different,
since we aim at counting, in real-time, the number of
stairsteps, which is a more difficult task than recogniz-
ing an activity during a longer time interval. Moreover,
recognizing stairsteps versus simple steps is clearly far
more difficult, since data retrieved by sensors are very
similar for staisteps and steps, as we will discuss in Sec-
tion 4.
Since the required data analysis frequency is ex-
tremely high (each stairstep just requires about 500ms
to be completed), we implemented a data-dependent
window segmentation, instead of the traditional sliding
window with a fixed time duration, and the classifica-
tion is applied only if that window is suitable to repre-
sent a stairstep. In this way, the number of windows to
analyze is reduced and hence the computational cost.
The paper is organized as follows. Section 2 dis-
cusses related works. Section 3 presents all the princi-
ples behind the design and implementation of the seri-
ous game. From Section 4 to Section 8 we present all the
technical details of the system, following the path from
data acquired with smartphone sensors to the classifica-
tion output. In Section 9 we present the experimental
results, in terms of precision of stairsteps recognition
and energy consumption. Section 10 describes results
of an experiment with 13 participants. We finally con-
clude in Section 11.
2 Background and Related Works
2.1 Activity recognition
Before the usage of smartphone sensors to perform ac-
tivity recognition (and in particular step counting), an
analysis of performances reached by pedometer dur-
ing stair climbing was made by Ayabe et al. [6]. The
purpose of their experiments was to understand how
well pedometers perform during stair climbing and de-
scending. They evaluated three different commercial pe-
dometers and different stepping rate (from 40 to 120
steps·min−1). Although they do not distinguish between
steps and stairsteps, results showed that pedometer can
assess stairstep counting within an error rate of ±5%,
being a great tool to count number of stairsteps (and
steps) made by each user.
Maurer et al. [23] created eWatch, a monitoring sys-
tem based on the usage of an accelerometer to classify
different user activities: running, sitting, climbing stairs
and walking. eWatch can be used in six fixed different
positions of the body, and data is acquired for each of
these positions. Even in this case, the authors stated
Stairstep Recognition and Counting in a Serious Game for Increasing Users’ Physical Activity 3
that the more difficult classification is to distinguish
between stairs and steps.
Thanks to the increasing performances of smart-
phones and their ability to acquire lots of data from
the surrounded environment, mobile applications using
data from sensors for activity recognition and better
lifestyle promoting, received considerable interest from
the research community.
A first tentative to use a mobile application to recog-
nize user activity can be find in [20]. The authors used a
cellphone augmented with a Bluetooth-connected sen-
sor board. The system works for only three fixed posi-
tions of the cellphone, the ones considered by the au-
thors asthe most common positions. The authors col-
lected a total of 651 features, but this approach makes
their computation and the following classification step
computationally very expansive.
Anjum and Ilyas [4] developed an application for on-
line activity recognition like walking, running, climbing
stairs, descending stairs, cycling, driving, and remain-
ing inactive. They provided an analysis of 5-seconds
length windows using several classification algorithms
like k-NN, Naive Bayes, Decision Trees, and Support
Vector Machines. They collected data from the acce-
lerometer, the gyroscope, and the GPS. The accuracy
of the results ranged between 79% (using Support Vec-
tor Machine) and 94% (Decision Tree). Their analysis
also showed that data retrieved from the gyroscope do
not provide any useful information, therefore they re-
moved all the features related to this sensor. Moreover,
they stated that the recognition of stairs climbing and
descending is a really difficult task, achieving a much
lower accuracy, and often stairs climbing is confused
with walking (accuracy of 84.6% for going upstairs,
90.5% for going downstairs).
Shoaib et al. [30] analyzed activity recognition using
accelerometer, gyroscope, and magnetometer, alone or
combined together, at a frequency of 50Hz. They con-
sidered four different positions of the smartphone (arm,
belt, pocket and wrist). They showed that the magne-
tometer, both alone or in combination with other sen-
sors, performs poorly since it causes overfitting for the
classifier. Moreover, they found that the accelerometer
performs better than the gyroscope in recognizing the
six different activities, but, in contrast with the results
provided by Anjum and Ilvas [4], this work showed that
the combination of accelerometer and gyroscope data
performs better than the two taken individually.
Brajdic and Harle [9] discussed about walk detec-
tion and step counting algorithms with data acquired
from a smartphone; the user can choose to carry the
smartphone in six different positions. This work dis-
cusses some issues in common with our approach, but
in our case we count stairsteps (so we need to discrim-
inate between steps and stairsteps which is a more dif-
ficult task). The authors evaluated several algorithms
using a dataset of 130 walks traces from 27 different
users, getting that most of the algorithms are able to
detect walking within a trace that contains only walk-
ing and idle periods, with a best median error of 1.3
using thresholding technique, but no algorithm is 100%
reliable.
Wu et al. [34] analyzed activity recognition using
an iPod Touch, acquiring data from the accelerometer
and the gyroscope. They recognize activity like sitting,
walking, jogging, going upstairs and downstairs. They
extracted both time, frequency and Fast Fourier Trans-
form features. They make data analysis offline, using
a 2-second window size with data acquisition at 30Hz.
They achieved the best results using k-NN. When sit-
ting, walking and jogging, the accuracy is very high
(90.1%-94.1%), while up and down stair walking is clas-
sified with a lower accuracy (52.3%-79.4%).
2.2 Persuasive mobile applications
Several serious games and applications have been de-
veloped to promote better lifestyle using smartphone
activity recognition.
Ubifit Garden [11], one of the first mobile persuasion
system for improving physical wellbeing, uses the wall-
paper of mobile phones to dynamically provide feed-
back about the different types of physical exercises per-
formed by the user. BeWell [19] is a smartphone ap-
plication that aims at monitoring user behavior and
wants to promote wellbeing. The application continu-
ously tracks activities that impact several aspects of
daily life of people, like social interactions or physical
activity, using smartphone’s sensor, and provides sev-
eral feedback to promote better health. The application
is based on a serious game and displays the “state” of
the user as an aquatic ecosystem that changes behavior
of different animals depending on the changes in user
wellbeing. The entire system is based on a cloud in-
frastructure used to analyze the data acquired with the
smartphone and to build a model of the user based on
his/her behavior. The second version of BeWell, called
BeWell+, introduces several improvements [22]. Firstly,
daily activity goals are adjusted depending on the per-
son. This means, for example, that expected activity
for an older man is different from a young one, or from
a student and an employee. Secondly, energy consump-
tion is considered, reducing resources used by the appli-
cation when monitoring if the behavior of a user tends
closely to healthy norms. In this way, less system re-
sources are required.
4 Matteo Ciman et al.
Authors
Use
ofcell
phone
Real-tim
e
Sensors
Orientatio
n
Win
dow
Activ
ity
Stairstep
countin
g
[6] No Yes Acc., Gyr. 1 - stairs Yes[23] No Yes Acc., Light sensor 6 4 s Walk, run, sitting, stairs, standing No[20] Yes - Acc., Microphone, Bar. 3 15 s Sitting, stand, walk, stairs, elevator No[4] Yes Yes Acc., Gyr., GPS 4 5 s stand, walk, run, cycle, stairs No[30] Yes Yes Acc. 1 4 s stand, walk, cycle, drive, run No[9] Yes No Acc. 6 (*) Walking No[34] No No Acc., Gyr. 1 2 s Stand, walk, run, stairs No[7] Yes - Acc., GPS, WiFi Free - Walk No[11] No Yes Acc., Bar. Fixed - Cardio, walk, resistance, flexibility No[19,22] Yes Yes Acc., GPS Free - Walk, drive, stand, run No[13] Yes No Acc., GPS 4 - Walk, run, stand No[17] No Yes Bracelet Free - Walk, run, dance, sleep No[12] Yes Yes Acc. Free - Steps No[5] Yes Yes Acc., Alt., GPS Free - Walk, run No[32] Yes Yes Acc., GPS Free - Walk, run, riding No[24] Yes Yes Bracelet Free - Walk, run, sports No[14] Yes Yes Bracelet, GPS Free - Walk, run, sleep NoOur solution Yes Yes Acc., Rot. Free (**) (***) Stairstep Yes(*) [9] allows different window sizes(**) The only constraint is not to use the trouser pocket(***) Our solution uses a data dependent window size
Table 1: Resume of the different approaches presented in Section 2 compared with our solution
HealthyLife [13] is a smartphone application that
automatically recognizes users activities. It recognizes
activities like walking, running, driving and staying-
still. It uses the accelerometer data and the GPS sig-
nal. Moreover, it applies Ambiguity reasoning to in-
crease classification results and to disambiguate data.
The idea is to apply a set of weak constraints that at-
taches to any possible answer of the classifier a violation
cost that depends on the number and type of violated
constraints. The right classification answer will be the
one with violation cost equals to 0. The final precision
of the system ranges from 100% (staying-still) to 73%
(walking).
Move2Play [7] is a smartphone application which
encourages a healthier lifestyle and motivates users to
participate in daily physical activities. It combines daily
targets, for short term motivation, with longer term tar-
gets, mainly using social elements, i.e., targets sharing,
competitions, etc., in order to avoid dropouts from the
game and to constantly try to keep users engaged.
2.3 Commercial applications
We also analyzed some commercial applications used to
monitor physical activity. Jawbone Up [17] is a wearable
system to monitor physical activity, sleep and resting
hearth rate. It uses a bracelet to record physical ac-
tivity and a mobile application to monitor it. It does
not distinguish between stairsteps and simple steps, it
simply measures the amount of physical activity.
Accupedo [12] is a smartphone application which
implements a pedometer using an intelligent 3D motion
recognition algorithm. The user is not forced to use the
smartphone in a particular position, so this work seems
to be similar to our approach, but Accupedo is not able
to distinguish between stairsteps and simple steps, but
it counts both as steps.
Apple Health [5] is another smartphone application
for monitoring physical activity. The application tracks
simple steps and the distance of walking or running. Ap-
ple Health simulates stairstep counting: it recognizes a
flight of stairs, using altimeter, as approximately 10 feet
(3 meters) of elevation gain, and it counts it as approx-
imately 16 stairsteps. Therefore, the application does
not recognize stairstep but flight of stairs. Moreover,
it uses the altimeter sensor that requires more battery
energy.
Endomondo [32] is a smartphone application which
can be used as fitness tracker. It is able to track a wide
set of fitness activities, e.g., running, walking, riding,
but it does not perform a fine-grained recognition of
these activities, i.e., it is not able to count the num-
ber of steps, and therefore, the number of stairsteps.
Moreover, it is not able to recognize a stairstep ver-
Stairstep Recognition and Counting in a Serious Game for Increasing Users’ Physical Activity 5
sus a simple step. Finally, it uses external devices like
bracelets or a cardio frequency meter.
Misfit and Fitbit produce bracelet for activity and
sleep tracking. Misfit Flash [24] is a sporty fitness and
sleep tracker. The bracelet can be worn anywhere, it is
connected to a smartphone application and it is able
to recognize different activities like running, walking,
cycling, playing tennis, soccer or basketball. Unlike our
application, it does not recognize and count stairsteps,
not even flight of stairs. Bracelets produced by Fit-
bit [14] work in a very similar manner, but they are
also able to recognize flight of stairs. Even in this case,
the application does not perform a fine grained activity
recognition and does not count stairsteps, it only rec-
ognizes flight of stairs and count the number of climbed
floors.
To the best of our knowledge, commercial applica-
tions are not able to count in real-time the number of
climbed stairsteps and address a different task, moni-
toring physical activity and sleep. These works do not
count stairsteps but simple steps and, in fact, they ei-
ther are not able to distinguish a step from a stair-
step [17,12,32] or they only calculate the number of
climbed floors using the altimeter, as in [5,14], or to
approximate the number of stairsteps using the altime-
ter and the average height of a stairstep, as in [5]. On
the other side, our application counts stairsteps in real-
time and it is able to distinguis them from normal steps.
Additionally, the above mentioned applications often
use external (intrusive) device like bracelets [17,32,24,
14], while we aim at creating a more pervasive solution
which does not require anything more than a smart-
phone which, nowadays, is present in almost all users’
pockets.
2.4 Comparison
Table 1 provides a comparison between our approach
and the others presented in this section. The analysis
of the related works suggests several open issues which
need a further study. First of all, since we are pursu-
ing a smartphone based real-time recognition, energy
consumption is a fundamental aspect, never considered
before. For example, the usage of multiple sensors at
the same time (accelerometer, gyroscope, magnetome-
ter and GPS) will rapidly drain the battery, causing a
lot of stress to the user. For this reason, we rely on the
accelerometer and the rotation sensor only, avoiding the
usage of more expensive sensors. Moreover, differently
from other solutions that recognize an activity over a
large window of time, we aim at recognizing each sin-
gle stairstep, and distinguish it from a simple step. As
shown in Figure 3b and Figure 3c, this is not an easy
task, since the signal obtained for a stairstep and a step
has a similar behavior. Finally, most of the previously
proposed approaches impose a set of fixed positions for
the smartphone2 and the training of a different classifier
for each supported position. As we will see, we do not
impose any position of the smartphone, we only ask to
avoid the trousers pocket, and no (expensive) external
devices are required to acquire data.
3 Game Design
The aim of ClimbTheWorld is to persuade people to
choose stairs instead of escalators, thus increasing their
physical activity, using gamification, i.e., providing a
way to have fun while climbing stairs. The whole de-
sign process of the game followed the Fogg Behaviour
Model (FBM) [15] to improve game’s persuasiveness.
FBM shows that three elements must converge at the
same moment for a behaviour to occur: Motivation,
Ability and Triggers. Since the game is not intended
for impaired people, Ability is not a problem since our
target audience is able to climb the stairs. But it is also
true that, even if climbing the stairs can be considered
an easy activity, it can become tiring or even frustrating
if the goal is too far away. For this reason, ClimbThe-
World has different sub-goals, denoted by the stars in
Figure 1(a), to encourage users to never give up.
In the same way, to avoid discouraging the users,
the game proposes different difficulty levels: easier lev-
els correspond to a lower number of stairsteps necessary
to reach the top of the building. Each stairstep in real
life corresponds to one (or more) stairstep in the game:
higher difficulty means less stairsteps in the game. Once
the user reaches the top of the building, a slideshow of
pictures is displayed, showing the view from the top of
the building. Different difficulty levels also bring differ-
ent quality and number of provided photos. Figure 1(e)
shows a screenshot of the gallery of the Eiffel Tower.
To improve Motivation and Ability, we designed four
different game modes, which can involve or not the
user’s friends3. The first game mode requires the user
to climb a building alone (see Figure 1(a)). Since some
building may have a large number of stairsteps, the
user can also invite friends to help him/her to climb
the building. This second mode is called Social Climb.
Figure 1(b) shows a screenshot where the user has in-
2 We must note here that even if most of the commercialapps do not impose a fixed position to wear the smartphone,they often require an external device, which is even worst interms of cost and intrusiveness of the system.3 In this case, we require the user to connect to Facebook
and to give the application the right to explorer his/her net-work of friends.
6 Matteo Ciman et al.
Fig. 1: Five screenshots of ClimbTheWorld
vited one of his/her Facebook friends to help him/her to
reach the top of the Pyramid of Giza. This game mode
improves Ability, since it lowers the required number of
stairsteps.
Motivation is strongly affected by two other game
modes, Social Challenge and Team vs. Team. The first
one implements a challenge between two (or more) play-
ers. Differently from Social Climb, the players do not
collaborate but compete. The winner is the first user
who reaches the top of the building. Figure 1(c) shows
a screenshot during a challenge between three players.
The game reports the number of climbed stairsteps for
each user. The player with yellow background is the
one who started the challenge and can decide to add or
remove other players from the game.
The Team vs. Team mode implements a challenge
between teams of equal number of players. In Figure
1(d) the two bars on the sides show the progress of the
two teams.
Since smartphones are not always connected to the
Internet, the game continues working even in absence of
network connection. If the game is in one of the multi-
player modes, and some players are not connected, the
game pauses at the end of the climbing and waits for
data from all the users before declaring the winner.
Another problem related to the multiplayer modes
are the management of freeloaders, i.e., players who
join a team or a social climb but do not contribute
to the climb with stairsteps. To avoid this type of play-
ers, ClimbTheWorld imposes a threshold (see Figure
1(b,d)): players who do not contribute with a minimal
set of stairsteps are not rewarded even in case of victory.
The last element of the Fogg Behaviour Model, the
Triggers, are implemented with a push notification that
remembers the player to use stairs when he/she lowers
the number of stairsteps climbed during the day with
respect to the day before. According to the FBM, this
type of Triggers are called signal. More details about
the use of Triggers in ClimbTheWorld can be found in
[10].
To increase the engagement of the user, the game
provides a set of bonuses, depending on his/her perfor-
mances. For example, if the user improves his/her per-
formance with respect to the day before, he/she gets a
30% increase on the total number of stairsteps made.
These bonuses are used to constantly encourage and
help the user. Since we aim at designing a non-invasive
pervasive application, we do not fix the smartphone ori-
entation and we consider energy consumption issues.
Finally, the game provides the possibility for a user
to share his/her performance with friends through Face-
book to further increase the user engagement and Mo-
tivation. Therefore the application, which records con-
fidential data, e.g., user movements and physical activ-
ity, has also the possibility to post a message on the
Facebook wall of the user. This situation could rise
privacy issues, therefore we decided to separate data
about user and his/her friends from data about his/her
movements recorded through the smartphone sensors,
which are recorded and stored separately. The applica-
tion has only the possibility to share game scores and
results, e.g., the winner of a challenge or the result of a
climbing, and not the user location or other data about
his/her movements. Data about stairsteps and physical
activity is stored only locally on the smartphone.
4 Pipeline overview
In this section, we give a broad overview of the system
for stairsteps recognition and counting, whose modules
will be discussed in details in the following sections.
Stairstep Recognition and Counting in a Serious Game for Increasing Users’ Physical Activity 7
Fig. 2: Pipeline overview of the system
The application’s pipeline is depicted in Figure 2.
When active, the application constantly acquires data
from smartphone’ sensors to identify whether the user
is climbing or descending stairs. Since it is very im-
portant to provide real-time feedback to increase user
engagement, the application does not rely on a server
that analyzes data and provides classification output,
but everything is performed on the smartphone. In this
way, we avoid network availability and delay problems.
Moreover, the use of a server can create a bottleneck in
the system. On the other side, if all the computational
tasks reside on the smartphone, energy consumption
has to be considered and deeply investigated, to avoid
wasting energy and drain smartphone battery.
The first step of the pipeline is data acquisition:
the smartphone acquires data about movements of the
device through its accelerometer and rotation sensor.
The second step of the pipeline is called “data -
smoothing & standardization” and performs two op-
erations. First of all, it cleans data by reducing sensor
noise and variability. This operation helps to increase
the accuracy of the classification task. After that, data
is standardized, since sensor data completely changes
depending on how the user carries the smartphone, as
we will discuss in Section 6, and the application does
not impose a fixed smartphone position. To overcome
this problem, smoothed data is translated to a fixed
coordinate system, using information from the rota-
tion sensor. In this way, the same activity has the same
data pattern independently from the orientation of the
smartphone.
The segmentation step splits the standardized data
into consecutive segments or windows. In this module,
a window segmentation technique based on data anal-
ysis (i.e., without fixing the time size of a window) is
used to reduce the computational cost, thus reducing
energy consumption, and to improve the classification
accuracy. The proposed method allows to consider time
as a parameter. In this way, windows that do not fall
into a predefined time duration, i.e., the time necessary
to complete a stairstep, can be automatically labeled as
not a stairstep without forwarding it to the next steps
(thus reducing energy consumption).
After segmentation, features extraction and stan-
dardization for the classifier take place.
The stairstep recognition task is a classification task,
where there are two possible labels (“NO STAIR” and
“STAIR”) and the examples consist of the vector rep-
resentations of segmented windows. The training of the
model is performed offline only once (one single model
shared by all different users), while the recognition phase
has to be made in real-time. This implies that the classi-
fication outputs must be promptly provided to the user
without delays. We must note here that a simple high-
pass filter or a peak detection algorithm would not be
sufficient in this context. In fact, as it is shown in Figure
3b and in Figure 3c, a stairstep and a step have almost
the same shape, that is, a data peak on one axis. For
this reason, a more complex method able to distinguish
between this two different activities is necessary.
In the following sections, we provide a deep analysis
of the main modules compounding the complete sys-
8 Matteo Ciman et al.
(a) Noise that affects accele-rometer data
(b) Example of a step
(c) Example of a stairstep (d) Example of a stairstep us-ing data smoothing
Fig. 3: Several examples of data acquired from the ac-
celerometer
tem. Finally, we present the results of the classification
algorithms in a simulation of the system, an analysis ofissues related to energy consumption and the results of
a small test for user engagement and behavior change
with 13 participants.
5 Data smoothing
One of the biggest problems that affects data retrieved
from the accelerometer and any other sensor of a smart-
phone acquiring real-time information from the envi-
ronment is the noise that affects this data. This noise
is caused, for example, by sensor precision and sensibil-
ity, or by the external environment itself. Let us con-
sider, for example, data from the accelerometer. If the
smartphone is on a table with the screen facing out the
sky, the expected accelerometer data is a series of vec-
tors equal to v = (0.0, 0.0, 9.8), where X- and Y- axes
should record values equals to 0.0 m/s2, since there
is no movement, and with the gravity force that influ-
ences only the Z-axis, whose value should be equal to
9.8 m/s2.
What actually happens is shown in Figure 3a. Each
accelerometer axis is affected by both background noise
and sensor sensibility that make the signal very unsta-
ble. The presence of this noise makes the classification
task more difficult, as data is affected by this back-
ground error that disturbs the recording of the real
movement. The idea behind the data smoothing step
is to reduce the effect of background noise and hence to
have cleaner data.
The idea is the following. A buffer is used to store
the accelerometer records of the last 300ms. The time
length of the buffer is short enough to keep all the in-
formation useful for the classification task (e.g., it is
shorter than the time length of a stairstep). Let us sup-
pose that a new vector value a = (ax, ay, az) of acceler-
ation data is received from the accelerometer sensor. If
the buffer is not full, i.e., its time length is lower than
300ms, the vector a is simply stored into the buffer and
nothing more happens. If the buffer is full, the average
vector m = (mx,my,mz) of every reading stored in the
buffer is calculated, and the vector m is sent forward to
the data standardization step. Then, the original vector
a is stored into the buffer. In particular, a cyclic buffer is
used, so the first element of the buffer is deleted when
a new record is added to the buffer. In this way, the
background noise is reduced since variations of a single
reading is smoothed from all the other readings.
The result of this technique in a stairsteps pattern
is shown in Figure 3, where it is possible to see the
same pattern of a stairstep without (Figure 3c) and
with (Figure 3d) the data smoothing step. As we can
see, the signal is less disturbed and clearer after the data
smoothing step, making the learning task, and hence
the classification task, easier and more accurate.
6 Data standardization
One of the main issue which is necessary to address
when working with activity recognition using data from
smartphone sensors like the accelerometer, is smart-
phone orientation. This problem is related to the fact
that data from motion sensors changes depending on
how the smartphone is carried by the user. Consider,
for example, a walking activity, as depicted in Figure
4a. When the smartphone is rotated to a different posi-
tion from the initial one (the “R” arrow), the data com-
pletely changes. In particular, it is possible to observe
how accelerometer data completely switches before and
after the rotation of the smartphone. Consequently, the
learning task of the classifier becomes extremely diffi-
cult. One of the possible solution to this problem is to
Stairstep Recognition and Counting in a Serious Game for Increasing Users’ Physical Activity 9
(a) Raw data acquired from the acce-lerometer: same activity but differentaxis values after rotation of the smart-phone
(b) Our method applied to a walkingactivity while rotating the smartphone
(c) Native linear acceleration plus rota-tion method
Fig. 4: The comparison among the raw signal (a) and the signal from the methods for the orientation-independence:
our method (b), Linear (c). The instant in which the rotation takes place is denoted by the letter “R”. Each line
represents data acquired with reference to one axis X, Y and Z.
force users to keep the smartphone in a particular po-
sition. For example, some previous works followed this
approach, but this is clearly a strong limitation, since
imposing a fixed smartphone position reduces perva-
siveness of our application and its persuasive power.
Another solution would be to train the classifier with
every possible orientation of the smartphone. This is
clearly an unpractical solution, since there are infinite
possible orientations, and thus it becomes unfeasible to
train a proper classifier.
The proposed solution aims at standardizing data
to a fixed coordinate system, independently from the
orientation of the smartphone. In this way, the same ac-
tivity is represented by a similar signal behavior, mean-
ing that how the user carries the smartphone becomes
an insignificant information, and the task to train an
accurate classifier becomes easier.
A first method for gravity influence removal, to cap-
ture the real movement of the user, was proposed by
Mizell [26]. The solution is the following: given a sam-
pling interval, the gravity component g = (gx, gy, gz) ∈R3 on each axis can be estimated by averaging over
data read on each axis. When an accelerometer pro-
duces the original signal a = (ax, ay, az) ∈ R3, it is
possible to calculate the so called dynamic component
of a as d = (ax−gx, ay−gy, az−gz) where the influence
of the gravity is eliminated. Finally, the vertical part p
of the dynamic component d (parallel to the gravity)
is computed as p = ( d·mm·m )m, and the horizontal part
(orthogonal to the gravity) as h = d−p. As a positive
aspect, this method requires only accelerometer data,
meaning that energy consumption is very low. On the
other side, it loose relevant information, since it trans-
lates a three-dimensional movement (the one captured
by the device) into a two-dimensional coordinate sys-
tem (vertical and horizontal directions).
The proposed method aims at increasing the
precision of the transformation and preserving the
orientation-invariance property. How it works and how
it changes the signal acquired by the accelerometer is
shown in Figure 4b. The fixed target coordinate system
is the following: the X-axis is defined as the one tangen-
tial to the ground pointing approximately toward East,
the Y -axis is tangential to the ground pointing toward
the geomagnetic North, and the Z-axis is orthogonal
to the ground plane and points toward the sky4. To
implement this method, a buffer used to estimate the
gravity component acting on each axis and data from
the rotation vector sensor are necessary.
Firstly, the gravity component is removed from the
accelerometer signal using a buffer which stores data
acquired during the last 500ms, and averaging over the
axes. The vector g = (gx, gy, gz) is used to compute the
dynamic component vector d = (ax−gx, ay−gy, az−gz)where a = (ax, ay, az) is the original accelerometer
reading. Once the vector d has been computed, it is
rotated to the fixed coordinate system using data from
the rotation vector sensor. This sensor provides infor-
4 http://developer.android.com/guide/topics/
sensors/sensors_overview.html
10 Matteo Ciman et al.
mation about the current orientation of the device with
a three-component vector, where each component rep-
resents a rotation angle around an axis. Now, it is pos-
sible to apply a three dimensional rotation to the vector
d = (dx, dy, dz) to obtain a new vector d′ = (d′x, d′y, d′z),
representing the real movement of the user with respect
to the target coordinate system. As we can see in Fig-
ure 4b, the signal remains affected from the rotation for
a very short interval of time, which is bounded by the
buffer size.
Another possibility to capture data about the real
movement of the user, e.g., without the influence of the
gravity, is to use the linear sensor, natively available
on each smartphone. An example of a stairstep data
retrieved using the linear sensor is provided in Figure
4c. In the following, we will compare our method for
gravity removal against the native solution both from a
stairstep recognition and an energy consumption point
of view. As we will show, the proposed solution can
reach better performances in terms of F1 evaluation
while consuming less energy.
7 Segmentation with data-dependent window
Once data is standardized to a fixed coordinate system,
then the data segmentation step is used to divide data
into time segments, i.e., windows, from which we cap-
ture the features that the algorithm uses to understand
if the user is climbing stairs or not.
The standard approach for data segmentation is
based on sliding windows, that is a set of readings of a
fixed interval of time, usually a time length sufficiently
long to contain the activity to recognize. Since the ac-
tivity could start at any point inside a sliding window,
an improvement can be obtained using the overlapping
sliding windows, in order to increase the recognition
probability. In this case, even if time duration remains
the same, two consecutive sliding windows overlap, typ-
ically for about 50% of the total time duration. Figure
5a gives an example of a segmentation using sliding
windows which overlaps on 50% of the time length.
The approaches described above have several draw-
backs. First of all, since time duration is fixed, there is
no user adaptation, especially for short activities, e.g.,
the time necessary to perform a stairstep may vary be-
tween a child and an elderly. Another issue is that,
since data pattern inside a window is not considered,
windows are always considered for classification, even
when it is clear that they are not the target activity.
Therefore, this approach wastes energy, since battery is
consumed for activities and analysis not strictly needed.
Moreover, the usage of overlapping sliding windows re-
quires to consider the same data twice (for two different
windows).
Our approach focuses on data patterns (and not on
time) in order to decide the starting and the ending
point of a sliding window. Since the data pattern of a
stairstep is generally the same, i.e., a local maximum
followed by a negative local minimum for the Z-axis,
while the X-axis and the Y -axis get a much smaller
variation, we suggest to use this pattern to divide data
coming from sensors into windows. We must note here
that, thanks to the previous orientation-independence
step, this pattern is not affected by smartphone orienta-
tion. An example of how windows are built is provided
in Figure 5b.
Since time length of the windows is no more fixed
but becomes variable, it can be used for energy saving
purposes. Analyzing the training set with stairsteps,
we found out that, on average, people need a time span
between 300ms and 2 seconds to complete a stairstep.
Using this information, all the windows that do not fall
into this time length interval can be automatically dis-
carded and considered as not stairsteps, even when they
follow the data pattern. In this case, we can omit the
real classification step of our pipeline, hence saving en-
ergy. Finally, our approach adapts a classification task
to the user variability, since the window are appropri-
ately re-sized to contain a stairstep without the need of
a previous step of calibration.
Another solution for data segmentation was pre-
sented in [18]. In this paper, authors proposed a solu-
tion for data segmentation using a bottom up approach,
where data segments are combined together depending
on a cost calculated as the error of approximating the
signal with its linear regression. If this cost exceeds a
threshold value, the two segments are not combined to-
gether. It is clear that, if we compare this method with
the solution proposed in this section, our approach re-
quires less operations and calculations, thus requiring
less battery energy, being more suitable for mobile ap-
plications.
Summing up, the proposed solution has several ad-
vantages with respect to the time-based sliding window.
The first one is that we are sure that each stairstep will
fit in a window, and this makes the learning task much
more easier. Moreover, it reduces the possible errors in
the training set. The second advantage is that just the
windows suitable for stairsteps are taken into account
for the analysis, reducing the total computational cost.
Finally, this approach allows to deal with user vari-
ability. In fact, even if different users require different
amount of time to complete a stairstep, the window will
be appropriately re-sized to contain it without the need
of calibration.
Stairstep Recognition and Counting in a Serious Game for Increasing Users’ Physical Activity 11
(a) Sliding windows using time (b) Data-dependent sliding windows
Fig. 5: The comparison between the segmentation in windows of possible stairsteps obtained from the time-driven
sliding windows (a) and the data-dependent sliding windows (b)
8 Representation and features standardization
In the representation phase, the dynamic window re-
sulting from the segmentation phase is transformed into
a vector of real values representing the information ob-
tained by the device sensors contained in that particular
time window. This mapping permits a natural applica-
tion of standard machine learning methods to the task
of recognizing whether the user is climbing stairs or
not. The way data is represented is crucial for the ef-
fectiveness of a learning algorithm. Specifically, a good
representation method is able to maintain the informa-
tion which is really relevant for the task and to reduce
the noise in the data as much as possible.
In the following we formally describe our choices for
the representation module. In particular, let the fre-
quency of sampling be fixed. Then, each window ob-
tained by the segmentation step will consist of a se-
quence of n vectors,
vi = (xi, yi, zi) ∈ R3, i ∈ {1, .., n},
each one consisting of the standardized values of accel-
eration with respect to the three axes.
We also consider two additional computed values
that, in our opinion, can carry precious information.
Firstly, we consider the norm of the vector vi which
well represents the global amount of activity being per-
formed and, secondly, the average value of horizontal
accelerations, namely xi+yi2 which combines the hor-
izontal activities in the fixed coordinate system in a
single value.
More formally, for any single standardized vector viof a dynamic window, a new vector si ∈ R5 can be built
as follows:
vi = (xi, yi, zi) 7→ si = (xi, yi, zi, ‖vi‖2,xi + yi
2)
Now, the entire sequence of observations in a window
can be represented in the matrix S ∈ R5×n where the
vectors si are accommodated in columns. Finally, the
representation of the sequence corresponding to the en-
tire dynamic window is obtained by applying a simple
transformation Φ : R5×n → R79, by evaluation of dif-
ferent statistical estimates over the rows of S.
Specifically, the 79 dimensions of Φ(S) are created
evaluating standard statistics (i.e., average, standard
deviation, variance and difference between minimal and
maximal values) which are very common in time se-
ries analysis or evaluating the ratio among the statis-
tical features above and the correlations from different
sources [31,29].
Table 2 presents a detailed description of the entire
set of computed features in order to represent a se-
quence of acceleration measures in a dynamic window.
The first set of 20 features considers standard statistics
computed over the rows Sj of the sequence matrix S.
The next group of 40 features considers the ratio of the
same statistics computed on different rows. An addi-
tional group of 9 features (61-69) takes into account
other important ratio statistics (among vertical and
horizontal activities and the norm of the total activ-
ity), and two other features, the Magnitude Area (MA)
and the Signal Magnitude Area (SMA) of {S1,S2,S3}.Finally, the last 10 features correspond to the value
of the correlation between pairs of rows. For reason of
computational complexity and energy consumption we
didn’t use the features in the Fast Fourier Transform
(FFT) family.
In order to fairly compare our orientation-
independent supporting method with other methods
presented earlier in this paper, we used a similar rep-
resentation mapping. In particular, exactly the same
mapping is used for the Linear method. Concerning the
12 Matteo Ciman et al.
Features No. Description1− 5 Ave(Sj), j ∈ {1, . . . , 5}6− 10 Std(Sj), j ∈ {1, . . . , 5}11− 15 V ar(Sj), j ∈ {1, . . . , 5}16− 20 Max(Sj)−Min(Sj), j ∈ {1, . . . , 5}21− 30
Ave(Sj)
Ave(Sh), j, h ∈ {1, . . . , 5}, j > h
31− 40Std(Sj)
Std(Sh), j, h ∈ {1, . . . , 5}, j > h
41− 50V ar(Sj)
V ar(Sh), j, h ∈ {1, . . . , 5}, j > h
51− 60Max(Sj)−Min(Sj)
Max(Sh)−Min(Sh), j, h ∈ {1, . . . , 5}, j > h
61 Max(S3)
Max(S5)
62 |Min(S3)
Min(S5)|
63√Ave(S1)2 +Ave(S2)2 +Ave(S3)2
64 Ave(|S1|+ |S2|+ |S3|)65 Std(s1)
2
Std(s4)
66 Std(s2)2
Std(s4)
67 Std(s3)2
Std(s4)
68 (Std(s1)+Std(s2))2
Std(s3)
69 (Std(s1)+Std(s2))2
Std(s4)
70− 79 Corr(Sj ,Sh), j, h ∈ {1, . . . , 5}, j > h
Table 2: The 79 features extracted from dynamic win-
dows
Mizell method, that returns two measurements vectors
instead of one, we have considered the natural exten-
sion of the function Φ applied to the two vectors inde-
pendently and concatenating the resulting vectors, thus
obtaining a new vector of dimension 79× 2 = 158.
Finally, it’s well-known that the classification algo-
rithms are badly influenced by features with different
orders of magnitude. For this reason, we have rescaled
all the features from the dynamic windows used to train
the classifiers between [−1, 1] using a linear transforma-
tion. These transformations (that are fixed and different
for each feature) are applied to all the features before
the classification takes place, in order to improve the
classification performance.
9 Experiments and results
In this section, we present experimental results ob-
tained considering classification performances, energy
consumption and features evaluation.
9.1 Classification setting and results
Our initial experimental results were presented in [1].
Starting from the same dataset, we added a new set of
raw information captured from device sensors for any
of the three methods (Mizell, Linear and our method).
This information was acquired through direct data col-
lection from 6 different users using different smart-
phones. This data is recorded keeping the devices con-
sistent with the body movement5 (e.g., hand held, in
a backpack or in a handbag) and without other con-
straints with respect to where or how to keep the device
when the data was collected. All the methods presented
in Section 8 have been used to compute a vector of fea-
tures for each dynamic window contained in the whole
dataset. Data has been then manually labeled in order
to use it with the supervised learning algorithms. Our
new dataset is composed by a total of 4128 windows
with 1554 stairsteps from 12 people, ranging between 7
and 72 years old.
We use algorithms from three different families to
tackle this task:
– Decision Trees (DT) generated using the C4.5 algo-
rithm,
– K-Nearest Neighbors (KNN) and
– Kernel Optimization of the Margin Distribution
(KOMD) that is a kernel machine (SVM-like) de-
scribed in [3], with RBF as the kernel.
For the KOMD algorithm, we used our own implemen-
tation6, while for the Decision Trees and the K-Nearest
Neighbors we used the Weka [16] implementation.
Raw data was sampled at three different frequen-
cies, namely 50Hz, 30Hz and 20Hz, with the aim of
exploring the relation between performance and energy
consumption, to find the best trade-off. Higher frequen-
cies correspond to more accurate information given to
the system, but, obviously, they also correspond to an
higher energy consumption for the device.
In our application, we face with the binary classi-
fication task of stairstep recognition. The distribution
of different labels in this case is very imbalanced since
we have far more negative examples than positive ones.
For this, we evaluated Precision and Recall, instead of
accuracy, to obtain a more proper performance estima-
tion and comparison of different experimental settings.
Then, we used the Fβ-score (with β = 1.0) to combine
recall and precision in a single aggregated effectiveness
score. The analytic formulas for these measures are:
PrecisionTP
TP + FP
RecallTP
TP + FN
F1-score2 Precision ·RecallPrecision+Recall
5 In this paper, we consider a device movement consistentwith the body movement whenever the device movement isnegligible relatively to the body’s barycenter6 Our KOMD implementations in Python and R can be
found at https://github.com/jmikko/EasyMKL
Stairstep Recognition and Counting in a Serious Game for Increasing Users’ Physical Activity 13
Frequency 20HzMizell Linear Rotation method Smoothed
Prec Rec F1±std Prec Rec F1±std Prec Rec F1±std Prec Rec F1±std
DT 0.81 0.84 0.825±0.012 0.67 0.70 0.685±0.024 0.77 0.79 0.780±0.015 0.73 0.71 0.720±0.010
KNN 0.80 0.77 0.785±0.010 0.87 0.77 0.817±0.024 0.80 0.75 0.774±0.012 0.81 0.81 0.810±0.009
KOMD 0.83 0.84 0.835±0.011 0.83 0.83 0.830±0.018 0.84 0.85 0.845±0.008 0.84 0.90 0.869±0.007
Frequency 30HzMizell Linear Rotation method Smoothed
Prec Rec F1±std Prec Rec F1±std Prec Rec F1±std Prec Rec F1±std
DT 0.82 0.86 0.840±0.012 0.80 0.78 0.790±0.024 0.82 0.73 0.772±0.015 0.73 0.71 0.720±0.010
KNN 0.83 0.84 0.835±0.010 0.88 0.61 0.721±0.024 0.90 0.78 0.836±0.011 0.87 0.85 0.860±0.009
KOMD 0.89 0.85 0.870±0.010 0.86 0.83 0.845±0.018 0.90 0.88 0.890±0.008 0.91 0.91 0.910±0.007
Frequency 50HzMizell Linear Rotation method Smoothed
Prec Rec F1±std Prec Rec F1±std Prec Rec F1±std Prec Rec F1±std
DT 0.70 0.77 0.733±0.012 0.82 0.89 0.854±0.023 0.81 0.75 0.779±0.015 0.86 0.83 0.845±0.009
KNN 0.82 0.85 0.835±0.009 0.81 0.84 0.825±0.024 0.90 0.78 0.836±0.011 0.82 0.85 0.835±0.009
KOMD 0.85 0.88 0.865±0.010 0.79 0.86 0.824±0.018 0.91 0.90 0.905±0.008 0.92 0.92 0.920±0.006
Table 3: Experimental results of Precision (Prec), Recall (Rec) and F1±std score for the algorithms and methods
used with frequency of sampling of 20Hz, 30Hz and 50Hz. The result highlighted has the largest value of F1
where TP stands for True Positives, FP as False Posi-
tives and FN as False Negatives.
Starting from the original dataset with manually as-
signed labels for each dynamic window, the experiments
are performed using a user-independent nested strati-
fied 10-Fold cross validation. Firstly, the dataset has
been randomly divided in 10 partitions (each partition
with the same distribution of stairsteps, i.e. stratified).
The best parameters are selected for each algorithm
using the classical 10-Fold cross validation among the
windows contained in 9 out of the 10 partitions (i.e.,
the training set). Finally, the machine learning algo-
rithms are trained to generate a model using the best
parameters and the classification statistics are evalu-
ated among the windows contained in the remaining
partition, used as test set. This procedure is repeated
selecting all the 10 partitions as test set and evaluating
the average of the performance. The obtained results
over the test set are summarized in Table 3.
These results show that smoothing of data improves
the proposed method for the support of the orientation-
independence. This technique, when combined with the
KOMD classification algorithm, obtains the best perfor-
mances against any other combination of methods and
algorithms. The results are satisfactory considering the
high difficulty of the classification task and given the
small training sample obtained on each single dynamic
window. We are also interested in finding the best trade-
off between energy consumption and performances. On
this respect, it is possible to see that the best compro-
mise is obtained at 30Hz, as we can notice that the
difference in performances between 50Hz and 30Hz is
not statistically significant considering the standard de-
viation. Moreover, our method with KOMD shows an
higher F1 score at 30Hz than any other classifier at
50Hz.
9.2 Energy consumption results
When dealing with smartphone devices and applica-
tions, energy consumption is one of the key aspects to
consider. One of the most important aspect for mobile
devices users is battery lifetime, meaning that it is not
possible to develop an application that wastes a lot of
energy and asks the user to recharge smartphone more
than once a day. This kind of situation usually takes
the user to remove the application.
A deep analysis of each step of the pipeline is a
fundamental requirement during design and develop-
ment. In our particular case, we studied how resources
are used and which are the computationally expensive
tasks. Furthermore, we wanted to understand how data
frequency acquisition from the accelerometer and the
rotation sensor affects energy consumption and in which
measure, how much energy the data-smoothing step re-
quires and how well the data segmentation technique
performs with respect to the sliding window technique.
To measure the energy consumption, we used the
Monsoon Power Monitor7. Differently form other ap-
proaches like background application running on the
smartphone (see [25,28]) that could introduce an un-
known and not fixed overhead that could affect mea-
sured data, the Monsoon Power Monitor is an external
device that directly measures the “requested” energy
from the smartphone to the battery. In this way, no
overhead is introduced. Thanks to the software for re-
7 http://www.msoon.com/LabEquipment/PowerMonitor/
14 Matteo Ciman et al.
mote data reading, it is possible to have information
about “Energy Consumption”, “Average Power” and
“Average current”, “Expected Battery Life”, etc. These
information are extremely important, since it is possi-
ble to understand how much energy is consumed by the
smartphone and the application, for how much time,
which are the tasks that are more expensive, etc.
Tests were performed using a Samsung Galaxy i9250
with a 1750mAh battery. Even if this smartphone is not
up to date, this is not a real problem since we do not
want to consider absolute values of energy consump-
tion, but we aim at comparing energy consumption of
different methods to determine which is the less expen-
sive.
Figure 6 reports the energy consumption measure-
ments. Here and in the following discussion we just con-
sider the “Consumed Energy” value, which is the en-
ergy the smartphone is requesting to the battery when
acquiring data from the sensors (accelerometer and ro-
tation sensor) and running one of the methods we con-
sider in this paper. Each experiment lasted two minutes
and we repeated each test three times for each method.
We tested our method using the target frequency of
30Hz.
The first evidence we can note from the results is
that the Mizell method is the less consuming one. This
is not surprising, since this method gets data from the
accelerometer sensor only, thus requiring less energy.
On the other side, this method shows a significantly
poorer performance (see Table 3).
Comparing our method, which uses both the ro-
tation sensor and the accelerometer, and the linear
method, which uses the rotation sensor and the linear
sensor, we can note that the usage of the accelerometer
is less demanding than the usage of the linear sensor.
These results, along with the ones presented in Table 3,
demonstrate that our solution is able to improve both
the accuracy in terms of stairstep recognition and the
energy consumption with respect to the linear solution.
Moreover, when adding the data smoothing phase, the
performance in accuracy are further improved without
affecting the energy consumption (only about +0.26%).
Another key aspect of our pipeline is the use of data-
driven windows that, differently from standard time-
based windows, use the data pattern to determine how
to segment data coming from sensors. To compare the
two methods in terms of energy consumption, we tested
both of them acquiring data at 30Hz and using KOMD
as the classification algorithm. For time-based windows,
a time length of 500ms and overlapping at 50% were
used. Final results are reported in Figure 7.
As we can see, data-driven segmentation allows to
save more energy (more than 15%) with respect to the
time-based sliding windows. This is because in the time-
based approach every windows is classified, while, with
the proposed method, only the ones that contains the
correct pattern are taken into consideration. Moreover,
the time filter let us further reduce the number of ana-
lyzed windows, and cannot be applied to standard win-
dows. Clearly, less windows analyzed by the classifier
means less energy consumption.
9.3 Irrelevant features
A further analisys has been performed to investigate
which features are more useful for our specific task.
The motivation for this is three-fold. On one hand we
wanted to understand what kind of information is suf-
ficient to recognize stairsteps. On the other hand, re-
moving useless features allows a classifier to learn more
quickly, that is, it needs fewer examples to obtain the
same performance in classification [27]. Finally, having
fewer features to compute can be beneficial from the
point of view of energy consumption.
Feature Selection (FS) is a very popular dimension-
ality reduction technique. FS achieves dimensionality
reduction by removing the irrelevant and redundant
features and is especially useful for applications where
the original features are important for model interpreta-
tion and knowledge extraction. In particular, we applied
a Multiple Kernel Learning (MKL) approach for feature
selection, called EasyMKL [2]. Basically, in the MKL
paradigm the kernel used by the classifier (KOMD) is
learned from data as a positive linear combination of a
set of base kernels. The base kernels in our case are ob-tained from single features. The weight, and hence the
importance, of a feature is given by the value obtained
in the linear combination returned by EasyMKL. Given
in input the whole dataset, EasyMKL returns an order
of the features (from the best to the worst) for our spe-
cific classification task. Experiments have shown that it
is possible to achieve a performance that is very similar
to the one obtained using the whole set of 79 features
(F1-score 0.905±0.009) by keeping only the first 49 most
important features. Specifically, the following features:
69, 31, 13, 48, 49, 17, 47, 24, 76, 1, 27, 73, 7, 64, 35, 39,
47, 72, 53, 67, 59, 66, 55, 33, 45, 25, 37, 23, 21 and 57
have been discarded (the indexes refer to the Table 2
and they are sorted in order of importance).
10 Users Tests
In order to evaluate the persuasive power of ClimbThe-
World, we conducted a test with real users, at the end of
Stairstep Recognition and Counting in a Serious Game for Increasing Users’ Physical Activity 15
Energy consumption (µAh)Mizell Linear Rotation Smoothing and
method rotation method9289±36 10003±21 9923±3 9949±8
Fig. 6: Energy consumption of the four different methods
Segmentation approachTime-based Data-driven
window windowEnergy
consumption 14535±17 12554±14
(µAh)
Fig. 7: Energy consumption comparison between time-based and data-driven windows
which we provided a small questionnaire to understand
their impression of the game.
The test involved 13 participants, 8 females and 5
males, aged between 24 and 30. All the participants
are students from the Bachelor and Master degrees in
Computer Science at the University of Padua, Italy. No
participants were involved in the development process,
and they did not have any knowledge about the project
before the experiment.
The experiment lasted 9 days, and each participant
used his/her own smartphone. We created a new Face-
book account for each participant in order to preserve
their privacy (no need to use the personal account to
ask friendship to someone they did not know) and toprovide them full access to all functionality of the game.
We did not ask participants to change their daily rou-
tine, but only to use freely the application whenever
they could and wanted.
Participants were divided into two different groups
of 7 and 6 users, respectively. Each participant was
randomly assigned to one of the two groups. Both the
groups played the same game with all the game modes,
we used these two groups to randomize our test. Since
we provide two different game modes, singleplayer and
multiplayer, we asked one group to use the singleplayer
mode first and the multiplayer mode afterwards, while
the second group started with the multiplayer modality.
The complete organization and task order of the
two groups is provided in Table 4. The first two days
we asked participants to use the “Step counter” mode
to record a baseline on the number of stairsteps made
without using a serious game. Then, the first group
played two days in singleplayer mode and the follow-
ing three days in multiplayer mode, and vice versa for
the second group. On last two days both groups con-
cluded the experiment using only the “Step counter”
modality.
At the end of the experiment, each participant com-
pleted a 5-point Likert questionnaire, based on the IBM
Computer Usability Satisfaction Questionnaires [21],
with possible answers ranging from “Strongly disagree”
to “Strongly agree”. The questionnaire was divided into
two different main sections: the first one was used to get
a description of the participants taking part to the ex-
periment, e.g., what they think about physical activity
and being physically active, which kind of game players
they are (single or multiplayer), etc. The second part of
the questionnaire contained questions about ClimbThe-
World, the different game modes and their experience
during the days.
Moreover, during the experiment, a background log-
ger was used to trace user activity with the game. We
recorded activities performed with the game and, in
particular, the number of stairsteps participants made
during each day and with each game mode. In this way,
we could understand how the number of stairsteps made
changed during the experiment related to the game
mode each participants played.
The first part of the questionnaire showed that 53%
of participants do not play any sport, while the oth-
ers 47% play an individual sport, and 77% of partici-
pants have positive feelings about being physically ac-
tive. Moreover, about 61% does not frequently use el-
evators or escalators, but prefer to take stairs in or-
der to be more active. Finally, 54% of them think they
do not need any form of external stimulus to keep
16 Matteo Ciman et al.
Day Group A Group B1 Stairstep counter Stairstep counter2 Stairstep counter Stairstep counter3 Singleplayer Multiplayer (Social Climb)4 Singleplayer Multiplayer (Social Challenge)5 Multiplayer (Social Climb) Multiplayer (Team vs. Team)6 Multiplayer (Social Challenge) Singleplayer7 Multiplayer (Team vs. Team) Singleplayer8 Stairstep counter Stairstep counter9 Stairstep counter Stairstep counter
Table 4: Game mode order for each of the two groups.
them active. From the obtained answers, we can say
that participants represent a difficult test case since
they already prefer using stairs and have an active life,
meaning that, they do not need a serious game to in-
crease physical activity. Considering questions and an-
swers about their relation with video games, they de-
fined themselves mainly as “casual players” (69%), that
play almost alone (61.5%) or with another player in the
same room (46.2%). Finally, only 23% of participants
frequently play with mobile games.
Analyzing participants experience with ClimbThe-
World, about 70% of participants preferred to play in
single mode, while only 38.5% preferred to play with
his/her friends. Moreover, 92.2% of participants liked
the Solo Climb mode and no one said he/she would
not play again with it. These data are in accordance
with participants description about preferring single-
player more than multiplayer: the multiplayer mode ob-
tained less appreciation. In particular, within the mul-
tiplayer modes, the most preferred one was the Social
Climb mode, since all the participants used it at least
one time and 92.3% of them ranked it positively. The
second preference was Social Challenge, played by 77%
of participants and 80% of them liked it. Finally, Team
vs. Team mode was played by 61.5% of participants and
63.6% would play again with it. This rank can be ex-
plained by the fact that this mode, that should be the
most challenging and engaging one, has the drawback
that it is difficult to set up, since it is necessary to find
at least 4 users, active in the same interval of time, to
be able to start the game (and this could take time that
not all users are happy to wait for).
The second step of our analysis has focused on the
data acquired with the logger. Data logger registered
how many stairsteps participants climbed during the
experimental period, and which game mode they used.
We compared answers provided with the questionnaire
with objective data, and performances of participants
depending on the game mode used. Figure 8 shows the
number of stairsteps climbed by the participants along
the days of our experiment. Together with Table 4, the
figure also shows the number of stairsteps climbed by
participants depending on the game mode used: the
number of stairsteps climbed using the serious game
(both in singleplayer or multiplayer) is higher than the
number when using simply the stairstep counter. In par-
ticular, singleplayer mode increased the average amount
of about 61%, while multiplayer of about 64%. This
means that the game is able to engage the users and
is effective in incentivizing people in taking stairs. This
result is particularly important since the test groups
were made by people that think that they do not need
to be incentivized to be physically active.
Moreover, we can note that, when used, the Team
vs. Team game mode allows to reach the highest num-
ber of stairsteps climbed. This probably comes from the
fact that this game mode combines both collaboration
and competition among users, a combination that is
able to engage participants and create high motivation.
On the other side, the big difference between the two
groups also shows the limitation of this game mode,since the setup phase is longer than other game modes
and could reduce users interest.
There is another important difference in the behav-
ior of the two groups. The second group, the one that
used the Team vs. Team game mode, approximately
doubled the number of stairsteps climbed when using
the simple counter in the last two days, while the first
group lowered the number of stairsteps climbed when
not using the game in the last two days with respect to
the first two days of the experiment. This means that
the Team vs. Team game mode is not always accepted
by the users, due to the initial setup phase, but, if used,
is able to achieve good results in persuading people to
change their behavior, and this result remains also in
absence of the game. On the other side, singleplayer
games were able to engage both groups, showing how
an easy entry setup of the game makes it more engag-
ing.
Stairstep Recognition and Counting in a Serious Game for Increasing Users’ Physical Activity 17
Fig. 8: Number of stairsteps made by both groups each day of the test.
10.1 Limitations of the current study
Although the user study represents the users in a real-
istic situation, it also has some limits. The main lim-
itation is represented by the restricted data collection
period. But, the collected data show an increment in
the number of climbed stairstep even in this short time
interval, so the result is encouraging.
We have deployed our study on a real user context.
Moreover, the study is not affected by problems related
to privacy, i.e., that the users could be afraid to ask
other people to play the game together, since we pro-
vide each user with a test Facebook account and we do
not give any information about the association between
Facebook accounts and real users to the participants.
All the users came from an homogeneous group, they
all are students of Computer Science. Clearly, a user
study with a more heterogeneous group of real friends
would provide further insight.
Given the results achieved from the tests we per-
formed, we can claim that ClimbTheWorld is really ef-
fective in incentivizing people in taking stairs, and both
singleplayer and multiplayer modes are engaging and
appreciated by users.
11 Conclusions and Future works
In this paper, we have presented a method for real-
time stairsteps recognition and counting that is inde-
pendent from the orientation of the smartphone. This
algorithm does not impose constraints concerning the
walking speed of the users as it autonomously adjusts
the size of the window for its analysis of data. Addi-
tionally, a smoothing preprocessing of data makes the
performance improve further as it removes the technical
noise that affects the smartphone sensors.
We reported several experiments showing that we
are able to obtain a very accurate recognition, compa-
rable or even better than the one obtained with more
time-demanding methods. In particular, using a state-
of-the-art kernel method (KOMD) in synergy with our
orientation -invariant preprocessing method, a data-
driven window segmentation and the smoothing of the
raw data, we achieved the best performances against
all the others combinations of methods and algorithms,
with a precision of 92% and a recall of 92%, using a
sampling rate of 50Hz.
The purpose of this work was to study a classifi-
cation algorithm inside a real-time smartphone appli-
cation (namely ClimbTheWorld) whose aim is to per-
suade people to use stairs instead of elevators or esca-
lators. An aspect that has been mandatory to take into
account during the development of the application was
the control of energy consumption. Our experiments
have shown that the new proposed method outperforms
the native Android solution in terms of energy saving.
The best trade-off between energy saving and precision
is reached using KOMD classification algorithm com-
bined with our method for data acquisition, using a
sampling rate of 30Hz (in this case, the obtained preci-
sion is 91%, recall is 91%).
The serious game was tested during an experiment
with real users, divided into two groups. The obtained
results have shown that the game is able to engage users
and to persuade them to change their behavior. In par-
ticular, we noted that the Team vs. Team game mode
is capable to achieve better persistency results, that is
collaboration and competitions tend to motivate users
to continue their activity during the time.
Although the performance of our method is not
much affected by the smartphone orientation, we no-
ticed a performance degradation when the user is car-
rying the smartphone on his/her trouser pocket (i.e.,
not consistently with the body).
Future analysis will be dedicated to solve this par-
ticular issue. A possible solution could be to exploit
18 Matteo Ciman et al.
additional information, like the proximity or light sen-
sors, to recognize this case. Another possibility could be
to analyze the user movement to find out when he/she
puts the smartphone on his/her trouser pocket.
Acknowledgements
The authors would like to thank Nicola Beghin, Sil-
via Segato and Mattia Bazzega for their contribute to
the implementation of ClimbTheWorld. Moreover, the
authors would like to thank the users who have parteci-
pated to the user test, and those who helped to collect
stairstep data.
References
1. Aiolli, F., Ciman, M., Donini, M., Gaggi, O.: Climbthe-world: Real-time stairstep counting to increase physicalactivity. In: Proceedings of the 11th International Con-ference on Mobile and Ubiquitous Systems: Computing,Networking and Services (Mobiquitous 14), pp. 218–227.London, UK (2014)
2. Aiolli, F., Donini, M.: EasyMKL: a scalable multiple ker-nel learning algorithm. Neurocomputing pp. 1–10 (2015)
3. Aiolli, F., Martino, G.D.S., Sperduti, A.: A kernelmethod for the optimization of the margin distribution.In: Proceedings of the 18th International Conference onArtificial Neural Networks (ICANN), pp. 305–314 (2008)
4. Anjum, A., Ilyas, M.: Activity recognition using smart-phone sensors. In: Proceedings of the IEEE ConsumerCommunications and Networking Conference (CCNC2013), pp. 914–919 (2013)
5. Apple Inc.: Health. http://www.apple.com/ios/whats-new/health/ (2015)
6. Ayabe, M., Aoki, J., Ishii, K., Takayama, K., Tanaka,H.: Pedometer accuracy during stair climbing and benchstepping exercises. Journal of sports science & medicine7(2), 249 (2008)
7. Bielik, P., Tomlein, M., Kratky, P., Mitrık, v., Barla, M.,Bielikova, M.: Move2play: An innovative approach to en-couraging people to be more physically active. In: Pro-ceedings of the 2nd ACM SIGHIT International HealthInformatics Symposium, IHI ’12, pp. 61–70 (2012)
8. Bloom, L., Eardley, R., Geelhoed, E., Manahan, M., Ran-ganathan, P.: Investigating the relationship between bat-tery life and user acceptance of dynamic, energy-awareinterfaces on handhelds. In: Proceedings of the Interna-tional Conference on Human Computer Interaction withMobile Devices & Services, pp. 13–24 (2004)
9. Brajdic, A., Harle, R.: Walk detection and step countingon unconstrained smartphones. In: Proceedings of the2013 ACM International Joint Conference on Pervasiveand Ubiquitous Computing (UbiComp ’13), pp. 225–234(2013)
10. Ciman, M., Gaggi, O.: Exploiting users natural compet-itiveness to promote physical activity. In: EAI Interna-tional Conference on Games fOr WELL-being (2016)
11. Consolvo, S., McDonald, D.W., Toscos, T., Chen, M.Y.,Froehlich, J., Harrison, B., Klasnja, P., LaMarca, A.,LeGrand, L., Libby, R., Smith, I., Landay, J.A.: Activity
sensing in the wild: A field trial of ubifit garden. In: Pro-ceedings of the SIGCHI Conference on Human Factors inComputing Systems, CHI ’08, pp. 1797–1806 (2008)
12. Corusen LLC: Accupedo. http://www.accupedo.com/(2015)
13. Do, T.M., Loke, S.W., Liu, F.: Healthylife: An activityrecognition system with smartphone using logic-basedstream reasoning. In: Proceedings of the InternationalConference on Mobile and Ubiquitous Systems: Com-puting, Networking, and Services (MOBIQUITOUS), pp.188–199. Springer (2013)
14. Fitbit Inc.: Fitbit. http://www.fitbit.com/ (2015)15. Fogg, B.J.: A behavior model for persuasive design. In:
Proceedings of the 4th International Conference on Per-suasive Technology, Persuasive ’09, pp. 1–7. ACM, NewYork, NY, USA (2009)
16. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reute-mann, P., Witten, I.H.: The weka data mining software:An update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
17. Jawbone: Jawbone Up. https://jawbone.com/ (2015)18. Junker, H., Amft, O., Lukowicz, P., Troster, G.: Gesture
spotting with body-worn inertial sensors to detect useractivities. Pattern Recognition 41(6), 2010–2024 (2008)
19. Lane, N.D., Mohammod, M., Lin, M., Yang, X., Lu, H.,Ali, S., Doryab, A., Berke, E., Choudhury, T., Camp-bell, A.T.: Bewell: A smartphone application to monitor,model and promote wellbeing. In: Proceedings of theInternational ICST Conference on Pervasive ComputingTechnologies for Healthcare, pp. 23–26 (2011)
20. Lester, J., Choudhury, T., Borriello, G.: A practical ap-proach to recognizing physical activities. In: Proceedingsof the 4th International Conference on Pervasive Com-puting, pp. 1–16 (2006)
21. Lewis, J.R.: Ibm computer usability satisfaction ques-tionnaires: Psychometric evaluation and instructions foruse. International Journal on Human-Computer Interac-tion 7(1), 57–78 (1995)
22. Lin, M., Lane, N.D., Mohammod, M., Yang, X., Lu, H.,Cardone, G., Ali, S., Doryab, A., Berke, E., Campbell,A.T., Choudhury, T.: Bewell+: Multi-dimensional well-being monitoring with community-guided user feedbackand energy optimization. In: Proceedings of the Confer-ence on Wireless Health, WH ’12, pp. 10:1–10:8 (2012)
23. Maurer, U., Smailagic, A., Siewiorek, D.P., Deisher, M.:Activity recognition and monitoring using multiple sen-sors on different body positions. In: 2006 InternationalWorkshop on Wearable and Implantable Body SensorNetworks (BSN 2006), 3-5 April 2006, Cambridge, Mas-sachusetts, USA, pp. 113–116 (2006). DOI 10.1109/BSN.2006.6
24. Misfit: Flash. http://misfit.com/products/flash (2015)25. Mittal, R., Kansal, A., Chandra, R.: Empowering de-
velopers to estimate app energy consumption. In: Pro-ceedings of the 18th annual International Conference onMobile Computing and Networking (MobiCom ’12), pp.317–328 (2012)
26. Mizell, D.: Using gravity to estimate accelerometer ori-entation. In: Proceedings of the 7th IEEE InternationalSymposium on Wearable Computers (ISWC’03), p. 252(2003)
27. Ng, A.Y.: On Feature selection: Learning with Exponen-tially many Irrelevant Features Training Examples. In:Proceedings of the 15th Interntional Conference on Ma-chine Learning, pp. 404–412 (1998)
28. Pathak, A., Hu, Y.C., Zhang, M.: Where is the energyspent inside my app?: Fine grained energy accountingon smartphones with eprof. In: Proceedings of the 7th
Stairstep Recognition and Counting in a Serious Game for Increasing Users’ Physical Activity 19
ACM European Conference on Computer Systems (Eu-roSys ’12), pp. 29–42 (2012)
29. Pirttikangas, S., Fujinami, K., Nakajima, T.: Feature se-lection and activity recognition from wearable sensors. In:H. Youn, M. Kim, H. Morikawa (eds.) Ubiquitous Com-puting Systems, Lecture Notes in Computer Science, vol.4239, pp. 516–527. Springer Berlin Heidelberg (2006)
30. Shoaib, M., Scholten, H., Havinga, P.: Towards physicalactivity recognition using smartphone sensors. In: Pro-ceedings of the 10th International Conference on Ubiqui-tous Intelligence and Computing, pp. 80–87 (2013)
31. Siirtola, P., Roning, J.: Recognizing human activitiesuser-independently on smartphones based on accelerome-ter data. International Journal of Interactive Multimediaand Artificial Intelligence 1(5), 38–45 (2012)
32. Under Armour ConnectedFitness: Endomondo.https://www.endomondo.com/ (2015)
33. World Health Organization: Physical In-activity: A Global Public Health Problemwww.who.int/dietphysicalactivity/factsheet inactivity
34. Wu, W., Dasgupta, S., Ramirez, E.E., Peterson, C., Nor-man, G.J.: Classification accuracies of physical activitiesusing smartphone motion sensors. Journal of medical In-ternet research 14(5), e130 (2012)