my app is an apparatus: how to do mobile hci research in the large

How to do Mobile HCI Research in the large?

Niels HenzeUniversity of StuttgartVisualization and Interactive Systems Institute

Martin PielotTelefónica I+DHCI and Mobile Computing Group

… but lets start with a question:

Who of you ever participated in a user study?

do you think that any of these guys ever did?

Photo by Robertobra, http://en.wikipedia.org/wiki/File:Guarani_Family.JPG (GFDL)

Outline

1. Limitations of common studies2. Into the large3. Types of studies4. What is so special?5. What works for us6. Wrap up

User studies at MobileHCI 201020% acceptance rate43 short+long papers

User studies at MobileHCI 201020% acceptance rate43 short+long paperssubjects per paper

http://nhenze.net/?p=810

User studies at MobileHCI 201020% acceptance rate43 short+long paperssubjects per papersubject’s gender


User studies at MobileHCI 201020% acceptance rate43 short+long paperssubjects per papersubject’s genderoften a biased sample


undergraduate or graduate students at the local university studying a variety of majors

members in a joint research project

most participants were students

studying or working in the University of Glasgow

university students

most subjects were students with a background in computer sciences

students or employees at our university

all with a university degree, recruited in the Institute community

recruited through flyers, posters and various mailing lists at the university

10 university students and 2 participants are marketing professionals

small samples

artificial context

artificial task

convenient samples

Some male students from the labtook part in our study...

Small sample size isn’t necessarily an issue for a study

Not every study needs a perfect sample of the population

Focussing on studies with few subjects prevents many findings

We stew in our own juices if using our own students by default

User studies at MobileHCI 201122.8% acceptance rate63 short+long paperssubjects per paper


Some motivation

Large numbers are expensive in the lab– 1,000 subjects for an hour -> 10,000€– 1,000 subjects for an hour -> 6 month– 1,000 subjects from around the world -> impossible

Different contexts are hard to address– We have no airplane in our lab– Don’t want to train ticket for my participant – And what are the relevant contexts anyway?

Outline


Target selection on mobile phonesthirty right-handed subjectsdifferent target locations and sizes

Example of getting large…

[Park2008MobileHCI]

Target selection on mobile phonesthirty right-handed subjectsdifferent target locations and sizes

Taps are skewedfixed posturesingle deviceKorean studentsvague results

[Park2008MobileHCI]

…same thing in the largegame published on the Android Marketwe inform the player about the studyjust looks like an ordinary gameparticipants get some introductionthey tap the targetsWe vary targets’ size and positionthere is even a high score list

published on the Android Market100,000 installations in three months120 million touch eventsmore than hundred different devicesplayers from all over the world

[Park2008MobileHCI]

[Henze2011MobileHCI]

Outline


Types of work

Proof of concept– Showing that an idea/concept/product works– Lots of users, good ratings, positive comments, ...

App stores as research tool– Experience report– Ethical and legal issues

Investigating app-specific aspects– How a specific app is used– Compare different visualizations

Observing general aspects– Learn about how people and devices behave– How are apps how, how people touch the screen, ...

Proof of concept

Smule’s iPhone Ocarinamusic instrument for the iPhonemillion installations

[Wang2009NIME]

Shapewriterdeveloped gesture-based keyboard + notepadqualitative feedback from App Store comments

[Zhai2009CHI]

App stores as research tool

Into the wild withHungry Yoshilocation based game for the iPhone94,642 unique downloaderinvestigated how to get subjective feedback

[McMillan2010Pervasive]

Experience from 5 Studiescompare amount of collected dataexperience with collecting qualitative datadiscuss internal and external validity

[Henze2011IJMHCI]

SINLA

PocketN

avigator

MapExplorer

Poke th

e RabbitTap It

0%

20%

40%

60%

80%

100%

0.46%7.32%

54.76%

83.68% 81.31%

Local vs. wildlocale study with 11 participantswild study with over 10,000 userscombine the findings of both approaches

[Morrison 2012CHI]

Investigating app-specific aspects

Ratings for Mobile Applicationscompare amount of collected dataexperience with collecting qualitative datadiscuss internal and external validity

[Girardello2010DSZ]

Compare off-screen visualisationsusing repeated measuresusing a tutorial for a map applicationand using a simple game

[Henze2010MobileHCI] [Henze2010MobileHCI]

Observing general aspects

Falling Asleep with … appazaar

[Böhmer2011MobileHCI]

A Study of Battery Life

[Ferreira2011Pervasive]

proof of concept app stores as a research tool

Ethics and legal issues

investigating app-specific aspects

investigating general aspects

[Wang2009NIME]

[Zhai2009CHI]

[Gilbertson2008CiE]

[Oliver2010HotPlanet]

[McMillan2010RiL]

[Miluzzo2010RiL]

[Henze2011IJMHCI]

[McMillan2010Pervasive]

[Cramer2010UbiComp]

[Morrison2010RiL]

[Poppinga2010OMUE]

[Pielot2011ELV]

[Henderson2009HotPlanet]

[Morrison2011CHI]

[Norcie2011ELV]

[Girardello2010DSZ]

[Riccamboni2010IB]

[Kuhn2010MM]

[Yan2011MobiSys]

[Budde2010IoT]

[Karpischek2011RiL]

[Henze2010MobileHCI]

[Henze2010NordiCHI]

[Hood2011IJTR]

[Henze2011MobileHCIa]

[Henze2011MobileHCIb]

[Watzdorf2010LocWeb]

[Ferreira2011Pervasive]

[Buddharaju2010CHI]

[Sahami2011CHI]

[Verkasalo2010MB]

[Böhmer2011MobileHCI]

Outline


but what is special about app store studies?

Common con-trolled studies

Mining existing data

App-based studies

Few participants Many participants Many participants

Artificial context Natural context Natural context

Defined task No tasks Defined tasks (if needed)

Total control over participants No control Weak control over

participants

Heavily biased sample Unbiased sample Biased to unbiased

sample

App-based vs. other studies

You have to “sell” your study

The study has a goal– Collect information about specific behaviour– Performance for a specific task

Users have to install the app on their own will– App needs a purpose– Good ratings, high ranking

Find a compromise– Maintain the goals of the study– Attract sufficient participants

Types of apps

Applications Games Widgets

ParticipantsHow do we count the number of participant?

installations opt-in active users0

10,00020,00030,00040,00050,00060,00070,00080,00090,000

100,000

[McMillan2010Pervasive] [Morrison2010RiL]

ParticipantsHow do we count the number of participant?A good sample of the population?

18-34 35-44 45-54 55-64 65+0%

10%

20%

30%

40%

50%

60%US Android users US population

[Nielsen2011] [USCensusBureau2008]

Collecting information

Objective data– As early as possible [Henze2011IJMHCI]

– More than just the task performance• All aspects that affect the results• E.g. device type, local, time, screen size, resolution, ...• In particular: a version number

– Compromise between permissions and data to collect


Subjective data– App Store comments can provide information• but usually don't [Henze2011IJMHCI]

• Might help to claim an app is great (e.g. [Zhai2009CHI])

• Ratings without baseline are meaningless

– Investigated how to get subjective feedback [McMillan2010Pervasive]

• In-game “tasks” with dynamically loaded questions• Integration with Facebook• Interviewed 10 people over VoIP for $25


You have to measure what you intend to measure!

Case Study: Pocket Navigator [Pielot2012CHI]

motivation: distraction

one in six (17%) cell-toting adults say they have been so distracted while talking or texting that they have physically bumped into another person or an object

Madden and Rainie, 2010, http://pewinternet.org/Reports/2010/Cell-Phone-Distractions.aspx

pocketnavigatornavigation system similar to Google Mapsruns on OpenStreet Maps

pocketnavigatornavigation system similar to Google Mapsruns on OpenStreet Maps

key innovation: convey navigation information in vibration patterns

evaluated in afield studyvibration patterns found to be effectivethey reduce level of distraction

evaluated in field studyvibration patterns found to be effectivethey reduce level of distraction

but, users were no expertsand did not use navigation support out of a necessity

evaluated in field studyvibration patterns found to be effectivethey reduce level of distraction

but, users were no expertsand did not use navigation support out of a necessityInstead of bringing the user into the “lab”

we bring the lab to the user’s daily life

Collecting data,Feb – Dec 2011

quick facts18,000 downloadsmostly US and Europe


Between Feb – Dec 20118,187 routes calculated34,035,316 log entries9,400 hours of usage


Between Feb – Dec 20118,187 routes calculated34,035,316 log entries9,400 hours of usage

a lot of data! But …

pedestrian navigation?

pedestrian navigation?we cannot prevent people from using the app anywhere, e.g. in cars

pedestrian navigation?we cannot prevent people from using the app anywhere, e.g. in carsin fact, 87% of all log data are from indoor use

pedestrian navigation?we cannot prevent people from using the app anywhere, e.g. in carsin fact, 87% of all log data are from indoor use hence filtering (route length, travel time, movement speed) required

lessons learneddouble-check that you measure the intended use!filter data might be necessaryacknowledge the fact that there is always uncertainty!

[Pielot2012CHI]


You have to measure what you intend to measure!

Another Example: TypeIt

TypeItcompare approaches to improve text entrypeople play as along as they want

[Henze2012CHIa, Henze2012CHIb, Henze2012Text]

TypeItcondition affects the number of played levels

4 conditions

TypeItcondition affects the number of played levelsAn ANOVA shows that the

feedback has a significant effect on the total number of levels played (p<.01).

TypeItcondition affects the number of played levelsFactor the number of played levels out using an ANCOVA

Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. (Wikipedia)

Ready for prime timeUsers don’t care if it’s a research prototype

FC the rabbit.... uninstalledGodimus Prime

Stupid waste of time!!!cailan

Stupid waste of time.lance

Realy stupidhope

1 word...... dumb!josue

Its okerika

boring and dumb.Beba

What the hell is this??Luci

Boo!Cullen Girl

5 stars if there is a way to turn the music off.Doesnt go to well with slipknot

Allen

Stupid and offinciveto my pet rabbit bayleigh

Logan

Ready for prime timeUsers don’t care if it’s a research prototypeLow quality results in low ratings

Ready for prime timeusers don’t care if it’s a research prototypelow quality results in low ratingsand few install installations

Ethical and legal issues

“Primum non nocere”/”First, do no harm” (Thomas Sydenham)

“One should treat others as one would like others to treat oneself” [Flew1979Dictionary]

Informed consentPresentation highly affects the conversion rate

[Pielot2011ELV]

6.96% 57.28%

67.42% 87.57%

Informed consentPresentation highly affects the conversion rateParticipants aren't aware what data is collected

[Morrison2011CHI]

RegulationsWhich rules to follow?

RegulationsWhich rules to follow?e.g. EU Data Protection Directive

[Henderson2009HotPlanet]

“any information relating to an identified or identifiable natural person”

• Transparency: the persons whose data are being collected or accessed have the right to be informed when such data processing is taking place.

• Legitimate purpose: data can only be collected for specific purposes

• Proportionality: data should be processed in a fashion that is not excessive beyond the purposes for which they were collected

Outline


… or what works for us

Games vs. Appsour games are more successful

SINLA

MapEx

plorer

Poke th

e Rab

bittap

It!

Type I

t!Hit I

t!0

50,000

100,000

150,000

200,000

250,000

300,000

350,000

400,000number of installations

apps 84.4%

games 15.6%

Games vs. Appsour games are more successfulthere are more apps than games

available in the Android Market

http://www.androlib.com/appstatstype.aspx

Games vs. Appsour games are more successfulthere are more apps than gamesplayers execute the strangest tasks

Games vs. Appsour games are more successfulthere are more apps than gamesplayers execute the strangest taskswidgets and background services are perfect for longitudinal observations

Games vs. Appsour games are more successfulthere are more apps than gamesplayers execute the strangest taskswidgets and background services are perfect for longitudinal observationsbut sometimes an app is just the only option

Informing the userprovide information in the Market

Informing the userprovide information in the Marketshow a modal dialog at the first start

Informing the userprovide information in the Marketshow a modal dialog at the first startprovide more information and a link to an about page

Publishingfancy screenshots and icon (that’s the first thing someone sees)title & description contain words users search forof course I don’t want to miss a single userprepare a dedicated webpage for each app

Playing with the marketfrequent updates

Playing with the marketfrequent updatesrate your app as soon as it becomes available

Keep it simplefocused and specialized studies

Keep it simplefocused and specialized studieslearning by doing

Keep it simplefocused and specialized studieslearning by doingrelease early, often, and try it again if it doesn’t work

Logginguse http and port 80 to transmit data

Logginguse http and port 80 to transmit datastore unaggregated measures

[Henze2012CHI]

Logginguse http and port 80 to transmit datastore unaggregated measuresconsider limited resources

in total:392,401 files

27,331,383,646 bytes

CSV files from ~400,000 users

Logginguse http and port 80 to transmit datastore unaggregated measuresconsider limited resourcesseriously!

Compressed binary data from less than 3,000 users

Advertisementsdoes not work!

200$ for AdMob over a couple of days

TapSnap: http://tiny.cc/tapsnap

Advertisementsdoes not work!well sometimes it does!

100$ for AppBrain on a single day

TypeIt II: http://tiny.cc/TypeIt2

Advertisementsdoes not work!well sometimes it does!focus all your efforts on a very short timeget additional users naturally

100$ for AppBrain on a single day

TypeIt II: http://tiny.cc/TypeIt2

What do?No harm!

Inform the userDon't store data you don't want

Choose a type of appGames worked for meBut if you have a great system anyway...

Sell you studyYou compete with commercial appsGraphics, design, ...

ReleaseKeywords, description, ...Rate and commentFocus your advertisement efforts

Test itWell I don't do thatAt least fix it

Think about the dataDo you store everything interestingCan you store data from 10,000 users?Can you analyse it?

small samples

small samples

large

artificial context

artificial context

natural?

artificial task

artificial task?

convenient samples

convenient samples

very

but how bad is it?

How to do Mobile HCI Research in the large?

Niels HenzeUniversity of StuttgartVisualization and Interactive Systems Institute

Martin PielotTelefónica I+DHCI and Mobile Computing Group

ethnography, controlled experiments, observations, … can all work in the large

collect data early, release often, be flexible

respect ethics, consider regulations

References[Morrison 2012CHI] Alistair Morrison, Donald McMillan, Stuart Reeves, Scott Sherwood, Matthew Chalmers: A Hybrid

Mass Participation Approach to Mobile Software Trials. Proceedings of CHI, 2012.[Wang2009NIME] Ge Wang: Designing Smule’s iPhone Ocarina. Proc. NIME, 2009.[Zhai2009CHI] Zhai, S., Kristensson, P.O., Gong, P., Greiner, M., Peng, S., Liu, L. Dunnigan, A., Shapewriter on the iPhone:

from the laboratory to the real world. Adjunct Proc. CHI, 2009.[Gilbertson2008CiE] Paul Gilbertson, Paul Coulton, Fadi Chehimi, Tamas Vajk: Using 'Tilt' as an Interface to control 'No

Button' 3-D Mobile Games. ACM Computers in Entertainment, 2008.[Oliver2010HotPlanet] Earl Oliver. The Challenges in Large-Scale Smartphone User Studies. Invited talk @ HotPlanet,

2010.[McMillan2010RiL] Donald McMillan: iPhone Software Distribution for Mass Participation. Proc. Research in the Large

Workshop @ UbiComp, 2010.[Miluzzo2010RiL] Emiliano Miluzzo, Nicholas D. Lane, Hong Lu, Andrew T. Campbell: Research in the App Store Era:

Experiences from the CenceMe App Deployment on the iPhone. Proc. Research in the Large Workshop @ UbiComp, 2010.

[Henze2011IJMHCI] Niels Henze, Martin Pielot, Benjamin Poppinga, Torben Schinke, Susanne Boll: My App is an Experiment: Experience from User Studies in Mobile App Stores, accepted by the International Journal of Mobile Human Computer Interaction (IJMHCI), 2011

[McMillan2010Pervasive] Donald McMillan, Alistair Morrison, Owain Brown, Malcolm Hall & Matthew Chalmers: Further into the Wild: Running Worldwide Trials of Mobile Systems, Proc. Pervasive 2010.

[Cramer2010UbiComp] Henriette Cramer, Mattias Rost, Nicolas Belloni, Didier Chincholle, Frank Bentley: Research in the Large. Using App Stores, Markets, and Other Wide Distribution Channels in Ubicomp Research. Adjunct Proc. Ubicomp, 2010.

[Morrison2010RiL] Alistair Morrison, Stuart Reeves, Donald McMillan, Matthew Chalmers: Experiences of Mass Participation in Ubicomp Research, Proc. Research In The Large Workshop at Ubicomp, 2010.

[Poppinga2010OMUE] Benjamin Poppinga, Martin Pielot, Niels Henze, Susanne Boll: Unsupervised User Observation in the App Store: Experiences with the Sensor-based Evaluation of a Mobile Pedestrian Navigation Application. Proc. OMUE in conjunction with NordiCHI, 2010.

References[Pielot2011ELV] Martin Pielot, Niels Henze, Susanne Boll: Experiments in App Stores – How to Ask Users for their

Consent?, Proceedings of the CHI workshop on Ethics, logs & videotape, 2011.[Henderson2009HotPlanet] Tristan Henderson, Fehmi Ben Abdesslem: Scaling Measurement Experiments to Planet-

Scale: Ethical, Regulatory and Cultural Considerations. Proc. HotPlanet, 2009.[Morrison2011CHI] Alistair Morrison, Owain Brown, Donald McMillan, Matthew Chalmers: Informed Consent and Users'

Attitudes to Logging in Large Scale Trials. Adjunct Proc. CHI, 2011.[Norcie2011ELV] Greg Norcie: Ethical and Practical Considerations For Compensation of Crowdsourced Research

Participants, Proc. ETHICS, LOGS and VIDEOTAPE @ CHI, 2011.[Girardello2010DSZ] A. Girardello, F. Michahelles, Explicit and Implicit Ratings for Mobile Applications. In 3. Workshop

“Digitale Soziale Netze” and der 40. Jahrestagung der Gesellshaft für Informatik, September 2010, Leipzig.[Riccamboni2010IB] Rodolfo Riccamboni, Alessio Mereu, Chiara Boscarol: Keys to Nature: A test on the iPhone market.

Tools for Identifying Biodiversity: Progress and Problems, 2010.[Kuhn2010MM] Michael Kuhn, Roger Wattenhofer, Samuel Welten: Social Audio Features for Advanced Music Retrieval

Interfaces. Proc. MM, 2010.[Yan2011MobiSys]Bo Yan, Guanling Chen: AppJoy: Personalized Mobile Application Discovery. Proc. MobiSys, 2011.[Budde2010IoT] Andreas Budde, Florian Michahelles: Product Empire - Serious play with barcodes. Proc. IoT, 2010.[Karpischek2011RiL] Stephan Karpischek, Geron Gilad, Florian Michahelles: Towards a Better Understanding of Mobile

Shopping Assistants - A Large Scale Usage Analysis of a Mobile Bargain Finder Application. Workshop on Research in the Large @ UbiComp, 2011.

[Henze2010MobileHCI] Niels Henze, Susanne Boll: Push the Study to the App Store: Evaluating Off-Screen Visualizations for Maps in the Android Market, Proc. MobileHCI, 2010

[Henze2010NordiCHI] Niels Henze, Benjamin Poppinga, Susanne Boll: Experiments in the Wild: Public Evaluation of Off-Screen Visualizations in the Android Market, Proc. NordiCHI, 2010.

References[Hood2011IJTR] Jeffrey Hood, Elizabeth Sall, Billy Charlton: A GPS-based Bicycle Route Choice Model for San Francisco, California.

Transportation Letters: The International Journal of Transportation Research, 2011[Henze2011MobileHCIa] Niels Henze, Enrico Rukzio, Susanne Boll: 100,000,000 Taps: Analysis and Improvement of Touch

Performance in the Large, Proceedings of MobileHCI, 2011[Henze2011MobileHCIb ] Niels Henze, Susanne Boll: Release Your App on Sunday Eve: Finding the Best Time to Deploy Apps,

Adjunct proceedings of MobileHCI, 2011[Henze2012CHIa] Niels Henze, Enrico Rukzio, Susanne Boll: Observational and Experimental Investigation of Typing Behaviour

using Virtual Keyboards on Mobile Devices, Proceedings of CHI 2012.[Henze2012CHIb] Niels Henze: Hit it!: an apparatus for upscaling mobile HCI studies. Proceeding of CHI Extended Abstracts, 2012.[Henze2012Text] Niels Henze: Ten male colleagues took part in our lab-study about mobile texting, Proceedings of the Workshop

on Designing and Evaluating Text Entry Methods in conjunction with CHI, 2012.[Watzdorf2010LocWeb] Stephan von Watzdorf, Florian Michahelles: Accuracy of Positioning Data on Smartphones. Proc. LocWeb,

2010.[Ferreira2011Pervasive] Denzil Ferreira, Anind K. Dey, Vassilis Kostakos: Understanding Human-Smartphone Concerns: A Study of

Battery Life. Proc. Pervasive, 2011.[Buddharaju2010CHI] Pradeep Buddharaju, Yuichi Fujiki, Ioannis Pavlidis, Ergun Akleman: A Novel Way to Conduct Human Studies

and Do Some Good. Adcunct Proc. CHI, 2010.[Sahami2011CHI] Alireza Sahami, Michael Rohs, Robert Schleicher, Sven Kratz, Alexander Müller, Albrecht Schmidt: Real-Time

Nonverbal Opinion Sharing through Mobile Phones during Sports Events, Proc. CHI 2011.[Verkasalo2010MB] Hannu Verkasalo: Analysis of Smartphone User Behavior, Proc. Ninth International Conference on Mobile

Business, 2010.[Böhmer2011MobileHCI] Matthias Böhmer, Brent Hecht, Johannes Schöning, Antonio Krüger, Gernot Bauer: Falling Asleep with

Angry Birds, Facebook and Kindle – A Large Scale Study on Mobile Application Usage. Proc. MobileHCI, 2011.[Agarwal2010HotNets] Sharad Agarwal, Ratul Mahajan, Alice Zheng, Victor Bahl: There’s an app for that, but it doesn’t work.

Diagnosing Mobile Applications in the Wild. Proc. Hotnets, 2010.[Morrison2010RiL] Alistair Morrison, Matthew Chalmers: SGVis: Analysis of Mass Participation Trial Data. Proc. Research In The

Large Workshop at Ubicomp, 2010.[Lane2010CM] Nicholas D. Lane, Emiliano Miluzzo, Hong Lu, Daniel Peebles, Tanzeem Choudhury, Andrew T. Campbell: A Survey

of Mobile Phone Sensing. IEEE Communications Magazine, 2010.

my app is an apparatus: how to do mobile hci research in the large

Education

long papers subjects

university students

university user studies

user study

paper subjects genderhttp

types of studies4

user studies atmobilehci

graduate students