a privacy analysis of cross-device tracking

19
This paper is included in the Proceedings of the 26th USENIX Security Symposium August 16–18, 2017 • Vancouver, BC, Canada ISBN 978-1-931971-40-9 Open access to the Proceedings of the 26th USENIX Security Symposium is sponsored by USENIX A Privacy Analysis of Cross-device Tracking Sebastian Zimmeck, Carnegie Mellon University; Jie S. Li and Hyungtae Kim, unaffiliated; Steven M. Bellovin and Tony Jebara, Columbia University https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/zimmeck

Upload: others

Post on 06-Feb-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Privacy Analysis of Cross-device Tracking

This paper is included in the Proceedings of the 26th USENIX Security SymposiumAugust 16–18, 2017 • Vancouver, BC, Canada

ISBN 978-1-931971-40-9

Open access to the Proceedings of the 26th USENIX Security Symposium

is sponsored by USENIX

A Privacy Analysis of Cross-device TrackingSebastian Zimmeck, Carnegie Mellon University; Jie S. Li and Hyungtae Kim, unaffiliated;

Steven M. Bellovin and Tony Jebara, Columbia University

https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/zimmeck

Page 2: A Privacy Analysis of Cross-device Tracking

A Privacy Analysis of Cross-device Tracking∗

Sebastian Zimmeck, Jie S. Li, Hyungtae Kim, Steven M. Bellovin, Tony JebaraSchool of Computer Science, Carnegie Mellon UniversityDepartment of Computer Science, Columbia University

[email protected], [email protected], [email protected],{smb, jebara}@cs.columbia.edu

Abstract

Online tracking is evolving from browser- and device-tracking to people-tracking. As users are increasingly access-ing the Internet from multiple devices this new paradigmof tracking—in most cases for purposes of advertising—isaimed at crossing the boundary between a user’s individualdevices and browsers. It establishes a person-centric viewof a user across devices and seeks to combine the input fromvarious data sources into an individual and comprehensiveuser profile. By its very nature such cross-device tracking canprincipally reveal a complete picture of a person and, thus,become more privacy-invasive than the siloed tracking viaHTTP cookies or other traditional and more limited trackingmechanisms. In this study we are exploring cross-devicetracking techniques as well as their privacy implications.

Particularly, we demonstrate a method to detect the occur-rence of cross-device tracking, and, based on a cross-devicetracking dataset that we collected from 126 Internet users,we explore the prevalence of cross-device trackers on mobileand desktop devices. We show that the similarity of IPaddresses and Internet history for a user’s devices gives riseto a matching rate of F-1 = 0.91 for connecting a mobile toa desktop device in our dataset. This finding is especiallynoteworthy in light of the increase in learning power thatcross-device companies may achieve by leveraging user datafrom more than one device. Given these privacy implicationsof cross-device tracking we also examine compliance withapplicable self-regulation for 40 cross-device companies andfind that some are not transparent about their practices.

1 Introduction

A recent study by Google showed that 98% of surveyed Inter-net users in the U.S. use multiple devices on a daily basis, and90% switch devices sequentially to accomplish a task over

∗Most of the work on which we are reporting was done when SebastianZimmeck and Hyungtae Kim were at Columbia University.

Figure 1: Identifying and correlating Sally’s phone and desktopamong all devices on the Internet allows cross-device companiesto target ads on both of her devices.

time [37]. From an ad network’s1 perspective these develop-ments create a challenging environment as they increase thecomplexity of targeting advertising to specific users. Attribut-ing conversions of ads to actual purchases and frequencycapping to avoid showing a user the same ad over and overagain becomes more difficult as well. However, there is asolution to these challenges: cross-device tracking. This tech-nique presents a fundamental shift from device tracking topeople tracking. As shown in Figure 1, it principally allowsad networks to follow a user on his or her online journeythrough all devices. However, at the same time, cross-devicetracking of users is potentially more privacy-invasive thanthe tracking of individual devices without connecting them.

In this study we are exploring the emerging cross-devicetracking ecosystem from a privacy perspective. Particularly,we are interested in studying the tracking techniques usedby cross-device companies,2 understanding the extent to

1We are using the term ad network broadly encompassing ad exchanges,demand/supply side platforms, and other companies in the online ad space.

2The term cross-device company encompasses ad networks, analytics ser-vices, and other companies that are using cross-device tracking techniques.

USENIX Association 26th USENIX Security Symposium 1391

Page 3: A Privacy Analysis of Cross-device Tracking

which cross-device tracking occurs on desktop and mobiledevices, and evaluating the privacy implications of machinelearning applications to cross-device data. We understandcross-device tracking to mean the tracing of an individual’susage of the Internet on multiple devices and combining allresulting information into one comprehensive user profile.

Cross-device tracking exists in a deterministic andprobabilistic variant. The former is based on a first-partyrelationship that often permits user identification withcertainty, for example, when a user logs into a social networkaccount from multiple devices. For the majority of our studywe focus on probabilistic cross-device tracking, which isused by services that are limited to a third-party relationshipwith users. To that end, ad networks and analytics servicesoftentimes cooperate with web and app publishers thathave a first-party user relationship and deploy trackingmechanisms on their properties. Applying machine learningthey then correlate the various data streams to identify thosebelonging to the same users. Probabilistic and deterministiccross-device tracking approaches are often combined ascompanies of different provenance collaborate and exchangedata [32]. While we examine cross-device tracking viaHTTP cookies, pixel tags, and other traditional mechanisms,such tracking can also occur via ultrasound signals [7,31,59]or other side channels, which we do not examine here.

As some cross-device companies match billions ofdevices [22] and social networks have cross-device function-ality naturally built into their systems lawmakers began totake notice. In particular, the U.S. Federal Trade Commission(FTC) hosted a cross-device workshop [29] facilitatingan initial public discussion on the privacy implications ofthis form of Internet tracking. The regulators discussedwith industry representatives, academics, and various otherstakeholders privacy risks, consumer transparency, andeffective industry self-regulation. They followed up withprivacy recommendations for cross-device companies [32].As evidenced by a recent case on cross-device tracking viaultrasound signals and the withdrawal of the service from theU.S. market, the FTC is determined to enforce the existinglaws and regulations [31, 32], however, is hampered byinsufficient insight into the used technologies [30].

In this study we are exploring cross-device trackingthrough the lens of privacy contributing the following:

1. By means of a brief case study we introduce a methodfor detecting cross-device trackers. We find statisticalsignificance for various ad networks’ capabilities oftargeting mobile users on their desktop. (§ 3.)

2. We make publicly available a cross-device trackingdataset as well as software that we used for collectingthe data.3 We give a statistical overview of cross-deviceusage patterns for the users in our dataset. (§ 4.)

3The dataset and software can be found at https://github.com/

SebastianZimmeck/Cross_Device_Tracking.

3. We design a basic algorithm and evaluate featuresand parameters for probabilistic cross-device trackingbased on relevant patent and other industry documents.Using IP addresses, web domains, and app domainsour techniques achieve an F-1 score of 0.91 on thecollected data. (§ 5.)

4. Leveraging our dataset we analyze how the availabilityof both mobile and desktop data may impact theprediction of users’ demographics and interests.Specifically, we examine predictions for gender andinterest in finance. (§ 6.)

5. Based on our dataset we calculate the penetration ofcross-device tracking on the Internet and concludethat some cross-device companies seem to have broadinsight into Internet users’ cross-device usage. (§ 7.)

6. Finally, we explore the efficacy of the industry’s self-regulation and find that some cross-device companiesdo not transparently disclose their practices. (§ 8.)

2 Related Work

Our study is based on work in online tracking (§ 2.1), human-computer interaction (§ 2.2), and machine learning (§ 2.3).

2.1 Online TrackingMuch research was published on online tracking. Notably,Roesner et al. [72] developed a tracker taxonomy and exam-ined how tracking occurs in the wild. Lerner et al. [56] pro-vided a historical perspective of tracker evolution over time.However, few existing efforts discuss tracking across devices.Similar to traditional tracking such cross-device trackingrequires the identification of individual users’ browserinstances. In this regard, Englehardt et al. [27] point out thatcookies allow for linking a user’s visits to different websiteseven if his or her device IP address varies. They conducteda large-scale measurement of traditional online trackingusing their OpenWPM platform [26]. Cross-device trackingfurther requires the correlation of users’ different devices.As Olejnik et al. [67] remarked, browsing histories couldpotentially identify the same user across multiple devices.

In the closest work to ours Brookman and his co-authorsfrom the FTC [11] examine the potential for devicecorrelation by surveying the occurrence of cross-devicetrackers on 100 popular websites. They also evaluate theextent to which cross-device companies notify users of theirpractices. While this inquiry into privacy transparency ispart of our study as well (§ 8), we extend their work. Inparticular, we provide statistical support for the occurrenceof cross-device tracking (§ 3), evaluate cross-device trackingtechniques (§ 5), analyze the potential increase in learningpower from cross-device data (§ 6), and examine thepenetration of cross-device trackers (§ 7); all on real userdata (§ 4). Our study is also complementary to the work of

1392 26th USENIX Security Symposium USENIX Association

Page 4: A Privacy Analysis of Cross-device Tracking

Mavroudis et al. [59] and Arp et al. [7], who present analyses,attacks, and defenses for ultrasound-based cross-devicetracking. We are exploring cross-device tracking based oncookies and other traditional tracking mechanisms.

If a browser does not accept cookies, it still can be trackedvia device fingerprinting as initially shown by Kohno etal. [51], Eckersley et al. [25], and extensively surveyed byLerner et al. [56]. Kurtz et al. [52] and Gulyas et al. [39]showed that mobile device fingerprints are often unique, dis-tinguishable, and re-identifiable. Fingerprinting can be basedon sensors [19] and, notably for our purposes, can be em-ployed across browsers [15]. With their FPDetective Acar etal. [2] conducted a large-scale study of device fingerprinting.Nikiforakis et al. [66] provided insight into the practices ofthree popular browser-fingerprinting libraries and introducedPriVaricator [65], which is a defense against browser finger-printing based on randomization. Three advanced trackingmechanisms—canvas fingerprinting, evercookies, and use ofcookie syncing—were investigated by Acar et al. [1]. In ourstudy we are now exploring the extent to which fingerprintingcan play a role in cross-device tracking (§§ 5.1, 7.1). Variousworks on website fingerprinting [12,13,17,41,44,69] informour study in this regard as well.

As we conduct a first cross-device tracking data flowexperiment our work also relates to similar experiments andmethodologies in other areas of online tracking. Particularly,our work relates to the study of Meng et al. [60], whoshowed that there is a correlation between Google ads andusers’ profiles and evaluated the likelihood of learning users’sensitive information. Focusing on Google as well, Lecuyeret al. [54, 55] were able to show a correlation betweenusers’ e-mail content and ads served to them. Further, Bookand Wallach [10] collected a set of about 225K ads on 32simulated devices and analyzed how the ads were targetedby correlating them to targeting profiles. In addition, Zarraset al. [85] performed a large-scale study on the securityof ad serving, and Meng et al. [61] presented an ad fraudattack that enables publishers to increase their ad revenue.In our experiment we follow the recommendations givenfor information flow experiments by Tschantz et al [81].

2.2 Human-Computer Interaction

While there are only few online tracking studies investigatinghow users are tracked across devices, various efforts onhuman-computer interaction are informative for our purposes.The goal of these studies is to improve website navigation,browser prediction of user destinations, and search resultrelevance for search engines [3]. To that end, we leverage theinsight of some studies focusing on website revisit patternsand highlighting the identifying potential of such revisits. Inthis regard, Tauscher and Greenberg [78] found that 58%of a user’s visits to websites constitute revisits. People tendto access only a few pages frequently and browse in small

clusters of related pages. Adar et al’s [3] analysis revealsvarious patterns of revisits, each with unique behavioral,content, and structural characteristics.

Some studies took a closer look at website revisits acrossdevices. Tossell et al. [79] were able to detect that revisitsoccurred very infrequently with approximately 25% ofURLs revisited by each user. They further found that,compared to desktops, mobile browsers are accessed lessfrequently, for shorter durations, and to visit fewer pages.Users seem to rely on apps instead. Different from websites,apps have a revisit rate of 97.1% driven by a high numberof visits to the five most frequently accessed apps. It appearsthat mobile web use is more concentrated and narrow thanits desktop counterpart. Indeed, Kamvar et al.’s work [46]confirms this conjecture for the use of web search.

In their quest for improving the sharing of bookmarks andother information across devices Kane et al. [47] found thatusers tend to visit many of the same domains on both theirmobile phones and desktops. Specifically, they found that amedian of 75.4% of the domains viewed on the phone werealso viewed on the desktop, and a median of 13.1% of the do-mains viewed on the desktop were also viewed on the phone.Despite the differing browsing habits across devices, partic-ularly, the higher number of websites visited on desktops,they conclude that users’ web browsing activities are similaracross devices. However, users do not use all of their devicesin the same way but rather assign them different roles, asDearman and Pierce [20] found. As we will explore further(§ 6), these different roles could be a reason for why learningabout users’ interests can be more comprehensive with datafrom more than one device. While the sharing of devicescan also principally impact cross-device companies’ abilityto track users across those, Matthew et al.’s study [58] foundthat phones were never shared among multiple individualsfor mutual use, and computers were shared moderately.

2.3 Machine Learning

Different from traditional types of online tracking cross-device tracking is often based on machine learning. In 2015,Drawbridge [22], an ad network specialized on cross-devicetracking, hosted the ICDM 2015: Drawbridge Cross-DeviceConnections competition asking competition participants toleverage machine learning techniques to correlate devices tousers [23]. The competition participants were given accessto an anonymized proprietary dataset with mostly hidden fea-tures. The competition generated various short papers [14,48,50,53,62,71,75,82] that take the perspective of an ad network.They focus on improving machine learning performance fora narrow set of features; essentially, only exploiting similarityof IP addresses. Our evaluation of tracking techniques broad-ens this research and is centered on privacy implications.

The first place solution of Walthers [82], which reachedan F-0.5 score of 0.9, is in some ways representative for the

USENIX Association 26th USENIX Security Symposium 1393

Page 5: A Privacy Analysis of Cross-device Tracking

1. google.com2. google.com; buy pet food - Google Search3. m.petsmart.com; PetSmart4. m.petsmart.com; Food5. m.petsmart.com; Fancy Feast Classic Adult Cat6. google.com; petco - Google Search7. m.petco.com; Pet Supplies, Pet Food, and Pet P.8. m.petco.com; Cat Furniture: Cat Trees, Towers9. m.petco.com; Cat Food10. m.petco.com; Browse & Buy Hill’s Science Diet11. m.petco.com; Hills Science Diet Adult Perfect W.12. instinctpetfood.com; Instinct Pet Food13. instinctpetfood.com; Instinct Pet Food For Your Cat14. instinctpetfood.com; Instinct Raw for Cats15. google.com; beneful cat food - Google Search16. google.com; instacart17. google.com18. google.com; buy watch - Google Search19. brilliantearth.com; Beyond Conflict Free Diamonds20. google.com; buy refrigerator - Google Search21. offers.geappliances.com; Drimmers - Offers GE A.22. m.homedepot.com; Top Freezer Refrigerators - Re.23. m.homedepot.com; Refrigerators24. searshometownstores.com; Refrigerators & Freez.25. searsoutlet.com; Refrigerators & Freezers for Sale26. amazon.com27. amazon.com; search for refrigerator28. amazon.com; LG LSXS26366S 35-Inch Side29. shoppermart.net; ShopperMart.net: Find the best30. samsung.com; Galaxy TabPro S - 2-in-1 Tablet

Figure 2: The mobile browser history (without visits to theAlexa-ranked homepages in the first two months of the experiment).The list shows the domains and the titles of the webpages, if any.

A. PetSmart

nytimes.com

adsense.com

Google AdSense

Google Display Network

B. Miele/Abt

latimes.com

as.chango.com

Rubicon Project

Tapad

C. Kate Spade

aol.com

redirectingat.com

Skimlinks

Lotame

Figure 3: Selected ads served to the desktop browser after visitingthe sites in Figure 2 on the mobile browser. We had not seen anyof these ads in our desktop browser session two months before.

approaches taken in the competition. Comparable to otherparticipants’ solutions [14,50,53], it identified IP addressesthat devices of the same user were connected to as the mostimportant feature. Intuitively, as conjectured by Cao etal. [14], devices with similar IP footprints are more likelyto be used by the same individual. Thus, the simple relianceon IP address history can already lead to an F-0.5 score of0.86 [14]. However, various studies found that not all IPaddresses are equally meaningful, in particular, because thesame public or cellular IP address can be assigned to manydifferent users at different times [48,82].

Participants in the Drawbridge competition [23] did notfind online history particularly useful for their task. They re-ported that correlating online history across devices providedonly minimal gain [50, 53]. This seemingly contradictoryresult to the previously discussed usability studies, whichhinted at cross-device website revisit patterns as an impor-tant feature, could be due to the fact that the Drawbridgedataset [45] provided only app history for mobile devices.Thus, the absence of mobile web history could be a reasonfor participants’ inability to reach Drawbridge’s precision of0.97 [22]. While the Drawbridge competition was about thecorrelation of different user devices, it did not address the pur-pose of the correlation: the prediction of users’ demographicsand interests, which we will discuss in our study (§ 6).

3 Detecting Cross-device Trackers

In order to evaluate the occurrence of cross-device trackingin the wild we conducted an exploratory case study.Purpose. It could be argued that our case study is aimed atthe obvious: detecting the existence of cross-device tracking.However, we emphasize that it is our intention to show aprocedure for identifying unknown cross-device companies.The procedure is also intended to be used for determiningwhether known companies are adhering to the limits that(self-)regulation imposes on them, in particular, as users aregiven the right to opt out from cross-device tracking [21].Our case study provides an initial information flowexperiment in the cross-device space. However, we cautionthat we leave a comprehensive analysis, which was done inother areas of online tracking [54,55], for further research.Establishing an IP Address Connection. We began ourexperiment by connecting two devices—a desktop and amobile device—to the same router and modem. Using afresh desktop browser without any user data we visited thehomepages of five randomly selected news websites fromthe Alexa rankings [4]—aol.com, latimes.com, nytimes.com,wsj.com, and washingtonpost.com (the test homepages). Werefreshed each test homepage ten times, as recommendedfor these type of information flow experiments [81], andobserved the ads that were served. We also set up a desktopdevice with a fresh browser connected to a different routerand modem as control instance. In the following two monthswe occasionally and randomly visited 100 highly rankedhomepages [4] on our fresh mobile browser.Observing Cross-device Ads. After two months we usedthe mobile browser to visit the websites shown in Figure 2.We searched Google for various consumer products andclicked on ads served for those on the search results pages.After a few hours we switched to our desktop browser andaccessed the test homepages. We refreshed each ten times.Some of the served ads, which we had not seen before, werefor products we had searched for on the mobile device. Fig-ure 3 shows the ads and associated information, that is, the

1394 26th USENIX Security Symposium USENIX Association

Page 6: A Privacy Analysis of Cross-device Tracking

name of the ad (e.g., PetSmart), the domain on which it wasserved (e.g., nytimes.com), the domain of the ad server (e.g.,adsense.com), the ad network serving the ad (e.g., Google Ad-Sense), and the involved cross-device tracking provider (e.g.,Google Display Network).4 Our results suggest that the adnetworks serving the ads had learned that the user who did thesearch on the phone was the same as the user on the desktop.

To assess the similarity of ads we categorized each adaccording to Google ad categories [35]. Then, based on anexact one-tailed permutation test, as recommended [81], wecompared the ad distribution served on the desktop browserto the ad distribution served on the desktop control browser.We evaluated the null hypothesis that both distributions donot differ from each other at the 0.05 significance level.However, the result of p = 0.02 indicates that the nullhypothesis should be rejected and that the deviation of bothdistributions is statistically significant at the 0.05 level. Thisfinding suggests that we successfully identified instances ofcross-device tracking. We also found mobile cookie syncingbetween Rubicon Project and Tapad. However, confirmingearlier observations [11], we did not detect any cookiesyncing across devices.Direction of Ad Serving and App-Web Correlation. Inaddition to cross-device tracking from mobile to desktopwe were further interested in the reverse direction. However,searching Google on our desktop for buying products did notseem to lead to ads for these products on our mobile browser.One explanation might be that the ad serving was limitedto one direction—from mobile to desktop— as users tend tomove from a smaller to a larger screen [33,37]. Another ex-planation could be that ad networks attached more weight tothe history on the device to which an ad was served and lessto other connected devices. Further, we might simply havemissed all cross-device campaigns at the time for the prod-ucts we searched for. Finally, we were not able to notice anycorrelation in ad serving in either direction when repeatingour experiment with mobile apps instead of websites.

4 The Cross-device Tracking Dataset

A major reason for the scarcity of academic research incross-device tracking is the unavailability of data. Generally,only proprietary industry data exists.5 Thus, we collectedour own cross-device tracking dataset (the CDT dataset).Here we describe how we collected the data and highlightcross-device usage patterns of the users in the dataset.

4We assume that the Google Display Network covers sites usingAdSense, DoubleClick, Blogger, YouTube, and AdMob. On one side, thisis likely an overestimation as not all sites using these trackers are part ofthe Google Display Network. On the other side, it is an underestimationas there are sites that are part of it, however, not using any of the trackers.In total, the Google Display Network covers over two million sites [34].

5The Drawbridge dataset [45] was only accessible to participants of theDrawbridge competition [23] and limited in its use for that purpose.

Desktop Web Mobile Web Mobile AppsUsers 125 102 104IPs 1,994 5,784Domains 23,517 3,876 845

Table 1: Summary statistics showing the total number of uniqueusers, IP addresses, and domains in the CDT dataset.

Desktop Web Mobile Web Mobile Apps25th, 50th, 75th 25th, 50th, 75th 25th, 50th, 75th

Days 19, 22, 26 9, 17, 23 19, 22, 24IPs 6, 17, 24 25, 63, 92Domains 149, 251, 374 9, 31, 70 19, 30, 44

Table 2: Summary statistics for the CDT dataset per user showingthe unique values at the 25th, 50th, 75th percentiles. The datawas collected for the same continuous time period for every user.However, not every user made use of his or her devices every day.

Data Collection Procedure. Before we began the data col-lection we obtained approval from Columbia University’s In-stitutional Review Board (IRB). We built our data collectionsystem such that interested users could sign up on our projectwebsite, at which point we also took a device fingerprint foreach signed up device. We asked users to supply basic infor-mation on their demographics (e.g., age and gender), interests(e.g., finance, games, shopping) [36], and personas (e.g., avidrunners, bookworms, pet owners) [84]. In order to captureusers’ mobile and desktop history we provided them withbrowser extensions and an Android app that we developedfor automatically collecting such information.6 Details on thetypes of information that we collected are contained in Ap-pendix A. We do not have any indication that users behaveddifferently in our study than under real-world conditions.

We only signed up users of Android phones withAndroid’s native browser, Google Chrome, or the SamsungS-Browser. We did not support iOS or other operatingsystems. Our app requires Android 4.0.3 and runs withoutroot access. Every minute it checks for a new foreground apprunning on the device as well as new entries in the browsinghistory database of the phones’ browsers. If new apps orURLs are detected, a new history datapoint is transmittedto our server.7 On the desktop side we provided users of alloperating systems with data collection browser extensionsfor Google Chrome, Mozilla Firefox, and Opera. At theconclusion of the study we rewarded each user with anAmazon gift card for $15 to $50 depending on the amountof data we received from them.Dataset Characteristics. We collected data from 126 users.Tables 1 and 2 show further details. We signed up 125desktop and 108 mobile users with an intersection of 107

6When we refer to desktops, we include laptops but exclude tablets.7For some users with Google Chrome and Android 6.0 or higher we

did not receive the full browsing history due to browser restrictions. Weasked affected users to send us their history manually.

USENIX Association 26th USENIX Security Symposium 1395

Page 7: A Privacy Analysis of Cross-device Tracking

users from whom we obtained both mobile and desktop data.While our data faithfully represents that not every Internetuser has multiple devices, it does not reflect that users in thereal world can have more than two devices. However, despitethis limitation we believe that our dataset is generally anaccurate reflection of real multi-device usage on the Internetbecause the vast majority of mobile devices is associated withonly one desktop browser [71]. Therefore, it seems plausibleto adopt this understanding of the problem here as well.Further, only 3/108 (3%) of mobile users and 4/125 (3%)of desktop users in our study reported that they are sharingtheir devices. As this result seems in line with findings thatphones are never shared for mutual use and that computersare only shared for a moderate amount [58], it appears thatour data is a realistic representation in this regard.

118 users in our study were affiliates of Columbia Uni-versity, mostly students. Based on this population we believethat our data is more homogeneous than a data from, say, thegeneral population of New York City. However, we also notethat our users are less likely to encounter typical restrictionsof device use that many employees face in the workplace,e.g., corporate networks blocking certain websites. For themedian user we collected about three weeks of data of whichIP addresses and domains are of particular importance forprobabilistic cross-device tracking because they can be usedto measure the similarity between devices (§ 5.2).

It is noteworthy that the total unique mobile IP count(5,784) is nearly three times the total unique desktop IPcount (1,994), which reflects mobile usage on the go. Itshould be noted, though, that the real unique mobile IPcount is likely even higher as our method did not allow usto collect mobile IPs with every datapoint. However, thehigh number of unique desktop domains (23,517), comparedto the homogeneous usage of apps (845), underscores thediversity of desktop browsing. While it is much more diversein terms of domains (3,876), mobile web usage pales incomparison to app usage. As shown by the 25th, 50th, and75th percentiles, the median user accessed the mobile webonly for 17 days visiting only 31 unique domains.8 Whileapp usage is more popular with a median of 22 days, themedian usage of 30 unique apps is comparable to that of themobile web. However, the median number of unique mobileIP addresses (63) more than triples desktop IP addresses (17).

Figure 4 shows that many users visit a relatively largenumber of unique mobile device IP addresses and desktopweb domains. However, there does not seem to be acorrelation between desktop and mobile devices to the effectthat lower usage of one would imply more usage of the otheror that both are used to an equal degree.

8A day was counted if a user’s device had at least one desktopweb, mobile web, or app access on a given day. Also, uniqueness ofa domain is dependent on its top and second level. Thus, for example,we treat facebook.com and linkedin.com as different domains, however,linkedin.com and blog.linkedin.com as the same domain.

0

50

100

150

200

250

Peggy DonUser

Uni

que

IP A

ddre

ss C

ount Type

Desktop IP

Mobile IP

0

250

500

750

1000

Peggy Don

User

Uniq

ue W

eb D

om

ain

Count

Type

Desktop Web

Mobile Web

<56 <10

Figure 4: Unique IP address (top) and web domain (bottom) countfor each user in our dataset for whom we had both mobile anddesktop data. For example, Peggy has 82 unique mobile and 35unique desktop IP addresses (top). To the right of Peggy about twothirds of users visited fewer than 56 unique mobile domains andto the right of Don about a fourth visited fewer than ten (bottom).

5 Methods for Cross-device Tracking

How cross-device companies operate is not known indetail [49]. In order to get an understanding of theircapabilities we designed an algorithm and evaluated featuresand parameters informed by a review of public materials,particularly, Adelphic’s cross-device patent [83] and Tapad’spatent application for managing associations between deviceidentifiers [80]. Essentially, cross-device tracking is basedon resolving two tasks: first, uniquely identifying users’devices (§ 5.1), and, second, correlating those that belongto the same user (§ 5.2).

5.1 Identifying Devices

Traditionally, HTTP cookies are used to identify desktopdevices. Indeed, many cross-device companies are employ-ing cookies for their tracking purposes as well. For mobiledevices the use of advertising identifiers, such as Google’sAdvertising ID (AdID), is common and often combinedwith cookie tracking. Thus, if users are allowing cookiesand do not opt out from being tracked, both their mobileand desktop devices can be easily identified. However,with the surge of tracking- and ad-blocking software, whichsome consider a mainstream technology on mobile bynow [68], unconventional identification technologies, suchas device fingerprinting, are becoming more prevalent.While it does not appear that they will generally replacecookies and advertising identifiers any time soon, variouscross-device companies—for example, BlueCava [9] andAdTruth [28]—are making use of device fingerprinting.

1396 26th USENIX Security Symposium USENIX Association

Page 8: A Privacy Analysis of Cross-device Tracking

Desk Devices Mob DevicesH, Hn, H H, Hn, H

User Agent 4.46, 0.64, 4.96 6.43, 0.95, 8.5Display Size/Colors 5.34, 0.77, 6.08 1.72, 0.25, 2.08Fonts 6.11, 0.88, 7.33 1.21, 0.18, 1.33Accept Headers 2.86, 0.41, 3.29 2.34, 0.35, 3System Language 0.41, 0.06, 0.51 0.81, 0.12, 1Time Zone 0.25, 0.04, 0.35 0.53, 0.07, 0.74Mobile Carrier N/A 1.39, 0.21, 1.45Do Not Track Enabled 0.67, 0.1, 0.67 0.18, 0.03, 0.19Geolocation Enabled 0.45, 0.07, 0.45 1, 0.15, 1Touch Enabled 0.72, 0.1, 0.72 N/ATotal per Device Type 6.96, 1, 12.95 6.69, 0.99, 10.87Total 7.84, 1, 13.37

Table 3: Entropy (H), normalized entropy (Hn), and estimatedentropy (H) for various browser features in our CDT dataset. Thenormalized entropy ranges from 0 (all features are the same) to1 (all features are different). We calculated the estimated entropyaccording to Chao and Shen [16]. For the totals we consideredall listed features. Overall, our dataset contains 3 duplicate mobilefingerprints and 1 duplicate desktop fingerprint.

Cross-device companies that are solely relying on devicefingerprinting must be able to identify both desktop andmobile devices using this technique. While it was reportedthat device fingerprints generally do not work well onmobile devices [25], our results do not support such broadconclusion. Particularly, mobile user agents often containdistinctive features and are far more diverse (6.43 bits) thanuser agents on desktops (4.46 bits). Also, the entropy inour dataset only represents a lower bound as we imposedsubstantial limitations for users’ participation in our study;most notably, requiring them to have an Android phone withAndroid 4.0.3 or higher and use the native browser, Chrome,or S-Browser. We also did not consider, for instance, canvasfingerprinting [1], sensor data [19], or the order in whichfonts and plugins were detected [25]. However, most mobiledevices in our dataset were still identifiable. The detailedfindings for the 107 mobile and 126 desktop devices in ourCDT dataset are shown in Table 3.9 Due to the small sizeof our dataset we caution to interpret our results as indicativefor the reliability of mobile device fingerprinting, though.

5.2 Correlating DevicesAfter uniquely identifying each device cross-device com-panies must match those that appear similar. Successfullymatching devices at scale is the core challenge for cross-device companies. Devices are represented in graphs knownas Device Graphs [76], Connected Consumer Graphs [22], orunder similar proprietary monikers. From a graph-theoreticalperspective a device graph can be built from connected

9One user did not submit a mobile fingerprint and another usersubmitted two different desktop fingerprints.

Figure 5: Our cross-device tracking approach. A. First, a mobile de-vice is identified. B. Its similarity to each identified desktop device,s, is calculated. C. The mobile-desktop pair with the maximumsimilarity, max, that is above a similarity threshold, t, is determined,if any. D. If such pair exists, it is added to the device graph andthe next iteration starts with a new mobile device. This routine isrepeated in three consecutive stages each evaluating similaritiesbetween mobile and desktop IP addresses, mobile and desktopURLs, and mobile apps and desktop URLs, respectively. If a mobiledevice cannot be matched in one stage due to not overcoming thesimilarity threshold, a match is attempted in the next.

components (each of which represents a user) with amaximum number of vertices (devices) and edges (deviceconnections) [18]. Matching every mobile with exactly onedesktop device will result in a bipartite graph. The goal isto achieve a perfect matching of similar devices.Algorithm, Features, and Parameters. While determin-istic cross-device companies can simply match a user’sdevices based on his or her login information, which mayalso extend towards third party properties through singlesign-on functionality, achieving a high match rate is moredifficult for probabilistic cross-device tracking companies.In the Drawbridge competition [23] many participantsapplied gradient boosting [48, 50, 53, 62, 71]. However,some participants also combined support vector machinesand factorization models into field-aware factorizationmachines [75] or employed pairwise ranking and ensemblelearning techniques [14]. Interestingly, the best performingsolution relied on learning-to-rank models instead of usingthe more conventional binary classification models [82].

In our approach, as outlined in Figure 5, we determine thesimilarity between devices based on distance metrics, mostnotably, the Bhattacharyya coefficient, which is defined forthe distributions p and q as Bhatta(p,q)=∑x∈X

√p(x)q(x).

The use of distance metrics for device correlation wasdescribed in Adelphic’s cross-device patent [83]. Ourcross-device tracking algorithm works in multiple stages.Using a key insight from the patent, for each feature a

USENIX Association 26th USENIX Security Symposium 1397

Page 9: A Privacy Analysis of Cross-device Tracking

Sim Feature Mapping Distance Metric Sim Thresh Mean Sim nq Acc Prec Rec F-1Stage 1 Mob IPs to Desk IPs Bhatta 0.07 0.33 44 0.61 1 0.63 0.77Stage 2 Mob URLs to Desk URLs Bhatta’ 0.13 0.18 44 0.52 0.85 0.59 0.7Stage 3 Mob Apps to Desk URLs Bhatta* 0.02 0.11 44 0.16 0.19 0.5 0.27Stages 1–3 Same as in Individual Stages above 0.33, 0.16, 0.03 44 0.84 0.88 0.95 0.91

Table 4: Test set results. The first three rows show the results for running each stage individually. The fourth row shows the results forrunning the three stages consecutively. We normalized the Bhattacharyya coefficient (Bhatta) to a range between 0 (low similarity) and1 (high similarity). Bhatta’ denotes the exclusion of URLs in the Alexa Top 50 [4] and all columbia.edu URLs. Further, Bhatta* excludesthe most used 100 apps according to our training set. We selected the best similarity threshold (Sim Thresh) for each stage accordingto observations in our training runs. Mean Sim is the mean similarity across all 44 device pairs in the test set, and nq is the size of the test set.

similarity threshold is set. If a threshold is reached at onestage, a match is declared for the mobile-desktop device pairwith the highest similarity score. Otherwise, the algorithmcontinues to evaluate whether the similarity threshold fora different feature is reached in the next stage. To evaluatethe similarity between a mobile and a desktop device itcompares mobile to desktop IP addresses, mobile to desktopweb URLs, and mobile apps to desktop web URLs.Test Set Results. To test our approach we randomly sepa-rated the set of device pairs in our dataset into a training (nt =63) and a test set (nq=44). We used the former to tune ouralgorithm and features and held out the latter for performanceevaluation. As shown in Table 4, running all three stages ofthe algorithm consecutively on our test set leads to precision,recall, and F-1 scores of 0.88, 0.95, and 0.91, respectively.The F-0.5 score [50], which emphasizes precision over recall,reaches 0.91. In detail, we obtained 37 true positives (TP), 5false positives (FP), 0 true negatives (TN), and 2 false neg-atives (FN). These results are based on the usual definitions,i.e., accuracy, Acc=(TP+TN)/(TP+TN+FP+FN), pre-cision, Prec=TP/(TP+FP), recall, Rec=TP/(TP+FN),and F-1 score, F-1 =(2·Prec·Rec)/(Prec+Rec).

To make the matching more difficult we included ineach run of our algorithm in every stage data from users forwhich we only had data from one device type: data fromone user who only submitted mobile data and from 18 userswho only submitted desktop data. Further, our results arebased on modeling device correlation as binary classification.Specifically, for each correct match between a user’s mobileand desktop device we counted a true positive. For eachincorrect match we noted a false positive. If a mobile devicewould have no corresponding desktop device it would havebeen counted as a true negative if it remained unmatched.However, as there was only one such instance in our test setand that mobile device was actually matched, we countedit as false positive. A false negative means that an instanceshould have been matched, however, remained unmatched.

Running the three stages of our algorithm consecutivelyleads to approximately balanced results for precision (0.88)and recall (0.95), as shown in Table 4. However, whenrunning the stages individually, we obtain relatively higherprecision and lower recall in the first two stages and lower

Stage 10.00

0.25

0.50

0.75

1.00

>0 >1 >2 >3 >4 >5 >6 >7 >8 >9Threshold

Pre

cisi

on

Bhatta Cosine Jaccard

Stage 1

0.0

0.2

0.4

0.6

0.8

>0 >1 >2 >3 >4 >5 >6 >7 >8 >9Threshold

Rec

all

Bhatta Cosine Jaccard

Stage 20.4

0.6

0.8

1.0

>0 >1 >2 >3 >4 >5 >6 >7 >8 >9Threshold

Pre

cisi

on

Stage 2

0.00

0.25

0.50

0.75

>0 >1 >2 >3 >4 >5 >6 >7 >8 >9Threshold

Rec

all

Stage 30.0

0.1

0.2

0.3

0.4

0.5

>0 >1 >2 >3 >4 >5 >6 >7 >8 >9Threshold

Pre

cisi

on

Stage 3

0.0

0.2

0.4

0.6

>0 >1 >2 >3 >4 >5 >6 >7 >8 >9Threshold

Rec

all

Bhatta Cosine JaccardF-1, Sim Thresh F-1, Sim Thresh F-1, Sim Thresh

Stage 1 0.84, 0.1 0.83, 0.1 0.73, 0Stage 2 0.74, 0.2 0.59, 0.1 0.67, 0Stage 3 0.29, 0.1 0.29, 0.4 0.12, 0

Figure 6: Precision and recall for matching devices based on variousdistance metrics and thresholds. The table shows the best F-1 scoresand their corresponding similarity thresholds. The features are thesame as described in the respective stages in Table 4. However,the evaluation is performed here on the full dataset. For higherthresholds recall scores tend to decrease while precision scorestend to increase (except when they exclude too many true positives).Overall, the Bhattacharyya coefficient returns the best results.

precision and higher recall in the third stage. This differencehighlights the tradeoff between achieving correct matches(precision) and broad user coverage (recall). While it ischallenging to improve one without adversely affecting theother [49,73], the similarity thresholds provide the controlsfor adjustment. Figure 6 shows changes in precision andrecall for different similarity thresholds and distance metrics.

The high precision scores of Drawbridge (0.97 [22]) andTapad (0.91 [77]) seem to suggest that the industry favorsprecision over recall.10 However, there is also an argument

10We interpret Tapad’s usage of the term accuracy to mean precision(“[W]henever our Device Graph indicated a relationship between two ormore devices, it was accurate 91.2 percent of the time.”).

1398 26th USENIX Security Symposium USENIX Association

Page 10: A Privacy Analysis of Cross-device Tracking

to be made against emphasizing precision: some devicemismatches may be irrelevant. Particularly, we believe thatmismatches might happen for people living in the samehousehold (in case of mobile IP to desktop IP similarity) orindividuals having the same interests (in case of web domainand app to web domain similarity). In these situations amismatched device might still be a meaningful ad target [24].The reason is that targeted purchase decisions might bemade at the household level or look-alike audiences mightbe sufficiently valuable for an ad network [80].

Our results show that IP addresses are very meaningfulfor matching devices, which is in line with Cao et al.’sfindings [14]. They reached an average F-0.5 score of 0.86in the Drawbridge competition [23] using only featuresfrom IP address data. However, beyond this finding ourresults further suggest that visited web domains are a goodindicator for device similarity as well. In fact, there mightbe situations in which they can be more revealing than IPaddresses. For example, if users of the same household sharean IP address, their devices can not be distinguished basedon this feature. Also, while the correlation between appsand desktop domains does not contribute as much as theIP address and domain correlations, it still provides somemeaningful signal as the results for the individual run ofthe third stage in Table 4 demonstrate. Most importantly,however, performance seems to increase if multiple featuresare applied consecutively. Some users can be better matchedbased on IP addresses and others on web domains or apps.

We note that we leveraged a manual mapping betweenapps and desktop domains via company names or othercommon identifiers thereby transforming a feature withminimal effect in our dataset and the Drawbridge competi-tion [14,48,50,53,62,71,75,82] to a useful feature. Similarly,the domain mapping proved to be useful as well due to users’visits to the same domains across devices. These resultshighlight that cross-device matching is not completely relianton IP matching, as suggested by the results in the Drawbridgecompetition. Our results seem to confirm the conjecture thatcarefully hand-crafted similarity features are of paramountimportance while algorithms play a smaller role for the taskof correlating mobile and desktop devices [82].

We experimented with various other features thatultimately did not prove useful. In particular, an algorithmleveraging system language and time zone did not matchdevices better than random. We also tried excluding sets offrequently used public IP addresses. However, different fromexcluding domains and apps, which, as described in Table 4,proved to be beneficial, this measure did not lead to betterperformance. We further tried different matching thresholdsand evaluated various distance metrics as shown in Figure 6.In future work it would be interesting to examine the extentto which the time, order, and duration of app and url accessplay a role for device correlation. E-mail and other messagecontent is an obvious candidate for a useful feature as well.

Applicability to Larger Datasets. With a runtime ofO(n(n−1)/2) our algorithm is suitable for large scale anal-ysis. However, it is obvious that our dataset is many orderssmaller than the data that cross-device companies are usuallyworking with. This difference in size begs the question towhich extent our findings are applicable to larger datasets.For the similarity of IP addresses this question was alreadyreliably answered. The Drawbridge competition results, forinstance, by Landry et al. [53], are based on a set of about62K mobile devices and confirm the meaningfulness of IPfeatures. For web domain features the situation is differentas the Drawbridge data did not contain those for mobiledevices. However, we can make an argument that lends somesupports for the applicability of our results to larger datasets.

Whether web data can be correlated across devicesrests on two premises: first, users visiting an intersectingset of domains on both their mobile and desktop devicesand, second, domains being sufficiently distinct to allowidentification of users. To examine the first premise werandomly selected 50 U.S. domains out of the top 5K sitesthat were quantified by Quantcast [70] and found a mean of17.1% users visiting a website both on a mobile and desktopdevice (during a 30-day period and at the 95% confidencelevel with a lower bound of 14.4% and an upper boundof 19.5% using the bootstrap technique). As to the secondpremise, it was shown for a set of about 368K desktopand mobile Internet users that 97% of them were uniquelyidentifiable if at least four visited websites were known.11

Limitations. It would be an interesting exercise to compareour techniques against those currently in use in industry.However, we are not aware of any publicly availableresources allowing us to do so. The same is true forcross-device tracking datasets. To our knowledge, thereis no dataset publicly available beyond the CDT datasetthat we created. The only other cross-device trackingdataset we know of was made available by Drawbridgeto participants of the Drawbridge competition solely forcompetition purposes [23]. However, even if this datasetwould be available, it would only allow an incompleteanalysis, particularly, as features were generally anonymizedand mobile web history was not included in the dataset.Consequently, at this point it does not seem possible tocompare our approach to others or evaluate its performanceon a different dataset. However, as we implemented keydesign elements that we found in available industry materials,we believe that our results provide a first approximation forcross-device tracking approaches applied in practice.

There are various considerations of identifying and corre-lating devices in practice that we cannot meaningfully test.A first point concerns the time period for which users arebeing tracked. We believe that the three weeks of data thatwe have available for most users (Table 2)—the concrete

11All users in our dataset who visited at least one mobile and one desktopwebsite had unique web histories as well.

USENIX Association 26th USENIX Security Symposium 1399

Page 11: A Privacy Analysis of Cross-device Tracking

length depending on the number of days on which they usedtheir devices—are realistic. However, we lack the insightfor which duration cross-device tracking actually occurs inpractice. Also, despite some cross-device companies’ broadcoverage of websites and apps (§ 7), none of them has accessto complete IP, web, and app data of Internet users. However,ultimately this limitation is one of reach and not of perfor-mance. By setting similarity thresholds high even companieswith limited data can obtain precise results, albeit, at the costof low recall. Further, our dataset does not contain full IP his-tories either. In addition, our data is probably more homoge-nous than real data, and, thus, more difficult to assess. Usersin our study were mostly students located in a confined spacewith many commonly shared web domains and IP addresses.

6 Learning from Cross-device Data

In this section we examine whether cross-device data enablescross-device companies to make more accurate predictionsthan they could make using data from individual devicesalone. We address this question for two inquiries: users’ inter-est in finance—a randomly selected interest category—andgender. Both are relevant ad targeting criteria. For interest infinance we obtained the most accurate predictions by usingdata from both mobile and desktop devices. Consequently,this is a task in which predictions about a user fromcross-device data appear more privacy-invasive than thosefrom single device data sources. However, as we also founda lack of performance increase for predicting a user’s gender,it appears that some prediction tasks might not become moreaccurate with the availability of cross-device data.Predicting Interest in Finance. As a starting point forour feature creation we used Alexa category rankings [5]and Google Play store categories [38] to identify the top25 finance domains that have both a website and an app.Then, we used the Weka machine learning toolkit [40] toexplore the potential for predicting interest in finance. Weexperimented with various features and all available standardalgorithms. We used a word-to-vector preprocessor andfound logistic regression to be the most effective technique.Due to the class imbalance of only 23% users in our datasetexpressing an interest in finance we ran logistic regressionas a cost-sensitive classifier increasing the cost for a falsepositive on average 1.5 times over the cost for a falsenegative. Our results, which are shown in Figure 7 andbased on 10-fold cross validation, suggest that predicting aninterest in finance for users in our dataset is more accurateif both desktop and mobile data are available.

In particular, predicting from mobile data alone proved tobe the weakest option. One reason seems to be that we onlyhad 90 features from the mobile data compared to 106 and107 for the desktop and combined data, respectively. Usingdesktop data only we tried to increase the performance tothe level of the combined mobile and desktop data, which

False Positive Rate

Tru

e P

osit

ive R

ate

A. Mob - 91 Features

False Positive Rate

Tru

e P

osit

ive R

ate

B. Desk - 107 Features

False Positive Rate

Tru

e P

osit

ive R

ate

C. Desk - 2,923 Features D. Mob & Desk - 108 Features

False Positive Rate

Tru

e P

osit

ive R

ate

Acc 95% CI Prec Rec F-1 ROCA. 0.64 0.55–0.73 0.26 0.22 0.24 0.5B. 0.75 0.67–0.83 0.5 0.52 0.51 0.68C. 0.79 0.71–0.87 0.57 0.59 0.58 0.75D. 0.83 0.76–0.9 0.68 0.63 0.65 0.79

Figure 7: Logistic regression for predicting interest in finance frommobile web domains and apps (Mob) and desktop web domains(Desk). 95% CI designates the binomial proportion confidenceinterval for the accuracy at the 95% level assuming a normaldistribution. The F-1 score based on features from both types ofdata (Mob & Desk - 108 Features) is higher than the scores obtainedusing mobile and desktop data individually (even with morefeatures as in Desk - 2,923 Features). We observed similar resultsfor value shoppers with F-1 scores of 0.17 (Mob - 85 Features),0.25 (Desk - 99 Features), and 0.41 (Mob & Desk - 104 Features).

reached an F-1 score of 0.65. However, we were only able toobtain an F-1 score of 0.58 by substantially increasing the fea-ture space to 2,923 features, at which point we saw no furtherimprovement. Combining desktop and mobile data and lever-aging 107 features outperformed all other approaches. TheROC curves in Figure 7 visualize this finding. The predic-tions that users have an interest in finance are shown in orangewhile the negative predictions for not having an interest infinance are displayed in blue. For the latter the F-1 scores are:Mob - 91 Features: 0.77, Desk - 107 Features: 0.83, Desk -2,923 Features: 0.86, and Mob & Desk - 108 Features: 0.89.Predicting Gender. While the predictive performance ofa user’s interest in finance increased with the availabilityof both desktop and mobile data, it appears that suchimprovement does not necessarily hold for all classificationtasks. Particularly, classifier performance for the predictionof gender from combined desktop and mobile data wasnot better than the performance using desktop data alone.Applying logistic regression with 10-fold cross validation weobtained identical scores for precision, recall, and F-1 withvalues of 0.82, 0.81, and 0.82, respectively. It did not make adifference whether mobile data was added to the desktop dataor not. This result suggests that for some tasks the availabilityof cross-device data does not lead to better predictions.Impact of Device Usage Patterns. What could be the rea-son for the differing utility of cross-device data in the twoprediction tasks? Subject to the results of further experiments

1400 26th USENIX Security Symposium USENIX Association

Page 12: A Privacy Analysis of Cross-device Tracking

12/25 (48%) 2/10 (20%)

7/10 (70%)

13/25 (52%) 1/10 (10%)0%

25%

50%

75%

100%

Interest in Finance Value Shoppers

Internet Access Type

Desk

Mob

Mob & Desk

Figure 8: Mobile and desktop device usage patterns. Some users inour dataset access finance and value shopping domains only fromtheir desktop or mobile device (i.e., from a mobile website or app).

it seems that having both mobile and desktop data availablecan be an advantage for predictions that rely on features ex-hibited on one device type only. We did not only observesuch patterns for users with an interest in finance but alsofor value shoppers—a randomly selected persona category.For both interest in finance and the value shopper personawe evaluated to which extent users respectively accessed thetop 25 finance and value shopping domains on their mobileand desktop devices. Our results, illustrated in Figure 8,support the conclusion that having data available from bothmobile and desktop devices increases the chances of captur-ing (more) salient features for the aforementioned predictions.For example, an ad network without access to desktop datawould have difficulty to make correct classifications for usersthat only access respective domains on their desktop device.We note that the observed patterns are based on a smallnumber of users. Thus, further investigation is warranted.Absence of Device Usage During our study we realized thepossibility of making predictions about users who do notmake use of their devices. Predicting a user’s Jewish religionserves as an illustrative example.12 Obviously, religious webdomains and apps can be meaningful features for predictingadherence to a particular faith. However, such predictions arealso possible based on subtler user behaviors. Most notably,as the data collection of our study covered the last two days ofthe Jewish Passover holiday we noticed that a few users in ourstudy did not use either of their devices as the Jewish faith pre-scribes abstinence from using electronics. Among all users inour study the pattern of holiday observation became obvious.This signal was especially clear from the insight into multipledevices. While some users did not use one of their devices,only those observant of Passover did not use both. This ex-ample illustrates that device activity as such can be a usefulpredictor that might be exploitable by cross-device tracking.

7 The Scope of Cross-device Tracking

The scope of cross-device tracking on the Internet is yetto be explored. For example, through their integration intomany websites and apps Facebook and Google appear tohave vast reach into the various devices of their users aswell as the ability to deterministically match those [73].

12As we obtained this result by chance we confirmed that its publicationis covered by applicable IRB regulations of Columbia University.

However, the percentage to which a typical Internet useris tracked across devices by those and other cross-devicecompanies is not known. We examine this question for theusers in our dataset based on a procedure for detecting thepresence of cross-device trackers in their browsing and apphistories (§ 7.1) and analyzing their occurrence accountingfor industry collaborations and consolidation (§ 7.2).

7.1 Detecting Cross-device Trackers

Procedure. We examined the trackers on the websites thatthe users in our study visited by automating a Firefox browserwith Selenium [74]. The browser included Lightbeam [63]and User Agent Switcher [64] browser extensions that al-lowed us to record the trackers on each domain for both mo-bile and desktop websites. Third party trackers that we foundin a subdomain were added to the domain, however, not viceversa. Thus, for example, the domain linkedin.com containsall trackers that we found on blog.linkedin.com but not theother way around. To identify trackers inside of apps weselected a total of 153 third party software development kits(SDKs) listed on AppBrain [6] encompassing SDKs of adnetworks (e.g., Smaato), social networks (e.g., Twitter), andanalytics services (e.g., comScore). Leveraging AppBrain’sstatistics on the inclusion of SDKs in apps we then deter-mined which SDKs are included in the apps of our dataset.

We qualify a company as cross-device company based onour detection of their trackers on both mobile and desktopdomains, the former including apps, and their websites’claims that they indeed perform cross-device tracking. Weidentified cross-device trackers by using Whois domainsearches and tracker blocking lists, especially, the list of theBetter tracker blocker [42]. For some companies—GoogleAdSense, Rubicon Project, Skimlinks, Tapad, and Lotame—our information flow experiment (§ 3) provides empiricalsupport for qualifying them as cross-device companies. Itwould be interesting to extend this or a similar informationflow experiment towards other companies.Lower Bound. Our approach for detecting trackers shouldbe understood as a lower bound for various reasons.First, trackers not identified in Lightbeam will remainundetected. The same is true for SDKs not included in thepre-defined set of 153 SDKs from AppBrain. Further, weonly detect app tracking via SDKs and do not account forAndroid WebViews and app-internal browsers that couldcontain tracking cookies or other traditional online trackingmechanisms [18]. We believe that these technologies warrantsubstantial further investigation as we suspect that a largeamount of trackers make use of them.Limitations. It is a limitation of our crawl that somewebsites in our dataset were not accessible (e.g., sites thatrequired a user login). In some cases our crawl was alsoredirected or the requested page was not found. However,these limitations only affected few URLs. Also, it should

USENIX Association 26th USENIX Security Symposium 1401

Page 13: A Privacy Analysis of Cross-device Tracking

G = Google AnalyticsG* = Google Display

F = FacebookT = Twitter

C = comScore

A = AtlasL = LinkedIn

R = Rubicon ProjectA* = Advertising.com

L* = Lotame

D = DrawBridgeS = SkimlinksT* = Tapad

B = BlueCavaS* = Smaato

10

20

30

40

50

GG* F AT C L R T*A* L* SD B S*

Ad Networks & Analytics Services

Trac

king

Cov

erag

e (in

%) Type

Desktop Web

Mobile Apps

Mobile Web

Figure 9: As the ad ecosystem in general, cross-device tracking is characterized by a few large companies with extensive reach and along tail of smaller companies, some of which are focusing on the mobile space (e.g., Smaato with 2% mobile and 0.2% desktop coverage).

9732 3243 2571 81 1240

2500

5000

7500

10000

Desk Web Mob Web Desk & Mob Web Mob Apps Cross−deviceTracker Type

Trac

ker

Cou

nt

Figure 10: Total unique third party trackers in our dataset. We found124 cross-device trackers that belong to 87 different companies.

be noted that we crawled the sites about a month after wefinished collecting data from the study participants. Thus, inthe meantime, some websites might have different trackersthan at the time they were actually visited. Ideally, it wouldhave been possible to capture the trackers live from our users’devices during the study. However, such collection is difficultdue to the constraints of the Android environment, mostnotably, the sandboxing of mobile browsers. In addition,our mobile tracker count may be off as we did not use realmobile devices but instead a spoofed desktop browser.

7.2 Cross-Device Tracking AnalysisAs shown in Figure 10, websites accessed from desktopand mobile devices contained a respective total of 9,732 and3,243 unique third party trackers. 2,571 trackers were on bothdesktop and mobile websites; Brookman et al. [11] found861 such trackers. Out of the 153 SDKs from AppBrainwe found 81 in our dataset.13 From these sets of third partytrackers we identified 124 cross-device trackers; 118 trackersthat appeared on both mobile and desktop websites and 6SDKs that are associated with a desktop tracker as well. Wefound that the 124 cross-device trackers belong to 87 differentcompanies. It appears that 22 follow a deterministic approach,39 use probabilistic techniques, and 26 leverage both.Tracking of the Average User in our Dataset. On averageeach user is tracked across devices on his or her desktopin 67% of all desktop website visits. We measured asimilar average for mobile web visits with 64%. These

13The app tracker count includes affiliated company’s SDKs. Thus, for ex-ample, the Facebook SDK inside the Instagram app is counted as one tracker.

Desk Mob AppsGoogle Analytics (D) 58% 43.6% 5.1%Google Display (D,P) 51.6% 41.7% 13.1%Facebook (D) 27.8% 27.7% 20.3%Atlas (Facebook) (D,P) 7.8% 2.4% N/A*Facebook & Atlas (D,P) 27.9% 29.1% 20.3%Twitter (D) 11.9% 6.6% 0.7%comScore (D,P) 11.3% 15.1% 1.7%LinkedIn (Microsoft) (D) 4.9% 1.9% N/A*Rubicon Project (P) 4.6% 5.8% N/A*Tapad (P) 1.1% 1.9% N/ARubicon & Tapad (P) 5.4% 6.7% N/A*Advertising.com (AOL) (P) 4% 3.5% N/ALotame (D,P) 2.7% 3.8% N/A*Skimlinks (D,P) 1.1% 1.6% N/ALotame & Skimlinks (D,P) 3.5% 5% N/A*Drawbridge (P) 1.2% 1.7% N/ABlueCava (P) 0.2% 0.5% N/ASmaato (D,P) 0.2% 2% 0.1%

Table 5: Cross-device companies’ (D = deterministic and P =probabilistic according to their websites’ claims) coverage ofwebsites and apps on average for the users in our dataset (n = 107).Some of the companies either do not seem to offer an SDK for appintegration (N/A) or we did not analyze it as it was not containedin our initial set of SDKs from AppBrain (N/A*). The full list,including the tracking server domains, is attached in Appendix B.

high percentages illustrate that cross-device tracking is abroadly occurring phenomenon. Table 5 shows the reachof individual companies. Google Analytics, Google Display,and Facebook can capture at least 20% of an average user’sonline traffic across devices. This percentage is aboutthe same that Roesner et al. [72] provided a few yearsago for tracking of individual devices. It is particularlynoteworthy that the companies with the broadest reach have adeterministic approach, which means that their cross-devicetracking is also very accurate. Figure 9 shows the trackingcoverage for the 87 cross-device companies we identified.

Partnerships between various cross-device companies ex-tend their reach. For example, Atlas receives user data fromFacebook [8] to track users deterministically. However, Atlas

1402 26th USENIX Security Symposium USENIX Association

Page 14: A Privacy Analysis of Cross-device Tracking

f = ft.com

b = bust.com

l = latimes.com

a = amny.com

t = ted.com

b* = biography.com

g = globo.com

o = observer.com

u = uol.com.br

e = ew.com

10

20

30

40

50

eo b lf b* g uta

Websites

Tra

cke

r C

oun

tType

Desktop Web

Mobile Web

Figure 11: The ten domains with the highest number of cross-device companies on their desktop websites (out of 1,829 total domains).It can be observed that they tend to have higher concentrations of cross-device companies on their mobile sites as well.

Desk Mob Rank-Countryew.com 52 4 466-USobserver.com 39 8 1,191-USlatimes.com 35 19 133-USbust.com 34 21 25,690-USft.com 31 22 166-UKglobo.com 30 9 5-BRbiography.com 29 12 1141-USted.com 26 13 635-USuol.com.br 26 8 N/Aamny.com 25 15 N/Agameofthrones.wikia.com 14 35 45-USandroidauthority.com 11 34 570-INfood.com 26 31 600-USsacbee.com 0 31 2,985-USsfgate.com 19 30 310-USphilly.com 15 29 908-USsouthcoasttoday.com 17 28 N/Anypost.com 17 28 154-USnytimes.com 18 28 48-USjalopnik.com 15 27 782-US

Table 6: Domains with the highest cross-device company counts outof 1,829 domains whose URL occurred in both mobile and desktopdata in our CDT dataset. With a total of 57 trackers (31 mobileweb and 26 desktop web) food.com had the highest count overall.

is intended to serve advertisements outside of Facebook’sreach and shares the data it collects with Facebook as well [8].Particularly, as shown in Table 5, Atlas’ cross-device trackersextend Facebook’s mobile web reach from 27.7% to 29.1%.As another example, the partnership between Lotame andSkimlinks [57], which we actually observed in our initialexperiment (§ 3), also extends their respective reach. Inthose cases the relationship between companies needs to beaccounted for to accurately determine their full coverage.Domains with Cross-device Company Concentration.It appears that media websites, in particular, websites ofnewspapers, contain the largest concentration of trackersfrom cross-device companies. Table 6 shows the top tendomains—separated for desktop and mobile websites—on

which we found the highest number of trackers from the87 identified cross-device companies. Coincidentally, itturns out that the website of the LA Times was a goodselection for our case study (§ 3) as it had trackers from 35cross-device companies on its desktop website.

Beyond the concentration of cross-device companies’trackers in the media category it is also striking that manywebsites that are hosting those trackers are fairly popularsites. Table 6 shows the Quantcast country rank accordingto the site’s traffic [70]. This placement of cross-devicetrackers on popular sites exposes them to large audiences.However, as it can be observed in Table 6 as well, the showndomains contain a maximum number of trackers fromcross-device companies on either their mobile or desktopsites but not on both. This finding holds in general. Whilethere is a tendency that domains that host many cross-devicecompanies on their desktop site also host many on theirmobile site, we could not find any statistically significantcorrelation in this regard. Figure 11 shows the distribution.

8 Does Self-Regulation Work?

The FTC recommends that cross-device companies shouldbe transparent about their data practices [32]. While there areno specific statutes or regulations for cross-device tracking inthe U.S., the field is subject to self-regulation, most notablyby the Digital Advertising Alliance (DAA) and the NetworkAdvertising Initiative. The DAA requires its membercross-device companies to disclose “the fact that datacollected from a particular browser or device may be usedwith another computer or device that is linked to the browseror device on which such data was collected.” [21] In orderto examine the level of compliance with this transparencyrequirement we randomly selected 40 DAA member adnetworks that advertised their cross-device capabilities ontheir websites and analyzed their privacy policies.

We found that 23 disclosed their cross-device trackingactivities while 17 omitted those. After contacting the latter,we received a response from seven. Two pointed us to docu-ments that were linked from the policy that indeed contained

USENIX Association 26th USENIX Security Symposium 1403

Page 15: A Privacy Analysis of Cross-device Tracking

compliant descriptions. A representative from another cross-device company wrote that their cross-device functionality isnot yet fully rolled out to clients, and three others announcedthat they will change their policy (one of those still has tofollow through). Another representative simply claimed thatthe company is “not violating anything.” Without contactingus five further cross-device companies simply changed theirpolicies, of which four became compliant. Finally, there wasno reaction or policy change from five. As of June 9, 2017we count a total of eight instances of non-compliance.

Overall, it appears that there is a lack of transparencywhen it comes to the disclosure of cross-device tracking.At this point, the DAA guidance does not seem to beenforced rigorously. While it may be true that the majority ofconsumers will not take the time to understand the trackingpractices described in privacy policies,14 we think that it isstill a worthwhile endeavor for cross-device companies toproperly disclose their practices, particularly, for audit andenforcement purposes as well as for signaling trustworthinessto the marketplace and to build an environment of rules andnorms in privacy disclosure.

9 Conclusion

Cross-device tracking is an emerging tracking paradigm thatchallenges current notions of privacy. This study is intendedas a broad overview of selected privacy topics in mainstreamcross-device technologies. In a brief case study we havedemonstrated how cross-device tracking can be observedwith statistical confidence by means of an information flowexperiment. Using our own cross-device tracking datasetwe designed a cross-device tracking algorithm and evaluatedrelevant features and parameter settings grounded in areview of publicly available information on the practices ofcross-device companies. For some predictive tasks it appearsthat those companies can learn more about users than fromindividual device data. As the penetration of cross-devicetracking on the Internet already appears relatively high it iseven more important that companies active in this field aretransparent about their practices.

Going forward we hope that the various privacy implica-tions of cross-device tracking technologies will be studied fur-ther. In this regard, proprietary research is substantially aheadof current efforts in academia. While a few major points areknown—for example, that IP addresses are a crucial fea-ture for correlating devices—many important details on howcross-device companies operate remain opaque. To shedmore light on the subject we publicized our dataset togetherwith the software that we developed for further exploration.

14Using tracking protection software and ad blockers is a much more effi-cient approach from a user perspective. Thus, when evaluating cross-devicetracking in terms of a threat model, the most effective defense would beto block tracking. In this regard, the defenses against cross-device trackingare the same as the defenses against the tracking of individual devices.

As cross-device tracking continues to mature and become anintegral part of tracking on the Internet we believe that a com-prehensive view including legal and business considerationsis helpful. Establishing an enforceable self-regulatory frame-work for companies to be transparent about their practiceswill help to protect consumer privacy and allow cross-devicecompanies to conduct their businesses responsibly.

Ultimately, cross-device tracking is part of a largertrend: the Internet of Things (IoT). In this regard, wesee cross-device tracking as an early harbinger of theincreasing inter-connectivity of devices. Increasingly,buildings, cars, appliances, and other things are connectedto the Internet and are interacting with other online devices.However, the development and deployment of privacysolutions has to keep pace with the emerging IoT landscape.Ensuring transparency and practicable control mechanismsfor information that is traversing device boundaries andpermeates between the online and offline worlds is a criticalelement. Given standardized interfaces [43], perhaps, anintelligent personal privacy assistant that is connected to allservices and devices of person could be a solution.

Acknowledgment

We would like to thank the anonymous reviewers as well asour shepherd Alina Oprea for their helpful comments duringthe paper review phase. We further thank Clement Canonne,Paul Blaer, Daisy Nguyen, and Anupam Das. Tony Jebara issupported in part by DARPA (N66001-15-C-4032) and NSF(III-1526914, IIS-1451500). We also gratefully acknowledgethe Comcast Research Grant for our user study. The viewsand conclusions contained herein are our own.

References[1] ACAR, G., EUBANK, C., ENGLEHARDT, S., JUAREZ, M.,

NARAYANAN, A., AND DIAZ, C. The web never forgets: Persistenttracking mechanisms in the wild. In CCS 2014, ACM.

[2] ACAR, G., JUAREZ, M., NIKIFORAKIS, N., DIAZ, C., GURSES, S.,PIESSENS, F., AND PRENEEL, B. FPDetective: Dusting the web forfingerprinters. In CCS 2013, ACM.

[3] ADAR, E., TEEVAN, J., AND DUMAIS, S. T. Large scale analysis ofweb revisitation patterns. In CHI 2008, ACM.

[4] ALEXA. The top 500 sites on the web. http://www.alexa.com/

topsites/countries/US. Accessed: June 29, 2017.

[5] ALEXA. The top 500 sites on the web. http://www.alexa.com/

topsites/category. Accessed: June 29, 2017.

[6] APPBRAIN. Android library statistics. http://www.appbrain.

com/stats/libraries/. Accessed: June 29, 2017.

[7] ARP, D., QUIRING, E., WRESSNEGGER, C., AND RIECK, K. Pri-vacy threats through ultrasonic side channels on mobile devices. InEuroS&P 2017, IEEE Computer Society.

[8] ATLAS. Privacy policy. https://atlassolutions.com/

privacy-policy/, Apr. 2015. Accessed: June 29, 2017.

[9] BLUECAVA, INC. http://bluecava.com/. Accessed: June 29,2017.

1404 26th USENIX Security Symposium USENIX Association

Page 16: A Privacy Analysis of Cross-device Tracking

[10] BOOK, T., AND WALLACH, D. S. An empirical study of mobile adtargeting. CoRR abs/1502.06577 (2015).

[11] BROOKMAN, J., ROUGE, P., ALVA, A., AND YEUNG, C. Cross-device tracking: Measurement and disclosures. In PoPETs 2017, DeGruyter.

[12] CAI, X., NITHYANAND, R., AND JOHNSON, R. CS-BuFLO: Acongestion sensitive website fingerprinting defense. In WPES 2014,ACM.

[13] CAI, X., NITHYANAND, R., WANG, T., JOHNSON, R., AND GOLD-BERG, I. A systematic approach to developing and evaluating websitefingerprinting defenses. In CCS 2014, ACM.

[14] CAO, X., HUANG, W., AND YU, Y. Recovering cross-device con-nections via mining IP footprints with ensemble learning. In IEEEInternational Conference on Data Mining Workshop, ICDMW 2015.

[15] CAO, Y., LI, S., AND WIJMANSY, E. (Cross-)browser fingerprintingvia os and hardware level features. In NDSS 2016, Internet Society.

[16] CHAO, A., AND SHEN, T.-J. Nonparametric estimation of shan-non’s index of diversity when there are unseen species in sample.Environmental and Ecological Statistics 10, 4 (2003), 429–443.

[17] CHERUBIN, G., HAYES, J., AND JUAREZ, M. Website fingerprintingdefenses at the application layer. In PoPETs 2017, De Gruyter.

[18] CRITEO SA. Building the cross device graph at criteo.http://labs.criteo.com/2016/06/building-cross-

device-graph-criteo/. Accessed: June 29, 2017.

[19] DAS, A., BORISOV, N., AND CAESAR, M. Tracking mobile webusers through motion sensors: Attacks and defenses. In NDSS 2016,Internet Society.

[20] DEARMAN, D., AND PIERCE, J. S. It’s on my other computer!:Computing with multiple devices. In CHI 2008, ACM.

[21] DIGITAL ADVERTISING ALLIANCE. Application of the self-regulatory principles of transparency and control to data usedacross devices. http://www.aboutads.info/sites/default/

files/DAA_Cross-Device_Guidance-Final.pdf, Nov. 2015.Accessed: June 29, 2017.

[22] DRAWBRIDGE, INC. http://www.drawbrid.ge/. Accessed:June 29, 2017.

[23] DRAWBRIDGE, INC. Drawbridge challenges scientific com-munity to better the accuracy of its cross-device consumergraph. https://drawbridge.com/news/p/drawbridge-

challenges-scientific-community-to-better-the-

accuracy-of-its-cross-device-consumer-graph. Ac-cessed: June 29, 2017.

[24] DSTILLERY, INC. A tale of two crosswalks. http://dstillery.

com/a-tale-of-two-crosswalks/. Accessed: June 29, 2017.

[25] ECKERSLEY, P. How unique is your web browser? In PETS 2010,Springer-Verlag.

[26] ENGLEHARDT, S., AND NARAYANAN, A. Online tracking: A 1-million-site measurement and analysis. In CCS 2016, ACM.

[27] ENGLEHARDT, S., REISMAN, D., EUBANK, C., ZIMMERMAN, P.,MAYER, J., NARAYANAN, A., AND FELTEN, E. W. Cookies thatgive you away: The surveillance implications of web tracking. InWWW 2015, International World Wide Web Conferences SteeringCommittee.

[28] EXPERIAN LTD. Device recognition by adtruth. http://www.

experian.co.uk/marketing-services/products/adtruth-

device-recognition.html. Accessed: June 29, 2017.

[29] FEDERAL TRADE COMMISSION. FTC cross-device track-ing workshop. https://www.ftc.gov/news-events/events-

calendar/2015/11/cross-device-tracking, Nov. 2015. Ac-cessed: June 29, 2017.

[30] FEDERAL TRADE COMMISSION. FTC cross-device tracking work-shop, segment 1, transcript. https://www.ftc.gov/system/

files/documents/videos/cross-device-tracking-

part-1/ftc_cross-device_tracking_workshop_-

_transcript_segment_1.pdf, Nov. 2015. Accessed: June29, 2017.

[31] FEDERAL TRADE COMMISSION. FTC issues warn-ing letters to app developers using ’Silverpush’ code.https://www.ftc.gov/news-events/press-releases/

2016/03/ftc-issues-warning-letters-app-developers-

using-silverpush-code, Mar. 2016. Accessed: June 29, 2017.

[32] FEDERAL TRADE COMMISSION. Cross-device tracking. An FTCstaff report. https://www.ftc.gov/reports/cross-device-

tracking-federal-trade-commission-staff-report-

january-2017, Jan. 2017. Accessed: June 29, 2017.

[33] GESELLSCHAFT FUR KONSUMFORSCHUNG. Finding simplic-ity in a multi-device world. https://blog.gfk.com/2014/

03/finding-simplicity-in-a-multi-device-world/, Mar.2014. Accessed: June 29, 2017.

[34] GOOGLE DISPLAY NETWORK. Where ads might appear inthe display network. https://support.google.com/adwords/

answer/2404191?hl=en. Accessed: June 29, 2017.

[35] GOOGLE, INC. General ad categories. https://support.google.

com/adsense/answer/3016459?hl=en. Accessed: June 29, 2017.

[36] GOOGLE, INC. Topics used for personalized ads. https://

support.google.com/ads/answer/2842480?hl=en. Accessed:June 29, 2017.

[37] GOOGLE, INC. The new multi-screen world study. https:

//www.thinkwithgoogle.com/research-studies/the-new-

multi-screen-world-study.html, Aug. 2012. Accessed: June29, 2017.

[38] GOOGLE PLAY STORE. https://play.google.com/store/

apps?hl=en. Accessed: June 29, 2017.

[39] GULYAS, G. G., ACS, G., AND CASTELLUCCIA, C. Near-optimalfingerprinting with constraints. In PoPETs 2016, De Gruyter.

[40] HALL, M., FRANK, E., HOLMES, G., PFAHRINGER, B., REUTE-MANN, P., AND WITTEN, I. H. The WEKA data mining software:An update. SIGKDD Explor. Newsl. 11, 1 (Nov. 2009), 10–18.

[41] HAYES, J., AND DANEZIS, G. k-fingerprinting: A robust scalablewebsite fingerprinting technique. In USENIX Security 2016, USENIXAssociation.

[42] IND.IE. Better tracker blocker. https://better.fyi/trackers/.Accessed: June 29, 2017.

[43] INFSO D.4 NETWORKED ENTERPRISE & RFID IMFSOG.2 MICRO & NANOSYSTEMS, RFID WORKING GROUPOF THE EUROPEAN TECHNOLOGY PLATFORM ON SMARTSYSTEMS INTEGRATION (EPOSS). Internet of thingsin 2020. http://www.smart-systems-integration.

org/public/documents/publications/Internet-of-

Things_in_2020_EC-EPoSS_Workshop_Report_2008_v3.pdf,Sept. 2008. Accessed: June 29, 2017.

[44] JUAREZ, M., AFROZ, S., ACAR, G., DIAZ, C., AND GREENSTADT,R. A critical evaluation of website fingerprinting attacks. In CCS2014, ACM.

[45] KAGGLE, INC. ICDM 2015: Drawbridge cross-device connec-tions. https://www.kaggle.com/c/icdm-2015-drawbridge-

cross-device-connections/data. Accessed: June 29, 2017.

[46] KAMVAR, M., KELLAR, M., PATEL, R., AND XU, Y. Computersand Iphones and mobile phones, oh my!: A logs-based comparison ofsearch users on different devices. In WWW 2009, International WorldWide Web Conferences Steering Committee.

USENIX Association 26th USENIX Security Symposium 1405

Page 17: A Privacy Analysis of Cross-device Tracking

[47] KANE, S. K., KARLSON, A. K., MEYERS, B. R., JOHNS, P.,JACOBS, A., AND SMITH, G. Exploring Cross-Device Web Use onPCs and Mobile Devices. Springer-Verlag, 2009, pp. 722–735.

[48] KEJELA, G., AND RONG, C. Cross-device consumer identification.In IEEE International Conference on Data Mining Workshop, ICDMW2015.

[49] KIHN, M. Cross-device identity: A data scientist speaks.http://blogs.gartner.com/martin-kihn/cross-device-

identity-a-data-scientist-speaks/, Oct. 2016. Accessed:June 29, 2017.

[50] KIM, M. S., LIU, J., WANG, X., AND YANG, W. Connecting devicesto cookies via filtering, feature engineering, and boosting. In IEEEInternational Conference on Data Mining Workshop, ICDMW 2015.

[51] KOHNO, T., BROIDO, A., AND CLAFFY, K. Remote physical de-vice fingerprinting. IEEE Transactions on Dependable and SecureComputing 2, 2.

[52] KURTZ, A., GASCON, H., BECKER, T., RIECK, K., AND FREILING,F. C. Fingerprinting mobile devices using personalized configurations.In PoPETs 2016, De Gruyter.

[53] LANDRY, M., S, S. R., AND CHONG, R. Multi-layer classification:ICDM 2015 drawbridge cross-device connections competition. InIEEE International Conference on Data Mining Workshop, ICDMW2015.

[54] LECUYER, M., DUCOFFE, G., LAN, F., PAPANCEA, A., PETSIOS,T., SPAHN, R., CHAINTREAU, A., AND GEAMBASU, R. Xray:Enhancing the web’s transparency with differential correlation. InUSENIX Security 2014, USENIX Association.

[55] LECUYER, M., SPAHN, R., SPILIOPOLOUS, Y., CHAINTREAU, A.,GEAMBASU, R., AND HSU, D. J. Sunlight: Fine-grained targetingdetection at scale with statistical confidence. In CCS 2015, ACM.

[56] LERNER, A., SIMPSON, A. K., KOHNO, T., AND ROESNER, F.Internet jones and the raiders of the lost trackers: An archaeologicalstudy of web tracking from 1996 to 2016. In USENIX Security 2016,USENIX Association.

[57] LOTAME SOLUTIONS, INC. Skimlinks and Lotame unleash en-hanced retail intent data. https://www.lotame.com/resource/

skimlinks-lotame-dmp/. Accessed: June 29, 2017.

[58] MATTHEWS, T., LIAO, K., TURNER, A., BERKOVICH, M.,REEDER, R., AND CONSOLVO, S. “She’ll just grab any devicethat’s closer”: A study of everyday device & account sharing in house-holds. In Proceedings of the ACM Conference on Human Factors inComputing Systems 2016.

[59] MAVROUDIS, V., HAO, S., FRATANTONIO, Y., MAGGI, F., VIGNA,G., AND KRUEGEL, C. On the privacy and security of the ultrasoundecosystem. In PoPETs 2017, De Gruyter.

[60] MENG, W., DING, R., CHUNG, S. P., HAN, S., AND LEE, W. Theprice of free: Privacy leakage in personalized mobile in-apps ads. InNDSS 2016, Internet Society.

[61] MENG, W., XING, X., SHETH, A., WEINSBERG, U., AND LEE,W. Your online interests: Pwned! a pollution attack against targetedadvertising. In CCS 2014, ACM.

[62] MORALES, R. D. Cross-device tracking: Matching devices andcookies. In IEEE International Conference on Data Mining Workshop,ICDMW 2015.

[63] MOZILLA. Lightbeam for Firefox. https://www.mozilla.org/

en-US/lightbeam/. Accessed: June 29, 2017.

[64] MYBROWSERADDON. User-agent switcher. http:

//mybrowseraddon.com/useragent-switcher.html. Ac-cessed: June 29, 2017.

[65] NIKIFORAKIS, N., JOOSEN, W., AND LIVSHITS, B. Privaricator:Deceiving fingerprinters with little white lies. In WWW 2015, Interna-tional World Wide Web Conferences Steering Committee.

[66] NIKIFORAKIS, N., KAPRAVELOS, A., JOOSEN, W., KRUEGEL, C.,PIESSENS, F., AND VIGNA, G. Cookieless monster: Exploring theecosystem of web-based device fingerprinting. In S&P 2013, IEEEComputer Society.

[67] OLEJNIK, L., CASTELLUCCIA, C., AND JANC, A. On the uniquenessof web browsing history patterns. Annales des Telecommunications69, 1-2 (2014), 63–74.

[68] PAGEFAIR. Adblocking goes mobile. https://pagefair.com/

downloads/2016/05/Adblocking-Goes-Mobile.pdf. Ac-cessed: June 29, 2017.

[69] PANCHENKO, A., LANZE, F., ZINNEN, A., HENZE, M., PEN-NEKAMP, J., WEHRLE, K., AND ENGEL, T. Website fingerprintingat internet scale. In NDSS 2016, Internet Society.

[70] QUANTCAST. Top sites. https://www.quantcast.com/top-

sites. Accessed: June 29, 2017.

[71] RENOV, O., AND ANAND, T. R. Machine learning approach toidentify users across their digital devices. In IEEE InternationalConference on Data Mining Workshop, ICDMW 2015.

[72] ROESNER, F., KOHNO, T., AND WETHERALL, D. Detecting anddefending against third-party tracking on the web. In NSDI 2012,USENIX Association.

[73] SCHIFF, A. 2016 edition: A marketers guide to cross-deviceidentity. https://adexchanger.com/data-exchanges/2016-

edition-marketers-guide-cross-device-identity/, Feb.2016. Accessed: June 29, 2017.

[74] SELENIUMHQ. Seleniumhq browser automation. http://www.

seleniumhq.org/. Accessed: June 29, 2017.

[75] SELSAAS, L. R., AGRAWAL, B., RONG, C., AND WIKTORSKI, T.AFFM: auto feature engineering in field-aware factorization machinesfor predictive analytics. In IEEE International Conference on DataMining Workshop, ICDMW 2015.

[76] TAPAD, INC. http://www.tapad.com/. Accessed: June 29, 2017.

[77] TAPAD, INC. Nielsen confirms tapad cross-device accuracyat 91.2%. http://www.tapad.com/news/blog/nielsen-

confirms-tapad-cross-device-accuracy-at-91-2, Dec.2014. Accessed: June 29, 2017.

[78] TAUSCHER, L., AND GREENBERG, S. How people revisit web pages.Int. J. Hum.-Comput. Stud. 47, 1 (July 1997), 97–137.

[79] TOSSELL, C., KORTUM, P., RAHMATI, A., SHEPARD, C., ANDZHONG, L. Characterizing web use on smartphones. In CHI 2012,ACM.

[80] TRAASDAHL, A., LIODDEN, D., AND CHANG, V. Managing as-sociations between device identifiers, May 16 2013. US Patent App.13/677,110.

[81] TSCHANTZ, M. C., DATTA, A., DATTA, A., AND WING, J. M. Amethodology for information flow experiments. In CSF 2015, IEEEComputer Society.

[82] WALTHERS, J. Learning to rank for cross-device identification. InIEEE International Conference on Data Mining Workshop, ICDMW2015.

[83] WANG, C., AND PU, H. Uniquely identifying a network-connectedentity, May 7 2013. US Patent 8,438,184.

[84] YAHOO! INC. Personas. https://web.archive.org/web/

20160728012832/https://developer.yahoo.com/flurry/

docs/analytics/lexicon/personas/. Accessed: June 29,2017.

[85] ZARRAS, A., KAPRAVELOS, A., STRINGHINI, G., HOLZ, T.,KRUEGEL, C., AND VIGNA, G. The dark alleys of madison av-enue: Understanding malicious advertisements. In IMC 2014, ACM.

1406 26th USENIX Security Symposium USENIX Association

Page 18: A Privacy Analysis of Cross-device Tracking

A Cross-device Tracking DatasetDevice Fingerprints (n=234)User Agent 1st Party HTTP Cookies EnabledBrowser Vendor 3rd Party HTTP Cookies EnabledBrowser Engine Do Not Track EnabledPlugins Installed TouchscreenOperating System Internet Connection TypeTime Zone LatencyScreen (Color Depth, etc.) Fonts InstalledSystem Language Local Storage EnabledAdobe Flash Version Session Storage EnabledMicrosoft Silverlight Version HTTP Accept HeadersJavaScript Enabled

App and Browsing Histories (n=233)IP Address Browser Tab IDBrowser Vendor Referrer URLDate URL/App Package IDTime URL TitleTime Zone 3rd Party Trackers/SDKs

Interest Questionnaires (n=126)Arts and Entertainment (68%) Beauty and Fitness (33%)Food and Drink (64%) Internet and Telecom (33%)Computers and Electronics (63%) Sports (29%)Science (62%) Online Communities (24%)News (60%) Finance (23%)Books and Literature (55%) Pets and Animals (23%)Jobs and Education (52%) Business and Industrial (21%)Games (43%) World Localities (15%)Travel (40%) Reference (13%)Law and Government (37%) Autos and Vehicles (11%)Shopping (36%) Home and Garden (11%)Hobbies and Leisure (34%) Real Estate (4%)People and Society (34%)

Persona Questionnaires (n=126)Music Lovers (47%) Hardcore Gamers (11%)Movie Lovers (46%) Photo and Video Enthusiasts (11%)Food and Dining Lovers (40%) Fashionistas (10%)Singles (39%) Personal Finance Geeks (10%)Bookworms (33%) Avid Runners (7%)Entertainment Enthusiasts (31%) Flight Intenders (6%)Tech and Gadget Enthusiasts (31%) Social Influencers (6%)Casual and Social Gamers (30%) Catalog Shoppers (5%)News and Magazine Readers (23%) Auto Enthusiasts (3%)Leisure Travelers (21%) Business Travelers (3%)Sports Fans (21%) Small Business Owners (3%)Health and Fitness Enthusiasts (20%) Home Design Enthusiasts (2%)Mobile Payment Makers (19%) Real Estate Followers (2%)Value Shoppers (18%) High Net Individuals (1%)Parenting and Education (15%) Mothers (1%)Pet Owners (14%) Home and Garden Pros (0%)Business Professionals (13%) New Mothers (0%)American Football Fans (11%) Slots Players (0%)

Age Groups (n=126)18–20 (18%) 31–35 (6%)21–25 (51%) over 35 (3%)26–30 (21%)

Gender (n=126)Women (34%) Men (66%)

B Cross-device TrackersType Desk Web Mob Web Mob Apps

33Across P 0.1 0.4 N/A- 33across.comAdbrain P 0.4 1 N/A- adbrn.comAddThis (Oracle) P 3.4 1.6 N/A*- addthis.com- addthisedge.comAdelphic D, P 0.1 0.7 N/A- ipredictive.comAdform D, P 0.4 1.3 N/A*- adform.net- adformdsp.netAdobe Marketing Cloud D 5.4 3.9 N/A*- 2o7.net- adobetag.com- omtrdc.netAdRoll D 2 1.7 N/A- adroll.comAdvertising.com (AOL) P 4 3.5 N/A- advertising.comAmobee D 0 0 N/A*- amgdgt.comAOL ONE P 3.7 4.6 N/A*- adap.tv- adtech.de- adtechus.com- aol.com- atwola.com- jumptap.comAppNexus P 9.2 8.9 N/A*- adnxs.comArbor D 0.1 0.4 N/A- pippio.comAtlas (Facebook) D, P 7.8 2.4 N/A*- atdmt.comAudienceScience D, P 0.9 2.4 N/A- revsci.netBaidu P 0.3 0 N/A- baidu.comBidtellect P 0.1 0.5 N/A- bttrack.comBing ads (Microsoft) D 3.2 1.6 N/A- bing.com- msn.comBlueCava P 0.2 0.5 N/A- bluecava.comBlueKai (Oracle) D, P 3.8 4.8 N/A*- bkrtx.com- bluekai.comBrightRoll (Yahoo) D, P 0.8 2.6 N/A- btrll.comCardlytics P 0.2 0 N/A- cardlytics.comCasale Media P 3.8 4.3 N/A- indexww.com- casalemedia.comChoiceStream D, P 0.1 0 N/A- choicestream.comClearstream TV P 0.1 0 N/A- clrstm.comcomScore D, P 11.3 15.1 1.7- comscore.com- scorecardresearch.com- zqtk.net- comScore SDKConnexity P 0.1 0.4 N/A- connexity.netCrimtan D, P 0 0.3 N/A- ctnsnet.comCriteo D 5.3 5.5 N/A

USENIX Association 26th USENIX Security Symposium 1407

Page 19: A Privacy Analysis of Cross-device Tracking

Type Desk Web Mob Web Mob Apps- criteo.com- criteo.netCross Pixel Media D 0.3 0.2 N/A- crsspxl.comDatalogix (Oracle) P 1.6 3.3 N/A- nexac.comDataXu D, P 1.1 2.5 N/A- w55c.netDatonics P 0.2 0.5 N/A- pro-market.netDeep Forest Media (Rakuten) P 0.3 0.4 N/A- dpclk.comDemandbase D 0.2 0 N/A- company-target.comDistroScale P 0.1 0 N/A- jsrdn.comDoubleClick (Google) D, P 50.2 41.1 13.1- 2mdn.net- dmtry.com- doubleclick.net- AdMob SDKDrawbridge P 1.2 1.7 N/A- adsymptotic.comDstillery P 0.3 1.3 N/A- media6degrees.comengage:BDR P 0 0.1 N/A- bnmla.comEnsighten D 1.3 1.2 N/A- ensighten.comeXelate (Nielsen) D, P 1.4 2.9 N/A*- exelator.comEyereturn marketing D 0 0.3 N/A- eyereturn.comEyeview P 0.1 0.4 N/A- eyeviewads.comFacebook D 27.8 27.7 20.3- facebook.com- facebook.net- fb.me- Facebook SDKFreeWheel (Comcast) P 0.3 0.6 N/A- fwmrm.netGigya D 0.6 0.8 N/A- gigya.comGoogle Analytics D 58 43.6 5.1- google-analytics.com- Google Analytics SDKGoogle Display Network D, P 51.6 41.7 13.1- 2mdn.net- adsense.com- blogger.com- dmtry.com- doubleclick.net- googleadservices.com- youtube.com- AdMob SDKIgnitionOne P 1 0.4 N/A- netmng.comInterstate D 0 0 N/A- interstateanalytics.comIXI Services (Equifax) D, P 1.6 3.7 N/A- ixiaa.comKenshoo D, P 0.1 0.4 N/A- xg4ken.comKrux D, P 3.5 5.2 N/A- krxd.netLinkedIn D 4.9 1.9 N/A*- bizographics.com- linkedin.comLotame D, P 2.7 3.8 N/A*- crwdcntrl.netMagnetic P 0.6 0.6 N/A- domdex.com

Type Desk Web Mob Web Mob AppsMaxPoint D 0.1 0.6 N/A- mxptint.netMediaMath P 5 4.9 N/A- mathtag.comMoat D, P 6.8 6.7 N/A- moatads.comNeustar D, P 3.7 6.1 N/A- adadvisor.net- agkn.comNielsen D, P 5.5 5.3 N/A*- imrworldwide.comOptimizely D 3.4 3.6 N/A*- optimizely.comPerfect Audience P, D 0.3 0.4 N/A*- prfct.coPubMatic P 2.7 3.3 N/A*- pubmatic.comQuantcast P 9.7 8.5 N/A- quantserve.comRadiumOne D 0.4 0.9 N/A*- gwallet.comResonate P 0 0.3 N/A- reson8.comRocket Fuel P 1.8 3 N/A- rfihub.comRubicon Project P 4.6 5.8 N/A*- chango.com- rubiconproject.comRUN P 0.1 0.3 N/A- rundsp.comSignal D 1.1 0.8 N/A- thebrighttag.comSizmek P 3.9 4 N/A- peer39.com- peer39.net- serving-sys.comSkimlinks D, P 1.1 1.6 N/A- skimresources.com- redirectingat.comSmaato D, P 0.2 2 0.1- smaato.net- Smaato SDKSmart AdServer P 0.4 1.1 N/A- smartadserver.comSonobi P 1.3 2.4 N/A- sonobi.comSpotX D, P 0.9 3.1 N/A*- spotxchange.comTapad P 1.1 1.9 N/A- tapad.comTealium D 1.9 2 N/A- tiqcdn.comThe Trade Desk P, D 4.4 6.6 N/A- adsrvr.orgTurn P 1.1 2.7 N/A- turn.comTwitter D 11.9 6.6 0.7- ads-twitter.com- twitter.com- Twitter SDKUndertone P 0.2 0.4 N/A- legolas-media.com- undertone.comVindico (Time) D, P 0.2 0.4 N/A*- vindicosuite.comWeborama P 0 0 N/A*- weborama.fr- weborama.ioYahoo D 5 6.5 N/A*- yahoo.comYieldbot P 0.8 1.6 N/A- yldbt.com

1408 26th USENIX Security Symposium USENIX Association