browser tracking, business risk? · browser tracking, business risk? datasci w231: legal and...

Browser Tracking, Business Risk? DATASCI W231: Legal and Ethical Concerns - Final Paper

Christopher J. Llop

I. Introduction II. Types of Tracking

A. Cookies B. Evercookies C. Cookie Syncing D. Canvas Fingerprinting E. Search History F. Data Brokers

III. Data and Trackers A. Data Collectors B. Data Collected C. Data Flows

IV. Regulation A. Government Regulation B. SelfRegulation

V. Business Risk A. Recap: Summing Up B. Proof of Data Collection C. Discussion of Business Harm

1. Business Risk to a Company involved in Litigation 2. Business Risk to a Company with Extensive R&D

D. Reflection VI. Mitigation

A. Training B. Use of Browser AddIns C. Clearing Cookies, Search and Other History D. Incognito Mode E. OptOut / Do Not Track

VII. Conclusions

1

I. Introduction

When we log onto and browse the internet, the machine we physically interact with can connect to billions of other devices. Each web page we visit or interaction we initiate is 1

actually a conversation between our device and one or more machines that also physically exist and are operated by another party for the purpose of exchanging information.

Many of these interactions occur seamlessly and, while we are browsing the internet, it is easy to have only a vague idea of the location or ownership of the machines that we communicate with. Toplevel domains such as ".com", ".uk", or ".org" are built into website names and give a general idea of who the primary entity we are interacting with may be: a forprofit company, a foreign company, or a nonprofit organization.

However, in the modern web, communication extends far beyond a simple twoparty back and forth between our computer and the specific web site we visit. Each page is loaded with social media addins, advertisements, analytics and an assortment of other widgets. When a page is loaded, each of these components communicates with their respective owners easily leading to dozens of interactions for a single webpage to load.

In the process of communicating, these parties whom we did not explicitly attempt to communicate with, known as thirdparties, receive information from our computer as we ask them for data. They respond in kind and can place files on our machine or take other routes to identify our browser as we interact with the same third party in various ways across the different sites they are embedded within.

Through identification, these parties are able to build a history of our activities on the web and to use this history for their business purposes. A marketing company can track how many advertisements for a particular nail polish they have shown you and change up the offers to keep you from developing negative opinions through "product fatigue". Alternatively, they can notice you don't shop for nail polish at all and offer you a nice mustache trimmer to go with your "woodland musk" aftershave. Our data itself becomes

1 “How Many Things Are Currently Connected To The "Internet of Things" (IoT)?” Forbes, 2013. http://www.forbes.com/sites/quora/2013/01/07/howmanythingsarecurrentlyconnectedtotheinternetofthingsiot/

2

http://www.forbes.com/sites/quora/2013/01/07/how-many-things-are-currently-connected-to-the-internet-of-things-iot/

http://www.forbes.com/sites/quora/2013/01/07/how-many-things-are-currently-connected-to-the-internet-of-things-iot/

a commodity, often limited by the privacy policy of the third party instead of the privacy policy of the website we intended to communicate with in the first place.

While the popular press has made people generally aware that their online activities are being tracked, much less attention has been paid to the fact that these same tracking mechanisms are in place while we are working from the office. In an age where internet research is a common component of whitecollar jobs, it seems reasonable to wonder to what extent the work we are doing professionally could be tracked and understood through the data streams we generate from the office. Such tracking could, theoretically, create a security risk if data regarding confidential clients, products, or analyses could be estimated with any degree of certainty by observing patterns of browser use.

This paper explores this idea further. First, Sections IIIV articulate the state of browser tracking in the modern web to provide context in which to discuss business risk. Section II overviews the common types of tracking that occur on the internet and the scope of such tracking as measured in the literature. Section III discusses the parties doing the tracking, the types of data they can collect, and who they may share this data with. Section IV looks towards regulation to see what limits governments and other organizations place on data collection and sharing. Next, Section V thinks through how these components may lead to different risk for businesses than individuals, and Section VI discusses potential methods to limit the amount we are tracked in the workplace. Finally, Section VII offers several conclusions and suggestions.

II. Types of Tracking

As we use the web there are numerous ways that our actions are tracked and conveyed to other parties. This section will overview some common and emerging methods used to identify our web activities and store data. Common to these methods is the ability to uniquely identify an individual, if not specifically by name or personal information, by a constant identifier that is associated with a particular browser or machine. 2

2 Many of the facts presented in these sections come from two sources: Acer et al., “The Web Never Forgets: Persistent Tracking Mechanisms in the Wild.” 2014. https://securehomes.esat.kuleuven.be/~gacar/persistent/the_web_never_forgets.pdf Roesner, et al., “Detecting and Defending Against ThirdParty Tracking on the Web.” 2012. http://www.franziroesner.com/pdf/webtrackingNSDI2012.pdf

3

https://securehomes.esat.kuleuven.be/~gacar/persistent/the_web_never_forgets.pdf

http://www.franziroesner.com/pdf/webtracking-NSDI2012.pdf

A. Cookies

A cookie is a small file placed on your computer by the servers running the websites that you visit. Many useful features of the modern web require cookies to work properly. In each cookie, the server will store three pieces of information: the name of the server that placed the cookie, the date and time when the cookie expires (is deleted), and a unique string of letters and/or numbers that can be used to identify your web browser. While this string does not identify you by personal information, it is unique to you.

When you communicate with a server, your web browser automatically checks for cookies from the server and will send the information inside the cookie to the server along with any page requests. The server then can compare the unique id in your cookie with their internal records to store information about you and decide what data to send back to you. This enables you to log into a site, fill up a shopping cart, or set your own preferences. The cookie provides your identification so the server knows how to respond. 3

There are two kinds of cookies:

Session cookies last for the length of time your browser is open. When you close the browser, they vanish. A session cookie enables you to log in to a site temporarily, or maintain a shopping cart without logging in.

Permanent cookies persist in your browser's cookie folder until they are manually deleted or their expiration date is reached. A permanent cookie allows you to stay logged in between browser sessions, or to set customization such as a permanent language setting.

When you visit a website, cookies can be set by either the firstparty or one of potentially many thirdparties:

The firstparty is limited to the owner of the web domain you are visiting. The firstparty is typically transparent to you as you use the web you are purposefully attempting to access their pages and content.

3 “What is a cookie?” All About Cookies. http://www.allaboutcookies.org/cookies/

4

http://www.allaboutcookies.org/cookies/

Thirdparties, or non firstparty servers, have code embedded in the website but are not the direct owner of the site. These third parties serve advertisements, collect analytics, link to social media, and perform other services.

Both first and third parties can set and retrieve cookie information while they loading. Because thirdparty components are distributed widely across the internet, they are able to recognize the unique identification in their cookie any time you access a page with their code. These updates contain your current location on the web, in aggregate pass along information of your favorite sites, browsing patterns, and even search terms (for example, when the search terms are included directly in the URL you visit). Thirdparties stores this data in their database as an asset to their firm, updating their records and gaining a better picture of what you do each time they interact with you.

Both first and thirdparties can place web beacons, also known as pixeltags, in their content. These beacons are invisible, singlepixel gif images that must be requested separately from the server. In the process of requesting these images, web beacons work in conjunction with cookies to link information such as the type of browser you are using and the amount of time spent on a page with the unique ID stored in a cookie. 4

There are two primary ways to prevent thirdparty tracking. First, you can update your browser settings to refuse to place thirdparty cookies. This is not always effective at preventing tracking. If popups are not blocked, a third party web component can force open a window where they are the firstparty, effectively getting around the cookie ban. Alternatively, if the organization provides a firstparty service, such as Facebook, their cookies will often already be associated with your browser. In this situation, because the cookies were set when Facebook was a first party, Facebook is also able to access this cookie every time they are present on a page as the thirdparty regardless of whether or not you are logged in. 5

The second option is to manually delete all of the cookies in your browser. This will remove both the cookies you want and those you don't. Saved logins, settings, and other preferences can disappear. Attempts to avoid this inconvenience can prevent people from deleting cookies in the first place.

4 “Web Beacons and Other Tools.” All About Cookies. http://www.allaboutcookies.org/webbeacons/index.html 5 Roesner, et al., “Detecting and Defending Against ThirdParty Tracking on the Web.” 2012. http://www.franziroesner.com/pdf/webtrackingNSDI2012.pdf

5

http://www.allaboutcookies.org/web-beacons/index.html

http://www.franziroesner.com/pdf/webtracking-NSDI2012.pdf

Thirdparties can be at work even if there are no distinctly visible components on the page. As DoubleClick, a predictive advertising company owned by Google, states in their privacy policy: "When the server delivers the ad content, it also sends a cookie. But a page doesn’t have to show DoubleClick ads for this to happen; it just needs to include DoubleClick ad tags, which might load a click tracker or impression pixel instead." 6

The amount of data that can be retrieved varies depending on how prevalent the third party is online. In a recent study that browsed 3000 top sites while watching the information sent to a list of 730 trackers, it was found that two trackers could recover more than 40% of a user’s browsing history and 11 could recover more than 10%. However, these numbers increase when basic cookies are combined with additional tracking methods. 7

B. Evercookies

Once a user deletes their cookies, the text file and its unique identifier assigned to the user are removed. The next time a server interacts with the browser, the server will detect that no cookies are present and a new cookie will be placed. This cookie will have a new unique identifier, effectively appearing to the server like a new person and removing the ability to continue to track the same individual over time.

Evercookies are cookies that can respawn once they are manually removed from a user’s browser. In doing so, they repopulate the original unique identifier and allow a given site to link together tracking information even if cookies have been deleted. To accomplish this, websites store data in additional locations outside of the cookies folder. There are various storage locations a server can use, such Flash cookies, localStorage, sessionStorage and ETags. These locations are less transparent to users and thus may be more difficult to clear

Flash cookies are particularly notorious because they allow cookies to move between browsers. Flash, a technology that allows browser to render multimedia, is common across all browsers. By storing a cookie in Flash, an individual is able to respawn the cookie from Flash to any other browser on the computer. This effectively links together the unique identifiers given to each browser, and in turn lets the company merge

6 “Ad Targeting: DoubleClick Cookies.” Google. https://support.google.com/adsense/answer/2839090?hl=en 7 Acer et al., “The Web Never Forgets: Persistent Tracking Mechanisms in the Wild.” 2014. https://securehomes.esat.kuleuven.be/~gacar/persistent/the_web_never_forgets.pdf

6

https://support.google.com/adsense/answer/2839090?hl=en


together records that appeared independent on the back end. Flash has had so many security issues that, in the near future, popular web browsers have started to remove Flash compatibility. 8

Recent browsers attempt to combat the problem by allowing users to clear various storage locations while also cleaning cookies. However, this action can require background knowledge and at time navigation to additional settings pages. Because there are a variety of locations in which evercookies can be stored, it can also be hard to be sure that tools have removed them. In a recent study focused on detecting web tracking, the authors involved were unable to identify how 18 of the cookies they witnessed managed to respawn themselves over time.

Evercookies have been particularly contentious in the United States, where a class action lawsuit settlement in 2013 resulted in KISSmetrics paying over $500,000 for their use of respawning eTag and Flash cookies without user consent. However, an analysis of the most popular domains on the web found that a majority of the sites using supercookies today are based in China and Russia, putting them firmly outside of U.S. jurisdiction. In an age where information can travel seamlessly across borders, data collected by these parties can be held as an asset and used in transfers with other companies. 910

C. Cookie Syncing

While the methods previously described allow the third parties embedded on the pages you interact with watch your internet behavior, it is important to note that these pages typically allow multiple parties to track your activity at one time. This enables various third parties to combine their information via cookie syncing, creating a crosswalk of their unique identifier to gather a larger picture of your web activity.

There is a natural incentive and business reason for the various parties tracking your activity to combine resources. By gathering a larger picture of an individual's interests, each party is able to better predict and monetize on their interactions with the user. A recent study investigated instances where different third parties clearly had access to

8 “Opinion: When Chrome, YouTube and Firefox drop it like it's hot, Flash is a dead plugin walking.” Phys.org, 2015. http://phys.org/news/201507opinionchromeyoutubefirefoxhot.html 9 “KISSmetrics Finalizes Supercookies Settlement.” MediaPost, 2013. http://www.mediapost.com/publications/article/191409/kissmetricsfinalizessupercookiessettlement.html 10 Acer et al., “The Web Never Forgets: Persistent Tracking Mechanisms in the Wild.” 2014. https://securehomes.esat.kuleuven.be/~gacar/persistent/the_web_never_forgets.pdf

7

http://phys.org/news/2015-07-opinion-chrome-youtube-firefox-hot.html

http://www.mediapost.com/publications/article/191409/kissmetrics-finalizes-supercookies-settlement.html


the unique identifiers that other third parties placed in the browser by investigating 730 trackers examined over 3000 top sites. While it was found that alone only two trackers could recover more than 40% of a user’s browsing history and only 11 could recover more than 10%, after accounting for cookie syncing, 101 domains could reconstruct 50% of browsing activity, and 161 could reconstruct over 40%. Even when thirdparty cookies were disabled, 44 parties could identify 40% of browsing history. 11

This outlines the unique privacy risk that comes from cookie syncing. As long as one of the parties has access to a unique identifier at a specific point in time be it through firstparty cookie access, an evercookie, or other source, communication with other thirdparties can help tracking systems recover a more full picture of internet activity.

These numbers only represent the linking of data that could be observed over the web. In addition to these observable risks, a pair of trackers could run back end database merges to attempt and create an even larger map of users over time. Predictive analytics based on behavior could be used to "guess" which user may be the same despite cleared cookies. Furthermore, new, innovative browser identification techniques have recently been observed in the wild and are able to track and sync information without using cookies at all.

D. Canvas Fingerprinting

While cookies are relatively well documented and understood, there is business incentive for companies that engage in web tracking to develop new and innovative methods to track without cookies. This has recently been observed through a process known as canvas fingerprinting.

To create a canvas fingerprint, a server sends a request to your machine to render an image out of a series of letters, numbers, and symbols at a certain size and in certain colors using the browser's Canvas API. This image, when rendered, serves as a unique identifier. It can be converted back into a stream of data, hashed into a value, then sent back to the third party.

Each browser on any machine will render this image slightly differently from others due to device, technology, software, and browser nuances. It is estimated that only 1000

11 Acer et al., “The Web Never Forgets: Persistent Tracking Mechanisms in the Wild.” 2014. https://securehomes.esat.kuleuven.be/~gacar/persistent/the_web_never_forgets.pdf

8


people have the same unique canvas fingerprint, and by combining the fingerprint with other easily obtainable metrics, it is possible to come up with an almost certainly unique identifier. Each time a tracker interacts with your browser, it can request the Canvas API to perform this function and send the data back to link your records over time. It is impossible to disable Canvas API without removing a large amount of native web functionality.

Canvas fingerprinting has only recently been observed in practice, and one specific company, addthis.com, is responsible for 95% of the canvas fingerprints found in a recent survey of top sites. However, the addthis.com web tool is involved in very popular social media sharing plugins, and "according to a recent ComScore report, AddThis “solutions” reaches 97.2% of the total Internet population in the United States and get 103 billion monthly page views." 12

These fingerprints, which never change as long as you are using the same browser on the same machine, can be used to link the cookies relating to a specific user over time despite any amount of cookie clearing or mitigation techniques.

It is unclear how many other advanced "fingerprinting" recognition systems exist, but the economic incentive is clearly in place for companies to develop and implement such methods.

E. Search History

In addition to third party trackers that collect information as we travel across webpages, the sites we directly interact with are constantly collecting and storing the information we give them. Search engines are one of the most prevalent examples of this largescale storage. Sites such as Google, Bing, and Yahoo store each search, associated with a unique identifier or your relevant account. 13

This can be useful: by paying attention to your searches and which results you click on, these services can provide you with more relevant results to your particular interests. In

12 “COMSCORE RANKS ADDTHIS #1 IN DISTRIBUTED CONTENT IN THE UNITED STATES.” AddThis, 2013. https://www.addthis.com/press/comscoreranksaddthisnumber1indistributedcontentintheunitedstates#.VdvTAPlVhBc 13 “What Your Search History Says About You (And How to Shut It Up).” Huffington Post, 2013. www.huffingtonpost.com/megancarpentier/whatyoursearchhistory_b_4179728.html

9

http://www.huffingtonpost.com/megan-carpentier/what-your-search-history-_b_4179728.html

many situations, two people sitting side by side will see slightly different search results because the search engine has determined that the individuals have different preferences either by their location, past history, or other collected data (i.e., from social media). This data also allows companies to offer more relevant advertising.

Over time, these companies can collect years worth of search records from individuals, often tied to the IP addresses where we work or to our personal accounts. Some of these companies will let you view and clear your history, for example, at https://history.google.com/history/. Because "googling" is so ubiquitous, it is easy to forget that Google is a forprofit company that maintains a database of what is searched for. It is important for businesses to think through through the implications of their search terms being stored in the databases of other companies

In addition to primary search engines, the FTC advised in 2011 that cookies can allow third parties to observe search terms. Historically, search engines would submit search terms via unencrypted channels where they could be observed by whatever search result site the user clicked on. This was useful, as it allowed a site to understand what search terms people were typing prior to visiting the site. However, users of search may not have been aware that their terms were being passed along to websites and potentially embedded third parties. 14

In 2013, Google moved to encrypt all searches through their site. In 2015, Bing followed suit. When you go to these sites you can see that you are automatically redirected to "https" instead of "http", indicating the encryption is being used. While this prevents searches on major engines from being naively tracked, many smaller search engines may not yet have switched to secure search. Users should be aware of if their search terms are encrypted or not. 1516

F. Data Brokers

In addition to the tracking of our interactions, there exists a suite of companies known as "Data Brokers" who constantly scan, mine, and save data that individuals post to the

14 “Cookies: Leaving a Trail on the Web.” FTC, 2011. http://www.consumer.ftc.gov/articles/0042cookiesleavingtrailweb 15 “Goodbye, Keyword Data: Google Moves Entirely to Secure Search.” Search Engine Watch, 2013. http://searchenginewatch.com/sew/news/2296351/goodbyekeyworddatagooglemovesentirelytosecuresearch 16 “Bing Moving to Encrypt Search Traffic by Default.” Bing, 2015. https://blogs.bing.com/webmaster/2015/06/15/bingmovingtoencryptsearchtrafficbydefault/

10

http://www.consumer.ftc.gov/articles/0042-cookies-leaving-trail-web

http://searchenginewatch.com/sew/news/2296351/goodbye-keyword-data-google-moves-entirely-to-secure-search

http://searchenginewatch.com/sew/news/2296351/goodbye-keyword-data-google-moves-entirely-to-secure-search

https://blogs.bing.com/webmaster/2015/06/15/bing-moving-to-encrypt-search-traffic-by-default/

web. When possible, they can scrape personal information such as addresses and purchase histories, and they can easily establish baseline profiles of individuals from social media sites such as Facebook and LinkedIn. 17

While the activities of data brokers are largely outside the scope of this paper, it is important to note that, because of their activities, a large amount of personal information may be available to buy and sell. To the extent that browsing history and search terms can be mapped to personal identities, it could be possible to leverage data brokers to fill in gaps for example, to identify people who work for the same company and might be working together on a project. A 2014 CBS interview found just that. While "The website doesn't require users to give their real name. But the IP address and the computer ID number are recorded and it is not difficult for data brokers to match that information with other online identifiers. There are firms that specialize in doing it." 18

A company named SafeShepherd offers a service to track down, identify, and remove your personal information from data brokers websites. On average, they claim to find their customers' personal information contained in 11 data broker databases. 19

III. Data and Trackers

With the mechanisms by which data is collected better understood, it is important to understand what data can be reasonably collected and who can gain access to it. The FTC notes that "consumers face a landscape of virtually ubiquitous collection of their data." This section will explore the data collectors, the data collected, and the flows by which that data can travel. 20

17 “Everything We Know About What Data Brokers Know About You.” ProPublica, 2014. https://www.propublica.org/article/everythingweknowaboutwhatdatabrokersknowaboutyou 18 “The Data Brokers: Selling your personal information.” CBS, 2014. http://www.cbsnews.com/news/thedatabrokerssellingyourpersonalinformation/ 19 “Here Are 20 Companies Who Sell Your Data (& How To Stop Them).” ReadWrite, 2012. http://readwrite.com/2012/04/26/hereare20companieswhosellyourdatahowtostopthem 20 “Protecting Consumer Privacy in an Era of Rapid Change Recommendations for Business and Policymakers.” FTC, 2012. https://www.ftc.gov/sites/default/files/documents/reports/federaltradecommissionreportprotectingconsumerprivacyerarapidchangerecommendations/120326privacyreport.pdf

11

https://www.propublica.org/article/everything-we-know-about-what-data-brokers-know-about-you

http://www.cbsnews.com/news/the-data-brokers-selling-your-personal-information/

http://readwrite.com/2012/04/26/here-are-20-companies-who-sell-your-data-how-to-stop-them

https://www.ftc.gov/sites/default/files/documents/reports/federal-trade-commission-report-protecting-consumer-privacy-era-rapid-change-recommendations/120326privacyreport.pdf

https://www.ftc.gov/sites/default/files/documents/reports/federal-trade-commission-report-protecting-consumer-privacy-era-rapid-change-recommendations/120326privacyreport.pdf

A. Data Collectors

The most prevalent nongovernmental player in browser tracking appears to be ad agencies. These organizations will piece together as much of the browsing history of an individual as possible so they can segment users into groups and sell access to those groups as targeted advertisements. In short, organizations use ad exchanges to allow marketers to bid on access to a particular audience, defined based on the browsing activities collected. Tracking is also used for a score of tactical activities, for example, to prevent "ad fatigue" by not overplaying an ad to one specific user. 2122

Companies involved include advertising specific organizations, such as QuantCast and comScore, and also the advertising arm of many large tech companies. For example, Google owns one of the most prominent in DoubleClick, while Yahoo owns YieldManager and Facebook owns Atlas. While Google, specifically, assures they will not merge browsing information from DoubleClick with personal information from user accounts, this would need to be investigated in privacy policies on a casebycase basis.

Other agencies, such as AddThis.com, do not serve advertisements themselves but instead act as a middleman, leveraging the browser tracking capabilities of their thirdparty plugins to sell extended data to companies in the advertising field.

In addition to advertising, it is important to note the ease by which any company with a relevant thirdparty addin could track browser history. A [YEAR] study that examined the top 3000 sites found 730 distinct trackers on the web. These parties originate from countries around the globe.

B. Data Collected

In order to establish the breadth of data collected, I reviewed the privacy policies of several third parties who track browsing activity. In this process, I simply selected a convenience sample of several large actors in this space. This review was not holistic

21 “Your Online Choices FAQ.” Your Online Choices. http://www.youronlinechoices.com/uk/faqs 22 “The evolution of online display advertising.” Iabuk. https://www.youtube.com/watch?v=1C0n_9DOlwE#

12

http://www.youronlinechoices.com/uk/faqs

https://www.youtube.com/watch?v=1C0n_9DOlwE#

and may not reflect the activities of all parties, particularly the long tail of smaller trackers. 23242526

Trackers reviewed generally split the data they collect into two categories: Personally Identifiable Information (PII), and nonPersonally identifiable information. These sites handle each kind of information differently, and, in some jurisdictions, companies face legal boundaries on the ways they can use PII. Sites stored this data generally any amount of time from one to five years.

The definition of PII varied by site, but was usually well called out. In general, it includes names, physical addresses, and similar information that directly and immediately can be used to locate a specific individual. Tracking sites collect this data mainly from those who log in to services they provide, and often may not collect this information while you browse the web unless you directly use their services.

However, the nonPI category is broad and could potentially be used to identify an individual regardless. NonPI commonly collected included: website views, specific page views, information on if a site is visited repeatedly, browser details, IP addresses, MAC addresses, location data, search information, and even distinct device identifiers. This data is typically called "log data", and regarded as a primary data asset for business purposes.

While these types of data are not considered "personally identifiable", they still present risks. As AOL found out when they released a large amount of search data for scientific research, it is all too easy for search history and internet browsing history to be used to identify specific people, and, in the business context, what kinds of activity they are up to. People log in to their social media profiles, search for themselves and their 27

companies, and engage in a host of other, highly individualized activities.

IP Addresses and device identifiers have been particularly salient in privacy discussions. One privacy policy in particular called out that "some jurisdictions" consider IP addresses to be PII, and many discussions regarding the "internet of things" have

23 “QuantCast Privacy Policy.” QuantCast, 2014. https://www.quantcast.com/privacy/ 24 “Privacy and Data Practices.” AddThis, 2014. http://www.addthis.com/privacy/privacypolicy 25 “DoubleClick Cookies.” DoubleClick. https://support.google.com/adsense/answer/2839090?hl=en 26 “Privacy Policy and Patent Notice.” ScoreCardResearch, 2015. https://www.scorecardresearch.com/privacy.aspx 27 “AOL: This Was a Screw Up.” TechCrunch, 2006. http://techcrunch.com/2006/08/07/aolthiswasascrewup/

13

https://www.quantcast.com/privacy/

http://www.addthis.com/privacy/privacy-policy

https://support.google.com/adsense/answer/2839090?hl=en

https://www.scorecardresearch.com/privacy.aspx

http://techcrunch.com/2006/08/07/aol-this-was-a-screw-up/

begun to question if it is really fair to say that the device ID of a device that is clearly owned by only one person can really be considered not "personally identifiable".

From a business perspective, this provides an interesting set of considerations. If individuals are engaged in sensitive research topics, it could be possible to link personal information about their company or job description to the types of research they are engaged in. Once one person at a certain company is identified, the shared IP address could be used to identify others.

While some companies reviewed explicitly stated that they will not engage in activities that attempt to link individuals with their browsing history, in an ecosystem with so many players, reviewing each of hundreds of privacy policies seems impossible. Adding complexity, it is almost impossible for the naive user to identify what parties are tracking each page before the page is visited. Even if you wanted to read the privacy policy first, it is difficult to know who's privacy policy should even be investigated. Many websites that embed third parties themselves do not know. They simply punt on the question, and state that, for communications with third parties, the privacy of the third party applies.

It is also important to note that information which "seems" deidentified at one point in time can easily be reidentified later if technology improves. For example, recent advances in science has proven that previously "deidentified" genetic samples could in fact be traced back to individuals. As the number of freely available public datasets increase, more and more avenues for reidentification exist. What is deidentified today may not be so tomorrow.

In short a lot of data can be collected. In any specific situation, it is exceedingly difficult to know how you are being tracked. Instead, it is safest to assume the entire set of data may be collected at any time and linked together by the parties involved.

C. Data Flows

The data collection practices and privacy policies of websites don't matter if the data can simply flow to other parties who have no obligation to keep the data safe. Review of privacy policies almost unilaterally stated that data could be shared with "partners" for business purposes, implicitly including practices such as cookie syncing. Very little mention was made of how these partners would be required to treat the data.

14

Reading between the lines, the policies imply "a large tracker may act as a data broker and sell user histories for a fee". Treatment of data as such an asset is also particularly relevant in a business acquisition, where there are many moving pieces and the company being acquired may have very little stake in how the next people in line treat the data. This is especially true in bankruptcy proceedings where the impetus is to sell whatever can be sold for as much as possible, as quickly as possible. One bankruptcy lawyer highlighted this risk, stating that "[data issues haven't] been litigated much, and there isn’t really good case law". 28

The more people who obtain the data, the more security risks there are that some party with access will use it for nefarious purposes. If the data stream itself can be deidentified, then any individual and corporate risks travel with it. Such copies of data also leave it more at risk of hacking. As we have seen in several highprofile government and corporate breaches, no dataset can be completely secure. The more copies, the more risk.

Finally, all privacy policies stated that data collected will be turned over to legal authorities if requested. While this should not be a problem for most businesses, it takes control of information away and out of hand.

IV. Regulation

Data gathered through online tracking are largely unregulated by government agencies. Instead, several selfregulation associations have been set up, with hundreds of tracking companies subscribing to each. While selfregulation may be better than no regulation and offers some benefits, it has the potential to set up perverse incentives for data privacy.

Even if government regulation were to be effective, policies that vary by jurisdiction would only be so helpful because data can quickly transition across borders. Of the privacy policies reviewed, some made explicit mention that data would be transferred

28 “How Safe is Your Information When a Company Goes Bankrupt.” Dallas News, 2015. http://www.dallasnews.com/business/headlines/20150404howsafeisyourinformationwhenacompanygoesbankrupt.ece

15

http://www.dallasnews.com/business/headlines/20150404-how-safe-is-your-information-when-a-company-goes-bankrupt.ece

http://www.dallasnews.com/business/headlines/20150404-how-safe-is-your-information-when-a-company-goes-bankrupt.ece

across borders without a clear picture of how that would occur. Others explicitly stated where the data would be stored, creating more certainty.

A. Government Regulation

The FTC has commented on data security and privacy since the 1990s. In 1998, the FTC reviewed commercial websites' privacy disclosures and released a report outlining "fair information practice principles" (FIPPs). The conclusion of this report determined that "despite the Commission's threeyear privacy initiative... the vast majority of online businesses have yet to adopt even the most fundamental fair information practice." 29

In 2000, the FTC recommended legislation that would require ad networks to comply with the FIPPs to protect users from online tracking. However, Congress ultimately did not enact the recommended legislation. The FTC continued to hold town hall meetings and propose principles throughout the 2000s until, in 2010, when the FTC proposed a regulatory framework for consumer data privacy and a "Do Not Track" mechanism. 30

Interestingly, this "Do Not Track" mechanism would require the use of a cookie itself. The FTC suggested that: “[t]he most practical method of providing uniform choice for online behavioral advertising would likely involve placing a setting similar to a persistent cookie on a consumer’s browser and conveying that setting to sites that the browser visits, to signal whether or not the consumer wants to be tracked or receive targeted advertisements”. Still, the FTCs recommended legislation has not been codified in law. 31

Two other attempts at regulation arose in 2011. First, Representative Jackie Speier introduced "Do Not Track Me Online Act of 2011" that would have authorized the FTC to promulgate regulations requiring online advertiser to allow individuals to opt out of tracking. The legislation also would have authorized random audits of companies to ensure they complied with the optout requests. This legislation was not enacted and died in the 112th Congress. 32

29 “PRIVACY ONLINE: A REPORT TO CONGRESS.” FTC, 1998. https://www.ftc.gov/sites/default/files/documents/reports/privacyonlinereportcongress/priv23a.pdf 30 “ONLINE PROFILING, A REPORT TO CONGRESS.” FTC, 2000. http://www.steptoe.com/assets/attachments/934.pdf 31 “Protecting Consumer Privacy in an Era of Rapid Change.” FTC, 2010. https://www.ftc.gov/sites/default/files/documents/reports/federaltradecommissionbureauconsumerprotectionpreliminaryftcstaffreportprotectingconsumer/101201privacyreport.pdf 32 “H.R. 654 (112th): Do Not Track Me Online Act” GovTrack. https://www.govtrack.us/congress/bills/112/hr654

16

https://www.ftc.gov/sites/default/files/documents/reports/privacy-online-report-congress/priv-23a.pdf

http://www.steptoe.com/assets/attachments/934.pdf

https://www.ftc.gov/sites/default/files/documents/reports/federal-trade-commission-bureau-consumer-protection-preliminary-ftc-staff-report-protecting-consumer/101201privacyreport.pdf

https://www.ftc.gov/sites/default/files/documents/reports/federal-trade-commission-bureau-consumer-protection-preliminary-ftc-staff-report-protecting-consumer/101201privacyreport.pdf

https://www.govtrack.us/congress/bills/112/hr654

At a similar time, Senators John McCain and John Kerry introduced the "Commercial Privacy Bill of Rights Act of 2011". Again, this bill would have empowered the FTC to create regulations regarding the storeing of PII, including an optin for "sensitive PII". This bill did not include the proposed "Do Not Track" mechanism and was criticized as not being protective enough. Regardless, the Bill was referred to the Senate Committee on Commerce, Science, and Transportation. A check of the record indicates no update since 2011. 3334

B. Self-Regulation

Despite the lack of clear regulation, numerous selfregulating initiatives have popped up in the advertising space. In reviewing privacy policies, several agencies in particular were referenced repeatedly: the Network Advertising Initiative (NAI), Digital Advertising Alliance SelfRegulatory Program (DAA), European Digital Advertising Alliance (EDAA) SelfRegulatory Principles and TRUSTe's Trusted Data Collection Program.

Of these, I reviewed the NAI's policies more in depth to understand the types of protections that these organizations provide. The NAI "is composed of nearly 100 member companies" with the goal to "imposes notice, choice, accountability, data security, and use limitation requirements on NAI member companies" specifically in regard to advertising activities. The NAI suggests that selfregulation is needed, because it is the only way to ensure that the rules protecting consumers can keep up with the changing pace of technology. The NAI suggests that governments cannot possibly move fast enough to protect consumers, and so industry groups must do so instead. 35

One of the main limitations placed by the NAI on member companies is a requirement that member companies allow users to optout of tracking. Notably, the NAI periodically checks that the optout capabilities of their member sites are actually working. If the NAI

33 “Bill Text 112th Congress (20112012) S.799.IS.” The Library of Congress, 2011. http://thomas.loc.gov/cgibin/query/z?c112:S.799: 34 “Bill Summary & Status 112th Congress (20112012) S.799.” The Library of Congress, 2011. http://thomas.loc.gov/cgibin/bdquery/z?d112:s.00799: 35 “2015 Update to the NAI Code of Conduct.” Network Advertising Initiative, 2015. https://www.networkadvertising.org/sites/default/files/NAI_Code15encr.pdf

17

http://thomas.loc.gov/cgi-bin/query/z?c112:S.799:

http://thomas.loc.gov/cgi-bin/bdquery/z?d112:s.00799:

https://www.networkadvertising.org/sites/default/files/NAI_Code15encr.pdf

finds that they are not, the NAI will work with the organization to rectify the problem and even suggests they may refer the party to the FTC for further investigation. 3637

However this optout in itself provides several challenges. First, in order to optout, an individual must allow enable thirdparty cookies so that the organization can place a marker that the user has opted out. This requirement to allow thirdparty cookies interferes with a user's ability to block thirdparty cookies from any other site particularly those who do not offer optout and do not abide by any selfpolicing policies.

Second, in order to optout a user must first identify the tracking organization, determine an optout mechanism, then implement the solution. While some websites have been generated which allow a user to optout of multiple tracking services at once, this is by no means comprehensive. Users still must identify how they are being tracked before they can respond. 38

In addition to optout protection, the NAI places restrictions on the ways that member organizations can transfer the data they collect. Specifically, "members shall contractually require that any unaffiliated parties to which they provide PII... adhere to the provisions of this Code concerning PII.... [and] require that all parties to whom they provide nonPII collected across web domains...[to] not attempt to merge such nonPII with PII... or to reidentify the individual for InterestBased Advertising purposes".

While this does appear to provide some protection, it is not immediately clear what protections are provided for secondorder transfers. That is, as long a non NAI compliant party follows policy for PII, it appears as though they can resell nonPII at will without first checking the credentials of the buyer. Furthermore, the restrictions explicitly discuss restrictions on use "for InterestBased Advertising purposes", and could be interpreted to place no restrictions on how the information can be bought and sold for any other purpose.

Finally, while selfregulation and related certificates are designed to provide a signal to consumers that their privacy is respected, in the context of thirdparty advertising and data tracking it is not clear that consumers have an option to use this certification as a

36 “About The NAI.” Network Advertising Initiative, 2015. https://www.networkadvertising.org/aboutnai/aboutnai 37 “Understanding Online Advertising.” Network Advertising Initiative, 2015. https://www.networkadvertising.org/faq 38 “Your Ad Choices.” YourOnlineChoices, 2015. http://www.youronlinechoices.com/uk/youradchoices

18

https://www.networkadvertising.org/about-nai/about-nai

https://www.networkadvertising.org/faq

http://www.youronlinechoices.com/uk/your-ad-choices

signal to "vote with their feet" and leave services that do not have adequate protections. Even if a user wanted to avoid all organizations that were not a part of selfregulation agreements, it is not clear how a user could easily identify and avoid such institutions without looking at the source code of the websites they visit. Such signals could perhaps be used by web developers instead, should they take it upon themselves to only allow third parties with certain certifications to be embedded in their sites.

V. Business Risk

A. Recap: Summing Up

In a 2012 Law360 interview, FTC Commissioner Julie Brill stated that, she "[is] concerned about how a consumer’s browsing history may be used in a way that can unfairly cause her harm." While Commissioner Brill is focused on consumer privacy, these same forces appear to have access to business browsing data. 39

In short, it seems as though potentially hundreds of parties could be privy to anywhere from 1050% of browsing history, linked to unique identifiers such as IP address, MAC addresses, and device IDs. While the major players seem to selfregulate to an extent, there are too many parties operating to know, in advance, what tracking mechanisms might be on any given site and what the privacy policies of that particular combination of parties might be. The data that is then tracked can be shared and sold in the course of business, and this development is new enough that it has not yet been highly litigated. In short, the government has not put specific rules in place, and industry run selfpolicing seems to be the major regulation.

Once browsing history and search terms have been collected, they can potentially be deidentified either via analyzing the data stream itself or combining the data stream with other resources, such as the data sold by private brokers. Deidentification can create a risk to businesses if a nefarious organization is able to track the activities in a certain business unit or among a certain group of individuals by a shared IP address or other connecting data. In addition, as data is shared the potential for breaches through hacking and other means can only increase.

39 “Privacy And The FTC: Insights From Commissioner Brill.” Law360, 2012. http://www.law360.com/articles/385881/privacyandtheftcinsightsfromcommissionerbrill

19

http://www.law360.com/articles/385881/privacy-and-the-ftc-insights-from-commissioner-brill

B. Proof of Data Collection

All of these concerns regarding browser tracking and data collection only matter if such actually occur in workplace. In order to move beyond theory, I conducted research into how I personally was being tracked in the office, and asked a few short survey questions to coworkers to see how they were as well.

As part of the European Interactive Digital Advertising Alliance (EDAA), a web tool has been developed to check for the presence of tracking cookies by various organizations who are part of the EDAA. When one accesses this tool at http://www.youronlinechoices.com/uk/youradchoices, the website checks browsing history to notify an individual that they are being tracked, and offer an option to optout using third party cookies.

Using my work computer, I visited this site using two separate browsers. On Firefox, I found that 53/106 (50.5%) organizations were already tracking my activity. On Chrome, I found 58/106 (55.2%). I then cleared my cookies on Chrome and confirmed that the trackers were gone. Four hours of internet use later, I checked again and found that 46, or 79% of those trackers had already respawned on my machine. One week later, (and after performing research on trackers for this paper) there were a total of 60.

Anecdotally, the advertisements I saw on YouTube also distinctly changed after I reviewed privacy policies from tracking companies I have seen multiple adverts for QuantCast on my work machine since visiting their website.

To see if this occurrence was unique to my habits, I administered a survey to a convenience sample of my coworkers (N=23). In the survey, I asked each individual to visit this page and report back an estimated percentage of the number of trackers active. Of the 19 respondents who answered this question, 10 reported that more than 50% of the the sites listed were tracking their behavior. Only 5 reported that fewer than 10% of the trackers were active. It appears that browser tracking is occurring within our company and would pick up activities in the usual course of business.

20

C. Discussion of Business Harm

With a better understanding of what data is collected from individuals browsing the internet at work, and a basic study showing that this data is, in fact, collected, we turn our attention to discussion of business risk.

1. Business Risk to a Company involved in Litigation

As an economic consultant, a portion of my work goes to support experts testifying in court. I will often conduct research on behalf of the expert as I examine the confidential record of evidence along side with publicly available documents.

In order for risk to come from my browsing activities, a company with an interest in collecting my data would first need to identify my browsing as browsing they are interested in, then be able to use that data to come to some sort of conclusions about my work.

In the case of litigation support, it would be particularly hard to identify the individuals involved in a specific case. To start, many times the engaged consulting firm has not been made public by the law firm that has hired them for support. To identify my company through browsing or search records alone, an opposing party would first need to obtain the records of the law firm that hired us, then make an educated guess as to which of a handful of number of firms have been engaged.

From there, the company would still have no idea which individuals within my firm were hired for this particular task. As such, even if they could obtain browsing records from my company, it would be difficult to separate a signal from the noise of all the cases going on. One tactic would be to look for individuals searching for the company itself, as that would indicate an interest. Even in these cases, sometimes we work on multiple engagements for one client so it would be hard to allocate research with specific workstreams.

Additionally, even if a company was able to obtain the full browsing or search history for the teams working on a specific piece of litigation, only a partial story of the analyses could be woven together. Many parts of litigation rely on confidential documents identified through the legal discovery process. Generally, these contain much more

21

nuanced information and thus are more heavily relied upon than public sources. It is likely that the "key" information would be missing from the online record. Even if key information existed, in the best case scenario it would only give the company a head start in their analyses. In litigation, the final workproduct is turned over with adequate time for the opposing side to analyze and respond.

These challenges might be avoided if a large search engine, thirdparty ad network, or other data broker was involved in litigation themselves. According to many of their privacy policies, these organizations can use the data they collect to protect themselves. It would not be a stretch to assume protection in litigation would count as one of these fair uses. In these situations, the organization would simply need to guess which consulting firms might be working against them, then check records inside their own databases to see if they can find anything useful. Challenges would still exist for example, the need for coordination between internal legal teams and the data scientists able to poll the database and predict which records may be of interest.

Another interesting risk would come if the court were able to rule that the browsing history of economic consultants were "discoverable" that is, subject to be turned over to the court. I believe this concern is unrealistic, as obtaining browsing history from a third party would not be able to be traced to certain individuals with enough certain to be admissible as some sort of evidence. Additionally, such records would mingle personal with nonpersonal browsing habits in an inappropriate fashion. Instead, if the court were interested in such records, they would have the power to demand a history be actively kept firsthand by each party involved in the case.

2. Business Risk to a Company with Extensive R&D

While the risks to an economic consulting firm seem unlikely, it may be easier to see the risks to a company engaged in very secure product R&D. In these situations, it might be easier for a third party to target specific individuals by looking up who is involved in R&D via LinkedIn profiles or other data sources made available by brokers. From there, the individual could attempt to purchase browsing records from that specific company (perhaps by IP address), and then try to identify the unique cookie IDs of the target individual. Such unique identification could be done by monitoring search terms or looking to see if the individual has logged in to a selfidentifying personal account or social network.

22

From there, browsing history could be looked at to understand the types of products or news stories that are internally interesting to the company. While this would not provide access to confidential internal documents, it might be enough to indicate vague details of a new product or to glean useful insights.

Again, such an endeavor could be undertaken much more easily by a large search engine, thirdparty network or other data broker who can leverage the records they already have inhouse. Such companies would likely target their own competitors (with similar skills), unless they decide to sell corporate espionage as a service. Such a business proposition seems unlikely, as it would instantly raise red flags in the industry.

D. Reflection

In short, in the litigation context, unless a company has firsthand access to the records and the right to use them in selfdefense, it seems as though attempting to track down browsing and search history for the purpose of legal defense would be costly, cumbersome, and unlikely to be the best use of resources. That said, special care should likely be given to lawsuits involving companies such as Google, who would more easily be able to poll their own databases with potentially little cost.

Regardless of the ease at which the information could be used to create harm, a separate issue does arise. Even if harm is unlikely, there is no good business reason for a company to leak information about their employee's browsing habits if it can easily be stopped. While there currently may not be much risk from these data streams, there is no guarantee that some new innovation in analytic capabilities will not change the costbenefit equation.

As such, the next section of this paper explores several mitigation techniques.

VI. Mitigation

At a glance, there is no "business reason" to allow predictive advertising or to allow our data to be tracked. Once data is stored externally and transferred, there is not much that can be done to demand the data back or to limit the ability to use it. Regardless of if

23

we think there is a current business risk, if basic protections or methods to stop the flow of browsing and search data exist, they should be undertaken

While no technique is perfect at ensuring no risk, we will explore several commonly mentioned techniques.

A. Training

Any good solution begins with training the workforce to better understand how they are tracked so that they have incentive to prevent it. The survey conducted as part of these research efforts indicate that better training is possible. For example, 74% of respondents did not know if third party cookies were enabled in their web browser. Only 30% felt confident that they certainly knew the types of data that could be tracked by third parties on the web, and only 9% felt they understand the rules that govern how tracking organizations can transfer the data obtained.

All of these metrics could be increased by clearer communication of how data exchanges work while browsing the internet and how it might be relevant to confidential activity. Employees could be encouraged to pick options or install plugins to their browsers that make it less likely for their activities to be tracked. These options could be vetted by the company's information technology department, providing a firm recommendation instead of relying on each employee to come to their own determination of what technologies actually help. Companies could develop position statements as to how they hope employees conduct web activities.

B. Use of Browser Add-Ins

Both the literature and my own survey indicate that browser addins may be one of the most effective ways to block online tracking. In my poll of coworkers, those who indicated they used one of multiple addins at work reported much less tracking. Nearly every individual who reported that fewer than 10% of the 106 measured tracking cookies was currently stored in their browser also reported use of such addins. Similarly, of these users, 57% reported that fewer 10% of tracking sites were following them and only 1 user indicated that more than 50% of the tracking sites were found. 40

40 Acer et al., “The Web Never Forgets: Persistent Tracking Mechanisms in the Wild.” 2014. https://securehomes.esat.kuleuven.be/~gacar/persistent/the_web_never_forgets.pdf

24


Popular addins listed included AdBlock, Ghostry, uBlock Origin, FlashBlock, and ForgetMe. Notably, several individuals reported using such technology at home but not in the office. This indicates that individuals may be more inclined to worry about privacy and browser tracking issues at home as opposed to at the office. Such behavior could be easily addressed by mentioning these tools in training given to new hires, so that individuals know that the company is interested in limiting the amount that employees are tracked online.

These tools are particularly powerful in that they can block both first and thirdparties from placing cookies onto a machine by explicitly preventing certain servers from loading content on a page. While many new web browsers allow a user to block thirdparty cookies altogether (in fact, new versions of Firefox include this behavior by default), this is not helpful in situations where one company can expect to have both first and thirdparty access to a user. Companies can encourage employees to block thirdparty cookies, but by itself this method would not be completely effective. 41

If even more security is desired, a browser called TOR browser is the only tool on the market with the ability to block Canvas Fingerprinting and other methods of unique browser identification without cookies. Because the Canvas API used in these fingerprinting requests is also a common part of web content, to accomplish this, TOR needs to request permission from the user every time a website attempts to use the browser's Canvas API. It is then the responsibility of the user to deny Canvas API requests that are not intentional, placing significant burden on the user.

In today's marketplace, almost every web browser comes standard with a popup blocker. Users should keep these features enabled, as blocking popups stops thirdparty sites from forcing themselves into a firstparty position where they can place cookies from a popup window.

C. Clearing Cookies, Search and Other History

Clearing user data can be an effective way to limit the data tracked by organizations, but will not be perfect. In my survey of coworkers, 79% reported clearing their work cookies "never" or "almost never". Training and/or corporate reminders could help remind employees to remove of their cookies more often, but it should be noted that through Evercookies some trackers will still be able to piece together browsing

41 "Firefox & Blocking 3rdParty Cookies: What It Means For Affiliate Marketing." Marketing Land, 2013. http://marketingland.com/firefoxblocking3rdpartycookieswhatitmeansforaffiliatemarketing36485

25

http://marketingland.com/firefox-blocking-3rd-party-cookies-what-it-means-for-affiliate-marketing-36485

information. Modern browsers have, however, become better at allowing users to clear other storage vectors along with clearing cookies. Companies could provide users with a set of instructions by which the user could be most likely to successfully clear the locations where Evercookies are typically stored.

In order to limit tracking by search engines, employees should be encouraged to log out of personal accounts (or to use a separate browser) to conduct work searches. Google, in particular, also offers an option to delete your search history after the fact via their https://history.google.com/history/ portal. Users should be aware that, even once the direct connection to a user's identity is removed, search terms are still stored in the search engine's database, likely with the requesting IP address. This could be enough information to identify the company for which the employee works. Individuals should be aware of their search terms and seek to generalize searches to get meaningful results without directly revealing confidential information 4243

As mentioned previously, companies such as SafeShepard can be hired to investigate and remove the data stored by Data Brokers. 44

D. Incognito Mode

"Incognito" and similar "Private Browsing" modes have become popular in recent years. In short, these browsers offer to give the user some degree of privacy while searching the web. It is important for individuals to understand what these modes can and cannot do so they can tailor their behavior accordingly.

First, while private browsing mechanisms do create a separate session "unique" from general browsing, these sessions are still continuous, individual sessions. Tracking cookies can be placed by the parties a user interacts with during the private session, and these cookies in turn can track users as they travel across pages in private browsing mode.

Similarly, and surprisingly, Evercookies can still become lodged in Incognito mode, where they can respawn themselves automatically every time a new Incognito session

42 "How to protect your privacy on Google." USA Today, 2013. http://www.usatoday.com/story/tech/columnist/komando/2013/05/17/googlemapsduckduckgowebhistory/2155759/ 43 "Six Tips to Protect Your Search Privacy." Electronic Frontier Foundation, 2006. https://www.eff.org/wp/sixtipsprotectyoursearchprivacy 44 “SafeShepherd” https://www.safeshepherd.com/

26

https://history.google.com/history/

http://www.usatoday.com/story/tech/columnist/komando/2013/05/17/google-maps-duckduckgo-web-history/2155759/

http://www.usatoday.com/story/tech/columnist/komando/2013/05/17/google-maps-duckduckgo-web-history/2155759/

https://www.eff.org/wp/six-tips-protect-your-search-privacy

https://www.safeshepherd.com/

is launched. This can be seen by visiting the an Evercookie tool located at http://samy.pl/evercookie/. This website will place Evercookies in several of your browsers storage vectors, then attempt to show you how they are reactivated after cookies are cleared. By using this tool via Incognito mode, then closing the mode, it is easy to see that Evercookies can persist and track an individual across Incognito sessions.

Individuals should also be aware that systems they log into can still identify them while they are using private browsing. For example, Google searches performed while logged into a Google account will be associated with a user's search history despite the browser mode. From the servers perspective, it simply matters that the user is logged in. 45

Finally, private browsing does nothing to prevent an internet service provider such as Comcast or RCN from tracking the activity on their networks.

E. Opt-Out / Do Not Track

Despite well intentions, both literature review and personal experience seem to indicate that opting out and signing up for "do not track" efforts does little to prevent browser tracking. One study of cookie syncing found that "with all cookies allowed, the impact of Do Not Track... only reduced the number of domains involved in synchronization by 2.9% and the number of IDs being synced by 2.6%." and later went on to elaborate that "due to lack of industry enforcement, Do Not Track provides little practical protection against trackers." 46

This same paper went on to explain that "we did not observe any website that stopped collecting canvas fingerprints due to optout" and that "most companies offering or honoring opt outs... do not promise to stop tracking when a user opts out, but only behavioral advertising." In short, tracking companies often seem to ignore their own optout promises. While organizations who are members of coalitions such as the NAI are supposedly policed by their organizations, these actions clearly do not hold the majority of third parties to task.

45 “What Is the Advantage of Using an Incognito Window in Google Chrome?” OpposingViews. http://science.opposingviews.com/advantageusingincognitowindowgooglechrome18496.html 46 Acer et al., “The Web Never Forgets: Persistent Tracking Mechanisms in the Wild.” 2014. https://securehomes.esat.kuleuven.be/~gacar/persistent/the_web_never_forgets.pdf

27

http://samy.pl/evercookie/

http://science.opposingviews.com/advantage-using-incognito-window-google-chrome-18496.html


Even if optout was effective, it suffers from an underlying theoretical flaw. In order to optout from tracking by one particular website, a user must allow all thirdparty cookies in their browser so that the optout mechanism can identify who to ignore. By allowing all thirdparty cookies, a user directly enables other third parties who do not offer optout to engage in tracking behaviors. In short: it is impossible to opt out of one site without enabling tracking from the hundreds of other parties on the web.

The failure of optout policies is one place where the FTC has had success in enforcement. In 2011, Chitika settled an FTC lawsuit indicating that their optout only actually lasted for 10 days, deceiving customers. 47

VII. Conclusions

Our activities online are systematically tracked, cataloged, and used as a business asset by the third parties on pages we browse. This process occurs while we browse in our private lives, and also while we browse in the office. While direct harm to businesses stemming from this data gathering seems unlikely, the simple truth is that there is no business justification for employers to allow this "data leakage" to occur when much of it can easily be prevented.

Third parties who collect this data are often not restricted by laws and regulations, but instead of their own decisions regarding how they believe they should treat customer privacy. It is not clear to what extent the first parties who embed third parties pay attention to these promises. Several privacy policies examined explicitly stated that the first party takes no ownership over what the embedded third party does with data. Even when one third party takes steps to protect user privacy, other third parties embedded within the same page may not.

Companies should be aware of how data flows out of their systems passively, should actively decide when and if they care, and take action if needed. Several routes of "low hanging fruit" have been identified, including minor changes in training practices and the

47 “FTC Puts an End to Tactics of Online Advertising Company That Deceived Consumers Who Wanted to "Opt Out" from Targeted Ads.” FTC, 2011. https://www.ftc.gov/newsevents/pressreleases/2011/03/ftcputsendtacticsonlineadvertisingcompanydeceived

28

https://www.ftc.gov/news-events/press-releases/2011/03/ftc-puts-end-tactics-online-advertising-company-deceived

https://www.ftc.gov/news-events/press-releases/2011/03/ftc-puts-end-tactics-online-advertising-company-deceived

quick installation of browser addins that help stop tracking. While these routes are not perfect, they are likely adequate to move the existing "unlikely" risk into the realm of "negligible" risk.

Finally, it is in the interest of all parties to better understand whose "role" it is to protect users and businesses while they browse the web. Arguments could be made for regulators, web browsers, or individuals themselves to hold the ultimate responsibility. At the same time, third party tracking does provide legitimate business benefits to advertisers, and such it may play a role in enabling much of the free content that users consume on the web. Future work could study the balance between "personal information" exchanged for "free content" on the web.

Interesting future work could include better investigating if the prevalence of user tracking is in part a "market failure" due to the constant information asymmetry between tracker and browser. That is, if users were made aware of exactly how much they were being tracked while they were being tracked, they might make different decisions regarding the sites that they visit and the companies they interact with. This could then push first party content generators to rethink their partnerships and the privacy policies of the third parties they embed.

29

browser tracking, business risk? · browser tracking, business risk? datasci w231: legal and...

Documents