privacy implications of online data collection
DESCRIPTION
Privacy Implications of Online Data Collection. DIMACS Workshop. Lorrie Faith Cranor AT&T Labs-Research http://www.research.att.com/~lorrie/. Recent headlines. Activists charge DoubleClick double cross. Websites Pull Back From Doubleclick. Doubleclick shelves plan to tag Web surfers. - PowerPoint PPT PresentationTRANSCRIPT
Privacy Privacy Implications of Implications of
Online Data Online Data CollectionCollectionLorrie Faith Cranor
AT&T Labs-Research
http://www.research.att.com/~lorrie/
DIMACS WorkshopDIMACS Workshop
2
Recent headlinesRecent headlines
Doubleclick shelves plan to tag Web surfers
Clinton Issues Privacy Warning To Technology Leaders
Websites Pull Back From Doubleclick
Senators Raise Privacy Issue In AOL-Time Warner Hearing
Activists charge DoubleClick double cross
3
Online profiling in the Online profiling in the comics!comics!
Cathy March 1, 2000
4
How do they get my data?How do they get my data? Browsers advertise
IP address, domain name, organization, referring pageplatform: O/S, browser which information is requested
Information available toend serverslocal system administratorsother third parties (e.g., doubleclick.com)
Cookies, Web bugs, advertising networks
5
Browsers like to chatterBrowsers like to chatter
A typical HTTP requestGET http://www.amazon.com/ HTTP/1.0User-Agent: Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4m)Host: www.amazon.comReferer: http://www.alcoholics-anonymous.org/Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*Cookie: session-id-time=868867200; session-id=6828-2461327-
649945; group_discount_cookie=F
6
Servers record what they Servers record what they hearhear
Server logsstore host, time, date, requested URL, referrerppp.bu.edu - - [09/Dec/1996:20:33:22 -500] “Get /cgi-bin/wwwais?hemoglobin+geneHTTP/1.0” 200 527
affiliation: Boston University, probably working from home, probably student or faculty in biology
7
What about cookies?What about cookies? Cookies can be useful
used like a staple to attach multiple parts of a form together
used to identify you when you return to a web site so you don’t have to remember a password
used to help web sites understand how people use them
Cookies can be harmfulused to profile users and track their activities,
especially across web sites
8
YOU
Searchengine
Ad
Search formedical
information
BookStore
Ad
Buy book
Ad companycan get yourname and
address frombook order and
link them to your search
Readcookie
Setcookie
9
Referer log problemsReferer log problemsGET methods result in values in URLThese URLs are sent in the referer
header to next hostExample: http://www.merchant.com/cgi_bin/order?
name=Tom+Jones&address=here+there&credit+card=234876923234&PIN=1234& -> index.html
10
What DoubleClick knows…What DoubleClick knows…… about Richard M. Smith Personal data:
My Email address My full name My mailing address (street, city, state, and Zip code) My phone number
Transactional data: Names of VHS movies I am interesting in buying Details of a plane trip Search phrases used at search engines Health conditions
11
No clicks requiredNo clicks required
“It was not necessary for me to click on the banner ads for information to be sent to DoubleClick servers.”
– Richard M. Smithhttp://www.tiac.net/users/smiths/privacy/banads.htm
12
DoubleClick examplesDoubleClick examplesAltaVista Yellow Pages – Complete home address (Fixed January 2000)
Banner ad URL: http://live.av.com/scripts/search.dll?ep=7&gca=address&orderby=distance&sstreet=172+mason+terr&scity=brookline&sstate=MA&szip=02446&scountry=USA&query=sinsa&qname=&sic=&ck=&userid=130782922&userpw=.&uh=130782922,0,&ccity=brookline&cstate=MA&ver=hb1.2.2
Travelocity – Email address
Referring URL: http://dps1.travelocity.com/[email protected]
13
Merging online and offline Merging online and offline datadata
In mid-February DoubleClick announced plans to merge “anonymous” online data with personal information obtained from offline databases
By the first week in March the plans were put on hold
14
Public concernPublic concernApril 1997 Louis Harris Poll of Internet
users5% say they have been the victim of an
invasion of privacy while on the Internet53% say they are concerned that information
about which sites they visit will be linked to their email address and disclosed without their knowledge
See also “Beyond Concern” study:http://www.research.att.com/projects/privacystudy/
15
International issuesInternational issuesEuropean Union Data Directive
prohibits secondary uses of data without informed consentCreating personally-identifiable online profiles
will have to be opt-in in most casesUpfront notice must be given when data is
collected – no web bugsNo transfer of data to non-EU countries unless
there is adequate privacy protection
16
Children's issuesChildren's issuesChildren’s Online Privacy Protection
Act (COPPA) requires parental consent before collecting personally-identifiable data from children online
17
SubpoenasSubpoenasData on online activities is increasingly
of interest in civil and criminal casesThe only way to avoid subpoenas is to
not have dataYour files on your computer in your
home have much greater legal protection that your files stored on a server on the network
18
Privacy concernsPrivacy concerns Data is often collected silently
Web allows lots of data to be collected easily, cheaply, unobtrusively and automatically
Individuals not given meaningful choice
Data from many sources may be mergedEven non-identifiable daa can become identifiable
when merged
Data collected for business purposes may be used in civil and criminal proceedings
19
Some solutionsSome solutionsPrivacy policiesVoluntary guidelines and codes of
conductSeal programsInfomediariesTechnologies for facilitating notice and
choiceP3P
20
P3P1.0 – A First StepP3P1.0 – A First StepOffers an easy way for web sites to
communicate about their privacy policies in a standard machine-readable formatCan be deployed using existing web servers
This will enable users to use tools that:Display symbols, play sounds, or provide
snapshots of sites’ policiesDisplay symbols or prompts after comparing
policies with user preferences
21
P3P is a Partial SolutionP3P is a Partial Solution P3P1.0 helps users understand privacy
policies but is not a complete solution Seal programs and regulations help ensure
that sites comply with their policies Anonymity tools reduce the amount of
information revealed while browsing Encryption tools secure data in transit and
storage Laws and codes of practice provide a base
line level for acceptable policies
22
Implementing a P3P 1.0 Implementing a P3P 1.0 ServerServer
Formulate privacy policy Translate privacy policy into P3P format Place P3P policy on web site
One policy for entire site or multiple policies for different parts of the site
Associate policy with web resources: Configure server to insert P3P header with link to
P3P policy; or Insert link to P3P policy in HTML content
23
A simple HTTP transactionA simple HTTP transactionWeb
ServerGET /x.html HTTP/1.1Host: foo.com. . . Request web page
HTTP/1.1 200 OKContent-Type: text/html. . . Send web page
24
HTTP/1.1 200 OKContent-Type: text/html. . . Send web page
A simple HTTP transactionA simple HTTP transactionWeb
ServerWith P3P 1.0 added
GET /x.html HTTP/1.1Host: foo.com. . . Request web page
HTTP/1.1 200 OKOpt: http://www.w3.org/2000/P3Pv1/; ns=1111-Policy: http://foo.com/p3p.xmlContent-Type: text/html. . . Send web page
GET /p3p.xml HTTP/1.1Host: foo.com. . . Request P3P Policy
HTTP/1.1 200 OK. . . Send P3P Policy
25
Implementing a P3P1.0 Implementing a P3P1.0 ClientClient
Client can be implemented as browser, proxy, plugg-in, part of an electronic wallet, java applet, javascript, etc.Can be entirely server side
Look for link to P3P policy and fetch policy with HTTP GET request
Parse policy and take appropriate actionDisplay symbol, play sound, prompt user, etc.Action can optionally be based on user preferencesAction can optionally allow data to be automatically
filled into form or transferred from electronic wallet
26
Some P3P Client IdeasSome P3P Client Ideas Symbols for how data is
used complete transaction R&D Customization marketing
Symbols to indicate whether data is shared
Symbols to indicate site has privacy seal
Symbols to indicate compliance with laws and regulations complies with German law complies with German law
if user gives informed consent
does not comply with German law
Symbols to indicate match/mismatch with user preferences information about cause of
mismatch on mouse-over
27
P3P PoliciesP3P Policies Machine-readable (XML) version of web site
privacy policies Use P3P Vocabulary to express data
practices Use P3P Base Data Set to express type of
data collected Capture common elements of privacy policies
but may not express everything (sites may provide further explanation in human-readable policies)
28
The P3P VocabularyThe P3P Vocabulary Who is collecting data? What data is collected? For what purpose will
data be used? Is there an ability to
change preferences about (opt-in or opt-out) of some data uses?
Who are the data recipients (anyone beyond the data collector)?
To what information does the data collector provide access?
What is the data retention policy?
How will disputes about the policy be resolved?
Where is the human-readable privacy policy?
29
Example Privacy PolicyExample Privacy PolicyTheCoolCatalog of 123 Main Street, Bethesda, MD 20814, USA, makes
the following statement for the Web page at http://www.TheCoolCatalog.com/catalog/. We have a privacy seal from PrivacySeal.org. Our privacy policy is posted at http://www.TheCoolCatalog.com/PrivacyPractice.html. We do not provide access capabilities to information we have about you.
We use cookies and collect your gender, information about your clothing preferences, and (optionally) your home address to customize our entry catalog pages and for our own research and product development. We retain this information indefinitely.
We also maintain server logs that include information about visits to the http://www.TheCoolCatalog.com/catalog/ page, and the types of browsers our visitors use. We use this information in order to maintain and improve our web site. We retain this information indefinitely.
P3P/XML EncodingP3P/XML Encoding<POLICY xmlns="http://www.w3.org/2000/P3Pv1" entity=“TheCoolCatalog, 123 Main Street, Bethesda, MD 20814, USA"> <DISPUTES-GROUP><DISPUTES resolution-type="independent" service="http://www.PrivacySeal.org" description="PrivacySeal.org" image="http://www.PrivacySeal.org/Logo.gif"/></DISPUTES-GROUP> <DISCLOSURE discuri="http://www.TheCoolCatalog.com/PrivacyPractice.html" access="none"/> <STATEMENT> <CONSEQUENCE-GROUP><CONSEQUENCE>a site with clothes you would appreciate</CONSEQUENCE></CONSEQUENCE-GROUP> <RECIPIENT><ours/></RECIPIENT> <PURPOSE><custom/><develop/></PURPOSE> <RETENTION><indefinitely/></RETENTION> <DATA-GROUP> <DATA name="dynamic.cookies" category="state"/> <DATA name="dynamic.miscdata" category="preference"/> <DATA name="user.gender"/> <DATA name="user.home." optional="yes"/> </DATA-GROUP> </STATEMENT> <STATEMENT> <RECIPIENT><ours/></RECIPIENT> <PURPOSE><admin/><develop/></PURPOSE> <RETENTION><indefinitely/></RETENTION> <DATA-GROUP> <DATA name="dynamic.clickstream.server"/> <DATA name="dynamic.http.useragent"/> </DATA-GROUP> </STATEMENT></POLICY>
31
PrivacyBank.ComPrivacyBank.Com PrivacyBankbookmark
32
Infomediary example: PrivacyBank
PrivacyBankbookmark
33
ChallengeChallenge Data is useful for research, targeting potential
customers, building relationships with customers, etc.
Privacy laws make data collection more difficult
Data collectors have personal privacy concerns too
How can we collect data in ways that reduce privacy concerns while remaining useful for research and business?