identity vs reputation

6
Identity vs Reputation What You Will Learn This paper covers individual identities on the WWW and how tracking users’ interactions can improve their experience without sacrificing their privacy. Introduction Web users have become increasingly savvy about protecting their identity and privacy. At the same time, web site operators have become savvy about amassing large amounts of customer data and finding trends to customize user experiences and offerings. To be successful at this, web site operators need to respect the privacy needs of their users while collecting the information they need to improve their business. Violation of users’ privacy can result in the loss of the customer as well as government intervention. This paper provides one approach to meeting these seemingly conflicting goals. Definitions Defining the terms around identity and privacy are of critical importance. Online Identity: A person’s distinct individual online persona. It usually doesn’t include any Personally Identifiable Information (PII) but consists of a shell – including a nickname and an avatar. Authentication: The process in which a person’s identity is confirmed online using a verifiable source to admit them into an online community or website. Verifiable Source: A verifiable source may be as simple as providing an email address, or may be as significant as providing a credit card number. Authorization: The process by which a person becomes approved to enter a website or program, usually with a user name and/or password. Personally Identifiable Information (PII): A term used in privacy and legal fields that refers to any information that can identify a person as a specific individual, such as name, postal or email address, phone number, occupation, or personal interests. It does not include web pages viewed or links clicked on, web search terms, time spent on a site, response to advertisements, or system settings such as the browser used, speed of connection and zip code. Sensitive Personal Information (SPI): Any information that would permit access to a person’s financial account, including account number, credit or debit card number, in combination with any required security code, access code or password.

Upload: spacemonkeymike

Post on 13-Jan-2015

366 views

Category:

Documents


2 download

DESCRIPTION

A CMSG white paper on user privacy

TRANSCRIPT

Page 1: Identity Vs Reputation

Identity vs Reputation

What You Will LearnThis paper covers individual identities on the WWW and how tracking users’ interactions can improve their experience without sacrificing their privacy.

IntroductionWeb users have become increasingly savvy about protecting their identity and privacy. At the same time, web site operators have become savvy about amassing large amounts of customer data and finding trends to customize user experiences and offerings. To be successful at this, web site operators need to respect the privacy needs of their users while collecting the information they need to improve their business. Violation of users’ privacy can result in the loss of the customer as well as government intervention. This paper provides one approach to meeting these seemingly conflicting goals.

DefinitionsDefining the terms around identity and privacy are of critical importance.

Online Identity: A person’s distinct individual online persona. It usually doesn’t include any Personally Identifiable Information (PII) but consists of a shell – including a nickname and an avatar.

Authentication: The process in which a person’s identity is confirmed online using a verifiable source to admit them into an online community or website.

Verifiable Source: A verifiable source may be as simple as providing an email address, or may be as significant as providing a credit card number.

Authorization: The process by which a person becomes approved to enter a website or program, usually with a user name and/or password.

Personally Identifiable Information (PII): A term used in privacy and legal fields that refers to any information that can identify a person as a specific individual, such as name, postal or email address, phone number, occupation, or personal interests. It does not include web pages viewed or links clicked on, web search terms, time spent on a site, response to advertisements, or system settings such as the browser used, speed of connection and zip code.

Sensitive Personal Information (SPI): Any information that would permit access to a person’s financial account, including account number, credit or debit card number, in combination with any required security code, access code or password.

Page 2: Identity Vs Reputation

| 2

Why Privacy is Important / Why it MattersThe rise of web based applications has made it easy for companies to determine information about their customers – ranging from their basic demographics to their personal preferences. While this information can be gathered explicitly through surveys, and forms, it can also be inferred through the user’s actions. The results may benefit the user in the end, but the method may make them uncomfortable and cause them to leave the website. The ultimate challenge is balancing the needs of both parties. At the end of the day though, privacy is measured by the end consumer’s reaction to their experience.

What The User RevealsIn order to meet the requirements of the Children’s Online Privacy Protection Act (COPPA) of 1998, users may be asked to enter their birthday to verify their eligibility to access certain content. Users are comfortable revealing this and other basic demographics in order to access many of their favorite sites. They do however make a conscious decision to limit what they reveal on a site to what they feel is necessary for the experience. When prompted for information that a user feels is unnecessary, they will typically provide incorrect information about such things as their birthday or gender.

At the same time, when it comes to social networking sites such as LinkedIn and Facebook, there is a social norm which causes people to reveal much more accurate information. When there are personal relationships involved, people feel compelled to provide their real birth date or gender information. When pictures can be uploaded, the accuracy of the basic information increases even more since deceptions are more likely to be uncovered.

Beyond the basic information, the accuracy of what users reveal about themselves is much more impacted by social status and peer pressure than anything else. Stereotypes can be readily found in individual profiles: for example, men expressing interest in action movies and sports, college students talking about parties, and women liking romantic movies.

Another area that causes concern for individuals is what they reveal from a financial perspective. As a result, they often provide false information. Beyond the basic PII information, users may misrepresent their financial status to boost their self-esteem or to assert themselves as a member of a particular group. Ironically, this is one piece of information that companies are most interested in to ensure that they target the right product to the right user.

The most useful information is what a user does when online. Some of the obvious examples are purchasing choices that are indicative of gender such as a purse or a wallet. More subtle ones come from participation in groups that have an obvious bias such as a retiree’s discussion group or a visit to an ecological travel site. These actions when combined with photos that a person may have on his/her Flikr account or messages posted to online discussion groups can provide a more complete understanding of an individual.

What is significant here is that the information doesn’t have to come from the user directly. The ability of Facebook users to tag a photo with all the people in it means that this information can be made available without the user taking any action.

Page 3: Identity Vs Reputation

| 3

Historical MistakesThe stakes are huge for companies to get identity and privacy right. Over the past few years, a number of high-profile incidents where PII or SPI was accidentally revealed to the public have been broadly publicized.

In 2006, America On-Line released the records of 20 million search keywords from approximately 650,000 of its users done over a three-month period. While the users were not personally identified, per se, their searches contained a wealth of PII. Within only a few days, New York Times journalists had determined the identities of many of the searchers, and with permission, revealed the identity of one of the users.1 That user, a 62-year-old Georgia woman, had conducted over 300 searches that were traced back to her, some of which were embarrassing to her. The AOL incident was devastating to the company.

Similarly in 2006, Netflix released over 100 million movie ratings made by 500,000 of the company’s subscribers. To protect its customers’ privacy, the data was made anonymous by removing any personal details. Only a few weeks later, Arvind Narayanan and Vitaly Shmatikov announced that they had de-anonymized the data by comparing the data with publicly available ratings on a movie database called the Internet Movie Database2.

Most recently, Facebook faced an uproar of criticism over its Beacon advertising program which pulls information from external websites and shares that information with Facebook users’ friends. Controversy swiftly followed Beacon’s launch over privacy concerns because the mechanism to opt-in or out of program was not clear. Fortunately for Facebook, the concern over Beacon did not doom the program. In fact, it continues to operate today, but with a higher level of control given to end users to permit the sharing of their information.

Customer BenefitsWhile data gathering primarily provides feedback to advertisers and content providers about trends and product interest, it also provides a significant benefit to all users. When users express similar interests, content providers can respond by creating new products or modifying old products to meet the newly discovered interests.

This is most evident in the local grocery store which pays extremely close attention to the aggregate buying habits of their customers in order to ensure that the right products are always on the shelves. No company wants to create a product for a single user or even track the habits of one person. They are looking for

a solution that will maximize profits through the largest audience possible.

At the same time, users expect more of a personalized, intuitive experience. It is when the user has a perception of value for what they reveal that they will really see an improvement in their experience. Users are willing to let Amazon track their purchasing habits because they get better recommendations as a result. They provide accurate rating to Netflix in order to improve the quality of the movies that it suggests to them. The key to all of this is making it obvious to the user that they are the ones benefiting.

With this in mind it is important for web site operators to remember that the personal data belongs to the end user. If the user perceives sufficient value for providing the information they will readily reveal it. By forcing users to reveal information that they are not ready to, they will either provide inaccurate information or choose to go elsewhere – in either case the only loss is to the web site polluting their trend data or losing the

customer. A better approach would be to allow the user to clearly retain their privacy and stay with the site and opt for lesser quality recommendations.

Eventually when the user hears about or otherwise realizes the value of the sharing, they will gladly provide accurate information. In return these users expect that information to be kept private. It is when this trust is broken that users will react – when this reaction becomes an uproar, the government gets involved and creates new laws to ensure that the privacy is protected.

M

onito

r R

esults Customer Listening

D

efine Goals

Best P

ractices

Metrics

Customer DrivenQuality Improvement

Process

Page 4: Identity Vs Reputation

| 4

Privacy Laws and Reactions Privacy laws in the United States and across the globe are inconsistent and continue to evolve. In contrast to the European Union, in the United States there is no over-arching privacy law in place. Instead, the United States takes a more laissez-faire approach that targets specific sectors, relying on a combination of legislation, regulation, and self-regulation. For example, U.S. laws are in place to address medical privacy, financial institution privacy and children’s privacy.

The EU has a comprehensive law4 reflecting the EU’s philosophy that while data processing is beneficial, an individual’s fundamental privacy rights must be protected. Many consider the EU to have the most restrictive privacy laws of any jurisdiction worldwide. Importantly, the EU regulations are implemented by each individual member state, which has lead to different interpretations and governing regulations.

While privacy has historically been given low priority in Asia, economic concerns—in particular, the desire to establish consumer trust in online commerce—have driven a surge in privacy there (see “Asia: the new Thought Leader in

Privacy?”). The Asia Pacific Economic Cooperation group (APEC) approved a set of non-binding privacy principles to assist governments in passing comprehensive privacy legislation in 2004. In contrast,

Central American nations tend to take a sectoral approach to privacy laws.5 6

Privacy laws and regulations continue to react to the marketplace, with new technologies and processes leading to more stringent regulation. For instance, the recent emergence of behavioral targeting has raised the ire of privacy regulators. Service providers along with two companies, Phorm in the UK, and Nebuad, in the US, have recently found themselves embroiled in controversy over plans to target customers with advertisements based on their prior web surfing behaviors.7 8 Both companies planned to install deep packet inspection equipment on ISP networks that would monitor subscribers’ online activities, build behavioral profiles, and sell the profiles to advertisers who could use the profiles to deliver targeted ads.

Privacy regulators in the EU and United States questioned whether the companies obtained informed consent from end users. BT deployed its system without the knowledge of affected users. European Union Communications Commissioner, Viviane Reding, voiced her concern that the practice breached the EU Privacy and Electronic Communications Regulations 2003 (PECR)—which implement European Directives on wiretapping—saying “[i]t is very clear in E.U. directives that unless someone specifically gives authorization (to track consumer activity on the Web) then you don’t have the right to do that.”

Charter Communications notified affected customers, who could opt out of the program. However, public interest groups claimed the opt-out system did not prevent users’ activities from being monitored. Two members of the United States House of Representatives wrote a letter to Charter expressing their concern that “[a]ny service to which a subscriber does not affirmatively subscribe and that can result in the collection of information about the web-related habits and interests of a subscriber, and achieves any of these results with the ‘prior written or electronic consent of the subscriber,’ raises substantial questions related to Section 631 [of the Communications Act].” Behavioral targeting has advanced over the years to provide a much more complete view of users’ behaviors. While, the behavioral targeting industry has attempted to educate consumers on the benefits of having content tailored to individuals, there are still many concerns over transparency, the ability to easily opt-out, and how opt-out data is discarded. As a result, regulator and lawmakers have proposed legislation and regulation to address the privacy concerns around behavioral targeting.

Page 5: Identity Vs Reputation

| 5

Stratification It is possible for the industry to meet all of these regulations, the desires of a company, and the needs of the user by taking a layered approach to the information about a user and the collective actions of a community. To accomplish this the concept of who a user is can be broken down into three levels.

Identity – Used for authentication and authorization1.

Profile (or Persona) – Used to describe an individual2.

Uniquity – Unique identifier used to collect actions3.

IdentityAt the highest level is a user’s identity. This is how a user says “they are who they say they are”. It is often represented as a combination of a userid and a password, but also can be authenticated through identification cards, biometrics, certificates, encryption keys, or other security mechanisms. To a user, this is the most valuable thing that they have because if someone else gets it from them, the user stands to lose a lot. Given their value, these authentication credentials are a common target of Phishing attacks.

This identity is often shared among multiple web sites – particularly when the default identity a site depends upon is an email id to which they send a verification message which requires no 3rd party involvement. The advent of OpenID technologies ensures that a user can use a common identity to access multiple sites. One weakness in using an email address/password combination to authenticate a user is that any compromised site may lead to a user’s identity being compromised on multiple sites.

To a web site operator, the identity portion has very little value other than to authenticate the user. However, protecting it requires attention to security of not just the data but the actual mechanisms of authentication. This is necessary to give confidence to the end user that their personal information can’t be compromised.

ProfileGiven the identity, the user has access to their profile(s) where all of their PII and information about their friends, interests, groups and preferences are stored. While this information is still valuable to the user, if it is hacked, someone can impersonate the user with the amount of risk based on how much SPI is taken.

It is important to note that the relationship between an Identity and a Profile is one way. Given an Identity, it is possible to determine a profile, but starting from a profile does not yield the credentials that the user gave to create it. This one-way relationship also works in that a user may have multiple profiles based on the situation that they are in. For example, the user may choose to have a different public name or picture on a team’s fan site verses when they are on a cooking related site.

The amount of information that they reveal in their profile can vary from site to site based on the user’s perceived value from the site. Furthermore, as a user creates multiple profiles for the different sites that they want to participate in, it is incumbent on the user to keep them in sync.

UniquityAt the lowest level we propose the concept of uniquity that represents a collection of a user’s actions. It is important that it does not contain any PII. What it does contain is a collection of actions that an anonymous user has taken. Like the relationship between the Identity and the Profile, you cannot get

Page 6: Identity Vs Reputation

| 6

back to the profile from uniquity. To a user, this has the least value. If someone gets a copy of the uniquity, the best that can be done is to imitate a random user.

Implementing this requires attention to detail in ensuring that all of the PII is completely separated out from the actions and that the same one-way relationship is established. It also means that algorithms should operate on the clearly observed behaviors instead of the public face that they user has put up.

Large scale trend analysis of uniquity reveals interests at large. A web site operator or a content producer can get a clear understanding of the interests of the community without breaching the privacy of those individuals. These trends even allow them to take into consideration those users who chose to remain anonymous.

Analyzing uniquity, which ignores identity and profiles, provides benefits for the publisher while protecting users’ privacy, and providing the clear benefit of recommendations targeted at them. If they choose to remain anonymous, the quality of these recommendations is limited to the current session that they are in and the minimal information that they have chosen to provide.

For the web site operator, beyond the benefits of complying with all government regulations, it also makes it easier to provide a custom experience that users come back to.

Conclusion By carefully separating out the levels of information that are stored for a user, it is possible to meet even the strictest of government regulations while offering a clear value to the end user.

© 2008 Cisco Systems, Inc. All rights reserved.

Cisco has more than 200 offices worldwide. Addresses, phone numbers, and fax numbers are listed on the Cisco Website at www.cisco.com/go/offices.

CCDE, CCENT, Cisco Eos, Cisco HealthPresence, Cisco Lumin, Cisco Nexus, Cisco StadiumVision, Cisco TelePresence, Cisco WebEx, the Cisco logo, DCE, and Welcome to the Human Network are trademarks; Changing the Way We Work, Live, Play, and Learn and Cisco Store are service marks; and Access Registrar, Aironet, AsyncOS, Bringing the Meeting To You, Catalyst, CCDA, CCDP, CCIE, CCIP, CCNA, CCNP, CCSP, CCVP, Cisco, the Cisco Certified Internetwork Expert logo, Cisco IOS, Cisco Press, Cisco Systems, Cisco Systems Capital, the Cisco Systems logo, Cisco Unity, Collaboration Without Limitation, EtherFast, EtherSwitch, Event Center, Fast Step, Follow Me Browsing, FormShare, GigaDrive, HomeLink, Internet Quotient, IOS, iPhone, iQuick Study, IronPort, the IronPort logo, LightStream, Linksys, MediaTone, MeetingPlace, MeetingPlace Chime Sound, MGX, Networkers, Networking Academy, Network Registrar, PCNow, PIX, PowerPanels, ProConnect, ScriptShare, SenderBase, SMARTnet, Spectrum Expert, StackWise, The Fastest Way to Increase Your Internet Quotient, TransPath, WebEx, and the WebEx logo are registered trademarks of Cisco Systems, Inc. and/or its affiliates in the United States and certain other countries.

All other trademarks mentioned in this document or website are the property of their respective owners. The use of the word partner does not imply a partnership relationship between Cisco and any other company. (0812R)

Americas HeadquartersCisco Systems, Inc.San Jose, CA

Asia Pacific HeadquartersCisco Systems (USA) Pte. Ltd.Singapore

Europe HeadquartersCisco Systems International BVAmsterdam, The Netherlands