big data - a critical appraisal

+

Big DataA critical appraisal

Thomas [email protected]

Bart [email protected]

2+Outline

The wonders of Big Data

The Perils of Big Data

User Experiments

A Note on Privacy

+

The Wonders of Big DataHow Big Data will put the personal backin e-commerce

4+Large vs small datasets

Everything is significant!

Data from most/all of your customers More than just an educated guess This is what really happens!

Large datasets can improve business intelligence

+ 5

The Netflix challenge

Recommendations seen as Netflix’ strongest asset

2006-2009

$1M prize if 10% better than Netflix’s Moviematch

Data: 18k movies, 500k users, 100M ratings

6+The Netflix challenge

Netflix’s rational: “Improve our ability to connect people to the movies they

love” Improve recommendations = improve satisfaction and

retention Small R&D team, slow progress $1M will pay for itself

Based on Padhraic Smyth’s report at http://www.ics.uci.edu/~smyth/courses/cs277/slides/netflix_overview.pdf

7+Matrix approximation

Distinguish noise from signal: variance and eigenvalues

Singular value decomposition Ratings(m*n) = U(m*n) E(n*n) V(n*n)

Rank-k approximation Ratings(m*n) ≈ U(m*k) E(k*k) V(k*n)

Ratings =

n movies n moviesk

m u

sers

U

V

m u

sers kE

kk

8independent, quirky, critically acclaimed

mainstream,formulaic

Lowbrow comedies,Horror,Male or adolescent audience

Drama, serious comedy,Strong femalelead

[Koren et al. 2009]

Plot of V with k=2

+ 9

Bias is information

[Smyth 2010]

10+Take-aways

Matrix decomposition Meaningful movie categories! For example: lowbrow, quirky, indie, strong female lead

Older movies are rated higher So ...? Should recommend older movies more often or less often? Why are they rated higher?

+

The Perilsof Big DataHow overfitting and a lack of domain knowledge can lead to suboptimal solutions

12+What about random?

“We were demonstrating our new recommender to a client. They were amazed by how well it predicted their preferences!”

“Later we found out that we forgot to activate the algorithm: the system was giving completely random recommendations.”

+ 13

Tradeoffs

14+Model complexity

“Our winning entries consist of more than 100 different predictor sets” [Koren et al 2009]

Only 10% better than Netflix Why?

Intrinsic noise Example: children watch cartoons, Mum is recommended

cartoons Should Netflix implement a “switch user” feature? Domain knowledge!

15+More gotchas

Obvious truisms and correlation fallacies Still present in large datasets Domain knowledge!

Overfitting: simple models that make sense vs complex models that fit the data

+

User ExperimentsHow user evaluations can be used to create meaningful experiences

17+Offline evaluations

Calibration/Evaluation Gather rating data Remove 10% of the ratings of each user Optimize the algorithm to predict those 10%

Execution Predict the rating of unknown items Recommend items with highest predicted rating

+ 18

Offline evaluations

Problems Offline evaluations may

not give the same outcome as online evaluations (Cosley et al., 2002; McNee et al., 2002)

Higher rating does not mean good recommendation (McNee et al., 2006)

The algorithm counts for only 5% of the relevance of a recommender system (Francisco Martin, 2009)

Solutions Test with real users

(A/B testing)

Consider other behaviors(consumption, retention)

A/B test other aspects(interaction, presentation)

http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html

+ 19

Online evaluations

Testing a recommender against a random videoclip system (A/B test) Expectation: Consumption

will increase Reality: The number of

clicked clips and total viewing time went down!

Insight: Recommender is more effective More clips watched from

beginning to end Users browse less,

consume more

20+Behavior vs Questionnaires

Behavior is hard to interpret Relationship between behavior and satisfaction is not

always trivial

Questionnaires are a better predictor of long-term retention With behavior only, you will need to run for a long time

Questionnaire data is more robust Fewer participants needed

21+A guide to user experiments

“Is my system good?” What does good mean? We need to define measures

“Does my system score high on this satisfaction scale?” What does high mean? We need to compare it against something

“Does my system score higher than this other system?” Say we find that it scores higher on satisfaction... why does

it? Apply the concept of ceteris paribus

http://bit.ly/recsys2011short http://bit.ly/recsystutorialhandout

+ 22

An example…

We compared three recommender systems Three different algorithms

System effectiveness scale: The system has no real

benefit for me. I would recommend the

system to others. The system is useful. I can save time using the

system. I can find better TV

programs without the help of the system.

+ 23

An example…

The mediating variables tell the entire story

24+An example…

+

A Note on PrivacyHow to avoid this looming dangerof our Big Data future

+ 26

Personalization… with control

27+Privacy concerns

Second Netflix challenge

Anonymized dataset

Lawsuit from Californian closeted lesbian Mum

Netflix withdraws their second challenge

http://arstechnica.com/tech-policy/2012/07/class-action-lawsuit-settlement-forces-netflix-privacy-changes/

+ 28

Privacy directive

Transparency “companies should

provide clear descriptions of [...] why they need the data, how they will use it”

Informed consent

Control “companies should offer

consumers clear and simple choices [...] about personal data collection, use, and disclosure”

User empowerment

29+Transparency Paradox

30+Control Paradox

“bewildering tangle of options” (New York Times, 2010)

“labyrinthian controls” (U.S. Consumer Magazine, 2012)

Researchers asked: “what do your privacy settings mean?” 86% of Facebook users got it wrong!

31+Control Paradox

Introducing an “extreme” sharing option Nothing - City - Block Add the option Exact

Expected: Some will choose Exact

instead of Block

Unexpected: Sharing increases

across the board!

http://bit.ly/chi2013privacy

B

N

privacy

benefits

C

E

32+Bounded rationality

ABCD

25%37%53%0%

????

+ 33

Idea: nudging

People do not always choose what is best for them

Idea: use defaults to “nudge” users in the right direction

34+What is the right direction?

“More information = better, e.g. for personalization” Techniques to increase disclosure cause reactance in the

more privacy-minded users

“Privacy is an absolute right“ More difficult for less privacy-minded users to enjoy the

benefits that disclosure would provide

+ 35

It depends on the user!

“What is best for consumers depends upon characteristics of the consumer

An outcome that maximizes consumer welfare may be suboptimal for some consumers in a context where there is heterogeneity in preferences” (Smith, Goldstein & Johnson, 2009)

36+Privacy Adaptation Procedure

Idea: Personalize users’ privacy settings! Automatic defaults in line with “disclosure profile” Using big data to improve big data privacy

Relieves some of the burden of the privacy decision: The right privacy-related information The right amount of control

“Realistic empowerment”

http://bit.ly/privdim

+

Conclusions

The wonders of Big Data

Big Data can be used to create powerful personalized e-commerce experiences

The Perils of Big Data

Big Data solutions will only work if the developers have an adequate amount of domain knowledge

User Experiments

Big Data solutions need to be tested on real users, with a focus on user experience

A Note on Privacy

Big Data can raise privacy concerns, but it can at the same time be used to alleviate these concerns

+

Questions?

The wonders of Big Data Big Data can be used to create

powerful personalized e-commerce experiences

The Perils of Big Data Big Data solutions will only work if

the developers have an adequate amount of domain knowledge

User Experiments Big Data solutions need to be

tested on real users, with a focus on user experience

A Note on Privacy Big Data can raise privacy

concerns, but it can at the same time be used to alleviate these concerns

big data - a critical appraisal

Education