big data - a critical appraisal
DESCRIPTION
Invited talk by Bart Knijnenburg and Thomas Debeauvais at the IIBA OC dinner meetingTRANSCRIPT
2+Outline
The wonders of Big Data
The Perils of Big Data
User Experiments
A Note on Privacy
+
The Wonders of Big DataHow Big Data will put the personal backin e-commerce
4+Large vs small datasets
Everything is significant!
Data from most/all of your customers More than just an educated guess This is what really happens!
Large datasets can improve business intelligence
+ 5
The Netflix challenge
Recommendations seen as Netflix’ strongest asset
2006-2009
$1M prize if 10% better than Netflix’s Moviematch
Data: 18k movies, 500k users, 100M ratings
6+The Netflix challenge
Netflix’s rational: “Improve our ability to connect people to the movies they
love” Improve recommendations = improve satisfaction and
retention Small R&D team, slow progress $1M will pay for itself
Based on Padhraic Smyth’s report at http://www.ics.uci.edu/~smyth/courses/cs277/slides/netflix_overview.pdf
7+Matrix approximation
Distinguish noise from signal: variance and eigenvalues
Singular value decomposition Ratings(m*n) = U(m*n) E(n*n) V(n*n)
Rank-k approximation Ratings(m*n) ≈ U(m*k) E(k*k) V(k*n)
Ratings =
n movies n moviesk
m u
sers
U
V
m u
sers kE
kk
8independent, quirky, critically acclaimed
mainstream,formulaic
Lowbrow comedies,Horror,Male or adolescent audience
Drama, serious comedy,Strong femalelead
[Koren et al. 2009]
Plot of V with k=2
+ 9
Bias is information
[Smyth 2010]
10+Take-aways
Matrix decomposition Meaningful movie categories! For example: lowbrow, quirky, indie, strong female lead
Older movies are rated higher So ...? Should recommend older movies more often or less often? Why are they rated higher?
+
The Perilsof Big DataHow overfitting and a lack of domain knowledge can lead to suboptimal solutions
12+What about random?
“We were demonstrating our new recommender to a client. They were amazed by how well it predicted their preferences!”
“Later we found out that we forgot to activate the algorithm: the system was giving completely random recommendations.”
+ 13
Tradeoffs
14+Model complexity
“Our winning entries consist of more than 100 different predictor sets” [Koren et al 2009]
Only 10% better than Netflix Why?
Intrinsic noise Example: children watch cartoons, Mum is recommended
cartoons Should Netflix implement a “switch user” feature? Domain knowledge!
15+More gotchas
Obvious truisms and correlation fallacies Still present in large datasets Domain knowledge!
Overfitting: simple models that make sense vs complex models that fit the data
+
User ExperimentsHow user evaluations can be used to create meaningful experiences
17+Offline evaluations
Calibration/Evaluation Gather rating data Remove 10% of the ratings of each user Optimize the algorithm to predict those 10%
Execution Predict the rating of unknown items Recommend items with highest predicted rating
+ 18
Offline evaluations
Problems Offline evaluations may
not give the same outcome as online evaluations (Cosley et al., 2002; McNee et al., 2002)
Higher rating does not mean good recommendation (McNee et al., 2006)
The algorithm counts for only 5% of the relevance of a recommender system (Francisco Martin, 2009)
Solutions Test with real users
(A/B testing)
Consider other behaviors(consumption, retention)
A/B test other aspects(interaction, presentation)
http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html
+ 19
Online evaluations
Testing a recommender against a random videoclip system (A/B test) Expectation: Consumption
will increase Reality: The number of
clicked clips and total viewing time went down!
Insight: Recommender is more effective More clips watched from
beginning to end Users browse less,
consume more
20+Behavior vs Questionnaires
Behavior is hard to interpret Relationship between behavior and satisfaction is not
always trivial
Questionnaires are a better predictor of long-term retention With behavior only, you will need to run for a long time
Questionnaire data is more robust Fewer participants needed
21+A guide to user experiments
“Is my system good?” What does good mean? We need to define measures
“Does my system score high on this satisfaction scale?” What does high mean? We need to compare it against something
“Does my system score higher than this other system?” Say we find that it scores higher on satisfaction... why does
it? Apply the concept of ceteris paribus
http://bit.ly/recsys2011short http://bit.ly/recsystutorialhandout
+ 22
An example…
We compared three recommender systems Three different algorithms
System effectiveness scale: The system has no real
benefit for me. I would recommend the
system to others. The system is useful. I can save time using the
system. I can find better TV
programs without the help of the system.
+ 23
An example…
The mediating variables tell the entire story
24+An example…
+
A Note on PrivacyHow to avoid this looming dangerof our Big Data future
+ 26
Personalization… with control
27+Privacy concerns
Second Netflix challenge
Anonymized dataset
Lawsuit from Californian closeted lesbian Mum
Netflix withdraws their second challenge
http://arstechnica.com/tech-policy/2012/07/class-action-lawsuit-settlement-forces-netflix-privacy-changes/
+ 28
Privacy directive
Transparency “companies should
provide clear descriptions of [...] why they need the data, how they will use it”
Informed consent
Control “companies should offer
consumers clear and simple choices [...] about personal data collection, use, and disclosure”
User empowerment
29+Transparency Paradox
30+Control Paradox
“bewildering tangle of options” (New York Times, 2010)
“labyrinthian controls” (U.S. Consumer Magazine, 2012)
Researchers asked: “what do your privacy settings mean?” 86% of Facebook users got it wrong!
31+Control Paradox
Introducing an “extreme” sharing option Nothing - City - Block Add the option Exact
Expected: Some will choose Exact
instead of Block
Unexpected: Sharing increases
across the board!
http://bit.ly/chi2013privacy
B
N
privacy
benefits
C
E
32+Bounded rationality
ABCD
25%37%53%0%
????
+ 33
Idea: nudging
People do not always choose what is best for them
Idea: use defaults to “nudge” users in the right direction
34+What is the right direction?
“More information = better, e.g. for personalization” Techniques to increase disclosure cause reactance in the
more privacy-minded users
“Privacy is an absolute right“ More difficult for less privacy-minded users to enjoy the
benefits that disclosure would provide
+ 35
It depends on the user!
“What is best for consumers depends upon characteristics of the consumer
An outcome that maximizes consumer welfare may be suboptimal for some consumers in a context where there is heterogeneity in preferences” (Smith, Goldstein & Johnson, 2009)
36+Privacy Adaptation Procedure
Idea: Personalize users’ privacy settings! Automatic defaults in line with “disclosure profile” Using big data to improve big data privacy
Relieves some of the burden of the privacy decision: The right privacy-related information The right amount of control
“Realistic empowerment”
http://bit.ly/privdim
+
Conclusions
The wonders of Big Data
Big Data can be used to create powerful personalized e-commerce experiences
The Perils of Big Data
Big Data solutions will only work if the developers have an adequate amount of domain knowledge
User Experiments
Big Data solutions need to be tested on real users, with a focus on user experience
A Note on Privacy
Big Data can raise privacy concerns, but it can at the same time be used to alleviate these concerns
+
Questions?
The wonders of Big Data Big Data can be used to create
powerful personalized e-commerce experiences
The Perils of Big Data Big Data solutions will only work if
the developers have an adequate amount of domain knowledge
User Experiments Big Data solutions need to be
tested on real users, with a focus on user experience
A Note on Privacy Big Data can raise privacy
concerns, but it can at the same time be used to alleviate these concerns