the movielens datasets: history and context
TRANSCRIPT
![Page 1: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/1.jpg)
The MovieLens Datasets:
History and Context
Max Harper (presenter)
Joe Konstan
![Page 2: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/2.jpg)
2
http://tiis.acm.org/iui16/
![Page 3: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/3.jpg)
MovieLens: 5 star movie ratings
userId,movieId,rating,timestamp
1,2,3.5,1112486027
1,29,3.5,1112484676
1,32,3.5,1112484819
1,47,3.5,1112484727
1,50,3.5,1112484580
1,112,3.5,1094785740
1,151,4.0,1094785734
1,223,4.0,1112485573
1,253,4.0,1112484940
...
138493,69644,3.0,1260209457
138493,70286,5.0,1258126944
138493,71619,2.5,1255811136
3
web site: dataset:
![Page 4: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/4.jpg)
ratings data is interesting, intuitive,
and pervasive
4
![Page 5: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/5.jpg)
dataset impact
» 140,000 downloads in 2014
» a search for “movielens” yields
• 6,020 results in Google Books
• 8,920 results in Google Scholar
5
![Page 6: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/6.jpg)
dataset uses
» research
» technical: programming books + blogs
» educational (including a MOOC)
» industrial R&D, demos
6
![Page 7: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/7.jpg)
overview
» MovieLens datasets overview
» dataset stability, system change
7
![Page 8: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/8.jpg)
8
<user, movie, rating, timestamp>
![Page 9: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/9.jpg)
9
<user, movie, rating, timestamp>
<Max, Toy Story, 4.0, 2010-12-01 12:00:00>
![Page 10: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/10.jpg)
MovieLens benchmark datasets
10
Name Dates Users Movies Ratings Density
ML 100K ‘97 – ‘98 943 1,682 100,000 6.30%
ML 1M ‘00 – ‘03 6,040 3,706 1,000,209 4.47%
ML 10M ‘95 – ‘09 69,878 10,681 10,000,054 1.34%
ML 20M ‘95 – ‘15 138,493 27,278 20,000,263 0.54%
designed for replicability
![Page 11: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/11.jpg)
MovieLens latest datasets
11
Name Dates Users Movies Ratings Density
ML Latest ‘95 – ‘16 247,753 34,208 22,884,377 0.003%
ML Latest
Small‘96 – ‘16 668 10,329 105,339 0.015%
designed for recency
![Page 12: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/12.jpg)
overview
» MovieLens datasets overview
» dataset stability, system change
12
![Page 13: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/13.jpg)
tension: datasets vs. system
» ideal (pure) vs. actual (it’s complex)
» systems want to change
• stay current, constant improvements
• A/B tests, beta testing, and other experiments
» context changes
• devices, competing sites, changing user base
13
![Page 14: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/14.jpg)
14
![Page 15: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/15.jpg)
15
![Page 16: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/16.jpg)
16
![Page 17: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/17.jpg)
17
![Page 18: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/18.jpg)
18
![Page 19: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/19.jpg)
some key changes
» core flow of browse/search
» rating widget
» recommender
» new user experience
» …
19
![Page 20: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/20.jpg)
history of experiments
» both online field experiments and online
lab experiments
» created temporary and permanent
changes, changed pattern of use
20
![Page 21: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/21.jpg)
21
![Page 22: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/22.jpg)
in the paper
» the story of MovieLens (1997 origins)
• lessons learned from running a “real” system
in a research lab
• lots of fun descriptive stats/charts
» best practices for dataset researchers
• limitations
• alternatives
22
![Page 23: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/23.jpg)
people who made this possible
» John Riedl
» Istvan Albert, Al Borchers, Dan Cosley, Brent J. Dahlen, Rich Davies, Michael Ekstrand, Dan Frankowski, Nathaniel Good, Jon Herlocker, Daniel Kluver, Shyong (Tony) Lam, Michael Ludwig, Sean McNee, Chad Salvatore, Shilad Sen, and Loren Terveen
» MovieLens users
23
![Page 24: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/24.jpg)
in ACM Transactions on Interactive Intelligent Systems, Dec. 2015
» feedback? contact us: [email protected]
presented by Max Harper, Research Scientist, University of Minnesota, [email protected]
written with Joe Konstan, Distinguished McKnight University Professor, University of Minnesota, [email protected]
This material is based on work supported by the National Science Foundation under grants DGE-9554517, IIS-9613960, IIS-9734442, IIS-9978717, EIA-9986042, IIS-0102229, IIS-0324851, IIS-0534420, IIS-0808692, IIS-0964695, IIS-0968483, IIS-1017697, IIS-1210863. This project was also supported by the University of Minnesota’s Undergraduate Research Opportunities Program and by grants and/or gifts from Net Perceptions, Inc., CFK Productions, and Google.
24
The MovieLens Datasets:
History and Context
![Page 25: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/25.jpg)
25
![Page 26: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/26.jpg)
26
version 0 (1997) version 4 (2014)
![Page 27: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/27.jpg)
one solution
» document change, include with datasets
27
![Page 28: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/28.jpg)
key dataset limitations (1/2)
» system UI and recommender changes
» bias towards “successful” users
» possible bias towards users with tolerance
for “research quality” design
» timestamps do not reflect time of
consumption
28
![Page 29: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/29.jpg)
key dataset limitations (2/2)
» recommender systems research
community attitudes
• implicit behaviors > ratings?
• dataset-only research increasingly
discouraged
29
![Page 30: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/30.jpg)
30
![Page 31: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/31.jpg)
MovieLens system evolution
key changes and experiments
31
![Page 32: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/32.jpg)
alternative datasets
32
Name Domain Rating Scale Ratings Density
Book-
Crossing books 0 - 10 1.1m 0.003%
EachMovie movies 0 - 14 2.7m 2.872%
Jester
(dataset1) jokes -10 - 10 4.1m 57.463%
Amazon many 1 - 5 82.8m < 0.001%
Netflix Prize movies 1 - 5 100.5m 1.178%
Yahoo Music
(C15) music (various) 0 - 100 262.8m 0.042%
![Page 33: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/33.jpg)
33
EachMovie
![Page 34: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/34.jpg)
lessons from running MovieLens
» lessons from startups apply (it’s hard, fail
fast)
» continual work, not one-time effort
» encourage code quality through good
social coding conventions
» invest in tools that allow users to help
34
![Page 35: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/35.jpg)
dataset uses
» recommender systems research
» recommender systems MOOC• http://coursera.org/learn/recommender-systems
» code examples (popular press, blogs)
» higher education
» commercial – internal testing
35
![Page 36: The MovieLens Datasets: History and Context](https://reader031.vdocument.in/reader031/viewer/2022021923/5a6d88d27f8b9a0a428b5a83/html5/thumbnails/36.jpg)
36