Download - Paper the plista dataset
The plista Dataset
ACM RecSys 2013, Hong Kong
Authors:Kille, Benjamin and Hopfgartner, Frank and Brodt, Torben and Heintz, Tobias
Speaker: Brodt, Torben
International News RecommenderSystems Workshop and Challenge
October 13th, 2013
Introduction and Motivation
● Context: News Article Recommendation
Introduction and Motivation
● Do we need another recommendation data set?we have
● What features are those data sets missing?● What requirements entail news articles for
recommendation?
...
Introduction and Motivation
● Features that had not been available in existing data sets:○ contextual features: device, operating system,
browser, etc.○ cross-domain features: 13 different news providers
included○ different interaction types: interactions with
recommendations (clicks), as well as news items (impressions)
○ content features: headline, URL, images, text snippets, etc.
Introduction and Motivation
● Additional requirements for recommending news articles○ real-time → recommendations must be provided within a
short time interval (< 200ms)○ changing relevancy → items’ relevancy decreases with
time○ dynamics → new news items are being continuously
added● Requirements inherent to existing recommender systems:
○ sparsity → users typically read only few news articles○ cold start → systems refrain from requesting users to
create profiles; this results in a majority of small user profiles
{ // json
"type": "impression",
"context": {
"simple": {
"27": 418, // publisher
"14": 31721, // widget
...
},
"lists": {
"10": [100, 101] // channel
}
...
}api specs hosted at http://orp.plista.com
Dataset characteristics
Dataset characteristics
● object types○ impressions → users reading news articles○ clicks → users clicking recommendations○ creates → news articles being created○ updates → news articles being updated
api specs hosted at http://orp.plista.com
Dataset usage
Dataset usage● Evaluation based on
Click-Through-Rate (CTR)
● ~ 84 million impressions
● ~ 1 million clicks
Dataset usage
● evaluation cross-news portal recommenders
● 10 - 36 % user overlap in between different news portals
Dataset usage
● news portal comparisons● do we observe similar user
behaviour on news portals offering similar content?
Dataset usage
● evaluating contextual recommendation algorithms
● sensitive to○ weekday○ hour of day○ ...
Dataset usage
When using the data set you may consider…● … we identify users by session IDs
○ individual users may have several IDs○ users sharing their device might be mapped to one ID
● … interactions (clicks, impressions) and content dynamics (creates, updates) differ between news portals
● … contents are restricted to German● … preferences are represented on a binary scale (user
read article, user clicked recommendation)● … clicking on recommendations might not reveal the
actual relevancy of an item
Conclusions
● we introduce a new data set intended to support recommender systems research
● we outlined novel features which existing data sets lacked
● we presented scenarios which can be evaluated using the data set
● we pointed to critical aspects which ought to be considered when working with the data set
Summary
● news articles○ of ~13 publishers
● transactional data○ Impressions○ Clicks
● contextual data○ of ~50 attributes
● cross domain application
The plista Dataset@inproceedings{Kille:2013,
title = {The plista Dataset},author = {
Kille, Benjamin and Hopfgartner, Frank and Brodt, Torben and Heintz, Tobias
},booktitle = {
NRS'13: Proceedings of the International Workshop and Challenge on News Recommender Systems
},year = {2013},month = {10},location = {Hong Kong, China},publisher = {ACM},pages={14--21}
}