sigir 2013 bars keynote - the search for the best live recommender system

The Search for the Best Live Recommender System

Torben Brodtplista GmbH

KeynoteSIGIR Conference 2013, DublinBARS Workshop - Benchmarking Adaptive Retrieval and Recommender Systems

August 1st, 2013

recommendations

where● news websites● below the article

different types● content● advertising

quality is win win

● happy user● happy advertiser● happy publisher● happy plista*

* company i am working

some years ago

one recommender

● collaborative filtering○ well known algorithm○ more data means

more knowledge● parameter tuning

○ time○ trust○ mainstream

one recommender = good result

2008● finished studies● publication● plista was born

today● 5k recs/second● many publishers

netflix prize

" use as many recommenders as possible! "

more recommenders

lost in serendipity

● we have one score● lucky success? bad

loose? ● we needed to keep

track on different recommenders

success: 0.31 %

how to measure success

number of● clicks● orders● engages● time on site● money

BAD

GOOD

evaluation technology

● features○ SUM○ INCR

● big data (!!)● real time● in memory


impressions

collaborative filtering 500 +1

most popular 500

text similarity 500

ZINCRBY"impressions""collaborative_filtering""1"

ZREVRANGEBYSCORE "impressions"


impressions

collaborative filtering 500

most popular 500

text similarity 500

clicks


most popular 10

... 1

needs division

ZREVRANGEBYSCORE "clicks"

ZREVRANGEBYSCORE "impressions"

evaluation results

● CF is "always" the best recommender

● but "always" is just avg of all context

lets check on context!

t = times = success

evaluation context● our context is limited to the web● we have URL + HTTP Headers

○ user agent -> device○ IP address -> geolocation○ time -> weekday

evaluation contextwe use ~60 context attributes

publisher = welt.de


most popular 420

text similarity 135

weekday = sunday


most popular 200

... 100category = archive

text similarity 200


... 5

evaluation context

publisher = welt.de

collaborative filterin 689

most popular 420

text similarity 135

weekday = sunday


most popular 200

... 100category = archive

text similarity 200


... 5

ZUNION clk ... WEIGHTS p:welt.de:clk 4 w:sunday:clk 1 c:archive:clk 1 ZREVRANGEBYSCORE "clk"

ZUNION imp ... WEIGHTS p:welt.de:imp 4 w:sunday:imp 1 c:archive:imp 1 ZREVRANGEBYSCORE "imp"

evaluation context

recap● added 3rd dimension

result● better for news:

Collaborative Filtering● better for content: Text

Similarity

t = times = successc = context

now breathe!

what did we get?● possibly many recommenders● know how to measure success● technology to see success

now breathe!

what is the link to the workshop?“.. novel, personalization-centric benchmarking approaches to evaluate adaptive retrieval and recommender systems”

● Functional: focus on user-centered utility metrics

● Non-functional: scalability and reactivity

the ensemble

● realtime evaluation technology exists

● to choose best algorithm for current context we need to learn○ multi-armed

bayesian bandit

multi armed bandit

temporary success?

No. 1 getting most

local minima?

Interested? Look for Ted Dunning + Bayesian Bandit

the ensemble = better results

● new total / avg is much better

● thx bandit● thx ensemble

t = times = success

try and error

● minimum pre-testing

● no risk if recommender crashs

● "bad" code might find its context

collaboration

● now plista developers can try ideas

● and allow researchers to do same

big pool of algorithmsEnsemble is able to choose

researcher has idea

.. needs to start the server

... probably hosted by university, plista or any cloud provider?

.. api implementation

"message bus"● event notifications

○ impression○ click

● error notifications● item updates

train model from it

plistaAPI

API research

{ // json "type": "impression", "context": { "simple": { "27": 418, // publisher "14": 31721, // widget ... }, "lists": { "10": [100, 101] // channel } ... }

.. package content

api specs hosted at https://sites.google.com/site/newsrec2013/

long term URL to be announced

plistaAPI

API research

Context + Kind

https://sites.google.com/site/newsrec2013/



.. reply to recommendation requests{ // json "recs": { "int": { "3": [13010630, 84799192] // 3 refers to content recommendations } ... }

generated by researchersto be shown to real user

api specs hosted at https://sites.google.com/site/newsrec2013/

long term URL to be announced

recs

API

real user

researcher




quality is win win #2

● happy user● happy researcher● happy plista

research can profit● real user feedback● real benchmark

recs

plista

real user

researcher

quick and fast

● no movies!● news articles will outdate!● visitors need the recs NOW● => handle the data very fast

src http://en.wikipedia.org/w

iki/Flash_(comics)

http://en.wikipedia.org/wiki/Flash_(comics)

"send quickly" technologies

● fast web server● fast network protocol● fast message queue● fast storage

or Apache Kafka

"learn quickly" technologies

● use common frameworks

src http://en.wikipedia.org/wiki/Pac-Man

http://en.wikipedia.org/wiki/Pac-Man

comparison to plista"real-time features feel better in a real-time world"

we don't need batch! see http://goo.gl/AJntul

our setup● php, its easy● redis, its fast● r, its well known

http://goo.gl/AJntul

Overview

Questions?Torben

http://goo.gl/pvXm5 (Blog)

[email protected]

http://lnkd.in/MUXXuv

xing.com/profile/Torben_Brodt

www.plista.com

News Recommender Challengehttps://sites.google.com/site/newsrec2013/

#sigir2013 #bars2013@torbenbrodt @plista @BARSws

http://goo.gl/pvXm5

http://goo.gl/pvXm5

mailto:[email protected]

mailto:[email protected]



http://xing.com/profile/Torben_Brodt

http://xing.com/profile/Torben_Brodt

http://www.plista.com

http://www.plista.com



sigir 2013 bars keynote - the search for the best live recommender system

Technology