department of computer science & engineering university of ...€¦ · user eric has to decide...

RecentAdvancesinRecommenderSystemsandFutureDirec5ons

GeorgeKarypis DepartmentofComputerScience&EngineeringUniversityofMinnesota

1

OVERVIEWOFRECOMMENDERSYSTEMS

2

RecommenderSystems

Recommendersystemshaveemergedasakeyenablingtechnologyforecommerce.

–  Virtualexpertsthatarekeenlyawareofacustomer’spreferences.–  Theyareusedtofiltervastamountsofdatainordertoiden5fyinforma5onthatisrelevanttoauser.

Recommender system Recommendation

3

Datasources

•  Informa5onabouttheitems:–  Productdescrip5ons,productspecifica5ons,ar5clecontent,media

aRributes,etc.

•  Informa5onabouttheusers:–  Profiledatasuchasdemographic,social-economic,andbehavioral

informa5on.–  Social/trustnetworks;eitherimplicitorexplicit.

•  Transac5onalinforma5on:–  Historyofuser-itempurchases.–  Productra5ngs,productreviews.–  Implicitvs.explicitinforma5on.

4

Problems&approaches

•  Ra5ngpredic5on–  Predictthera5ngthatauserwillgivetoa

par5cularitem.

•  Top-Nrecommenda5on–  Iden5fyasetofitemsthattheuserwilllike.

•  Cold-startrecommenda5ons–  Computerecommenda5onsforpreviously

unseenusersand/oritems.

•  Grouprecommenda5ons–  Computerecommenda5onsthatsa5sfya

groupofusers.

•  Assortmentrecommenda5ons–  Recommendanautoma5callyconstructedassortmentofitems:e.g.,vaca5onpackage,ouYit,playlist.

•  Contextawarerecommenda5ons.

•  Non-personalizedmethods–  Rule-basedsystems,popularity-based

models,methodsbasedonglobalmodels.

•  Methodsrelyingonauser’shistoricalprofile–  Predic5vemodelses5matedusingauser’s

individualpreferences.

•  Collabora5vefilteringmethods–  Approachesthatleverageinforma5onfrom

otherusersanditems.

Recommenda7onproblems Recommenda7onapproaches

5

Collabora5veFiltering(CF)

§  Derivesrecommenda5onsbyexploi5ngthe“wisdomofthecrowd”.

§  Itcanbeviewedasaninstanceofmul5-tasklearningandtransferlearning.

§  Itisthemostprominentapproachtoday:–  Usedbylarge,commerciale-commercesites.–  Well-understood,variousalgorithmsandvaria5onsexist.–  Applicableinmanydomains(book,movies,DVDs,..).–  Itcanoperatebothinacontentagnos5candcontent

awareseang.–  Itcanhandlebothexplicitandimplicitpreference

informa5on.

6

OverviewofCFmethods

•  User-,item-,andgraph-basedneighborhoodmethods.–  Lazylearners.

•  Methodsthatusevariousmachinelearningapproachestolearnapredic5vemodelfromhistoricaldata.–  Latent-spacemodelsbasedonmatrix/tensorcomple5on.–  Linearandnon-linearmul5-regressionmodels.–  Probabilis5cmodels.–  Auto-encoder-basedneuralnetworks.–  ….

7

Item-basednearest-neighbor(1)

Thebasictechnique:–  GivenEric(“ac5veuser”)andTitanic(“ac5veitem”)thatErichasnotyet

seen.–  Findasetofitems(nearestneighbors)thatEricalreadyratedsuchthat

otheruserslikedeachoftheminasimilarfashionastheylikedTitanic.

–  UseEric’sra5ngsontheseitemstopredicthowmuchEricwilllikeTitanic.

Fortop-N,dothisforeveryunrateditemandreturntheNitemswiththehighestpredictedra5ng.

2 A Comprehensive Survey of Neighborhood-Based Recommendation Methods 43

by their “interestingness” to u. If the test set is built by randomly selecting, for eachuser u, a single item iu of Iu, the performance of L can be evaluated with the AverageReciprocal Hit-Rank (ARHR):

ARHR.L/ D 1

jUjX

u2U

1

rank.iu;L.u//; (2.5)

where rank.iu;L.u// is the rank of item iu in L.u/, equal to1 if iu 62 L.u/. A moreextensive description of evaluation measures for recommender systems can be foundin Chap. 8 of this book.

2.3 Neighborhood-Based Recommendation

Recommender systems based on neighborhood automate the common principle thatsimilar users prefer similar items, and similar items are preferred by similar users.To illustrate this, consider the following example based on the ratings of Fig. 2.1.

Example 2.1. User Eric has to decide whether or not to rent the movie “Titanic”that he has not yet seen. He knows that Lucy has very similar tastes when it comesto movies, as both of them hated “The Matrix” and loved “Forrest Gump”, so heasks her opinion on this movie. On the other hand, Eric finds out he and Diane havedifferent tastes, Diane likes action movies while he does not, and he discards heropinion or considers the opposite in his decision.

2.3.1 User-Based Rating Prediction

User-based neighborhood recommendation methods predict the rating rui of a useru for a new item i using the ratings given to i by users most similar to u, callednearest-neighbors. Suppose we have for each user v ¤ u a value wuv representingthe preference similarity between u and v (how this similarity can be computed will

TheTitanic

Die ForrestWall-E

Matrix Hard Gump

John 5 1 2 2Lucy 1 5 2 5 5Eric 2 ? 3 5 4

Diane 4 3 5 3

Fig. 2.1 A “toy example” showing the ratings of four users for five movies

Keyassump5ons:•  Itemsbelonginto(overlapping)groupsthatelicitsimilarlikes/dislikes.

• Users’preferencesremainstableandconsistentover5me.

8

Item-basednearest-neighbor(2)

•  Item-basedmethodspre-computeandstorethekmostsimilarotheritemsforeachitem.

•  Thepredic5onscoresarees5matedusingonlythosemostsimilaritems.

•  Similari5es:cosine,Jaccard,correla5oncoeeficient,condi5onalprobability,…

9

rui = f(ru:, s:i) ;ĞŐ rui = ru: s:iͿ

ǁŚĞƌĞ

R Rnm ŝƐ ƚŚĞ ŵĂƚƌŝǆ ŽĨ ŚŝƐƚŽƌŝĐĂů ƌĂƟŶŐƐ ĂŶĚ

S Rmm ŝƐ ƚŚĞ ŝƚĞŵͲŝƚĞŵ ƐŝŵŝůĂƌŝƚǇ ŵĂƚƌŝǆ ƐƚŽƌŝŶŐ ƚŚĞ kŚŝŐŚĞƐƚ ƐŝŵŝůĂƌŝƟĞƐ ĂůŽŶŐ ĞĂĐŚ ƌŽǁ

Low-rankmatrixapproxima5on

Thera5ngmatrixcanbeapproximatedastheproductoftwolow-rankmatrices:

rui = puq0i

Thislow-rankdecomposi5oncanbeusedtocomputethera5ngsforunseenitemsas:

10

Interpreta5onoflatentfactors

Thereisalowdimensionalfeaturespace(whosedimensionalityistherankofthedecomposi5on)onwhichbothusersanditemscanbeembeddedin.Latent variable view

Geared towards females

Geared towards males

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11

Sense and Sensibility

Gus

Dave

Latent factor models

•  Theitemvectorcanbethoughtofascapturinghowmuchofeachofthelatentfeaturestheitempossess.

•  Theuservectorcanbethoughtofascapturingtheuser’spreferencefortheselatentfeatures.

11

Latentfactorsviamatrixcomple5on

•  Es5matethelatentfactormatricesbasedonlyontheobservedentriesofthematrix.

•  Thisisanon-convexop5miza5onproblemandcanconvergetoalocalminima.

>Ğƚ ďĞ ƚŚĞ ƐĞƚ ŽĨ ŽďƐĞƌǀĞĚ ĞŶƚƌŝĞƐ ŝŶ ƚŚĞ ƌĂƟŶŐ ŵĂƚƌŝǆ RdŚĞ P ĂŶĚ Q ĨĂĐƚŽƌƐ ĂƌĞ ĞƐƟŵĂƚĞĚ ďǇ ƐŽůǀŝŶŐ ƚŚĞ ĨŽůůŽǁŝŶŐŽƉƟŵŝǌĂƟŽŶ ƉƌŽďůĞŵ

ŵŝŶŝŵŝǌĞP,Q

=

(u,i)

(rui puqi)

2.

12

Improvingaccuracyoflatentfactormodels

Theperformanceofthera5ngpredic5oncanbeimprovedby

–  Explicitlymodelingglobal,user,anditembiases•  Biases:thebaseline/expectedra5ngsforallitemsbyallusers,fortheitemsratedbyauser,andfortheusersforagivenitem.

–  Reducingmodelover-fiangviaregulariza5on:

ŵŝŶŝŵŝǌĞP,Q,µ,b∗

=!

(u,i)∈Ω

(rui−µ− bu− bi− puq′i)

2+λ(µ+ ||b∗||22+ ||P ||2F + ||Q||2F ).

dŚĞ ĞƐƟŵĂƚĞĚ ƌĂƟŶŐ ĨŽƌ ƵƐĞƌ u ŽŶ ŝƚĞŵ i ŝƐ ŐŝǀĞŶ ďǇrui = µ + bu + bi + pq

13

Recenttrends

•  Theboundariesbetweentradi5onalneighborandlatentfactormodelshavebecomelessseparated.

•  Neighborschemesrelyonthelatentspacetocomputeitem-itemsimilari5es.

•  Latentfactoriza5onmethodsrelyonsimilaritemstoderivealatentrepresenta5onofauser.

R ≈ PQ′, Ɛŝŵ(i, j) = qiq′j .

14

rui = µ+ bu + bi +

⎛

⎝ 1

|U|α∑

j∈Uruiqj

⎞

⎠ q′i.

ASMALLDETOUR

15

•  Chemicalgene5csisapromisingapproachforstudyingbiologicalsystems.

•  Ithasanumberofkeyadvantagesoverapproachesbasedonmoleculargene5cs:–  smallmoleculescanworkrapidly,–  theirac5onisreversible,–  canmodulatesinglefunc5onsofmul5-func5onproteins,–  candisruptprotein-proteininterac5ons,and–  ifthetargetispharmaceu5calrelevant,itcanleadtothediscoveryofnewdrugs.

Chemical Gene,cs (Genomics): The research field that is designed to discover and synthesizeprotein-binding small organicmolecules that can alter the func5on of all the proteins and usethemtostudybiologicalsystems.

(Na5onalIns5tuteofGeneralMedicalSciences)

Chemicalgene5cs(genomics)

16

RS&CG–Datasimilari5es

Target-ligandac7vitymatrix User-itemra7ngmatrix

Target-intrinsicinforma5on•  sequence,structure,homology,etc.

User-intrinsicinforma5on•  profile,socialnetwork,etc.

Ligand-intrinsicinforma5on•  chemicalstructure,pharmacophores,etc.

Item-intrinsicinforma5on•  descrip5ons,content,specifica5on,aRributes,etc.

Ac5vityinforma5on•  IC50,dose-response,etc.

Transac5oninforma5on•  ra5ng,view,review,etc.

17

RS&CG–Similarproblems

•  Ac5vitypredic5on=>Ra5ngpredic5on

•  Secondaryscreeninglibrarydesign=>Top-Nrecommenda5on

•  Librarydesignforanoveltarget=>Cold-startrecommenda5on

•  Denovocompounddesign=>Assortmentrecommenda5on

18

RS&CG–Similarprinciples

•  Ligandbindingisaprocessthatinvolves:–  Structureofprotein’sbindingsiteandthestructureoftheligand.–  Non-covalentinterac5onsbetweentheatomsoftheprotein’s

bindingsiteandthatoftheligand.

•  Asaresult:–  Thesameligandwillbindtosimilartargets.–  Similarligandswillbindtothesametarget.–  Similarligandswillbindtosimilartargets.

•  Thesearethesameprinciplesandassump5onsbehindrecommendersystems.

19

RS&CG–Similarmethods

•  Target-specificStructure-Ac5vityRela5onship(SAR)models.–  Thebiologicalac5vityofachemical

compoundismathema5callyexpressedasafunc5onofitschemicalstructure.

•  Chemogenomicapproaches.–  Proteinsofthesamefamilytendto

bindtoligandswithcertaincommoncharacteris5cs.

•  Mul5-tasklearningapproaches.–  Modelsthates5materela5ons

betweenprotein-andligand-derivedfeatures.

•  Personalizedcontent-basedrecommendersystems.–  Theuser’spasttransac5onsare

usedtoderivecontent-basedmodelsforlike/dislikepredic5on.

•  Cluster-basedrecommendersystems.–  Content-basedmodelsforsimilar

groupsofusers.

•  Collabora5vefilteringapproacheswithsideinforma5on.

20

RECENTDEVELOPMENTSINITEM-BASEDAPPROACHES

21

Es5ma5ngthesimilaritymatrix

Goal:Insteadofusingapre-determinedfunc5ontocomputetheitem-itemsimilari5es,“learn”themdirectlyfromthedata.

Learningproblemformula5on:Es5matealinearmodeltopredicteachitembasedonthehistoricalinforma5onoftheotheritems.

>Ğƚ wi ďĞ ƚŚĞ ůŝŶĞĂƌ ŵŽĚĞů ĨŽƌ ŝƚĞŵ i ůĞƚ R¬i ďĞ ƚŚĞ n (m 1)ƐƵďͲŵĂƚƌŝǆ ŽĨ R ŽďƚĂŝŶĞĚ ďǇ ƌĞŵŽǀŝŶŐ ĐŽůƵŵŶ i ;ŝĞ ƚŚĞ ŚŝƐƚŽƌŝĐĂůŝŶĨŽƌŵĂƟŽŶ ĨŽƌ ŝƚĞŵ iͿ ĂŶĚ ůĞƚ r:i ďĞ ƚŚĞ iƚŚ ĐŽůƵŵŶ ŽĨ RdŚĞ wi ŝƐ ĞƐƟŵĂƚĞĚ ĂƐ

ŵŝŶŝŵŝǌĞwi

1

2||r:i R¬iwi||2 + ƌĞŐ(wi)

.

dŚŝƐ ŵŽĚĞů ďĞĐŽŵĞƐ ƚŚĞ iƚŚ ĐŽůƵŵŶ ŽĨ ƚŚĞ ŝƚĞŵͲŝƚĞŵ ƐŝŵŝůĂƌŝƚǇ ŵĂͲƚƌŝǆ S

r:i

R

22

SLIM–SparseLinearMethod

•  Themodel(S)ises5matedas:

•  Goodperformanceisopenachievedwith50-200non-zerospercolumn.–  Lowstoragerequirementsandlowrecommenda5on5me.

•  Themodelcanbees5matedefficiently:–  EachcolumnofScanbecomputedindependently.–  Thesolu5onofacolumncan“warmstart”othersimilarcolumns.–  Regulariza5onspacecanbeexploredefficiently.–  Itemneighborscanbeusedforini5al“feature”selec5onandrestrictsparsity

structure.

ThisisaStructuralEqua5onModel(SEM)withnoexogenousvariables.Itcanalsobeviewedasanetworkinferencemodel.

ŵŝŶŝŵŝǌĞS

1

2||R RS||2F +

2||S||2F + ||S||1

ƐƵďũĞĐƚ ƚŽ S 0

ĚŝĂŐ(S) = 0

23

SLIM–Performance

HitRate@10

AverageReciprocalHitRate

Hit Rate (HR) @ N =tp

N

2 A Comprehensive Survey of Neighborhood-Based Recommendation Methods 43

by their “interestingness” to u. If the test set is built by randomly selecting, for eachuser u, a single item iu of Iu, the performance of L can be evaluated with the AverageReciprocal Hit-Rank (ARHR):

ARHR.L/ D 1

jUjX

u2U

1

rank.iu;L.u//; (2.5)

where rank.iu;L.u// is the rank of item iu in L.u/, equal to1 if iu 62 L.u/. A moreextensive description of evaluation measures for recommender systems can be foundin Chap. 8 of this book.

2.3 Neighborhood-Based Recommendation

Recommender systems based on neighborhood automate the common principle thatsimilar users prefer similar items, and similar items are preferred by similar users.To illustrate this, consider the following example based on the ratings of Fig. 2.1.

Example 2.1. User Eric has to decide whether or not to rent the movie “Titanic”that he has not yet seen. He knows that Lucy has very similar tastes when it comesto movies, as both of them hated “The Matrix” and loved “Forrest Gump”, so heasks her opinion on this movie. On the other hand, Eric finds out he and Diane havedifferent tastes, Diane likes action movies while he does not, and he discards heropinion or considers the opposite in his decision.

2.3.1 User-Based Rating Prediction

User-based neighborhood recommendation methods predict the rating rui of a useru for a new item i using the ratings given to i by users most similar to u, callednearest-neighbors. Suppose we have for each user v ¤ u a value wuv representingthe preference similarity between u and v (how this similarity can be computed will

TheTitanic

Die ForrestWall-E

Matrix Hard Gump

John 5 1 2 2Lucy 1 5 2 5 5Eric 2 ? 3 5 4

Diane 4 3 5 3

Fig. 2.1 A “toy example” showing the ratings of four users for five movies

24

SLIM–Long-tailperformance

Ra5ngDistribu5oninML10M

HR@10inML10MLongTail

ARHRinML10MLongTail

25

SLIMextensions

•  Low-rankconstraintsonSasanalternatewaytocontrolmodelcomplexity.–  Eitherviatheproductoftwolowrankmatricesorbyminimizing

thenuclearnormofS.•  Incorpora5onofitemsideinforma5on.•  Incorpora5onofdifferentcontexts.•  Incorpora5onoftemporalinforma5on.•  Higher-orderregressionmodels.•  FusionofglobalandlocalSLIMmodels.

ŵŝŶŝŵŝǌĞS

1

2||R RS||2F +

2||S||2F + ||S||1

ƐƵďũĞĐƚ ƚŽ S 0

ĚŝĂŐ(S) = 0

26

Factoredsimilaritymatrix

•  SLIM(andtradi5onalitem-itemapproaches)cannotcompute/learnrela5onsbetweenpairsofitemsthatarenotco-rated.–  Thees5matedsimilari5esforsuchpairofitemswill

alwaysbe0.

•  Theycannotproducemeaningfulrecommenda5onsthatrelyontransi5vitywithintheitem-itemsimilaritygraph.

•  Canleadtopoorrecommenda5ons,especiallyforsparserdatasets,inwhichtherearefewpairsofco-rateditems.

r:i

R

27

FISM–FactoredSLIM

0

10

20

30

40

50

60

Sparse(-1) Sparser(-2) Sparsest(-3)

Percen

tageIm

provem

ent

ML100K NeUlix Yahoo

Performanceoverbestcompe5ngmethodamong(SLIM,itemKNN,PureSVD,BPRMF,BPRkNN)

rui =X

j2Ru

rujpjqTi

ŵŝŶŝŵŝǌĞP,Q

!1

2||R−RPQT ||2F +

β

2(||P ||2F + ||Q||2F )

"

Sincetheoverallproblemisnotconvex,op5miza5onandsearchovertheregulariza5onspacebecomesconsiderablymoreexpensive.ScalableapproachesrelyonSGD.

28

Learningfromhigher-orderrela5ons

l  Ifacustomerbuysacertaingroupofitems,theyaremore(orless)likelytobuysomeotheritems.

l  Thejointdistribu5onofasetofitemscanbedifferentfromthedistribu5onsoftheindividualitemsintheset.

l  Higherordermodelsareneededtocapturesuchrela5ons.

29

HOSLIM—Higher-orderSLIM

•  KeyIdea:Usefrequentitemsetstocapturehigherorderrela5ons.•  Two-stepapproach:

–  Givenauser-itempurchasematrixR,findallfrequentitemsetsandcreateauser-itemsetmatrixRf.

–  LearntwosimilaritymatricesSandSfthatcaptureitem-itemanditemset-itemrela5ons.

•  Predictnewitemsbycombininginforma5onfromSandSf.

Modeles5ma5on:

Predic5on: rui = ru: s:i + rfu: s

f:i

ŵŝŶŝŵŝǌĞS

1

2||R RS RfSf ||2F +

2(||S||2F + ||Sf ||2F ) + (||S||1 + ||Sf ||1)

ƐƵďũĞĐƚ ƚŽ S 0, Sf 0,

ĚŝĂŐ(S) = 0, ĂŶĚ sfji = 0 ĨŽƌ i Ij

30

HOSLIMperformance

Considerablegainscanbeobtainedfordatasetsthathavesuchcharacteris5cs.

31

Onemodeldoesnotfitall

•  Initem-basedmethods,personaliza5onoccursbasedontheitemsthattheuserhaspreviouslyactedon.–  The“recommenda5ons”thattheseitemstriggerarenotspecifictoeachuserbutareglobal.

•  BeRerperformancecanpoten5allybeachievedbyhavinguser-specificitem-basedmodels.

32

GLSLIM–FusionofglobalandlocalSLIMmodels

ŵŝŶŝŵŝǌĞS,S1,...,SK ,c,g

1

2

ui

(rui rui)2 +

2(||S||2F +

k

||Sk||2F ) + (||S||1 +

k

||Sk||1)

ƐƵďũĞĐƚ ƚŽ S 0, Sk 0 ĨŽƌ k = 1, . . . , K,

ĚŝĂŐ(S) = 0, ĚŝĂŐ(Sk) = 0 ĨŽƌ k = 1, . . . , K,

u, 0 gu 1, ĂŶĚu, cu 1, . . . , K.

globalmodel

localmodel

user’smembership

rui =!

j∈Ru

ruj"gusji + (1− gu)s

cuji

#

Op5mizedusingalternateop5miza5onbetweensolvingforthevariousSLIMmodelsandsolvingfortheclusteringandusermembership.

33

S1

S2

S


15%

11%

1%

7%

18%

12%

6%8%

0%

5%

10%

15%

20%

groceries(65) ml10m(20) flixter(5) jester(60)

ImprovementofGL-SLIMoverSLIM

HR ARHR

34


Smartini5aliza5onreducesthenumberofitera5onsbutdoesnotsignificantlyimpactthefinalsolu5on.

35

WRAPPINGUP

36

Currentstateoftheart

•  Ra5ngpredic5on–  Factoriza5on-basedapproachesarehighlyeffec5ve:

• Goodpredic5onperformance,fasttraining5me,theycanincorporatesideinforma5on,diverseobjec5ves,etc.

•  Top-Nrecommenda5onmethods–  Noclearwinningstrategyhasemerged.

•  Item-basedmethodsoutperformsignificantlymoresophis5catedmethods.

–  Thereissignificantongoingresearchonrankinglossfunc5ons.

37

Scalingup

•  Es5mateonlyitemfactorsfromasubsetofusers&computeuserfactorsonthefly.

•  Warm-startthesearchovertheregulariza5onspace:–  ForMFuseSVDtoeliminatebadlocalminima.–  ForSLIMuseprevioussolu5onsand/orsolu5onsforsimilaritems.

•  Sampletheusers.Thegoalistogetareliablesetofitem-basedmodels:–  Itemfactorsoritem-itemmodels.

38

Futuredirec5ons

•  Deeppersonaliza5on–  Context,loca5on,5me,etc.

•  Behaviorsteering–  Acquiringtaste,modelingstate,futurebenefits,etc.

•  Contentcrea5on–  Packagerecommenda5on,ar5clegenera5on,etc.

•  Liveevalua5ons– OpenA/Btes5ngplaYorms.

39

Finalwords…

•  Recommendersystemshaveextensiveapplica5ons.–  Bothcommercial,scien5fic,andsocietalapplica5ons.

•  Therearealreadyhigh-qualitysopwareimplementa5onsofmanyofthesealgorithms.–  Theycanbeusedasis,orusedtoquicklyexperimentwithnewmodelingapproachesanddatasources.

•  Mul5-tasklearningunderliesmanyofthelearningmethods.–  Averyac5veareaofresearchwithbroadapplica5ons.

•  Thefieldisripefornewmethodologicaladvancesthatwillgetustothenextlevel.

40

Acknowledgments

Students(current&former)§  XiaNing§  EvangeliaChristakopoulou§  SantoshKabbur§  AsmaaElBadrawy§  AgoritsaPolyzou

Funding§  NSF,ARO,NIH,PayPal,Samsung,Intel.References:

Recommender Systems Handbook, 2015 http://www.springer.com/us/book/9781489976369

Papers at hRp://www.cs.umn.edu/~karypis

41

department of computer science & engineering university of ...€¦ · user eric has to decide...

Documents