hers: modeling inﬂuential contexts with heterogeneous ... · hers: modeling inﬂuential contexts...

HERS: Modeling Influential Contexts with Heterogeneous Relationsfor Sparse and Cold-start Recommendation

Liang Hu1,2, Songlei Jian1,3, Longbing Cao1, Zhiping Gu4, Qingkui Chen2, Artak Amirbekyan51University of Technology Sydney, Australia

2University of Shanghai for Science and Technology, China, 3National University of Defense Technology, China4Shanghai Technical Institute of Electronics Information, China, 5Commonwealth Bank of Australia

[email protected], [email protected], [email protected]@usst.edu.cn, [email protected], [email protected]

AbstractClassic recommender systems face challenges in addressingthe data sparsity and cold-start problems with only modelingthe user-item relation. An essential direction is to incorporateand understand the additional heterogeneous relations, e.g.,user-user and item-item relations, since each user-item inter-action is often influenced by other users and items, whichform the user’s/item’s influential contexts. This induces im-portant yet challenging issues, including modeling heteroge-neous relations, interactions, and the strength of the influencefrom users/items in the influential contexts. To this end, wedesign Influential-Context Aggregation Units (ICAU) to ag-gregate the user-user/item-item relations within a given con-text as the influential context embeddings. Accordingly, wepropose a Heterogeneous relations-Embedded RecommenderSystem (HERS) based on ICAUs to model and interpret theunderlying motivation of user-item interactions by consider-ing user-user and item-item influences. The experiments ontwo real-world datasets show the highly improved recommen-dation quality made by HERS and its superiority in handlingthe cold-start problem. In addition, we demonstrate the inter-pretability of modeling influential contexts in explaining therecommendation results.

IntroductionRecommender systems (RSs) are essentially embedded withheterogeneous relations: user-user couplings, item-item cou-plings, and user-item couplings (Cao 2015), with which wecall Heterogeneous relations Embedded Recommender Sys-tems (HERS). It is increasingly recognized that modelingsuch multiple heterogeneous relations is essential for un-derstanding the non-IID nature and characteristics of RSs(Cao 2016) and addressing such challenges as social, group-based, context-based, cross-domain, sparse, cold-start, dy-namic and deep recommendation (Tang, Hu, and Liu 2013;Hu et al. 2014; Hu et al. 2017b; Hu et al. 2016; Do and Cao2018a; Do and Cao 2018b; Zhang et al. 2018).

Figure 1 illustrates the concept and motivation of a HERScontaining user-user relations (e.g., friendship), item-itemrelations (e.g., the same category), and user-item relations(e.g., users’ purchase or feedback on items). Given a userut in the user social network, her/his selections are often in-fluenced by friends (Friedkin and Johnsen 2011). Moreover,

Copyright c© 2019, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

User-user Relation Item-item RelationUser-item Relation

us

ut

is

itCu

CuCi

Ci

s

t

s

t

Figure 1: An HERS consists of three heterogeneous re-lations: user-user, item-item, and user-item. Each user’schoice is relevant to the corresponding user’s and item’s in-fluential contexts.

the influences from friends’ friends may transitively influ-ence ut’s choice. Cut signifies the influential context w.r.t.ut, which outlines ut’s influential neighbors and the rela-tionships between them. Those users in Cut have the con-siderable influence on ut’ selection. Similarly, user selec-tion on an item it is also influenced by it’s relevant itemswhich form it’s influential context Cit . For example, we caninfer that ut more probably selects it (Android watch) thanis (Apple watch) if we consider the corresponding influen-tial contexts Cut

, Cit and Cis , i.e., the choices from ut’sfriends and the compatibility of electronic products. This ex-ample shows that the influential contexts of users and itemsindicate how a user’s choice on items is made, thus makingrecommendation more accurate and interpretable. The view-point of HERSs and influential contexts can further addressthe sparsity of user-item interactions and both user and itemcold-start problems by referring to the influential users anditems in the contexts.

Modeling an HERS with influential contexts involves var-ious challenges. In this work, we focus on two major ones:(1) How to model a user’s and an item’s influential con-texts behind an observed user-item interaction? (2) How tolearn the strength of influence from different users or itemsin the different contexts? These two questions are importantand challenging because the influential contexts for differentuser-item interactions are different, and the influence fromeach user/item in one context is also different.

Some relevant work includes social RSs (Tang, Hu, andLiu 2013; Jiang et al. 2014; Hu et al. 2017a) which com-bine collaborative filtering (CF) and social network analysis(Vasuki et al. 2010; Tang et al. 2012) by co-factorizing therating matrix and the social-relation matrix (Ma et al. 2008;Jamali and Ester 2010) and adding regularization term ac-cording to the social closeness (Ma et al. 2011) in matrixfactorization (MF). However, they only consider user rela-tionships but ignore the influence of relevant items and onlymodel first-order influences from neighbors and fail to con-sider higher-order influences from indirect users. Factoriza-tion machine (FM) (Rendle 2012) represents multiple rela-tions with a design matrix (Rendle 2013). However, using asingle design matrix for all relations implicitly assumes thehomogeneity of these relations. FM is unable to model thestrengths of influence from the same users/items in differentinfluential contexts w.r.t. different target users/items. Recentwork on non-IID Rs includes modeling user/item couplingsin explicit user/item metadata and latent user/item factors bycoupled MF (Li, Xu, and Cao 2015), coupled Poisson fac-torization (Do and Cao 2018a) and deep models CoupledCF(Zhang et al. 2018).

In this paper, we design Influential-Context AggregationUnits (ICAUs) to aggregate all user/item influences in a con-text into an embedding, namely Influential Context Embed-ding (ICE). Taking ICAUs as the building blocks, we con-struct an HERS that considers both user’s and item’s influ-ential contexts when making recommendations. The maincontributions of this work include:

• An HERS framework is proposed to model the heteroge-neous relations in recommendation tasks by consideringboth the user’s and the item’s influential contexts.

• As the building block of the HERS framework, ICAUsare designed to aggregate all users’/items’ influences in acontext into an ICE as the representation of this influentialcontext.

• The ICAU-based HERS empowers the interpretability onrecommended results in terms of measuring the strengthof influence from relevant users and items.

We conduct extensive experiments on two real datasets withheterogeneous relations. The results show the effectivenessof our method on recommendation quality and its superior-ity in terms of handling both user and item cold-start andsparsity problems. In addition, we visualize the learned in-fluential contexts to interpret the recommendations.

Related WorkWe mainly review the work which somehow incorporate theuser-user, user-item and item-item coupling relationships inRS (Cao 2016). With regard to the user-user relation, clas-sic RSs consider social relationships in RS by extending CFmodels such as by modeling social network relations (Tang,Hu, and Liu 2013), SoRec (Ma et al. 2008) to co-factorizethe user-item matrix and the user-user social relation matrixby sharing the common user latent factor matrix, and So-cialMF (Jamali and Ester 2010) and SoReg (Ma et al. 2011)to regularize the user latent factor vector of a target user via

the user latent factor vectors of their trusters. However, so-cial RSs only incorporate additional information from userside but ignoring the relevance between items.

Several recent works jointly model multiple relations.Collective Matrix Factorization (CMF) (Singh and Gordon2008) represents these relations in terms of a collection ofcoupled matrices with sharing the latent factors along thecoupled dimensions. Then, CMF conducts co-factorizationon these coupled matrices to learn the latent factor matri-ces. Factorization Machine (FM) (Rendle 2012) factorizesthe interactions between each pair of features via a designmatrix to represent multiple relations (Rendle 2013). How-ever, the above models assume the relations are homoge-neous but ignore the heterogeneous interactions within andbetween users and items. These factorization methods onlymodel the first-order influences from direct neighbors andfail to consider to higher order influences in a context.

More recent work is to model both explicit and implicituser-user, item-item, and user-item couplings (Cao 2015;Cao 2016). Coupled Matrix Factorization (Li, Xu, and Cao2015) involves user couplings and item couplings into MF.Coupled Poisson Factorization (Do and Cao 2018a) inte-grates user/item metadata and their relations into PoissonFactorization for large and sparse recommendation. Dy-namic Matrix Factorization mGDMF (Do and Cao 2018b)involves a conjugate Gamma-Poisson model by incorporat-ing metadata influence to effectively and efficiently modelmassive, sparse and dynamic recommendations. Deep neuralmodels have also been incorporated into RSs, e.g., (Wang etal. 2017) extended the Neural Collaborative Filtering (NCF)(He et al. 2017) to cross-domain social recommendations,and a Neural Social Collaborative Ranking (NSCR) modelbased on NCF (Wang et al. 2017) seamlessly integrates user-item interactions from the information domain and the user-user relationships from the social domain. The Neural Fac-torization Machine (NFM) extends FM with neural networksby adding multiple hidden layers to learn non-linear interac-tions. CoupledCF (Zhang et al. 2018) learns explicit and im-plicit user-item couplings in recommendation for deep col-laborative filtering. However, all these methods only modelpairwise interactions instead of all influences in the influ-ential contexts. Moreover, they cannot tell the strengths ofinfluence from each user or item.

The Neural Factorization Machine (NFM) extends FMwith neural networks by adding multiple hidden layers tolearn non-linear interactions. However, all these methodsonly model pairwise interactions instead of all influencesin the influential contexts. Moreover, they cannot tell thestrengths of influence from each user or item.

Problem Formulation and Model ArchitectureIn this section, we formulate the influences of multiple het-erogeneous relations in RSs, and then present the frameworkof modeling influential user/item contexts.

Problem FormulationIn this paper, we aim to build an HERS which exploits theinformation learned from user-user, user-item and user-item

UIC Aggregator: AU IIC Aggregator: AI

User-item Interaction Scorer: SUI

Item ICE

... ...

User ICE

et e1 ...

rUt rIt

User Representer: EU

Target User Embedding

Influential Users’ Embeddings

eM vt

Target Item Embedding

Influential Items’ Embeddings

... vNv1

User Influential Context Item Influential Context

User-item Interaction

itut

Item Representer: EI

Figure 2: The architecture of HERS for modeling user-iteminteraction with user’s and item’s influential contexts

relations for more effective recommendation. The architec-ture of proposed HERS model is illustrated in Figure 1.

Let u and U ={u1, u2, · · · , u|U|

}denote a user and the

whole user set. RU denotes the user-user relation. Let i andI =

{i1, i2, · · · , i|I|

}denote an item and the whole item

set. RI denotes the item-item relation. The user-item inter-actions are denoted as RUI = {yu,i|u ∈ U , i ∈ I}, andyu,i could be explicit feedback, e.g., ratings or implicit feed-back, e.g., clicks. Generally, each user-item interaction yu,iis not only decided by user u and item i but also other usersand items in the influential context (Cao 2016). Hence, weformally define a user’s and an item’s influential contexts tomodel the user-item interactions.Definition 1. User Influential Context (UIC): Given a tar-get user u, the UIC denotes Cu = {Uu,Ru}, where Uu ={u,Ucu} consists of target user u and all influential users rel-evant to u,Ru denotes the user relationships over Uu.Definition 2. Item Influential Context (IIC): Given a targetitem i, the IIC denotes Ci = {Ii,Ri}, where Ii = {i, Ici }consists of target item i and all influential items relevant toi,Ri denotes all the item relationships over Ii.

Interaction Score Decomposition: Each user-item inter-action yu,i can be measured by a score function s in termsof the UIC Cu and the IIC Ci; formally, s : s(Cu, Ci, yu,i) 7→s〈Cu,Ci〉. According to Definitions 1 and 2, the overall inter-action score s〈Cu,Ci〉 can be decomposed into four scores:

s〈Cu,Ci〉 = λ1s〈u,i〉+λ2s〈u,Ici 〉+λ3s〈Ucu,i〉+λ4s〈Uc

u,Ici 〉 (1)

where s〈u,i〉 scores user u’s preference on item i; s〈u,Ici 〉scores u’s preference on influential items Ici ; s〈Uc

u,i〉 scoresrelevant users’ preference on i, and s〈Uc

u,Ici 〉 scores the sub-sidiary preference between influential users and influentialitems. λ1, λ2, λ3, λ4 are the scale parameters for weighingthese scores.

Model ArchitectureThe architecture of HERS with the user’s and item’s in-fluential contexts is illustrated in Figure 2, which consists

of five components: User Representer EU , UIC AggregatorAU , Item Representer EI , IIC Aggregator AI and user-iteminteraction scorer SUI . Given a target user ut and the cor-responding UIC Cut

, a target item it and the correspondingIIC Cit :• User Representer EU : it maps target user ut and its influ-

ential users in UIC to the corresponding user embeddings,i.e., EU (Uut

) 7→ Eutwhere Eut

= {et, e1, · · · eM}.• Item Representer EI : it maps target item it and its influ-

ential items in IIC to the corresponding item embeddings,i.e., EI(Iit) 7→ Eit where Eit = {vt,v1, · · ·vN}.

• UIC Aggregator AU : it learns a representation rUt for theinfluential context Cut , namely influential context embed-ding (ICE). Formally, we have AU (Cut , Eut) 7→ rUt .

• IIC Aggregator AI : it learns it’s ICE by aggregating theinfluential context Cit , that is, AI(Cit , Eit) 7→ rIt .

• User-item Interaction Scorer SUI : it learns to score theinteraction strength between the target user-item pair〈ut, it〉 in terms of the user ICE rUt and the item ICE rIt ,namely SUI(rUt , r

It , yut,it) 7→ s〈Cu,Ci〉 (cf. Eq. 1).

This architecture can be implemented w.r.t. many concretemethods, e.g., a mixture model. In this paper, we implementit w.r.t. neural networks which have been proved mostly ef-fective and efficient in recent years.

Neural ICAU-based HERS ModelIn this section, we present the details of a neural model toimplement the HERS architecture illustrated in Figure 2.Specially, we design the neural ICAUs to learn ICEs as thecore component of this HERS model.

Influential-context Aggregation UnitThe ICAUs aim to aggregate the user embeddings Eut

orthe item embeddings Eit in a context into an ICE accordingto the strength of influence from each user or item. Figure3 demonstrates an ICAU for aggregating user embeddingsEut , which consists of a two-stage aggregation: S1 and S2.

S1: This stage outputs the subsidiary influence embed-ding ct through an aggregation function h(·) over the influ-ential users’ embeddings:

{α1, · · ·αK} = a(e1, · · · eK) (2)ct = h(e1, · · · eK |α1, · · ·αK). (3)

where αi denotes the influential strength modeled by a func-tion a(·).

S2: This stage generates the ICE by aggregating the sub-sidiary influence context embedding ct and the target em-bedding et through a gate function f(·):

g = f(ct, et) (4)rt = gct + (1− g)et (5)

In ICAU, h and f could be any linear or non-linear func-tions. In this work, we implement h by multilayer neural net-works in terms of the attention mechanism. And f is imple-mented with a gate neural network. Note that the ICAUs canbe used in cascade to aggregate higher order ICEs, which ispresented in the next subsection.

S2

...e1

h

× g

× 1-g

+

f

α1

eK

α2

e2

αK

Influential Users’ Embeddings

Influential Context

Embeddingct

et rt

Target UserEmbedding

S1

Figure 3: Influential-Context Aggregation Unit (ICAU): Atwo-stage aggregation model to construct ICE

User’s Influential Context EmbeddingGiven a user influential context Cut

= {Uut,Rut

} (cf. Def-inition 1), Uut

consists of the target user ut and her/hisfirst-order influential neighbors {ut,m}1≤m≤M , and eachut,m’s neighbors {ut,m,k}1≤k≤Km

, i.e., second-order influ-ential neighbors of ut, according to the relationshipsRut

.To learn the ICE from the UIC Cut , we first employ the

User Representer EU to extract the user embeddings forall users in Uut . The embedding for a first-order neighborut,m is denoted as et,m, and the embedding for a second-order neighbor ut,m,k is denoted as et,m,k. To generate theICE for Cut

, we construct an ICAU-based two-level tree-likemodel as the UIC Aggregator AU to recursively embed bothsecond-order and first-order neighbors’ influences. Firstly,the second level ICAUs are used to generate the ICE rt,mw.r.t. each first-order neighbor ut,m. Then, the first levelICAU recursively generates the target ICE rUt by aggregat-ing these first-order ICEs {rt,m}. More details are presentedas follows.

The Second Level ICAUs Taking a first-order influentialneighbor ut,m of target user ut as an example, we presenthow to implement an ICAU with neural networks for gener-ating the ICE rt,m w.r.t. ut,m in two stages.

S1: We adopt the attention mechanism to model the influ-ential strength for each neighbor ut,m,k of ut,m. Specifically,we construct a three-layer attention neural network to weightthe influence according to the user embedding et,m,k. First,et,m,k is projected into hidden units by a tanh layer to cap-ture nonlinear interaction:

ht,m,k = tanh(W(1)et,m,k + b) (6)

where the weight matrix W(1) ∈ RL×L and we omit thebias term b in the following equations for concision.

Then, the normalized influence for each neighbor isscored by the softmax(xk) = exk/

∑j exj function:

αUt,m,k = softmax(isrθ

(W(2)ht,m,k

))(7)

where the weight matrix W(2) ∈ R1×L and isrθ is an in-verse square root unit which is defined as follows:

isrθ(x) =x√

1 + θx2(8)

where θ is the parameter which decides the range of functionisrθ(·). A large θ makes the upper bound and lower boundclose to 0; as a result, the softmax tends to output uniformweights. On the contrary, the softmax tends to output a sin-gle large weight with a small θ.

Further, the subsidiary influence embedding ct,m isaggregated from the influential neighbors’ embeddings{et,m,k} according to their influence strengths {αUt,m,k}:

ct,m =∑Km

k=1αUt,m,ket,m,k (9)

S2: The ICE rt,m w.r.t. the first-order user ut,m is calcu-lated as follows:

rt,m = gt,mct,m + (1− gt,m)et,m (10)

where g measures the influence strength from the second-order neighbors. The influential gate g is modeled by a gateneural network:

gt,m = σ(isrθ

(W(3) tanh

(W(4)ct,m +W(5)et,m

)))(11)

where σ(z) = 1/(1 + e−z), W(4),W(5) ∈ RL×L andW(3) ∈ R1×L.

The First Level ICAU When we obtain the ICEs {rt,m}w.r.t. all first-order influential neighbors, another ICAU atthe first level is used to learn the target ICE:

cUt =∑M

m=1αUt,mrt,m (12)

where αUt,m denotes the influence strength w.r.t. the first-order ICE rt,m, which is calculated by another three-layerattention network with the same form as Eqs. 6 and 7. Then,we get the ICE rUt w.r.t. the target user ut:

rUt = gUt cUt + (1− gUt )et (13)

where gUt is learned by a gate neural network which has thesame structure with Eq. 11.

Item’s Influential Context EmbeddingGiven an IIC Ci = {Ii,Ri}, Ri is often built with the itemrelevance. Different from the indirect influence in user-userrelation modeling, a user normally only consider those itemsdirectly relevant to the target item when they make a choice.Therefore, we only consider the first-order influential neigh-bors of a target item for modeling IIC. Given {it,n}1≤n≤Nw.r.t. target item it, their corresponding embeddings vt and{vt,n}1≤n≤N are retrieved by Item RepresenterEI . As a re-sult, we use an ICAU to learn the item ICE. The subsidiaryinfluence embedding cIt is calculated as:

cIt =∑N

n=1αIt,nvt,n (14)

where the influential strength αIt,n of vt,n is calculated by athree-layer attention network as in the UIC Aggregator.

αIt,n = softmax(isrθ

(W(6) tanh (W(7)vt,n)

))(15)

where W(6) ∈ RL×L and W(7) ∈ R1×L.

Subsequently, the ICE rIt w.r.t. it is obtained by aggre-gating the subsidiary influence embedding cIt and the targetitem embedding vt:

rIt = gIt cIt + (1− gIt )vt (16)

where the influential gate gIt is also learned from a three-layer gate neural network as in Figure 3.

gIt = σ(isrθ

(W(8) tanh

(W(9)cIt +W(10)vt

)))(17)

where W(8),W(9) ∈ RL×L and W(10) ∈ R1×L.

User-item Interaction RankingEach user-item connection denotes a user’s selection on anitem. User-item connections can be regarded as one-classpreference data (Hu et al. 2016) which cannot differentiateuser preferences. To handle the one-class problem, we treatthe learning on the user-item interactions as a ranking prob-lem (Rendle et al. 2009). Given a user ut, we construct acontrastive item pair to specify the preference order. A pos-itive item ip is the one which has an observed connection tout w.r.t. user-item relations, i.e., a user-selected item, whilea pseudo-negative item in refers to the one without a con-nection to ut. Then, we have the preference order 〈ut, ip〉 �〈ut, in〉. Accordingly, we have S〈ut,ip〉 ≥ S〈ut,in〉 whereS〈ut,i〉 denotes the preference score on item i by the innerproduct of rt and ri:

S〈ut,it〉 = rU>t rIt (18)

Then, we use the max-margin loss (LeCun et al. 2006) tooptimize the ranking order over pairs:

L〈ut,ip〉�〈ut,in〉 = max{0,m− S〈ut,ip〉 + S〈ut,in〉} (19)

where m = 10 is set as the maximum margin in this paper.Remark for Scoring Model: if we expand Eq. 18, we

obtain the following form by using Eqs. 13 and 16:

S〈ut,it〉 =(gUt c

Ut + (1− gUt )et

)>(gIt c

It + (1− gIt )vt

)= (1− gUt )(1− gIt )e>t vt + gUt (1− gIt )cU>t vt

+ (1− gUt )gIt e>t cIt + gUt gIt cU>t cIt (20)

According to Eq. 1, we find that (i) s〈u,i〉 is modeled bye>t vt; (ii) s〈u,Ici 〉 is modeled by e>t c

It ; (iii) s〈Uc

u,i〉 is mod-eled by cU>t vt, and (iv) s〈Uc

u,Ici 〉 is modeled by cU>t cIt . Cor-respondingly, λ1 = (1 − gUt )(1 − gIt ), λ2 = gUt (1 − gIt ),λ3 = (1 − gUt )g

It and λ4 = gUt g

It are learned to weigh

these scores. Therefore, the above-expanded terms providean insight into how the influences in the UIC and the IIC areembedded in our model to affect the final recommendation.

Training ProcedureFor each user selection 〈ut, ip〉, we can construct a triplet〈ut, ip, in〉 to optimize the ranking loss L〈ut,ip〉�〈ut,in〉.Then the loss of a mini-batch B for training is given as:

L =1

|B|∑

〈ut,ip,in〉∈B

L〈ut,ip〉�〈ut,in〉 (21)

Table 1: Statistics of the datasets: Delicious and Lastfm

Property User-user Item-item User-Item

Del

icou

s #Entity 1,892 17,632 1,892+17,632#Link 25,434 199,827 104,799#Link/#Entity 13.44 22.66 5.37Sparsity 0.0071 0.0006 0.0031

Las

tfm

#Entity 1,867 69,226 1,867+69,226#Link 15,328 682,314 92,834#Link/#Entity 8.24 15.75 3.03Sparsity 0.0044 0.0001 0.0007

To learn the parameters, we adopt a gradient decent-basedalgorithm over ∂L/∂W w.r.t. each weight matrix W in ourmodel. We implement our model using Keras (Chollet andothers 2015) with Tensorflow GPU version as backend. Weuse Adam (Kingma and Ba 2014) as the gradient optimizerand the mini-batch size is set to 200. The code is availableat: https://github.com/rainmilk/aaai19hers.

ExperimentsIn this section, we conduct experiments on two real datasetsto compare the recommendation quality of our approachwith other state-of-the-art recommendation methods.

Data PreparationMost public datasets for recommendation only involve theuser-item relation, so it is not easy to find a dataset contain-ing all user-user, user-item and item-item relations. Fortu-nately, two datasets, Delicious1 and Lastfm2, provided byRecSys Challenge 2011 (Cantador, Brusilovsky, and Kuflik2011) can satisfy our requirement.

The Delicious dataset contains social networking, book-marking, and tagging information from the Delicious so-cial bookmarking system. Contact relationships are identi-fied between users when they are mutual fans in Delicious,which is used as the user-user relation. The user-item rela-tionships are constructed from users and their bookmarkeditems. The item-item relationships are built on the commontags between items. Given a target item, we assign links totop 10 items which have the most common tags.

The Lastfm dataset contains social networking, tagging,and music artist information from the Last.fm online musicsystem. The friendships between users are used as the user-user relation. The items are artists who are connected withusers if the artists’ musics are listened by these users. Thelistening relationships between users and artists are servedas the user-item relation. The item-item relationships arebuilt through the tags using the same method as Delicious.The statistics of the datasets are summarized in Table 1which contains the number of entities (i.e., users, items), thenumber of links (i.e., user-user relationships, item-item re-lationships, user-item interactions), the number of averagelinks per entity, and the sparsity for each type of relations.

1http://www.delicious.com2http://www.last.fm

https://github.com/rainmilk/aaai19hers

Table 2: Item recommendation for test users of Delicious and Lastfm

Delicious Lastfm

MAP@5 MAP@20 nDCG@5 nDCG@20 MAP@5 MAP@20 nDCG@5 nDCG@20BPR-MF 0.4157 0.3225 0.4318 0.3744 0.5154 0.4586 0.6252 0.6334SoRec 0.4174 0.3390 0.4476 0.3965 0.5350 0.4775 0.6412 0.6457Social MF 0.4181 0.3409 0.4520 0.4017 0.5489 0.4907 0.6544 0.6575SoReg 0.4239 0.3444 0.4577 0.4056 0.5495 0.4878 0.6548 0.6541CMF 0.4375 0.3507 0.4739 0.4158 0.5530 0.4928 0.6549 0.6749FM 0.4246 0.3363 0.4522 0.3896 0.5366 0.4837 0.6453 0.6723NFM 0.4565 0.3754 0.4924 0.4347 0.5462 0.4885 0.6516 0.6702ICAU-HERS 0.5477 0.4200 0.6064 0.5273 0.5865 0.5302 0.6913 0.7021

As both datasets are very sparse, the additional informationleveraged from the social networks and the item-item rela-tions will benefit the recommendation.

Experimental SettingsTo evaluate the ranking accuracy, we adopt two metrics:Mean Average Precision (MAP) and normalized DiscountedCumulative Gain (nDCG), to measure the quality of prefer-ence ranking and top-N recommendation.

Comparison Methods: The following representativemethods are selected for comparison:

• BPR-MF (Rendle et al. 2009): it uses the BPR-based op-timization on the user-item relation by MF.

• SoRec (Ma et al. 2008): it jointly factorizes the social-relation matrix and a user-item interaction matrix.

• SoicalMF (Jamali and Ester 2010): it adds regularizationinto MF according to social information.

• SoReg (Ma et al. 2011): it leverages social relationships toregularize the user’s latent factors.

• CMF (Singh and Gordon 2008): it is adopted to co-factorize the user-user, user-item and item-item relations.

• FM (Rendle 2012): it embeds features into a latent spaceand models the interactions between each pair of features.We integrate user-user, user-item, item-item relations asthe features.

• NFM (He and Chua 2017): a deep neural network-basedFM with multiple hidden layers to learn non-linear inter-actions.

• ICAU-HERS: ICAU-based HERS proposed in this paper.

Parameter Settings: The lengths of user/item embed-dings and context embeddings are set to 128. To acceler-ate the model, we choose top 10 first-order neighbors foreach target user and 10 second-order neighbors for eachfirst-order neighbor, and 10 neighbors for each item in theitem-item relation ranked by the influence weights, cf. Eq.7. The θ in Eq. 19 is set to 16.

Recommendation PerformanceWe construct the testing set by holding out 20% user-iteminteractions as the ground truth. For each hold-out test sam-ple in the testing sets, we randomly draw ten noisy samples

to test whether the testing methods can successfully rank thetrue test sample at a top position out of the noisy samples.The remaining data of the user-item relation is used for train-ing, together with the user-user and the item-item relations.

Overall Comparison: Table 2 reports MAP and nDCGat top 5 and 10 over all testing users. Among all methods,ICAU-HERS achieves the best performance in terms of allmetrics on both datasets. Specifically, ICAU-HERS demon-strates approximately 20% improvement over the second-place method NFM on Delicious and 6.1% improvementover the second-place method CMF on Lastfm in terms ofMAP@5.

Effect of User-user Relation Modeling: BPR-MF moreeasily suffers from data sparsity than other comparisonmethods, because it cannot borrow information from the so-cial relationships. As a result, it achieves the worst perfor-mance. In comparison, other methods benefit from the infor-mation in the user social network. Especially, ICAU-HERSoutperforms all other methods, which should thank to ICAUfor precisely weighing the influence from different users andaggregating high-order influences.

Effect of Item-item Relation Modeling: CMF, FM andNFM consider the item-item relation, which makes therecommendation more effective than the social relationonly methods. Compared with MF and FM-based methods,ICAU-HERS demonstrates its superiority on integrating het-erogeneously relational recommendation data with the influ-ence aggregation modeling.

Recommendation for Cold-start Users and ItemsThe cold-start problem is ubiquitous in RSs which can becategorized into cold-start users and cold-start items. In theuser cold-start problem, for new users, we recommend itemsto them. The ranks over the recommended items are re-garded as the evaluation criteria. In the item cold-start prob-lem, for new items, we recommend them to different users.The ranks over the recommended users are used to evalu-ate the performance. In this section, we show the ICAU-HERS ability against comparison methods of handling userand item cold-start problem respectively.

Cold-start Users: To test the recommendation for cold-start users, we randomly remove 20% users and all theirlinks from the user-item relation as the training set. BPR-MF only considers the user-item relation, so it cannot be

MAP@5 nDCG@50

0.1

0.2

0.3

0.4

0.5

SocRec

Social MF

SocReg

CoMF

FM

NFM

ICAU-HERS

(a) Delicious

MAP@5 nDCG@50

0.1

0.2

0.3

0.4

0.5

0.6

(b) Lastfm

Figure 4: Item recommendation for cold-start users of Deli-cious and Lastfm

MAP@5 nDCG@50

0.1

0.2

0.3

0.4

0.5

0.6

CoMF

FM

NFM

ICAU-HERS

(a) Delicious

MAP@5 nDCG@50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

(b) Lastfm

Figure 5: User recommendation for cold-start items of Deli-cious and Lastfm

used in the cold-start scenario. The performance comparedwith other methods are shown in Figure 4. ICAU-HERS sig-nificantly outperforms other methods on both datasets. OnDelicious, in terms of MAP@5, the relative improvement ofICAU-HERS over SoRec, Social MF, SoReg, CMF, FM andNFM is 42.3%, 32.2%, 29.4%, 21.6%, 19.7% and 12.6% re-spectively. On Lastfm, in terms of MAP@5, the relative im-provement of ICAU-HERS over SoRec, Social MF, SoReg,CMF, FM and NFM is 55.1%, 58.2%, 36.6%, 54.2% 30.7%and 26.1% respectively. Eq. 20 gives the insight into whyICAU-HERS can handle this cold-start problem more effec-tively. This is because the ICEs of users have embedded theinformation from their neighbors in the influential context,even when no historical user selection is observed.

Cold-start Items: To cope with cold-start items, we ran-domly remove 20% items and all their connected edges fromthe user-item relation. For each testing item, we rank the pre-dictive scores over users. The MAP and nDCG results areshown in Figure 5. Since SoRec, Social MF and SoReg onlymodel the user-user relation, they cannot handle cold-startitems. Compared with CMF, FM and NFM, ICAU-HERSachieves the best performance on both datasets. Note thatICAU-HERS’s performance on cold-start items is not as sig-nificant as that for cold-start users. This can be interpretedthat the impact from other users is higher than the impactfrom relevant items on user’s item selection. However, theembedded influential item context can still provide usefulinformation when no selection is available for a new item.

41 Raised Fist

1348

184611

1808

799

1067

993

1276163

170

1165

1219

1143

1191

360219

1240

658

1673

1346

1559

57

1626

1597

1137

1580

1527

1444

29

488

514538

584 796

1743

1757

1779

1333

789

864

75778

1891

950

1828

103

711

791

287

18751063

878

49

1369

200

1057

399

418

650

1555

1142

424

291

644

985

716

1544

1417

334

1186

226

1027

1416

242

225188

252

2841099

1890

1872 726

1400

1360

1328

553

1188

1411

6381238

1341

1338 Ratos de Porão

DS-13

seis35

Russian singer

Lagwagon

Killing Chainsaw

Suicidal Tendencies

Russian band

A Wilhelm Scream

Golliwog

Figure 6: The visualization of influential contexts of a sam-pled user selection on an artist in the Lastfm dataset. Theartists in the item network are labeled by their names andthe anonymous users in the user network are labeled withtheir IDs. The thickness of edges specifies the significanceof influence.

Visualization and Interpretation

To interpret the influence from influential users and itemswhen a user makes an item selection, we randomly choose auser and her/his selection on an artist (User 41 with RaisedFist) from the LastFM dataset as a case study. We visualizethe influential context w.r.t. the target user and the target itemin Figure 6 by differentiating the influence with differentedge thickness. We can find that the influence from differentusers/items is quite different, for example, User 1808 andUser 184 have the common neighbor 1626 but the influenceof User 1626 on User 184 is much larger than that on User1808. In the item network, we observe similar cases fromthe influential context w.r.t. Raised Fist. Therefore, these in-fluential contexts can provide the interpretation of how thetarget users and the target items are influenced by the influ-ential users and items to form the connection.

ConclusionWe propose a framework for modeling influential contextsin recommender systems with the user-user, the item-itemand the user-item relations, and further implement a neu-ral ICAU-based HERS model. In particular, this HERS caneffectively learn the ICEs from user’s and item’s influen-tial contexts through ICAU. Extensive experiments showthat ICAU-HERS outperforms other state-of-the-art meth-ods and it is effective for handling the user/item cold-startand sparsity problems. In addition, we demonstrate the in-terpretability of ICAU-HERS with a real case study.

ICAU is a general influence embedding model which canbe applied to other domains with heterogeneous networks,such as user group behavior analysis and biological interac-tion network, apart from recommender systems. Moreover,it is easy to incorporate content information of each entity tobetter interpret the influence propagation (Jian et al. 2019).

AcknowledgementsThe paper is partially supported by the National Natural Sci-ence Foundation of China (61572325); Shanghai Key Pro-grams of Science and Technology(16DZ1203603); Shang-hai Engineering Research Center Project (GCZX14014 andC14001).

References[Cantador, Brusilovsky, and Kuflik 2011] Cantador, I.;Brusilovsky, P.; and Kuflik, T. 2011. 2nd workshop oninformation heterogeneity and fusion in recommendersystems (hetrec 2011). In HetRec 2́011.

[Cao 2015] Cao, L. 2015. Coupling learning of com-plex interactions. Information Processing & Management51(2):167–186.

[Cao 2016] Cao, L. 2016. Non-iid recommender systems: Areview and framework of recommendation paradigm shift-ing. Engineering 2(2):212–224.

[Chollet and others 2015] Chollet, F., et al. 2015. Keras.https://github.com/fchollet/keras.

[Do and Cao 2018a] Do, T. D. T., and Cao, L. 2018a. Cou-pled poisson factorization integrated with user/item meta-data for modeling popular and sparse ratings in scalable rec-ommendation. In AAAI’2018, 2918–2925.

[Do and Cao 2018b] Do, T. D. T., and Cao, L. 2018b.Gamma-poisson dynamic matrix factorization embeddedwith metadata influence. In NIPS’2018.

[Friedkin and Johnsen 2011] Friedkin, N. E., and Johnsen,E. C. 2011. Social influence network theory: A sociologi-cal examination of small group dynamics, volume 33. Cam-bridge University Press.

[He and Chua 2017] He, X., and Chua, T.-S. 2017. Neuralfactorization machines for sparse predictive analytics. In SI-GIR’2017, 355–364.

[He et al. 2017] He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.;and Chua, T.-S. 2017. Neural collaborative filtering. InWWW’2017, 173–182.

[Hu et al. 2014] Hu, L.; Cao, J.; Xu, G.; Cao, L.; Gu, Z.; andCao, W. 2014. Deep modeling of group preferences forgroup-based recommendation. In AAAI’2014, 1861–1867.

[Hu et al. 2016] Hu, L.; Cao, L.; Cao, J.; Gu, Z.; Xu, G.; andYang, D. 2016. Learning informative priors from hetero-geneous domains to improve recommendation in cold-startuser domains. ACM Transactions on Information Systems(TOIS) 35(2):13.

[Hu et al. 2017a] Hu, L.; Cao, L.; Cao, J.; Gu, Z.; Xu, G.;and Wang, J. 2017a. Improving the quality of recommen-dations for users and items in the tail of distribution. ACMTransactions on Information Systems (TOIS) 35(3):25.

[Hu et al. 2017b] Hu, L.; Cao, L.; Wang, S.; Xu, G.; Cao, J.;and Gu, Z. 2017b. Diversifying personalized recommenda-tion with user-session context. In IJCAI’2017, 1858–1864.

[Jamali and Ester 2010] Jamali, M., and Ester, M. 2010. Amatrix factorization technique with trust propagation for rec-ommendation in social networks. In RecSys’2010, 135–142.ACM.

[Jian et al. 2019] Jian, S.; Hu, L.; Cao, L.; Lu, K.; and Gao,H. 2019. Evolutionarily learning multi-aspect interactionsand influences from network structure and node content. InAAAI’2019.

[Jiang et al. 2014] Jiang, M.; Cui, P.; Wang, F.; Zhu, W.; andYang, S. 2014. Scalable recommendation with social con-textual information. IEEE Transactions on Knowledge andData Engineering 26(11):2789–2802.

[Kingma and Ba 2014] Kingma, D., and Ba, J. 2014. Adam:A method for stochastic optimization. arXiv preprintarXiv:1412.6980.

[LeCun et al. 2006] LeCun, Y.; Chopra, S.; Hadsell, R.; Ran-zato, M.; and Huang, F. 2006. A tutorial on energy-basedlearning. Predicting Structured Data 1:0.

[Li, Xu, and Cao 2015] Li, F.; Xu, G.; and Cao, L. 2015.Coupled matrix factorization within non-iid context. InPAKDD’2015, 707–719.

[Ma et al. 2008] Ma, H.; Yang, H.; Lyu, M. R.; and King, I.2008. Sorec: social recommendation using probabilistic ma-trix factorization. In CIKM’2008, 931–940. ACM.

[Ma et al. 2011] Ma, H.; Zhou, D.; Liu, C.; Lyu, M. R.; andKing, I. 2011. Recommender systems with social regular-ization. In WSDM’2011, 287–296. ACM.

[Rendle et al. 2009] Rendle, S.; Freudenthaler, C.; Gantner,Z.; and Schmidt-Thieme, L. 2009. Bpr: Bayesian personal-ized ranking from implicit feedback. In UAI’2009, 452–461.

[Rendle 2012] Rendle, S. 2012. Factorization machines withlibfm. ACM Transactions on Intelligent Systems and Tech-nology (TIST) 3(3):57.

[Rendle 2013] Rendle, S. 2013. Scaling factorization ma-chines to relational data. In Proceedings of the VLDB En-dowment, volume 6, 337–348. VLDB Endowment.

[Singh and Gordon 2008] Singh, A. P., and Gordon, G. J.2008. Relational learning via collective matrix factorization.In KDD’2008, 650–658.

[Tang et al. 2012] Tang, J.; Gao, H.; Liu, H.; and Das Sarma,A. 2012. etrust: Understanding trust evolution in an onlineworld. In KDD’2012, 253–261.

[Tang, Hu, and Liu 2013] Tang, J.; Hu, X.; and Liu, H. 2013.Social recommendation: a review. Social Network Analysisand Mining 3(4):1113–1133.

[Vasuki et al. 2010] Vasuki, V.; Natarajan, N.; Lu, Z.; andDhillon, I. S. 2010. Affiliation recommendation using aux-iliary networks. In RecSys’2010, 103–110.

[Wang et al. 2017] Wang, X.; He, X.; Nie, L.; and Chua, T.-S. 2017. Item silk road: Recommending items from infor-mation domains to social users. In SIGIR’2017, 185–194.ACM.

[Zhang et al. 2018] Zhang, Q.; Cao, L.; Zhu, C.; Li, Z.; andSun, J. 2018. Coupledcf: Learning explicit and implicituser-item couplings in recommendation for deep collabora-tive filtering. In IJCAI’2018, 3662–3668.

https://github.com/fchollet/keras

hers: modeling inﬂuential contexts with heterogeneous ... · hers: modeling inﬂuential contexts...

Documents