efficient similarity computation - acm recommender systems · matrix /and vector 0, in a single...

Contribution We introduce a novel algorithm, called “Dynamic Index”, for efficiently computing all exact pairwise similarities in a collection of sparse high-dimensional vectors, which are typical for recommender systems. The algorithm learns incrementally, making it suitable for real-time environments. We further exploit non-recommendable items to improve the computational efficiency of our method. By presenting our algorithm in a MapReduce-inspired formulation, it is easily parallellised and scalable. The algorithm is generally applicable, but we focus on neighbourhood-based collaborative filtering. Efficient Similarity Computation for Collaborative Filtering in Dynamic Environments Olivier Jeunen 1 , Koen Verstrepen 2 and Bart Goethals 1,2,3 1 Adrem Data Lab, University of Antwerp, Antwerp, Belgium 2 Froomle, Antwerp, Belgium 3 Faculty of Information Technology, Monash University, Melbourne, Australia Fig. 2: Runtime in seconds for Dynamic Index, when items only remain recommendable for time, along with the number of recommendable items that are left. Fig. 1: Runtime in seconds for a baseline tuned towards sparse data and our Dynamic Index algorithm (top plot). Runtime in seconds for our Dynamic Index algorithm, for a varying number of virtual cores (bottom plot). Intuition In the case of binary implicit feedback, cosine similarity between items can be computed as the number of users who have seen both items, normalised by their counts. cos , = | + ∩ - | + . | - | As such, we compute | + ∩ - | and + incrementally in a matrix and vector , in a single pass over the data. Simple unordered inverted indices ensure fast updates when new data arrives. Recommendability For many different use-cases, the set of recommendable items at a given time is a much smaller subset of the full item collection (due to recency, stock, business rules…). We exploit this imbalance by only keeping up-to-date scores for recommendable items. The set of recommendable items at time is denoted by 3 . Algorithm 1 Dynamic Index Input: A set of pageviews P t , a set of recommendable items R t . Output: A matrix of item intersections M, a vector of items’ l 1 -norms N, an inverted index of users to rec. items L r , an inverted index of users to non-rec. items L n . 1: M 0, N 0, 8u 2 U : L r [u] ;, L n [u] ; 2: for (u, i, s) 2 P t do 3: for j 2 L r [u] do 4: M i,j += 1 5: if i 2 R t then 6: for j 2 L n [u] do 7: M i,j += 1 8: L r [u]= L r [u] [ {i} 9: N i += 1 10: else 11: L n [u]= L n [u] [ {i} 12: return M, N, L r , L n Parallellisation For two models ℳ and ℳ’ computed on data stemming from disjoint sets of users, the combined model is simply the sum of their item co-occurrence matrices and count vectors. As such, we can easily parallellise Dynamic Index in a MapReduce-fashion. Conclusion o Dynamic Index is especially fast for highly sparse data o Incrementally computable, easily parallellisable o Future work includes looking into other similarity functions (Jaccard, PMI, Pearson), and extensions for non-binary data. Source code:

Upload: others

Post on 27-Jul-2020

0 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Efficient Similarity Computation - ACM Recommender Systems · matrix /and vector 0, in a single pass over the data. Simple unordered invertedindicesensure fastupdateswhen new data

ContributionWe introduce a novel algorithm, called “Dynamic Index”,

for efficiently computing all exact pairwise similarities in a collection of sparse high-dimensional vectors, which are typical for recommender systems. The algorithm learns incrementally, making it suitable for real-timeenvironments.

We further exploit non-recommendable items to improvethe computational efficiency of our method.

By presenting our algorithm in a MapReduce-inspired formulation, it is easily parallellised and scalable.

The algorithm is generally applicable, but we focus on neighbourhood-based collaborative filtering.

Efficient Similarity Computationfor Collaborative Filtering in Dynamic Environments

Olivier Jeunen1, Koen Verstrepen2 and Bart Goethals1,2,3

1Adrem Data Lab, University of Antwerp, Antwerp, Belgium2Froomle, Antwerp, Belgium

3Faculty of Information Technology, Monash University, Melbourne, Australia

Fig. 2: Runtime in seconds for Dynamic Index, when items only remain recommendable for 𝛿 time, along with the number of recommendable items that are left.

Fig. 1: Runtime in seconds for a baseline tuned towards sparse data and our Dynamic Index algorithm (top plot). Runtime in seconds for our Dynamic Index algorithm, for a varying number of virtual cores (bottom plot).

IntuitionIn the case of binary implicit feedback, cosine similarity

between items can be computed as the number of users who have seen both items, normalised by their counts.

cos 𝑖, 𝑗 =|𝑈+ ∩ 𝑈-|

𝑈+ . |𝑈-|As such, we compute |𝑈+ ∩ 𝑈-| and 𝑈+ incrementally in a

matrix 𝑀 and vector 𝑁, in a single pass over the data.Simple unordered inverted indices ensure fast updates when

new data arrives.

RecommendabilityFor many different use-cases, the set of recommendable

items at a given time is a much smaller subset of the fullitem collection (due to recency, stock, business rules…).

We exploit this imbalance by only keeping up-to-date scores for recommendable items. The set of recommendable items at time 𝑡 is denoted by 𝑅3.

Algorithm 1 Dynamic Index

Input: A set of pageviews Pt, a set of recommendable items Rt .Output: A matrix of item intersections M, a vector of items’

l1�norms N, an inverted index of users to rec. items Lr, aninverted index of users to non-rec. items Ln.

1: M 0, N 0, 8u 2 U : Lr[u] ;,Ln[u] ;2: for (u, i, s) 2 Pt do

3: for j 2 Lr[u] do4: Mi,j += 15: if i 2 Rt then

6: for j 2 Ln[u] do7: Mi,j += 18: Lr[u] = Lr[u] [ {i}9: Ni += 1

10: else

11: Ln[u] = Ln[u] [ {i}12: return M,N,Lr,Ln

ParallellisationFor two models ℳ and ℳ’ computed on data

stemming from disjoint sets of users, the combined model is simply the sum of their item co-occurrence matrices and count vectors. As such, we can easily parallellise Dynamic Index in a MapReduce-fashion.

Conclusiono Dynamic Index is especially fast for highly sparse datao Incrementally computable, easily parallellisableo Future work includes looking into other

similarity functions (Jaccard, PMI, Pearson),and extensions for non-binary data.

Source code:

Platform Games - Computing Science and Mathematics ...Drawing Sprites - Unordered List Unordered List – Maintain an unordered list of sprites – Scan through the entire list every

INTRODUCTION TO RECOMMENDER - Leuphana … · SYSTEMS AND THEIR EVALUATION Olga ... recommender systems Introduction Properties of recommender ... 11 Tutorial: Recommender Problems

Deep Unordered Composition Rivals Syntactic Methods …cs.umd.edu/~miyyer/data/acldan_slides.pdf · Deep Unordered Composition Rivals Syntactic Methods for Text Classiﬁcation Mohit

New RecSys - Rethinking Collaborave Filtering: A Prac0cal … · 2017. 10. 4. · ACM Conference on Recommender Systems (RecSys'13), October 2013, Hong Kong, China. Movie Features

Models for Ordered and Unordered Categorical Variablesliberalarts.utexas.edu/prc/_files/cs/Multinomial_Ordinal_Models.pdf · Models for Ordered and Unordered Categorical Variables

Multi-modal Knowledge Graphs for Recommender Systemszheng-kai.com/paper/cikm_2020_sun.pdf · modal Knowledge Graph ACM Reference Format: Rui Sun1†, Xuezhi Cao2, Yan Zhao3, Junchen

TrailMix: An Ensemble Recommender System for …faculty.cse.tamu.edu/caverlee/pubs/zhao18recsyschallenge.pdfComparison, Neural Network, Collaborative Filtering ACM Reference Format:

Deep Language-based Critiquing for Recommender SystemsGa Wu, Kai Luo, Scott Sanner, and Harold Soh. 2019. Deep Language-based Critiquing for Recommender Systems. In Thirteenth ACM

Unordered Tree Matching and Strict Unordered Tree Matching: the Evaluation of Tree Pattern Queries

Combining context features in sequence-aware recommender ...ceur-ws.org/Vol-2431/paper3.pdf · Combining context features in sequence-aware recommender systems ACM RecSys 2019 Late-breaking