current challenges and visions inmusic …current challenges and visions in music recommender...

Current Challenges and Visions inMusic Recommender Systems Research

Markus SchedlJohannes Kepler University

Linz, [email protected]

Hamed ZamaniUniversity of Massachuse�s

Amherst, [email protected]

Ching-Wei ChenSpotify USA Inc.

New York City, [email protected]

Yashar DeldjooPolitecnico di Milano

Milan, [email protected]

Mehdi ElahiFree University of Bozen-Bolzano

Bolzano, [email protected]

ABSTRACTMusic recommender systems (MRS) have experienced a boom inrecent years, thanks to the emergence and success of online stream-ing services, which nowadays make available almost all music inthe world at the user’s �ngertip. While today’s MRS consider-ably help users to �nd interesting music in these huge catalogs,MRS research is still facing substantial challenges. In particularwhen it comes to build, incorporate, and evaluate recommendationstrategies that integrate information beyond simple user–item in-teractions or content-based descriptors, but dig deep into the veryessence of listener needs, preferences, and intentions, MRS researchbecomes a big endeavor and related publications quite sparse.

�e purpose of this trends and survey article is twofold. We �rstidentify and shed light on what we believe are the most pressingchallenges MRS research is facing, from both academic and industryperspectives. We review the state of the art towards solving thesechallenges and discuss its limitations. Second, we detail possiblefuture directions and visions we contemplate for the further evolu-tion of the �eld. �e article should therefore serve two purposes:giving the interested reader an overview of current challenges inMRS research and providing guidance for young researchers byidentifying interesting, yet under-researched, directions in the �eld.

KEYWORDSmusic recommender systems; challenges; automatic playlist contin-uation; user-centric computing

1 INTRODUCTIONResearch in music recommender systems (MRS) has recently expe-rienced a substantial gain in interest both in academia and indus-try [162]. �anks to music streaming services like Spotify, Pandora,or Apple Music, music a�cionados are nowadays given access to

�is research was supported in part by the Center for Intelligent Information Retrieval.Any opinions, �ndings and conclusions or recommendations expressed in this materialare those of the authors and do not necessarily re�ect those of the sponsors.Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor pro�t or commercial advantage and that copies bear this notice and the full citationon the �rst page. Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s).Conference’17, Washington, DC, USA© 2017 Copyright held by the owner/author(s). 978-x-xxxx-xxxx-x/YY/MM. . .$15.00DOI: 10.1145/nnnnnnn.nnnnnnn

tens of millions music pieces. By �ltering this abundance of musicitems, thereby limiting choice overload [21], MRS are o�en verysuccessful to suggest songs that �t their users’ preferences. How-ever, such systems are still far from being perfect and frequentlyproduce unsatisfactory recommendations. �is is partly because ofthe fact that users’ tastes and musical needs are highly dependenton a multitude of factors, which are not considered in su�cientdepth in current MRS approaches, which are typically centered onthe core concept of user–item interactions, or sometimes content-based item descriptors. In contrast, we argue that satisfying theusers’ musical entertainment needs requires taking into account in-trinsic, extrinsic, and contextual aspects of the listeners [3], as wellas more decent interaction information. For instance, personalityand emotional state of the listeners (intrinsic) [72, 147] as well astheir activity (extrinsic) [76, 184] are known to in�uence musicaltastes and needs. So are users’ contextual factors including weatherconditions, social surrounding, or places of interest [3, 100]. Alsothe composition and annotation of a music playlist or a listeningsession reveals information about which songs go well together orare suited for a certain occasion [126, 194]. �erefore, researchersand designers of MRS should reconsider their users in a holisticway in order to build systems tailored to the speci�cities of eachuser.

Against this background, in this trends and survey article, weelaborate on what we believe to be amongst the most pressingcurrent challenges in MRS research, by discussing the respectivestate of the art and its restrictions (Section 2). Not being able totouch all challenges exhaustively, we focus on cold start, automaticplaylist continuation, and evaluation of MRS. While these problemsare to some extent prevalent in other recommendation domainstoo, certain characteristics of music pose particular challenges inthese contexts. Among them are the short duration of items (com-pared to movies), the high emotional connotation of music, andthe acceptance of users for duplicate recommendations. In thesecond part, we present our visions for future directions in MRSresearch (Section 3). More precisely, we elaborate on the topics ofpsychologically-inspired music recommendation (considering humanpersonality and emotion), situation-aware music recommendation,and culture-aware music recommendation. We conclude this articlewith a summary and identi�cation of possible starting points for theinterested researcher to face the discussed challenges (Section 4).

arX

iv:1

710.

0320

8v2

[cs

.IR

] 2

1 M

ar 2

018

�e composition of the authors allows to take academic as wellas industrial perspectives, which are both re�ected in this article.Furthermore, we would like to highlight that particularly the ideaspresented as Challenge 2: Automatic playlist continuation in Sec-tion 2 play an important role in the task de�nition, organization,and execution of the ACM Recommender Systems Challenge 20181

which focuses on this use case. �is article may therefore also serveas an entry point for potential participants in this challenge.

2 GRAND CHALLENGESIn the following, we identify and detail a selection of the grandchallenges, which we believe the research �eld of music recom-mender systems is currently facing, i.e., overcoming the cold startproblem, automatic playlist continuation, and properly evaluatingmusic recommender systems. We review the state of the art of therespective tasks and its current limitations.

2.1 Particularities of music recommendationBefore we start digging deeper into these challenges, we would �rstlike to highlight the major aspects that make music recommenda-tion a particular endeavor and distinguishes it from recommendingother items, such as movies, books, or products. �ese aspectshave been adapted and extended from a tutorial on music recom-mender systems [161], co-presented by one of the authors at theACM Recommender Systems 2017 conference.2

Duration of items: In traditional movie recommendation, theitems of interest have a typical duration of 90 minutes or more. Inbook recommendation, the consumption time is commonly evenmuch longer. In contrast, the duration of music items usually rangesbetween 3 and 5 minutes (except maybe for classical music). Be-cause of this, music items may be considered more disposable.

Magnitude of items: �e size of common commercial music cat-alogs is in the range of tens of millions music pieces while moviestreaming services have to deal with much smaller catalog sizes,typically thousands up to tens of thousands of movies and series.3Scalability is therefore a much more important issue in music rec-ommendation than in movie recommendation.

Sequential consumption: Unlike movies, music pieces are mostfrequently consumed sequentially, more than one at a time, i.e., in alistening session or playlist. �is yields a number of challenges fora MRS, which relate to identifying the right arrangement of itemsin a recommendation list.

Recommendation of previously recommended items: Recommend-ing the same music piece again, at a later point in time, may beappreciated by the user of a MRS, in contrast to a movie or productrecommender, where repeated recommendations are usually notpreferred.

Consumption behavior: Music is o�en consumed passively, inthe background. While this is not a problem per se, it can a�ectpreference elicitation. In particular when using implicit feedback

1h�p://www.recsyschallenge.com/20182h�p://www.cp.jku.at/tutorials/mrs recsys 20173Spotify reports about 30 million songs in 2017 (h�ps://press.spotify.com/at/about);Amazon’s advanced search for books reports 10 million hardcover and 30 millionpaperback books in 2017 (h�ps://www.amazon.com/Advanced-Search-Books/b?node=241582011); whereas Net�ix, in contrast, o�ers about 5,500 movies and TV series as of2016 (h�p://time.com/4272360/the-number-of-movies-on-net�ix-is-dropping-fast).

to infer listener preferences, the fact that a listener is not payinga�ention to the music (therefore, e.g., not skipping a song) mightbe wrongly interpreted as a positive signal.

Listening intent and purpose: Music serves various purposes forpeople and hence shapes their intent to listen to it. �is should betaken into account when building a MRS. In extensive literatureand empirical studies, Schafer et al. [155] distilled three fundamen-tal intents of music listening out of 129 distinct music uses andfunctions: self-awareness, social relatedness, and arousal and moodregulation. Self-awareness is considered as a very private relation-ship with music listening. �e self-awareness dimension “helpspeople think about who they are, who they would like to be, andhow to cut their own path” [154]. Social relatedness [153] describesthe use of music to feel close to friends and to express identityand values to others. Mood regulation is concerned with managingemotions, which is a critical issue when it comes to the well-beingof humans [78, 110, 176]. In fact, several studies found that moodand emotion regulation is the most important purpose why peoplelisten to music [19, 97, 122, 155], for which reason we discuss theparticular role emotions play when listening to music separatelybelow.

Emotions: Music is known to evoke very strong emotions.4 �isis a mutual relationship, though, since also the emotions of usersa�ect musical preferences [18, 78, 144]. Due to this strong relation-ship between music and emotions, the problem of automaticallydescribing music in terms of emotion words is an active researcharea, commonly refereed to as music emotion recognition (MER),e.g. [15, 103, 187]. Even though MER can be used to tag music byemotion terms, how to integrate this information into MRS is ahighly complicated task, for three reasons. First, MER approachescommonly neglect the distinction between intended emotion (i.e.,the emotion the composer, songwriter, or performer had in mindwhen creating or performing the piece), perceived emotion (i.e., theemotion recognized while listening), and induced emotion that isfelt by the listener. Second, the preference for a certain kind of emo-tionally laden music piece depends on whether the user wants toenhance or to modulate her mood. �ird, emotional changes o�enoccur within the same music piece, whereas tags are commonly ex-tracted for the whole piece. Matching music and listeners in termsof emotions therefore requires to model the listener’s musical pref-erence as a time-dependent function of their emotional experiences,also considering the intended purpose (mood enhancement or reg-ulation). �is is a highly challenging task and usually neglected incurrent MRS, for which reason we discuss emotion-aware MRS asone of the main future directions in MRS research, cf. Section 3.1.

Listening context: Situational or contextual aspects [16, 49] havea strong in�uence on music preference, consumption, and inter-action behavior. For instance, a listener will likely create a dif-ferent playlist when preparing for a romantic dinner than whenwarming-up with friends to go out on a Friday night [76]. �e mostfrequently considered types of context include location (e.g., lis-tening at workplace, when commuting, or relaxing at home) [100]

4Please note that the terms “emotion” and “mood” have di�erent meanings in psychol-ogy, whereas they are commonly used as synonyms in music information retrieval(MIR) and recommender systems research. In psychology, in contrast, “emotion” refersto a short-time reaction to a particular stimulus, whereas “mood” refers to a longerlasting state without relation to a speci�c stimulus.

http://www.recsyschallenge.com/2018

http://www.cp.jku.at/tutorials/mrs_recsys_2017

https://press.spotify.com/at/about

https://www.amazon.com/Advanced-Search-Books/b?node=241582011

https://www.amazon.com/Advanced-Search-Books/b?node=241582011

http://time.com/4272360/the-number-of-movies-on-netflix-is-dropping-fast

and time (typically categorized into, e.g., morning, a�ernoon, andevening) [32]. Context may, in addition, also relate to the listener’sactivity [184], weather [140], or the use of di�erent listening de-vices, e.g., earplugs on a smartphone vs. hi-� stereo at home [76],to name a few. Since music listening is also a highly social activ-ity, investigating the social context of the listeners is crucial tounderstand their listening preferences and behavior [46, 134]. �eimportance of considering such contextual factors in MRS researchis acknowledged by discussing situation-aware MRS as a trendingresearch direction, cf. Section 3.2.

2.2 Challenge 1: Cold start problemProblem de�nition: One of the major problems of recom-

mender systems in general [65, 151], and music recommender sys-tems in particular [99, 119] is the cold start problem, i.e., when a newuser registers to the system or a new item is added to the catalogand the system does not have su�cient data associated with theseitems/users. In such a case, the system cannot properly recommendexisting items to a new user (new user problem) or recommend anew item to the existing users (new item problem) [4, 63, 99, 164].

Another sub-problem of cold start is the sparsity problem whichrefers to the fact that the number of given ratings is much lowerthan the number of possible ratings, which is particularly likelywhen the number of users and items is large. �e inverse of the ratiobetween given and possible ratings is called sparsity. High sparsitytranslates into low rating coverage, since most users tend to rateonly a tiny fraction of items. �e e�ect is that recommendationso�en become unreliable [99]. Typical values of sparsity are quiteclose to 100% in most real-world recommender systems. In themusic domain, this is a particularly substantial problem. Dror etal. [52], for instance, analyzed the Yahoo! Music dataset, which asof time of writing represents the largest music recommendationdataset. �ey report a sparsity of 99.96%. For comparison, theNet�ix dataset of movies has a sparsity of “only” 98.82%.5

State of the art:A number of approaches have already been proposed to tackle

the cold start problem in the music recommendation domain, fore-most content-based approaches, hybridization, cross-domain rec-ommendation, and active learning.

Content-based recommendation (CB) algorithms do not requireratings of users other than the target user. �erefore, as long assome pieces of information about the user’s own preferences areavailable, such techniques can be used in cold start scenarios. Fur-thermore, in the most severe case, when a new item is added to thecatalog, content-based methods enable recommendations, becausethey can extract features from the new item and use them to makerecommendations. It is noteworthy that while collaborative �lter-ing (CF) systems have cold start problems both for new users andnew items, content-based systems have only cold start problemsfor new users [6].As for the new item problem, a standard approach is to extract anumber of features that de�ne the acoustic properties of the audiosignal and use content-based learning of the user interest (user

5Note that Dror et al.’s analysis was conducted in 2011. Even though the generalcharacter (rating matrices for music items being sparser than those of movie items)remained the same, the actual numbers for today’s catalogs are likely slightly di�erent.

pro�le learning) in order to e�ect recommendations. Feature ex-traction is typically done automatically, but can also be e�ectedmanually by musical experts, as in the case of Pandora’s MusicGenome Project.6 Pandora uses up to 450 speci�c descriptors persong, such as “aggressive female vocalist”, “prominent backup vo-cals”, “abstract lyrics”, or “use of unusual harmonies”.7 Regardlessof whether the feature extraction process is performed automati-cally or manually, this approach is advantageous not only to addressthe new item problem but also because an accurate feature repre-sentation can be highly predicative of users’ tastes and interestswhich can be leveraged in the subsequent information �lteringstage [6]. An advantage of music to video is that features in mu-sic is limited to a single audio channel, compared to audio andvisual channels for videos adding a level complexity to the contentanalysis of videos explored individually or multimodal in di�erentresearch works [47, 48, 60, 128].

Automatic feature extraction from audio signals can be done intwo main manners: (1) by extracting a feature vector from eachitem individually, independent of other items or (2) by consideringthe cross-relation between items in the training dataset. �e dif-ference is that in (1) the same process is performed in the trainingand testing phases of the system, and the extracted feature vectorscan be used o�-the-shelf in the subsequent processing stage, forexample they can be used to compute similarities between itemsin a one-to-one fashion at testing time. In contrast, in (2) �rst amodel is built from all features extracted in the training phase,whose main role is to map the features into a new (acoustic) spacein which the similarities between items are be�er represented andexploited. An example of approach (1) is the block-level featureframework [167, 168], which creates a feature vector of about 10,000dimensions, independently for each song in the given music col-lection. �is vector describes aspects such as spectral pa�erns,recurring beats, and correlations between frequency bands. Anexample of strategy (2) is to create a low-dimensional i-vector rep-resentation from the Mel-frequency cepstral coe�cients (MFCCs),which model musical timbre to some extent [57]. To this end, auniversal background model is created from the MFCC vectorsof the whole music collection, using a Gaussian mixture model(GMM). Performing factor analysis on a representation of the GMMeventually yields i-vectors.

In scenarios where some form of semantic labels, e.g., genres ormusical instruments, are available, it is possible to build models thatlearn the intermediate mapping between low-level audio featuresand semantic representations using machine learning techniques,and subsequently use the learned models for prediction. A goodpoint of reference for such semantic-inferred approaches can befound in [20, 37].

An alternative technique to tackle the new item problem ishybridization. A review of di�erent hybrid and ensemble recom-mender systems can be found in [7, 27]. In [51] the authors proposea music recommender system which combines an acoustic CB andan item-based CF recommmender. For the content-based compo-nent, it computes acoustic features including spectral properties,

6h�p://www.pandora.com/about/mgp7h�p://enacademic.com/dic.nsf/enwiki/3224302

http://www.pandora.com/about/mgp

http://enacademic.com/dic.nsf/enwiki/3224302

timbre, rhythm, and pitch. �e content-based component then as-sists the collaborative �ltering recommender in tackling the coldstart problem since the features of the former are automaticallyderived via audio content analysis.�e solution proposed in [189] is a hybrid recommender systemthat combines CF and acoustic CB strategies also by feature hy-bridization. However, in this work the feature-level hybridization isnot performed in the original feature domain. Instead, a set of latentvariables referred to as conceptual genre are introduced, whose roleis to provide a common shared feature space for the two recom-menders and enable hybridization. �e weights associated with thelatent variables re�ect the musical taste of the target user and arelearned during the training stage.In [169] the authors propose a hybrid recommender system incor-porating item–item CF and acoustic CB based on similarity metriclearning. �e proposed metric learning is an optimization modelthat aims to learn the weights associated with the audio contentfeatures (when combined in a linear fashion) so that a degree ofconsistency between CF-based similarity and the acoustic CB sim-ilarity measure is established. �e optimization problem can besolved using quadratic programming techniques.

Another solution to cold start are cross-domain recommendationtechniques, which aim at improving recommendations in one do-main (here music) by making use of information about the userpreferences in an auxiliary domain [29, 68]. Hence, the knowl-edge of the preferences of the user is transferred from an auxiliarydomain to the music domain, resulting in a more complete and accu-rate user model. Similarly, it is also possible to integrate additionalpieces of information about the (new) users, which are not directlyrelated to music, such as their personality, in order to improve theestimation of the user’s music preferences. Several studies con-ducted on user personality characteristics support the conjecturethat it may be useful to exploit this information in music recom-mender systems [70, 74, 87, 130, 147]. For a more detailed literaturereview of cross-domain recommendation, we refer to [30, 69, 102].

In addition to the aforementioned approaches, active learninghas shown promising results in dealing with the cold start problemin single domain [61, 146] or cross-domain recommendation sce-nario [136, 192]. Active learning addresses this problem at its originby identifying and eliciting (high quality) data that can representthe preferences of users be�er than by what they provide them-selves. Such a system therefore interactively demands speci�c userfeedback to maximize the improvement of system performance.

Limitations: �e state-of-the-art approaches elaborated onabove are restricted by certain limitations. When using content-based �ltering, for instance, almost all existing approaches rely on anumber of prede�ned audio features that have been used over andover again, including spectral features, MFCCs, and a great numberof derivatives [106]. However, doing so assumes that (all) thesefeatures are predictive of the user’s music taste, while in practiceit has been shown that the acoustic properties that are importantfor the perception of music are highly subjective [132]. Further-more, listeners’ di�erent tastes and levels of interest in di�erentpieces of music in�uence perception of item similarity [158]. �issubjectiveness demands for CB recommenders that incorporatepersonalization in their mathematical model. For example, in [66]

the authors propose a hybrid (CB+CF) recommender model, namelyregression-based latent factor models (RLFM). In [5] the authors pro-pose a user-speci�c feature-based similarity model (UFSM), whichde�nes a similarity function for each user, leading to a high degreeof personalization. Although not designed speci�cally for the musicdomain, the authors of [5] provide an interesting literature reviewof similar user-speci�c models.

While hybridization can therefore alleviate the cold start prob-lem to a certain extent, as seen in the examples above, respectiveapproaches are o�en complex, computationally expensive, and lacktransparency [28]. In particular, results of hybrids employing latentfactor models are typically hard to understand for humans.

A major problem with cross-domain recommender systems is theirneed for data that connects two or more target domains, e.g., books,movies, and music [30]. In order for such approaches to work prop-erly, items, users, or both therefore need to overlap to a certaindegree [41]. In the absence of such overlap, relationships betweenthe domains must be established otherwise, e.g., by inferring seman-tic relationships between items in di�erent domains or assumingsimilar rating pa�erns of users in the involved domains. How-ever, whether respective approaches are capable of transferringknowledge between domains is disputed [40]. A related issue incross-domain recommendation is that there is a lack of establisheddatasets with clear de�nitions of domains and recommendationscenarios [102]. Because of this, the majority of existing work oncross-domain RS use some type of conventional recommendationdataset transformation to suit it for their need.

Finally, also active learning techniques su�er from a number ofissues. First of all, the typical active learning techniques propose to auser to rate the items that the system has predicted to be interestingfor them, i.e., the items with highest predicted ratings. �is indeedis a default strategy in recommender systems for eliciting ratingssince users tend to rate what has been recommended to them. Evenwhen users browse the item catalog, they are more likely to rateitems which they like or are interested in, rather than those itemsthat they dislike or are indi�erent to. Indeed, it has been shownthat doing so creates a strong bias in the collected rating data asthe database gets populated disproportionately with high ratings.�is in turn may substantially in�uence the prediction algorithmand decrease the recommendation accuracy [64].

Moreover, not all the active learning strategies are necessarilypersonalized. �e users di�er very much in the amount of informa-tion they have about the items, their preferences, and the way theymake decisions. Hence, it is clearly ine�cient to request all theusers to rate the same set of items, because many users may havea very limited knowledge, ignore many items, and will thereforenot provide ratings for these items. Properly designed active learn-ing techniques should take this into account and propose di�erentitems to di�erent users to rate. �is can be highly bene�cial andincrease the chance of acquiring ratings of higher quality [58].

Moreover, the traditional interaction model designed for activelearning in recommender systems can support building the initialpro�le of a user mainly in the sign-up process. �is is done bygenerating a user pro�le by requesting the user to rate a set ofselected items [31]. On the other hand, the users must be able toalso update their pro�le by providing more ratings anytime theyare willing to. �is requires the system to adopt a conversational

interaction model [31], e.g., by exploiting novel interactive designelements in the user interface [39], such as explanations that candescribe the bene�ts of providing more ratings and motivating theuser to do so.

Finally, it is important to note that in an up-and-running rec-ommender system, the ratings are given by users not only whenrequested by the system (active learning) but also when a uservoluntarily explores the item catalog and rates some familiar items(natural acquisition of ratings) [31, 62, 64, 127, 146]. While thiscould have a huge impact on the performance of the system, it hasbeen mostly ignored by the majority of the research works in the�eld of active learning for recommender systems. Indeed, almost allresearch works have been based on a rather non-realistic assump-tion that the only source for collecting new ratings is through thesystem requests. �erefore, it is crucial to take into account a morerealistic scenario when studying the active learning techniques inrecommender systems, which can be�er picture how the systemevolves over time when ratings are provided by users [143, 146].

2.3 Challenge 2: Automatic playlistcontinuation

Problem de�nition: In its most generic de�nition, a playlistis simply a sequence of tracks intended to be listened to together.�e task of automatic playlist generation (APG) then refers to theautomated creation of these sequences of tracks. In this context,the ordering of songs in a playlist to generate is o�en highlightedas a characteristics of APG, which is a highly complex endeavor.Some authors have therefore proposed approaches based on Markovchains to model the transitions between songs in playlists, e.g. [33,125]. While these approaches have been shown to outperformapproaches agnostic of the song order in terms of log likelihood,recent research has found li�le evidence that the exact order ofsongs actually ma�ers to users [177], while the ensemble of songs ina playlist [181] and direct song-to-song transitions [93] do ma�er.

Considered a variation of APG, the task of automatic playlist con-tinuation (APC) consists of adding one or more tracks to a playlistin a way that �ts the same target characteristics of the originalplaylist. �is has bene�ts in both the listening and creation ofplaylists: users can enjoy listening to continuous sessions beyondthe end of a �nite-length playlist, while also �nding it easier tocreate longer, more compelling playlists without needing to haveextensive musical familiarity.

A large part of the APC task is to accurately infer the intendedpurpose of a given playlist. �is is challenging not only becauseof the broad range of these intended purposes (when they evenexist), but also because of the diversity in the underlying featuresor characteristics that might be needed to infer those purposes.

Related to Challenge 1, an extreme cold start scenario for thistask is where a playlist is created with some metadata (e.g., thetitle of a playlist), but no song has been added to the playlist. �isproblem can be cast as an ad-hoc information retrieval task, wherethe task is to rank songs in response to a user-provided metadataquery.

�e APC task can also potentially bene�t from user pro�ling,e.g., making use of previous playlists and the long-term listeninghistory of the user. We call this personalized playlist continuation.

According to a study carried out in 2016 by the Music BusinessAssociation8 as part of their Music Biz Consumer Insights pro-gram,9 playlists accounted for 31% of music listening time amonglisteners in the USA, more than albums (22%), but less than singletracks (46%). Other studies, conducted by MIDiA,10 show that 55%of streaming music service subscribers create music playlists, withsome streaming services such as Spotify currently hosting over2 billion playlists.11 In a 2017 study conducted by Nielsen,12 itwas found that 58% of users in the USA create their own playlists,32% share them with others. Studies like these suggest a growingimportance of playlists as a mode of music consumption, and assuch, the study of APG and APC has never been more relevant.

State of the art: APG has been studied ever since digital multi-media transmission made huge catalogs of music available to users.Bonnin and Jannach provide a comprehensive survey of this �eldin [22]. In it, the authors frame the APG task as the creation ofa sequence of tracks that ful�ll some “target characteristics” of aplaylist, given some “background knowledge” of the characteristicsof the catalog of tracks from which the playlist tracks are drawn. Ex-isting APG systems tackle both of these problems in many di�erentways.

In early approaches [10, 11, 135] the target characteristics of theplaylist are speci�ed as multiple explicit constraints, which includemusical a�ributes or metadata such as artist, tempo, and style. Inothers, the target characteristics are a single seed track [121] or astart and an end track [10, 33, 75]. Other approaches create a circularplaylist that comprises all tracks in a given music collection, in sucha way that consecutive songs are as similar as possible [105, 142]. Inother works, playlists are created based on the context of the listener,either as single source [157] or in combination with content-basedsimilarity [36, 149].

A common approach to build the background knowledge of themusic catalog for playlist generation is using machine learning tech-niques to extract that knowledge from manually-curated playlists.�e assumption here is that curators of these playlists are encodingrich latent information about which tracks go together to createa satisfying listening experience for an intended purpose. Someproposed APG and APC systems are trained on playlists fromsources such as online radio stations [33, 123], online playlist web-sites [126, 181], and music streaming services [141]. In the study byPichl et al. [141], the names of playlists on Spotify were analyzedto create contextual clusters, which were then used to improverecommendations.

An approach to speci�cally address song ordering within playlistsis the use of generative models that are trained on hand-curatedplaylists. McFee and Lanckriet [125] represent songs by metadata,familiarity, and audio content features, adopting ideas from statisti-cal natural language processing. �ey train various Markov chainsto model transitions between songs. Similarly, Chen et al. [33]propose a logistic Markov embedding to model song transitions.

8h�ps://musicbiz.org/news/playlists-overtake-albums-listenership-says-loop-study9h�ps://musicbiz.org/resources/tools/music-biz-consumer-insights/consumer-insights-portal10h�ps://www.midiaresearch.com/blog/announcing-midias-state-of-the-streaming-nation-2-report11h�ps://press.spotify.com/us/about12h�p://www.nielsen.com/us/en/insights/reports/2017/music-360-2017-highlights.html

https://musicbiz.org/news/playlists-overtake-albums-listenership-says-loop-study

https://musicbiz.org/resources/tools/music-biz-consumer-insights/consumer-insights-portal

https://musicbiz.org/resources/tools/music-biz-consumer-insights/consumer-insights-portal

https://www.midiaresearch.com/blog/announcing-midias-state-of-the-streaming-nation-2-report

https://press.spotify.com/us/about

http://www.nielsen.com/us/en/insights/reports/2017/music-360-2017-highlights.html

http://www.nielsen.com/us/en/insights/reports/2017/music-360-2017-highlights.html

�is is similar to matrix decomposition methods and results inan embedding of songs in Euclidean space. In contrast to McFeeand Lanckriet’s model, Chen et al.’s model does not use any audiofeatures.

Limitations: While some work on automated playlist continu-ation highlights the special characteristics of playlists, i.e., their se-quential order, it is not well understood to which extent and in whichcases taking into account the order of tracks in playlists helps createbe�er models for recommendation. For instance, in [181] Vall etal. recently demonstrated on two datasets of hand-curated playliststhat the song order seems to be negligible for accurate playlistcontinuation when a lot of popular songs are present. On theother hand, the authors argue that order does ma�er when creatingplaylists with tracks from the long tail. Another study by McFeeand Lanckriet [126] also suggests that transition e�ects play animportant role in modeling playlist continuity. �is is in line with astudy presented by Kamehkhosh et al. in [93], in which users iden-ti�ed song order as being the second but last important criterionfor playlist quality.13 In another recent user study [177] conductedby Tintarev et al., the authors found that many participants did notcare about the order of tracks in recommended playlists, sometimesthey did not even notice that there is a particular order. However,this study was restricted to 20 participants who used the DiscoverWeekly service of Spotify.14

Another challenge for APC is evaluation: in other words, howto assess the quality of a playlist. Evaluation in general is discussedin more detail in the next section, but there are speci�c questionsaround evaluation of playlists that should be pointed out here. AsBonnin and Jannach [22] put it, the ultimate criterion for this isuser satisfaction, but that is not easy to measure. In [125], McFeeand Lanckriet categorize the main approaches to APG evaluationas human evaluation, semantic cohesion, and sequence prediction.Human evaluation comes closest to measuring user satisfactiondirectly, but su�ers from problems of scale and reproducibility.Semantic cohesion as a quality metric is easily measurable andreproducible, but assumes that users prefer playlists where tracksare similar along a particular semantic dimension, which may notalways be true, see for instance the studies carried out by Slaneyand White [172] and by Lee [115]. Sequence prediction casts APCas an information retrieval task, but in the domain of music, aninaccurate prediction need not be a bad recommendation, and thisagain leads to a potential disconnect between this metric and theultimate criterion of user satisfaction.

Investigating which factors are potentially important for a posi-tive user perception of a playlist, Lee conducted a qualitative userstudy [115], investigating playlists that had been automaticallycreated based on content-based similarity. �ey made several inter-esting observations. A concern frequently raised by participantswas that of consecutive songs being too similar, and a general lackof variety. However, di�erent people had di�erent interpretationsof variety, e.g., variety in genres or styles vs. di�erent artists in theplaylist. Similarly, di�erent criteria were mentioned when listen-ers judged the coherence of songs in a playlist, including lyrical

13�e ranking of criteria (from most to least important) was: homogeneity, artistdiversity, transition, popularity, lyrics, order, and freshness.14h�ps://www.spotify.com/discoverweekly

content, tempo, and mood. When creating playlists, participantsmentioned that similar lyrics, a common theme (e.g., music to listento in the train), story (e.g., music for the Independence Day), or era(e.g., rock music from the 1980s) are important and that tracks notcomplying negatively e�ect the �ow of the playlist. �ese aspectscan be extended by responses of participants in a study conductedby Cunningham et al. [43], who further identi�ed the followingcategories of playlists: same artist, genre, style, or orchestration,playlists for a certain event or activity (e.g., party or holiday), ro-mance (e.g., love songs or breakup songs), playlists intended to senda message to their recipient (e.g., protest songs), and challengesor puzzles (e.g., cover songs liked more than the original or songswhose title contains a question mark).

Lee also found that personal preferences play a major role. Infact, already a single song that is very much liked or hated by alistener can have a strong in�uence on how they judge the entireplaylist [115]. �is seems particularly true if it is a highly dis-liked song [45]. Furthermore, a good mix of familiar and unknownsongs was o�en mentioned as an important requirement for a goodplaylist. Supporting the discovery of interesting new songs, stillcontextualized by familiar ones, increases the likelihood of realizinga serendipitous encounter in a playlist [160, 193]. Finally, partici-pants also reported that their familiarity with a playlist’s genre ortheme in�uenced their judgment of its quality. In general, listenerswere more picky about playlists whose tracks they were familiarwith or they liked a lot.

Supported by the studies summarized above, we argue that thequestion of what makes a great playlist is highly subjective andfurther depends on the intent of the creator or listener. Importantcriteria when creating or judging a playlist include track similar-ity/coherence, variety/diversity, but also the user’s personal pref-erences and familiarity with the tracks, as well as the intention ofthe playlist creator. Unfortunately, current automatic approachesto playlist continuation are agnostic of the underlying psycholog-ical and sociological factors that in�uence the decision of whichsongs users choose to include in a playlist. Since knowing aboutsuch factors is vital to understand the intent of the playlist creator,we believe that algorithmic methods for APC need to holisticallylearn such aspects from manually created playlists and integraterespective intent models. However, we are aware that in today’s erawhere billions of playlists are shared by users of online streamingservices,15 a large-scale analysis of psychological and sociologicalbackground factors is impossible. Nevertheless, in the absence ofexplicit information about user intent, a possible starting pointto create intent models might be the metadata associated withuser-generated playlists, such as title or description. To foster thiskind of research, the playlists provided in the dataset for the ACMRecommender Systems Challenge 2018 include playlist titles.16

2.4 Challenge 3: Evaluating musicrecommender systems

Problem de�nition: Having its roots in machine learning(cf. rating prediction) and information retrieval (cf. “retrieving”items based on implicit “queries” given by user preferences), the

15h�ps://press.spotify.com/us/about16h�ps://recsys-challenge.spotify.com

https://www.spotify.com/discoverweekly

https://press.spotify.com/us/about

https://recsys-challenge.spotify.com

�eld of recommender systems originally adopted evaluation met-rics from these neighboring �elds. In fact, accuracy and relatedquantitative measures, such as precision, recall, or error measures(between predicted and true ratings), are still the most commonlyemployed criteria to judge the recommendation quality of a rec-ommender system [12, 79]. In addition, novel measures that aretailored to the recommendation problem have emerged in recentyears. �ese so-called beyond-accuracy measures [98] address theparticularities of recommender systems and gauge, for instance,the utility, novelty, or serendipity of an item. However, a majorproblem with these kinds of measures is that they integrate factorsthat are hard to describe mathematically, for instance, the aspectof surprise in case of serendipity measures. For this reason, theresometimes exist a variety of di�erent de�nitions to quantify thesame beyond-accuracy aspect.

State of the art: In the following, we discuss performancemeasures which are most frequently reported when evaluating rec-ommender systems. An overview of these is given in Table 1. �eycan be roughly categorized into accuracy-related measures, suchas prediction error (e.g., MAE and RMSE) or standard IR measures(e.g., precision and recall), and beyond-accuracy measures, suchas diversity, novelty, and serendipity. Furthermore, while some ofthe metrics quantify the ability of recommender systems to �ndgood items, e.g., precision and recall, others consider the rankingof items and therefore assess the system’s ability to position goodrecommendations at the top of the recommendation list, e.g., MAP,NDCG, or MPR.

Mean absolute error (MAE) is one of the most common metricsfor evaluating the prediction power of recommender algorithms.It computes the average absolute deviation between the predictedratings and the actual ratings provided by users [82]. Indeed, MAEindicates how close the rating predictions generated by an MRS areto the real user ratings. MAE is computed as follows:

MAE =1|T |

∑ru,i ∈T

|ru,i − ru,i | (1)

where ru,i and ru,i respectively denote the actual and the predictedratings of item i for user u. MAE sums over the absolute predictionerrors for all ratings in a test set T .

Root mean square error (RMSE) is another similar metric that iscomputed as:

RMSE =

√√ 1|T |

∑ru,i ∈T

(ru,i − ru,i )2 (2)

It is an extension to MAE in that the error term is squared, whichpenalizes larger di�erences between predicted and true ratingsmore than smaller ones. �is is motivated by the assumption that,for instance, a rating prediction of 1 when the true rating is 4 ismuch more severe than a prediction of 3 for the same item.

Precision at top K recommendations (P@K) is a common metricthat measures the accuracy of the system in commanding relevantitems. In order to compute P@K , for each user, the top K rec-ommended items whose ratings also appear in the test set T areconsidered. �is metric was originally designed for binary rele-vance judgments. �erefore, in case of availability of relevanceinformation at di�erent levels, such as a �ve point Likert scale,

the labels should be binarized, e.g., considering the ratings greaterthan or equal to 4 (out of 5) as relevant. For each user u, Pu@K iscomputed as follows:

Pu@K =|Lu ∩ Lu ||Lu |

(3)

where Lu is the set of relevant items for user u in the test setT andLu denotes the recommended set containing the K items in T withthe highest predicted ratings for the user u. �e overall P@K isthen computed by averaging Pu@K values for all users in the testset.

Mean average precision at top K recommendations (MAP@K) is arank-based metric that computes the overall precision of the systemat di�erent lengths of recommendation lists. MAP is computed asthe arithmetic mean of the average precision over the entire set ofusers in the test set. Average precision for the top K recommenda-tions (AP@K ) is de�ned as follows:

AP@K =1N

K∑i=1

P@i · rel(i) (4)

where rel(i) is an indicator signaling if the ith recommendeditem is relevant, i.e. rel(i) = 1, or not, i.e. rel(i) = 0; N is thetotal number of relevant items. Note that MAP implicitly incorpo-rates recall, because it also considers the relevant items not in therecommendation list.17

Recall at top K recommendations (R@K) is presented here forthe sake of completeness, even though it is not a crucial measurefrom a consumer’s perspective. Indeed, the listener is typically notinterested in being recommended all or a large number of relevantitems, rather in having good recommendations at the top of therecommendation list. For a user u, Ru@K is de�ned as:

Ru@K =|Lu ∩ Lu ||Lu |

(5)

where Lu is the set of relevant items of user u in the test set T andLu denotes the recommended set containing the K items in T withthe highest predicted ratings for the user u. �e overall R@K iscalculated by averaging Ru@K values for all the users in the testset.

Normalized discounted cumulative gain (NDCG) is a measurefor the ranking quality of the recommendations. �is metric hasoriginally been proposed to evaluate e�ectiveness of informationretrieval systems [94]. It is nowadays also frequently used forevaluating music recommender systems [120, 139, 185]. Assumingthat the recommendations for user u are sorted according to thepredicted rating values in descending order. DCGu is de�ned asfollows:

DCGu =

N∑i=1

ru,iloд2(i + 1) (6)

17We should note that in the recommender systems community, another variationof average precision is gaining popularity recently, formally de�ned by: AP@K =

1min(K,N )

∑Ki=1 P@i · r el (k ) in which N is the total number of relevant items and K

is the size of recommendation list. �e motivation behind the minimization term is toprevent the AP scores to be unfairly suppressed when the number of recommendationsis too low to capture all the relevant items. �is variation of MAP was popularizedby Kaggle competitions [1] about recommender systems and has been used in severalother research works, consider for example [9, 124].

Measure Abbreviation Type Ranking-awareMean absolute error MAE error/accuracy NoRoot mean square error RMSE error/accuracy NoPrecision at top K recommendations P@K accuracy NoRecall at top K recommendations R@K accuracy NoMean average precision at top K recommendations MAP@K accuracy YesNormalized discounted cumulative gain NDCG accuracy YesHalf-life utility HLU accuracy YesMean percentile rank MPR accuracy YesSpread — beyond NoCoverage — beyond NoNovelty — beyond NoSerendipity — beyond NoDiversity — beyond No

Table 1: Evaluation measures commonly used for recommender systems.

where ru,i is the true rating (as found in test set T ) for the itemranked at position i for user u, and N is the length of the recom-mendation list. Since the rating distribution depends on the users’behavior, the DCG values for di�erent users are not directly com-parable. �erefore, the cumulative gain for each user should benormalized. �is is done by computing the ideal DCG for user u,denoted as IDCGu , which is the DCGu value for the best possibleranking, obtained by ordering the items by true ratings in descend-ing order. Normalized discounted cumulative gain for useru is thencalculated as:

NDCGu =DCGuIDCGu

(7)

Finally, the overall normalized discounted cumulative gain NDCGis computed by averaging NDCGu over the entire set of users.

In the following, we present common quantitative evaluationmetrics, which have been particularly designed or adopted to as-sess recommender systems performance, even though some ofthem have their origin in information retrieval and machine learn-ing. �e �rst two (HLU and MRR) still belong to the categoryof accuracy-related measures, while the subsequent ones capturebeyond-accuracy aspects

Half-life utility (HLU) measures the utility of a recommenda-tion list for a user with the assumption that the likelihood of view-ing/choosing a recommended item by the user exponentially decayswith the item’s position in the ranking [25, 137]. Formally wri�en,HLU for user u is de�ned as:

HLUu =N∑i=1

max (ru,i − d, 0)2(ranku,i−1)/(h−1) (8)

where ru,i and ranku,i denote the rating and the rank of item ifor user u, respectively, in the recommendation list of length N ; drepresents a default rating (e.g., average rating) and h is the half-time, calculated as the rank of a music item in the list, such thatthe user can eventually listen to it with a 50% chance. HLUu can befurther normalized by the maximum utility (similar to NDCG), andthe �nal HLU is the average over the half-time utilities obtained forall users in the test set. A larger HLU may correspond to a superiorrecommendation performance.

Mean percentile rank (MPR) estimates the users’ satisfaction withitems in the recommendation list, and is computed as the averageof the percentile rank for each test item within the ranked list ofrecommended items for each user [90]. �e percentile rank of anitem is the percentage of items whose position in the recommen-dation list is equal to or lower than the position of the item itself.Formally, the percentile rank PRu for user u is de�ned as:

PRu =

∑ru,i ∈T

ru,i · ranku,i∑ru,i ∈T

ru,i(9)

where ru,i is the true rating (as found in test set T ) for item irated by user u and ranku,i is the percentile rank of item i withinthe ordered list of recommendations for user u. MPR is then thearithmetic mean of the individual PRu values over all users. Arandomly ordered recommendation list has an expected MPR valueof 50%. A smaller MPR value is therefore assumed to correspond toa superior recommendation performance.

Spread is a metric of how well the recommender algorithm canspread its a�ention across a larger set of items [104]. In more detail,spread is the entropy of the distribution of the items recommendedto the users in the test set. It is formally de�ned as:

spread = −∑i ∈I

P(i) log P(i) (10)

where I represents the entirety of items in the dataset and P(i) =count(i)/∑i′∈I count(i ′), such that count(i) denotes the total num-ber of times that a given item i showed up in the recommendationlists. It may be infeasible to expect an algorithm to achieve theperfect spread (i.e., recommending each item an equal number oftimes) without avoiding irrelevant recommendations or unful�l-lable rating requests. Accordingly, moderate spread values areusually preferable.

Coverage of a recommender system is de�ned as the proportionof items over which the system is capable of generating recommen-dations [82]:

coveraдe =|T ||T | (11)

where |T | is the size of the test set and |T | is the number of ratingsin T for which the system can predict a value. �is is particularlyimportant in cold start situations, when recommender systemsare not able to accurately predict the ratings of new users or newitems, and hence obtain low coverage. Recommender systems withlower coverage are therefore limited in the number of items theycan recommend. A simple remedy to improve low coverage is toimplement some default recommendation strategy for an unknownuser–item entry. For example, we can consider the average ratingof users for an item as an estimate of its rating. �is may come atthe price of accuracy and therefore the trade-o� between coverageand accuracy needs to be considered in the evaluation process [8].

Novelty measures the ability of a recommender system to rec-ommend new items that the user did not know about before [2].A recommendation list may be accurate, but if it contains a lotof items that are not novel to a user, it is not necessarily a usefullist [193].While novelty should be de�ned on an individual user level, consid-ering the actual freshness of the recommended items, it is commonto use the self-information of the recommended items relative totheir global popularity:

novelty =1|U |

∑u ∈U

∑i ∈Lu

− log2 popiN

(12)

where popi is the popularity of item i measured as percentage ofusers who rated i , Lu is the recommendation list of the top N rec-ommendations for user u [193, 195]. �e above de�nition assumesthat the likelihood of the user selecting a previously unknown itemis proportional to its global popularity and is used as an approxi-mation of novelty. In order to obtain more accurate informationabout novelty or freshness, explicit user feedback is needed, inparticular since the user might have listened to an item throughother channels before.It is o�en assumed that the users prefer recommendation lists withmore novel items. However, if the presented items are too novel,then the user is unlikely to have any knowledge of them, nor tobe able to understand or rate them. �erefore, moderate valuesindicate be�er performances [104].

Serendipity aims at evaluating MRS based on the relevant andsurprising recommendations. While the need for serendipity iscommonly agreed upon [83], the question of how to measure thedegree of serendipity for a recommendation list is controversial.�is particularly holds for the question of whether the factor ofsurprise implies that items must be novel to the user [98]. On ageneral level, serendipity of a recommendation list Lu provided toa user u can be de�ned as:

serendipity(Lu ) =

��Lunexpu ∩ Lusef ulu

��|Lu |

(13)

where Lunexpu and Lusef ulu denote subsets of L that contain, respec-

tively, recommendations unexpected to and useful for the user. �eusefulness of an item is commonly assessed by explicitly askingusers or taking user ratings as proxy [98]. �e unexpectedness ofan item is typically quanti�ed by some measure of distance fromexpected items, i.e., items that are similar to the items already ratedby the user. In the context of MRS, Zhang et al. [193] propose

an “unserendipity” measure that is de�ned as the average similar-ity between the items in the user’s listening history and the newrecommendations. Similarity between two items in this case iscalculated by an adapted cosine measure that integrates co-likinginformation, i.e., number of users who like both items. It is assumedthat lower values correspond to more surprising recommendations,since lower values indicate that recommendations deviate from theuser’s traditional behavior [193].

Diversity is another beyond-accuracy measure as already dis-cussed in the limitations part of Challenge 1. It gauges the extentto which recommended items are di�erent from each other, wheredi�erence can relate to various aspects, e.g., musical style, artist,lyrics, or instrumentation, just to name a few. Similar to serendipity,diversity can be de�ned in several ways. One of the most common isto compute pairwise distance between all items in the recommenda-tion set, either averaged [196] or summed [173]. In the former case,the diversity of a recommendation list L is calculated as follows:

diversity(L) =

∑i ∈L

∑j ∈L\i

disti, j

|L| · (|L| − 1) (14)

where disti, j is the some distance function de�ned between items iand j. Common choices are inverse cosine similarity [150], inversePearson correlation [183], or Hamming distance [101].

When it comes to the task of evaluating playlist recommenda-tion, where the goal is to assess the capability of the recommenderin providing proper transitions between subsequent songs, the con-ventional error or accuracy metrics may not be able to capturethis property. �ere is hence a need for sequence-aware evalua-tion measures. For example, consider the scenario where a userwho likes both classical and rock music is recommended a rockmusic right a�er she has listened to a classic piece. Even thoughboth music styles are in agreement with her taste, the transitionbetween songs plays an important role toward user satisfaction.In such a situation, given a currently played song and in presenceof several equally likely good options to be played next, a RS maybe inclined to rank songs based on their popularity. Hence, othermetrics such as average log-likelihood have been proposed to be�ermodel the transitions [34, 35]. In this regard, when the goal is tosuggest a sequence of items, alternative multi-metric evaluationapproaches are required to take into consideration multiple qualityfactors. Such evaluation metrics can consider the ranking order ofthe recommendations or the internal coherence or diversity of therecommended list as a whole. In many scenarios, adoption of suchquality metrics can lead to a trade-o� with accuracy which shouldbe balanced by the RS algorithm [145].

Limitations: As of today, the vast majority of evaluation ap-proaches in recommender systems research focuses on quantitativemeasures, either accuracy-like or beyond-accuracy, which are o�encomputed in o�ine studies.Doing so has the advantage of facilitating the reproducibility ofevaluation results. However, limiting the evaluation to quantitativemeasures means to forgo another important factor, which is userexperience. In other words, in the absence of user-centric evalu-ations, it is di�cult to extend the claims to the more important

objective of the recommender system under evaluation, i.e., givingusers a pleasant and useful personalized experience [107].

Despite acknowledging the need for more user-centric evalua-tion strategies [158], the factor human, user, or, in the case of MRS,listener is still way too o�en neglected or not properly addressed.For instance, while there exist quantitative objective measures forserendipity and diversity, as discussed above, perceived serendipityand diversity can be highly di�erent from the measured ones [182]as they are subjective user-speci�c concepts. �is illustrates thateven beyond-accuracy measures cannot fully capture the real usersatisfaction with a recommender system. On the other hand, ap-proaches that address user experience (UX) can be investigatedto evaluate recommender systems. For example, a MRS can beevaluated based on user engagement, which provides a restrictedexplanation of UX that concentrates on judgment of product qualityduring interaction [80, 118, 133]. User satisfaction, user engage-ment, and more generally user experience are commonly assessedthrough user studies [14, 116, 117].

Addressing both objective and subjective evaluation criteria, Kni-jnenburg et al. [108] propose a holistic framework for user-centricevaluation of recommender systems. Figure 1 provides an overviewof the components. �e objective system aspects (OSA) are consid-ered unbiased factors of the RS, including aspects of the user inter-face, computing time of the algorithm, or number of items shownto the user. �ey are typically easy to specify or compute. �e OSAin�uence the subjective system aspects (SSA), which are causedby momentary, primary evaluative feelings while interacting withthe system [81]. �is results in a di�erent perception of the systemby di�erent users. SSA are therefore highly individual aspects andtypically assessed by user questionnaires. Examples of SSA includegeneral appeal of the system, usability, and perceived recommenda-tion diversity or novelty. �e aspect of experience (EXP) describesthe user’s a�itude towards the system and is commonly also inves-tigated by questionnaires. It addresses the user’s perception of theinteraction with the system. �e experience is highly in�uencedby the other components, which means changing any of the othercomponents likely results in a change of EXP aspects. Experiencecan be broken down into the evaluation of the system, the deci-sion process, and the �nal decisions made, i.e., the outcome. �einteraction (INT) aspects describe the observable behavior of theuser, time spent viewing an item, as well as clicking or purchasingbehavior. In a music context, examples further include liking a songor adding it to a playlist. �erefore, interactions aspects belongto the objective measures and are usually determined via loggingby the system. Finally, Knijnenburg et al.’s framework mentionspersonal characteristics (PC) and situational characteristics (SC),which in�uence the user experience. PC include aspects that donot exist without the user, such as user demographics, knowledge,or perceived control, while SC include aspects of the interactioncontext, such as when and where the system is used, or situation-speci�c trust or privacy concerns. Knijnenburg et al. [108] alsopropose a questionnaire to asses the factors de�ned in their frame-work, for instance, perceived recommendation quality, perceivedsystem e�ectiveness, perceived recommendation variety, choice sat-isfaction, intention to provide feedback, general trust in technology,and system-speci�c privacy concern.

While this framework is a generic one, tailoring it to MRS wouldallow for user-centric evaluation thereof. Especially the aspectsof personal and situational characteristics should be adapted tothe particularities of music listeners and listening situations, re-spectively, cf. Section 2.1. To this end, researchers in MRS shouldconsider the aspects relevant for the perception and preference ofmusic, and their implications on MRS, which have been identi�edin several studies, e.g. [44, 113, 114, 158, 159]. In addition to the gen-eral ones mentioned by Knijnenburg et al., of great importance inthe music domain seem to be psychological factors, including a�ectand personality, social in�uence, musical training and experience,and physiological condition.

We believe that carefully and holistically evaluating MRS bymeans of accuracy and beyond-accuracy, objective and subjectivemeasures, in o�ine and online experiments, would lead to a be�erunderstanding of the listeners’ needs and requirements vis-a-visMRS, and eventually a considerable improvement of current MRS.

3 FUTURE DIRECTIONS AND VISIONSWhile the challenges identi�ed in the previous section are alreadyresearched on intensely, in the following, we provide a more forward-looking analysis and discuss some MRS-related trending topics,which we assume in�uential for the next generation of MRS. All ofthem have in common that their aim is to create more personalizedrecommendations. More precisely, we �rst outline how psychologi-cal constructs such as personality and emotion could be integratedinto MRS. Subsequently, we address situation-aware MRS and arguefor the need of multifaceted user models that describe contextualand situational preferences. To round o�, we discuss the in�u-ence of users’ cultural background on recommendation preferences,which needs to be considered when building culture-aware MRS.

3.1 Psychologically-inspired musicrecommendation

Personality and emotion are important psychological constructs.While personality characteristics of humans are a predictable andstable measure that shapes human behaviors, emotions are short-term a�ective responses to a particular stimulus [179]. Both havebeen shown to in�uence music tastes [72, 154, 159] and user re-quirements for MRS [70, 74]. However, in the context of (music)recommender systems, personality and emotion do not play a majorrole yet. Given the strong evidence that both in�uence listeningpreferences [147, 159] and the recent emergence of approaches toaccurately predict them from user-generated data [111, 170], webelieve that psychologically-inspired MRS is an upcoming area.

3.1.1 Personality: In psychology research, personality is o�ende�ned as a “consistent behavior pa�ern and interpersonal pro-cesses originating within the individual” [26]. �is de�nition ac-counts for the individual di�erences in people’s emotional, interper-sonal, experiential, a�itudinal, and motivational styles [96]. Severalprior works have studied the relation of decision making and per-sonality factors. In [147], as an example, it has been shown thatpersonality can in�uence the human decision making process aswell as the tastes and interests. Due to this direct relation, peoplewith similar personality factors are very likely to share similarinterests and tastes.

Figure 1: Evaluation framework of the user experience for recommender systems, according to [108].

Earlier studies conducted on the user personality characteristicssupport the potential bene�ts that personality information couldhave in recommender systems [23, 24, 59, 86, 88, 178, 180]. Asa known example, psychological studies [147] have shown thatextravert people are likely to prefer the upbeat and conventionalmusic. Accordingly, a personality-based MRS could use this in-formation to be�er predict which songs are more likely than oth-ers to please extravert people [87]. Another example of poten-tial usage is to exploit personality information in order to com-pute similarity among users and hence identify the like-mindedusers [178]. �is similarity information could then be integratedinto a neighborhood-based collaborative �ltering approach.

In order to use personality information in a recommender sys-tem, the system �rst has to elicit this information from the users,which can be done either explicitly or implicitly. In the former case,the system can ask the user to complete a personality questionnaireusing one of the personality evaluation inventories, e.g., the tenitem personality inventory [77] or the big �ve inventory [95]. Inthe la�er case, the system can learn the personality by tracking andobserving users’ behavioral pa�erns, for instance, Liking behavioron Facebook [111] or applying �lters to images posted on Insta-gram [170]. Not too surprisingly, it has shown that systems thatexplicitly elicit personality characteristics achieve superior recom-mendation outcomes, e.g., in terms of user satisfaction, ease of use,

and prediction accuracy [53]. On the downside, however, manyusers are not willing to �ll in long questionnaires before being ableto use the RS. A way to alleviate this problem is to ask users onlythe most informative questions of a personality instrument [163].Which questions are most informative, though, �rst needs to bedetermined based on existing user data and is dependent on therecommendation domain. Other studies showed that users are tosome extent willing to provide further information in return for abe�er quality of recommendations [175].

Personality information can be used in various ways, particularly,to generate recommendations when traditional rating or consump-tion data is missing. Otherwise, the personality traits can be seenas an additional feature that extends the user pro�le, that can beused mainly to identify similar users in neighborhood-based recom-mender systems or directly fed into extended matrix factorizationmodels [68].

3.1.2 Emotion: �e emotional state of the MRS user has a strongimpact on his or her short-time musical preferences [99]. Vice versa,music has a strong in�uence on our emotional state. It thereforedoes not come as a surprise that emotion regulation was iden�edas one of the main reasons why people listen to music [122, 155].As an example, people may listen to completely di�erent musicalgenres or styles when they are sad in comparison to when theyare happy. Indeed, prior research on music psychology discovered

that people may choose the type of music which moderates theiremotional condition [109]. More recent �ndings show that musiccan be mainly chosen so as to augment the emotional situationperceived by the listener [131]. In order to build emotion-awareMRS, it is therefore necessary to (i) infer the emotional state thelistener is in, (ii) infer emotional concepts from the music itself, and(iii) understand how these two interrelate. �ese three tasks aredetailed below.

Eliciting the emotional state of the listener: Similar to personalitytraits, the emotional state of a user can be elicited explicitly orimplicitly. In the former case, the user is typically presented one ofthe various categorical models (emotions are described by distinctemotion words such as happiness, sadness, anger, or fear) [85, 191]or dimensional models (emotions are described by scores with re-spect to two or three dimensions, e.g., valence and arousal) [152].For a more detailed elaboration on emotion models in the context ofmusic, we refer to [159, 186]. �e implicit acquisition of emotionalstates can be e�ected, for instance, by analyzing user-generatedtext [50], speech [67], or facial expressions in video [56].

Emotion tagging in music: �e music piece itself can be regardedas an emotion-laden content and in turn can be described by emo-tion words. �e task of automatically assigning such emotion wordsto a music piece is an active research area, o�en refereed to as musicemotion recognition (MER), e.g. [15, 92, 103, 187, 188, 191]. Howto integrate such emotion terms created by MER tools into a MRSis, however, not an easy task, for several reasons. First, early MERapproaches usually neglected the distinction between intended emo-tion, perceived emotion, and induced or felt emotion, cf. Section 2.1.Current MER approaches focus on perceived or induced emotions.However, musical content still contains various characteristics thata�ect the emotional state of the listener, such as lyrics, rhythm,and harmony, and the way how they a�ect the emotional state ishighly subjective. �is so even though research has detected a fewgeneral rules, for instance, a musical piece that is in major key istypically perceived brighter and happier than those in minor key,or a piece in rapid tempo is perceived more exciting or more tensethan slow tempo ones [112].

Connecting listener emotions and music emotion tags: Currentemotion-based MRS typically consider emotional scores as contex-tual factors that characterize the situation the user is experiencing.Hence, the recommender systems exploit emotions in order to pre-�lter the preferences of users or post-�lter the generated recommen-dations. Unfortunately, this neglects the psychological background,in particular on the subjective and complex interrelationships be-tween expressed, perceived, and induced emotions [159], which isof special importance in the music domain as music is known toevoke stronger emotions than, for instance, products [161]. It hasalso been shown that personality in�uences in which emotionalstate which kind of emotionally laden music is preferred by listen-ers [72]. �erefore, even if automated MER approaches would beable to accurately predict the perceived or induced emotion of agiven music piece, in the absence of deep psychological listenerpro�les, matching emotion annotations of items and listeners maynot yield satisfying recommendations. �is is so because how peo-ple judge music and which kind of music they prefer depends to a

large extent on their current psychological and cognitive states. Wehence believe that the �eld of MRS should embrace psychologicaltheories, elicit the respective user-speci�c traits, and integrate theminto recommender systems, in order to build decent emotion-awareMRS.

3.2 Situation-aware music recommendationMost of the existing music recommender systems make recom-mendations solely based on a set of user-speci�c and item-speci�csignals. However, in real-world scenarios, many other signals areavailable. �ese additional signals can be further used to improvethe recommendation performance. A large subset of these addi-tional signals includes situational signals. In more detail, the musicpreference of a user depends on the situation at the moment of rec-ommendation.18 Location is an example of situational signals; forinstance, the music preference of a user would di�er in libraries andin gyms [36]. �erefore, considering location as a situation-speci�csignal could lead to substantial improvements in the recommenda-tion performance. Time of the day is another situational signal thatcould be used for recommendation; for instance, the music a userwould like to listen to in mornings di�ers from those in nights [42].One situational signal of particular importance in the music domainis social context since music tastes and consumption behaviors aredeeply rooted in the users’ social identities and mutually a�ecteach other [46, 134]. For instance, it is very likely that a user wouldprefer di�erent music when being alone than when meeting friends.Such social factors should therefore be considered when buildingsituation-aware MRS. Other situational signals that are sometimesexploited include the user’s current activity [184], the weather [140],the user’s mood [129], and the day of the week [84]. Regarding time,there is also another factor to consider, which is that most musicthat was considered trendy years ago is now considered old. �isimplies that ratings for the same song or artist might strongly di�er,not only between users, but in general as a function of time. Toincorporate such aspects in MRS, it would be crucial to record atimestamp for all ratings.

It is worth noting that situational features have been proven tobe strong signals in improving retrieval performance in search en-gines [17, 190]. �erefore, we believe that researching and buildingsituation-aware music recommender systems should be one centraltopic in MRS research.

While several situation-aware MRS already exist, e.g. [13, 36,91, 100, 157, 184], they commonly exploit only one or very fewsuch situational signals, or are restricted to a certain usage context,e.g., music consumption in a car or in a tourist scenario. �osesystems that try to take a more comprehensive view and considera variety of di�erent signals, on the other hand, su�er from a lownumber of data instances or users, rendering it very hard to buildaccurate context models [76]. What is still missing, in our opinion,are (commercial) systems that integrate a variety of situational sig-nals on a very large scale in order to truly understand the listenersneeds and intents in any given situation and recommend musicaccordingly. While we are aware that data availability and privacy

18Please note that music taste is a relatively stable characteristic, while music prefer-ences vary depending on the context and listening intent.

concerns counteract the realization of such systems on a large com-mercial scale, we believe that MRS will eventually integrate decentmultifaceted user models inferred from contextual and situationalfactors.

3.3 Culture-aware music recommendationWhile most humans share an inclination to listen to music, inde-pendent on their location or cultural background, the way music isperformed, perceived, and interpreted evolves in a culture-speci�cmanner. However, research in MRS seems to be agnostic of this fact.In music information retrieval (MIR) research, on the other hand,cultural aspects have been studied to some extent in recent years,a�er preceding (and still ongoing) criticisms of the predominance ofWestern music in this community. Arguably the most comprehen-sive culture-speci�c research in this domain has been conductedas part of the CompMusic project,19 in which �ve non-Westernmusic traditions have been analyzed in detail in order to advanceautomatic description of music by emphasizing cultural speci�city.�e analyzed music traditions included Indian Hindustani and Car-natic [54], Turkish Makam [55], Arab-Andalusian [174], and BeijingOpera [148]. However, the project’s focus was on music creation,content analysis, and ethnomusicological aspects rather than onthe music consumption side [38, 165, 166]. Recently, analyzingcontent-based audio features describing rhythm, timbre, harmony,and melody for a corpus of a larger variety of world and folk musicwith given country information, Panteli et al. found distinct acous-tic pa�erns of the music created in individual countries [138]. �eyalso identi�ed geographical and cultural proximities that are re-�ected in music features, looking at outliers and misclassi�cationsin a classi�cation experiments using country as target class. Forinstance, Vietnamese music was o�en confused with Chinese andJapanese, South African with Botswanese.

In contrast to this — meanwhile quite extensive — work onculture-speci�c analysis of music traditions, li�le e�ort has beenmade to analyze cultural di�erences and pa�erns of music con-sumption behavior, which is, as we believe, a crucial step to buildculture-aware MRS. �e few studies investigating such culturaldi�erences include [89], in which Hu and Lee found di�erencesin perception of moods between American and Chinese listeners.By analyzing the music listening behavior of users from 49 coun-tries, Ferwerda et al. found relationships between music listeningdiversity and Hofstede’s cultural dimensions [71, 73]. Skowron etal. used the same dimensions to predict genre preferences of listen-ers with di�erent cultural backgrounds [171]. Schedl analyzed alarge corpus of listening histories created by Last.fm users in 47countries and identi�ed distinct preference pa�erns [156]. Furtheranalyses revealed countries closest to what can be considered theglobal mainstream (e.g., the Netherlands, UK, and Belgium) andcountries farthest from it (e.g., China, Iran, and Slovakia). However,all of these works de�ne culture in terms of country borders, whicho�en makes sense, but is sometimes also problematic, for instancein countries with large minorities of inhabitants with di�erentculture.

In our opinion, when building MRS, the analysis of culturalpa�erns of music consumption behavior, subsequent creation of

19h�p://compmusic.upf.edu

respective cultural listener models, and their integration into rec-ommender systems are vital steps to improve personalization andserendipity of recommendations. Culture should be de�ned onvarious levels though, not only country borders. Other examplesinclude having a joint historical background, speaking the same lan-guage, sharing the same beliefs or religion, and di�erences betweenurban vs. rural cultures. Another aspect that relates to culture is atemporal one since certain cultural trends, e.g., what de�nes the“youth culture”, are highly dynamic in a temporal and geographicalsense. We believe that MRS which are aware of such cross-culturaldi�erences and similarities in music perception and taste, and areable to recommend music a listener in the same or another culturemay like, would substantially bene�t both users and providers ofMRS.

4 CONCLUSIONSIn this trends and survey paper, we identi�ed several grand chal-lenges the research �eld of music recommender systems (MRS) isfacing. �ese are, among others, in the focus of current research inthe area of MRS. We discussed (1) the cold start problem of items andusers, with its particularities in the music domain, (2) the challengeof automatic playlist continuation, which is gaining importancedue to the recently emerged user request of being recommendedmusical experiences rather than single tracks [161], and (3) thechallenge of holistically evaluating music recommender systems,in particular, capturing aspects beyond accuracy.

In addition to the grand challenges, which are currently highlyresearched, we also presented a visionary outlook of what we be-lieve to be the most interesting future research directions in MRS.In particular, we discussed (1) psychologically-inspired MRS, whichconsider in the recommendation process factors such as listeners’emotion and personality, (2) situation-aware MRS, which holisti-cally model contextual and environmental aspects of the musicconsumption process, infer listener needs and intents, and eventu-ally integrate these models at large scale in the recommendationprocess, and (3) culture-aware MRS, which exploit the fact that mu-sic taste highly depends on the cultural background of the listener,where culture can be de�ned in manifold ways, including historical,political, linguistic, or religious similarities.

We hope that this article helped pinpointing major challenges,highlighting recent trends, and identifying interesting researchquestions in the area of music recommender systems. Believing thatresearch addressing the discussed challenges and trends will pavethe way for the next generation of music recommender systems,we are looking forward to exciting, innovative approaches andsystems that improve user satisfaction and experience, rather thanjust accuracy measures.

5 ACKNOWLEDGEMENTSWe would like to thank all researchers in the �elds of recommendersystems, information retrieval, music research, and multimedia,with whom we had the pleasure to discuss and collaborate in recentyears, and whom in turn in�uenced and helped shaping this article.Special thanks go to Peter Knees and Fabien Gouyon for the fruitfuldiscussions while preparing the ACM Recommender Systems 2017tutorial on music recommender systems. In addition, we would like

to thank the reviewers of our manuscript, who provided useful andconstructive comments to improve the original dra� and turn itinto what it is now. We would also like to thank Eelco Wiechert forproviding additional pointers to relevant literature. Furthermore,the many personal discussions with actual users of MRS unveiledimportant shortcomings of current approaches and in turn wereconsidered in this article.

REFERENCES[1] Kaggle O�cial Homepage. h�ps://www.kaggle.com [last accessed March 11,

2018].[2] Panagiotis Adamopoulos and Alexander Tuzhilin. 2015. On unexpectedness in

recommender systems: Or how to be�er expect the unexpected. ACM Transac-tions on Intelligent Systems and Technology (TIST) 5, 4 (2015), 54.

[3] Gediminas Adomavicius, Bamshad Mobasher, Francesco Ricci, and AlexanderTuzhilin. 2011. Context-aware recommender systems. AI Magazine 32 (2011),67–80. Issue 3.

[4] Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the Next Gener-ation of Recommender Systems: A Survey of the State-of-the-Art and PossibleExtensions. IEEE Trans. on Knowl. and Data Eng. 17, 6 (June 2005), 734–749. DOI:h�p://dx.doi.org/10.1109/TKDE.2005.99

[5] Deepak Agarwal and Bee-Chung Chen. 2009. Regression-based latent factormodels. In Proceedings of the 15th ACM SIGKDD international conference onKnowledge discovery and data mining. ACM, 19–28.

[6] Charu C Aggarwal. 2016. Content-based recommender systems. In RecommenderSystems. Springer, 139–166.

[7] Charu C Aggarwal. 2016. Ensemble-based and hybrid recommender systems. InRecommender Systems. Springer, 199–224.

[8] Charu C Aggarwal. 2016. Evaluating Recommender Systems. In RecommenderSystems. Springer, 225–254.

[9] Fabio Aiolli. 2013. E�cient top-n recommendation for very large scale binaryrated datasets. In Proceedings of the 7th ACM conference on Recommender systems.ACM, 273–280.

[10] Masoud Alghoniemy and Ahmed Tew�k. 2001. A Network Flow Model forPlaylist Generation. In Proceedings of the IEEE International Conference on Multi-media and Expo (ICME). Tokyo, Japan.

[11] Masoud Alghoniemy and Ahmed H Tew�k. 2000. User-de�ned music sequenceretrieval. In Proceedings of the eighth ACM international conference on Multimedia.ACM, 356–358.

[12] Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 2011. Modern Information Re-trieval – �e Concepts and Technology Behind Search (2nd ed.). Addison-Wesley,Pearson, Harlow, England.

[13] Linas Baltrunas, Marius Kaminskas, Bernd Ludwig, Omar Moling, FrancescoRicci, Karl-Heinz Luke, and Roland Schwaiger. 2011. InCarMusic: Context-Aware Music Recommendations in a Car. In International Conference on ElectronicCommerce and Web Technologies (EC-Web). Toulouse, France.

[14] Luke Barrington, Reid Oda, and Gert RG Lanckriet. Smarter than Genius? HumanEvaluation of Music Recommender Systems. Citeseer.

[15] Mathieu Barthet, Gyorgy Fazekas, and Mark Sandler. 2012. Multidisciplinary Per-spectives on Music Emotion Recognition: Implications for Content and Context-Based Models. In Proceedings of International Symposium on Computer MusicModelling and Retrieval. 492–507.

[16] Christine Bauer and Alexander Novotny. 2017. A consolidated view of contextfor intelligent systems. Journal of Ambient Intelligence and Smart Environments9, 4 (2017), 377–393. DOI:h�p://dx.doi.org/10.3233/ais-170445

[17] Paul N. Benne�, Filip Radlinski, Ryen W. White, and Emine Yilmaz. 2011. In-ferring and Using Location Metadata to Personalize Web Search. In Proceedingsof the 34th International ACM SIGIR Conference on Research and Developmentin Information Retrieval (SIGIR ’11). ACM, New York, NY, USA, 135–144. DOI:h�p://dx.doi.org/10.1145/2009916.2009938

[18] Ehud Bodner, Iulian Iancu, Avi Gilboa, Amiram Sarel, Avi Mazor, and DoritAmir. 2007. Finding words for emotions: �e reactions of patients with majordepressive disorder towards various musical excerpts. �e Arts in Psychotherapy34, 2 (2007), 142–150.

[19] Diana Boer and Ronald Fischer. 2010. Towards a holistic model of functionsof music listening across cultures: A culturally decentred qualitative approach.Psychology of Music 40, 2 (2010), 179–200.

[20] Dmitry Bogdanov, Martın Haro, Ferdinand Fuhrmann, Anna Xambo, EmiliaGomez, and Perfecto Herrera. 2013. Semantic Audio Content-based Music Rec-ommendation and Visualization Based on User Preference Examples. InformationProcessing & Management 49, 1 (2013), 13–33.

[21] Dirk Bollen, Bart P. Knijnenburg, Martijn C. Willemsen, and Mark Graus. 2010.Understanding Choice Overload in Recommender Systems. In Proceedings of the4th ACM Conference on Recommender Systems. Barcelona, Spain.

[22] Geo�ray Bonnin and Dietmar Jannach. 2015. Automated generation of musicplaylists: Survey and experiments. ACM Computing Surveys (CSUR) 47, 2 (2015),26.

[23] Ma�hias Braunhofer, Mehdi Elahi, and Francesco Ricci. 2014. Techniques for cold-starting context-aware mobile recommender systems for tourism. IntelligenzaArti�ciale 8, 2 (2014), 129–143. DOI:h�p://dx.doi.org/10.3233/IA-140069

[24] Ma�hias Braunhofer, Mehdi Elahi, and Francesco Ricci. 2015. User Personalityand the New User Problem in a Context-Aware Point of Interest RecommenderSystem. In Information and Communication Technologies in Tourism 2015, IisTussyadiah and Alessandro Inversini (Eds.). Springer International Publishing,Cham, 537–549.

[25] John S. Breese, David Heckerman, and Carl Kadie. 1998. Empirical analysisof predictive algorithms for collaborative �ltering. In Proceedings of the 14thConference on Uncertainty in Arti�cial Intelligence. Morgan Kaufmann PublishersInc, 43–52.

[26] Jerry M. Burger. 2010. Personality. Wadsworth Publishing, Belmont, CA., USA.[27] Robin Burke. 2002. Hybrid recommender systems: Survey and experiments. User

modeling and user-adapted interaction 12, 4 (2002), 331–370.[28] Robin Burke. 2007. HybridWeb Recommender Systems. Springer Berlin Heidelberg,

Berlin, Heidelberg, 377–408. DOI:h�p://dx.doi.org/10.1007/978-3-540-72079-912

[29] Ivan Cantador and Paolo Cremonesi. 2014. Tutorial on Cross-domain Recom-mender Systems. In Proceedings of the 8th ACM Conference on RecommenderSystems (RecSys’14). ACM, New York, NY, USA, 401–402. DOI:h�p://dx.doi.org/10.1145/2645710.2645777

[30] Ivan Cantador, Ignacio Fernandez-Tobıas, Shlomo Berkovsky, and Paolo Cre-monesi. 2015. Cross-Domain Recommender Systems. Springer US, Boston, MA,919–959. DOI:h�p://dx.doi.org/10.1007/978-1-4899-7637-6 27

[31] Giuseppe Carenini, Jocelyin Smith, and David Poole. 2003. Towards more con-versational and collaborative recommender systems. In Proceedings of the 8thinternational conference on Intelligent user interfaces (IUI ’03). ACM, New York,NY, USA, 12–18. DOI:h�p://dx.doi.org/10.1145/604045.604052

[32] Toni Cebrian, Marc Planaguma, Paulo Villegas, and Xavier Amatriain. 2010.Music Recommendations with Temporal Context Awareness. In Proceedings ofthe 4th ACM Conference on Recommender Systems (RecSys). Barcelona, Spain.

[33] Shuo Chen, Josh L. Moore, Douglas Turnbull, and �orsten Joachims. 2012.Playlist Prediction via Metric Embedding. In Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining (KDD ’12).ACM, New York, NY, USA, 714–722. DOI:h�p://dx.doi.org/10.1145/2339530.2339643

[34] Shuo Chen, Josh L Moore, Douglas Turnbull, and �orsten Joachims. 2012.Playlist prediction via metric embedding. In Proceedings of the 18th ACM SIGKDDinternational conference on Knowledge discovery and data mining. ACM, 714–722.

[35] Shuo Chen, Jiexun Xu, and �orsten Joachims. 2013. Multi-space probabilis-tic sequence modeling. In Proceedings of the 19th ACM SIGKDD internationalconference on Knowledge discovery and data mining. ACM, 865–873.

[36] Zhiyong Cheng and Jialie Shen. 2014. Just-for-Me: An Adaptive PersonalizationSystem for Location-Aware Social Music Recommendation. In Proceedings of the4th ACM International Conference on Multimedia Retrieval (ICMR). Glasgow, UK.

[37] Zhiyong Cheng and Jialie Shen. 2016. On e�ective location-aware music rec-ommendation. ACM Transactions on Information Systems (TOIS) 34, 2 (2016),13.

[38] Olmo Cornelis, Joren Six, Andre Holzapfel, and Marc Leman. 2013. Evaluationand Recommendation of Pulse and Tempo Annotation in Ethnic Music. Journalof New Music Research 42, 2 (2013), 131–149. DOI:h�p://dx.doi.org/10.1080/09298215.2013.812123

[39] Paolo Cremonesi, Mehdi Elahi, and Franca Garzo�o. 2017. User interface pat-terns in recommendation-empowered content intensive multimedia applica-tions. Multimedia Tools and Applications 76, 4 (01 Feb 2017), 5275–5309. DOI:h�p://dx.doi.org/10.1007/s11042-016-3946-5

[40] Paolo Cremonesi and Massimo �adrana. 2014. Cross-domain RecommendationsWithout Overlapping Data: Myth or Reality?. In Proceedings of the 8th ACMConference on Recommender Systems (RecSys ’14). ACM, New York, NY, USA,297–300. DOI:h�p://dx.doi.org/10.1145/2645710.2645769

[41] P. Cremonesi, A. Tripodi, and R. Turrin. 2011. Cross-Domain RecommenderSystems. In IEEE 11th International Conference on Data Mining Workshops. 496–503. DOI:h�p://dx.doi.org/10.1109/ICDMW.2011.57

[42] Stuart Cunningham, Stephen Caulder, and Vic Grout. 2008. Saturday Night orFever? Context-Aware Music Playlists. In Proceedings of the 3rd InternationalAudio Mostly Conference: Sound in Motion. Pitea, Sweden.

[43] Sally Jo Cunningham, David Bainbridge, and Anne�e Falconer. 2006. ‘Moreof an Art than a Science’: Supporting the Creation of Playlists and Mixes. InProceedings of the 7th International Conference on Music Information Retrieval(ISMIR). Victoria, BC, Canada.

[44] Sally Jo Cunningham, David Bainbridge, and Dana Mckay. 2007. Finding NewMusic: A Diary Study of Everyday Encounters with Novel Songs. In Proceedingsof the 8th International Conference on Music Information Retrieval. Vienna, Austria,

https://www.kaggle.com

http://dx.doi.org/10.1109/TKDE.2005.99

http://dx.doi.org/10.3233/ais-170445

http://dx.doi.org/10.1145/2009916.2009938

http://dx.doi.org/10.3233/IA-140069

http://dx.doi.org/10.1007/978-3-540-72079-9_12

http://dx.doi.org/10.1007/978-3-540-72079-9_12

http://dx.doi.org/10.1145/2645710.2645777

http://dx.doi.org/10.1145/2645710.2645777

http://dx.doi.org/10.1007/978-1-4899-7637-6_27

http://dx.doi.org/10.1145/604045.604052

http://dx.doi.org/10.1145/2339530.2339643

http://dx.doi.org/10.1145/2339530.2339643

http://dx.doi.org/10.1080/09298215.2013.812123

http://dx.doi.org/10.1080/09298215.2013.812123

http://dx.doi.org/10.1007/s11042-016-3946-5

http://dx.doi.org/10.1145/2645710.2645769

http://dx.doi.org/10.1109/ICDMW.2011.57

83–88.[45] Sally Jo Cunningham, J. Stephen Downie, and David Bainbridge. 2005. “�e Pain,

�e Pain”: Modelling Music Information Behavior And �e Songs We Hate. InProceedings of the 6th International Conference on Music Information Retrieval(ISMIR 2005). London, UK, 474–477.

[46] S. J. Cunningham and D. M. Nichols. 2009. Exploring Social Music Behaviour: AnInvestigation of Music Selection at Parties. In Proceedings of the 10th InternationalSociety for Music Information Retrieval Conference (ISMIR 2009). Kobe, Japan.

[47] Yashar Deldjoo, Paolo Cremonesi, Markus Schedl, and Massimo �adrana. 2017.�e e�ect of di�erent video summarization models on the quality of videorecommendation based on low-level visual features. In Proceedings of the 15thInternational Workshop on Content-Based Multimedia Indexing. ACM, 20.

[48] Yashar Deldjoo, Mehdi Elahi, Paolo Cremonesi, Franca Garzo�o, Pietro Piazzolla,and Massimo �adrana. 2016. Content-Based Video Recommendation SystemBased on Stylistic Visual Features. Journal on Data Semantics (2016), 1–15. DOI:h�p://dx.doi.org/10.1007/s13740-016-0060-9

[49] Anind K. Dey. 2001. Understanding and using context. Personal and UbiquitousComputing 5, 1 (2001), 4–7. DOI:h�p://dx.doi.org/10.1007/s007790170019

[50] L. Dey, M. U. Asad, N. Afroz, and R. P. D. Nath. 2014. Emotion extraction from realtime chat messenger. In 2014 International Conference on Informatics, ElectronicsVision (ICIEV). 1–5. DOI:h�p://dx.doi.org/10.1109/ICIEV.2014.6850785

[51] Justin Donaldson. 2007. A Hybrid Social-acoustic Recommendation System forPopular Music. In Proceedings of the ACM Conference on Recommender Systems(RecSys). Minneapolis, MN, USA, 4.

[52] Gideon Dror, Noam Koenigstein, Yehuda Koren, and Markus Weimer. 2011. �eyahoo! music dataset and kdd-cup’11. In Proceedings of the 2011 InternationalConference on KDD Cup 2011-Volume 18. JMLR. org, 3–18.

[53] Greg Dunn, Jurgen Wiersema, Jaap Ham, and Lora Aroyo. 2009. EvaluatingInterface Variants on Personality Acquisition for Recommender Systems. InProceedings of the 17th International Conference on User Modeling, Adaptation,and Personalization: formerly UM and AH (UMAP ’09). Springer-Verlag, Berlin,Heidelberg, 259–270.

[54] Shrey Du�a and Hema A. Murthy. 2014. Discovering Typical Motifs of a Ragafrom One-Liners of Songs in Carnatic Music. In Proceedings of the 15th Interna-tional Society for Music Information Retrieval Conference (ISMIR). Taipei, Taiwan,397–402.

[55] Georgi Dzhambazov, Ajay Srinivasamurthy, Sertan Senturk, and Xavier Serra.2016. On the Use of Note Onsets for Improved Lyrics-to-audio Alignmentin Turkish Makam Music. In 17th International Society for Music InformationRetrieval Conference (ISMIR 2016). New York, USA.

[56] Samira Ebrahimi Kahou, Vincent Michalski, Kishore Konda, Roland Memisevic,and Christopher Pal. 2015. Recurrent Neural Networks for Emotion Recogni-tion in Video. In Proceedings of the 2015 ACM on International Conference onMultimodal Interaction (ICMI ’15). ACM, New York, NY, USA, 467–474. DOI:h�p://dx.doi.org/10.1145/2818346.2830596

[57] Hamid Eghbal-zadeh, Bernhard Lehner, Markus Schedl, and Gerhard Widmer.2015. I-Vectors for Timbre-Based Music Similarity and Music Artist Classi�cation.In Proceedings of the 16th International Society for Music Information RetrievalConference (ISMIR). Malaga, Spain.

[58] Mehdi Elahi. 2011. Adaptive active learning in recommender systems. UserModeling, Adaption and Personalization (2011), 414–417.

[59] Mehdi Elahi, Ma�hias Braunhofer, Francesco Ricci, and Marko Tkalcic. 2013.Personality-based active learning for collaborative �ltering recommender sys-tems. In AI* IA 2013: Advances in Arti�cial Intelligence. Springer InternationalPublishing, 360–371. DOI:h�p://dx.doi.org/10.1007/978-3-319-03524-6 31

[60] Mehdi Elahi, Yashar Deldjoo, Farshad Bakhshandegan Moghaddam, LeonardoCella, Stefano Cereda, and Paolo Cremonesi. 2017. Exploring the Semantic Gapfor Movie Recommendations. In Proceedings of the Eleventh ACM Conference onRecommender Systems. ACM, 326–330.

[61] Mehdi Elahi, Valdemaras Repsys, and Francesco Ricci. 2011. Rating ElicitationStrategies for Collaborative Filtering. In EC-Web (Lecture Notes in Business Infor-mation Processing), Christian Huemer and �omas Setzer (Eds.), Vol. 85. Springer,160–171. DOI:h�p://dx.doi.org/10.1007/978-3-642-23014-1 14

[62] Mehdi Elahi, Francesco Ricci, and Neil Rubens. 2012. Adapting to natural ratingacquisition with combined active learning strategies. In ISMIS’12: Proceedings ofthe 20th international conference on Foundations of Intelligent Systems. Springer-Verlag, Berlin, Heidelberg, 254–263.

[63] Mehdi Elahi, Francesco Ricci, and Neil Rubens. 2014. Active Learning in Col-laborative Filtering Recommender Systems. In E-Commerce and Web Technolo-gies, Martin Hepp and Yigal Ho�ner (Eds.). Lecture Notes in Business Infor-mation Processing, Vol. 188. Springer International Publishing, 113–124. DOI:h�p://dx.doi.org/10.1007/978-3-319-10491-1 12

[64] Mehdi Elahi, Francesco Ricci, and Neil Rubens. 2014. Active Learning Strategiesfor Rating Elicitation in Collaborative Filtering: A System-wide Perspective.ACM Transactions on Intelligent Systems and Technology 5, 1, Article 13 (Jan.2014), 33 pages. DOI:h�p://dx.doi.org/10.1145/2542182.2542195

[65] Mehdi Elahi, Francesco Ricci, and Neil Rubens. 2016. A survey of active learningin collaborative �ltering recommender systems. Computer Science Review 20

(2016), 29–50.[66] Asmaa Elbadrawy and George Karypis. 2015. User-speci�c feature-based sim-

ilarity models for top-n recommendation of new items. ACM Transactions onIntelligent Systems and Technology (TIST) 6, 3 (2015), 33.

[67] Mehmet Erdal, Markus Kachele, and Friedhelm Schwenker. 2016. Emotion Recog-nition in Speech with Deep Learning Architectures. Springer International Pub-lishing, Cham, 298–311. DOI:h�p://dx.doi.org/10.1007/978-3-319-46182-3 25

[68] Ignacio Fernandez Tobias, Ma�hias Braunhofer, Mehdi Elahi, Francesco Ricci,and Cantador Ivan. 2016. Alleviating the New User Problem in CollaborativeFiltering by Exploiting Personality Information. User Modeling and User-AdaptedInteraction (UMUAI) 26, Personality in Personalized Systems (2016). DOI:h�p://dx.doi.org/10.1007/s11257-016-9172-z

[69] Ignacio Fernandez-Tobıas, Ivan Cantador, Marius Kaminskas, and FrancescoRicci. 2012. Cross-domain recommender systems: A survey of the state of theart. In Spanish Conference on Information Retrieval. 24.

[70] Bruce Ferwerda, Mark Graus, Andreu Vall, Marko Tkalcic, and Markus Schedl.2016. �e In�uence of Users’ Personality Traits on Satisfaction and A�ractivenessof Diversi�ed Recommendation Lists. In Proceedings of the 4th Workshop onEmotions and Personality in Personalized Services (EMPIRE 2016). Boston, USA.

[71] Bruce Ferwerda and Markus Schedl. 2016. Investigating the Relationship Be-tween Diversity in Music Consumption Behavior and Cultural Dimensions: ACross-country Analysis. In Workshop on Surprise, Opposition, and Obstruction inAdaptive and Personalized Systems.

[72] Bruce Ferwerda, Markus Schedl, and Marko Tkalcic. 2015. Personality & Emo-tional States: Understanding Users� Music Listening Needs. In Extended Pro-ceedings of the 23rd International Conference on User Modeling, Adaptation andPersonalization (UMAP). Dublin, Ireland.

[73] Bruce Ferwerda, Andreu Vall, Marko Tkalcic, and Markus Schedl. 2016. ExploringMusic Diversity Needs Across Countries. In Proc. UMAP.

[74] Bruce Ferwerda, Emily Yang, Markus Schedl, and Marko Tkalcic. 2015. Per-sonality Traits Predict Music Taxonomy Preferences. In ACM CHI ’15 ExtendedAbstracts on Human Factors in Computing Systems. Seoul, Republic of Korea.

[75] Arthur Flexer, Dominik Schnitzer, Martin Gasser, and Gerhard Widmer. 2008.Playlist Generation Using Start and End Songs. In Proceedings of the 9th Inter-national Conference on Music Information Retrieval (ISMIR). Philadelphia, PA,USA.

[76] Michael Gillhofer and Markus Schedl. 2015. Iron Maiden While Jogging, Debussyfor Dinner? - An Analysis of Music Listening Behavior in Context. In Proceedingsof the 21st International Conference on MultiMedia Modeling (MMM). Sydney,Australia.

[77] Samuel D Gosling, Peter J Rentfrow, and William B Swann Jr. 2003. A very briefmeasure of the Big-Five personality domains. Journal of Research in personality37, 6 (2003), 504–528.

[78] J. Gross. 2007. Emotion Regulation: Conceptual and Empirical Foundations. InHandbook of emotion regulation (2 ed.), J. Gross (Ed.). �e Guilford Press, NewYork, 1–19.

[79] Asela Gunawardana and Guy Shani. 2015. Evaluating Recommender Systems. InRecommender Systems Handbook (2nd ed.), Francesco Ricci, Lior Rokach, BrachaShapira, and Paul B. Kantor (Eds.). Springer, Chapter 8, 256–308.

[80] Jennefer Hart, Alistair G. Sutcli�e, and Antonella di Angeli. 2012. EvaluatingUser Engagement �eory. In CHI Conference on Human Factors in ComputingSystems. Paper presented in Workshop ’�eories behind UX Research and How�ey Are Used in Practice’ 6 May 2012.

[81] Marc Hassenzahl. 2005. �e �ing and I: Understanding the Relationship BetweenUser and Product. Springer Netherlands, Dordrecht, 31–42. DOI:h�p://dx.doi.org/10.1007/1-4020-2967-5 4

[82] Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T. Riedl.2004. Evaluating collaborative �ltering recommender systems. ACM Trans. Inf.Syst. 22, 1 (2004), 5–53. DOI:h�p://dx.doi.org/10.1145/963770.963772

[83] Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T. Riedl.2004. Evaluating Collaborative Filtering Recommender Systems. ACM Transac-tions on Information Systems 22, 1 (January 2004), 5–53.

[84] P. Herrera, Z. Resa, and M. Sordo. 2010. Rocking around the clock eight days aweek: an exploration of temporal pa�erns of music listening. In Proceedings of theACM Conference on Recommender Systems: Workshop on Music Recommendationand Discovery (WOMRAD 2010). 7–10.

[85] K. Hevner. 1935. Expression in Music: A Discussion of Experimental Studies and�eories. Psychological Review 42 (March 1935). Issue 2.

[86] Rong Hu and Pearl Pu. 2009. A comparative user study on rating vs. personalityquiz based preference elicitation methods. In Proceedings of the 14th internationalconference on Intelligent user interfaces (IUI ’09). ACM, New York, NY, USA,367–372. DOI:h�p://dx.doi.org/10.1145/1502650.1502702

[87] Rong Hu and Pearl Pu. 2010. A Study on User Perception of Personality-BasedRecommender Systems.. In UMAP (2010-06-29) (Lecture Notes in Computer Sci-ence), Paul De Bra, Alfred Kobsa, and David N. Chin (Eds.), Vol. 6075. Springer,291–302.

http://dx.doi.org/10.1007/s13740-016-0060-9

http://dx.doi.org/10.1007/s007790170019

http://dx.doi.org/10.1109/ICIEV.2014.6850785

http://dx.doi.org/10.1145/2818346.2830596

http://dx.doi.org/10.1007/978-3-319-03524-6_31

http://dx.doi.org/10.1007/978-3-642-23014-1_14

http://dx.doi.org/10.1007/978-3-319-10491-1_12

http://dx.doi.org/10.1145/2542182.2542195

http://dx.doi.org/10.1007/978-3-319-46182-3_25

http://dx.doi.org/10.1007/s11257-016-9172-z

http://dx.doi.org/10.1007/s11257-016-9172-z

http://dx.doi.org/10.1007/1-4020-2967-5_4

http://dx.doi.org/10.1007/1-4020-2967-5_4

http://dx.doi.org/10.1145/963770.963772

http://dx.doi.org/10.1145/1502650.1502702

[88] Rong Hu and Pearl Pu. 2011. Enhancing collaborative �ltering systems withpersonality information. In Proceedings of the ��h ACM conference on Rec-ommender systems (RecSys ’11). ACM, New York, NY, USA, 197–204. DOI:h�p://dx.doi.org/10.1145/2043932.2043969

[89] Xiao Hu and Jin Ha Lee. 2012. A Cross-cultural Study of Music Mood PerceptionBetween American and Chinese Listeners. In Proc. ISMIR.

[90] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative �ltering forimplicit feedback datasets. In Proceedings of the 8th IEEE International Conferenceon Data Mining. IEEE, 263–272.

[91] Yajie Hu and Mitsunori Ogihara. 2011. NextOne Player: A Music Recommen-dation System Based on User Behavior. In Proceedings of the 12th InternationalSociety for Music Information Retrieval Conference (ISMIR 2011). Miami, FL, USA.

[92] A Huq, J.P. Bello, and R. Rowe. 2010. Automated Music Emotion Recognition: ASystematic Evaluation. Journal of New Music Research 39, 3 (November 2010),227–244.

[93] Geo�ray Bonnin Iman Kamehkhosh, Dietmar Jannach. 2018. How AutomatedRecommendations A�ect the Playlist Creation Behavior of Users. In Joint Pro-ceedings of the 23rd ACM Conference on Intelligent User Interfaces (ACM IUI 2018)Workshops: Intelligent Music Interfaces for Listening and Creation (MILC). Tokyo,Japan.

[94] Kalervo Jarvelin and Jaana Kekalainen. 2002. Cumulated gain-based evaluationof IR techniques. ACM Trans. Inf. Syst. 20, 4 (2002), 422–446. DOI:h�p://dx.doi.org/10.1145/582415.582418

[95] Oliver John and Sanjay Srivastava. 1999. �e Big Five Trait Taxonomy: History,Measurement, and �eoretical Perspectives. In Handbook of personality: �eoryand research (second ed.), Lawrence A Pervin and Oliver P John (Eds.). Number510. Guilford Press, New York, 102–138.

[96] Oliver P. John and Sanjay Srivastava. 1999. �e Big Five trait taxonomy: History,measurement, and theoretical perspectives. In: Handbook of personality: �eoryand research. vol. 2, pp. 102 to 138.

[97] Patrik N. Juslin and John Sloboda. 2011. Handbook of music and emotion: �eory,research, applications. OUP Oxford.

[98] Marius Kaminskas and Derek Bridge. 2016. Diversity, Serendipity, Novelty, andCoverage: A Survey and Empirical Analysis of Beyond-Accuracy Objectives inRecommender Systems. ACM Trans. Interact. Intell. Syst. 7, 1, Article 2 (Dec.2016), 42 pages. DOI:h�p://dx.doi.org/10.1145/2926720

[99] Marius Kaminskas and Francesco Ricci. 2012. Contextual music informationretrieval and recommendation: State of the art and challenges. Computer ScienceReview 6, 2 (2012), 89–119.

[100] Marius Kaminskas, Francesco Ricci, and Markus Schedl. 2013. Location-awareMusic Recommendation Using Auto-Tagging and Hybrid Matching. In Proceed-ings of the 7th ACM Conference on Recommender Systems (RecSys). Hong Kong,China.

[101] John Paul Kelly and Derek Bridge. 2006. Enhancing the diversity of con-versational collaborative recommendations: a comparison. Arti�cial Intel-ligence Review 25, 1 (01 Apr 2006), 79–95. DOI:h�p://dx.doi.org/10.1007/s10462-007-9023-8

[102] Muhammad Murad Khan, Roliana Ibrahim, and Imran Ghani. 2017. Cross Do-main Recommender Systems: A Systematic Literature Review. ACM ComputingSurveys (CSUR) 50, 3 (2017), 36.

[103] Youngmoo E. Kim, Erik M. Schmidt, Raymond Migneco, Brandon G. Morton,Patrick Richardson, Je�rey Sco�, Jacquelin Speck, and Douglas Turnbull. 2010.Music Emotion Recognition: A State of the Art Review. In Proceedings of theInternational Society for Music Information Retrieval Conference.

[104] Daniel Kluver and Joseph A Konstan. 2014. Evaluating recommender behaviorfor new users. In Proceedings of the 8th ACM Conference on Recommender Systems.ACM, 121–128. DOI:h�p://dx.doi.org/10.1145/2645710.2645742

[105] Peter Knees, Tim Pohle, Markus Schedl, and Gerhard Widmer. 2006. CombiningAudio-based Similarity with Web-based Data to Accelerate Automatic MusicPlaylist Generation. In Proceedings of the 8th ACM SIGMM International Workshopon Multimedia Information Retrieval (MIR). Santa Barbara, CA, USA.

[106] P. Knees and M. Schedl. 2016. Music Similarity and Retrieval: An Introductionto Audio- and Web-based Strategies. Springer Berlin Heidelberg. h�ps://books.google.it/books?id=MdRhjwEACAAJ

[107] Bart P Knijnenburg and Martijn C Willemsen. 2015. Evaluating recommendersystems with user experiments. In Recommender Systems Handbook. Springer,309–352.

[108] Bart P Knijnenburg, Martijn C Willemsen, Zeno Gantner, Hakan Soncu, andChris Newell. 2012. Explaining the user experience of recommender systems.User Modeling and User-Adapted Interaction 22, 4-5 (2012), 441–504.

[109] Vladimir J Konecni. 1982. Social interaction and musical preference. �epsychology of music (1982), 497–516.

[110] S. L. Koole. 2009. �e psychology of emotion regulation: An integrative review.Cognition and Emotion 23 (2009), 4–41.

[111] Michal Kosinski, David Stillwell, and �ore Graepel. 2013. Private traits anda�ributes are predictable from digital records of human behavior. Proceedings ofthe National Academy of Sciences 110, 15 (April 2013), 5802��–5805.

[112] Fang-Fei Kuo, Meng-Fen Chiang, Man-Kwan Shan, and Suh-Yin Lee. 2005.Emotion-based music recommendation by association discovery from �lm music.In Proceedings of the 13th annual ACM international conference on Multimedia.ACM, 507–510.

[113] Audrey Laplante. 2014. Improving Music Recommender Systems: What WeCan Learn From Research on Music Tastes?. In 15th International Society forMusic Information Retrieval Conference. Taipei, Taiwan.

[114] Audrey Laplante and J. Stephen Downie. 2006. Everyday Life Music Information-Seeking Behaviour of Young Adults. In Proceedings of the 7th International Con-ference on Music Information Retrieval. Victoria (BC), Canada.

[115] Jin Ha Lee. 2011. How Similar is Too Similar? Exploring Users’ Perceptions ofSimilarity in Playlist Evaluation. In Proceedings of the 12th International Societyfor Music Information Retrieval Conference (ISMIR 2011). Miami, FL, USA.

[116] Jin Ha Lee, Hyerim Cho, and Yea-Seul Kim. 2016. Users’ music information needsand behaviors: Design implications for music information retrieval systems.Journal of the Association for Information Science and Technology 67, 6 (2016),1301–1330.

[117] Jin Ha Lee, Rachel Wishkoski, Lara Aase, Perry Meas, and Chris Hubbles. 2017.Understanding users of cloud music services: Selection factors, managementand access behavior, and perceptions. Journal of the Association for InformationScience and Technology 68, 5 (2017), 1186–1200.

[118] Jane�e Lehmann, Mounia Lalmas, Elad Yom-Tov, and Georges Dupret. 2012.Models of User Engagement. In Proceedings of the 20th International Conferenceon User Modeling, Adaptation, and Personalization (UMAP’12). Springer-Verlag,Berlin, Heidelberg, 164–175. DOI:h�p://dx.doi.org/10.1007/978-3-642-31454-414

[119] Qing Li, Sung Hyon Myaeng, Dong Hai Guan, and Byeong Man Kim. 2005. Aprobabilistic model for music recommendation considering audio features. InAsia Information Retrieval Symposium. Springer, 72–83.

[120] Nathan N. Liu and Qiang Yang. 2008. EigenRank: a ranking-oriented approachto collaborative �ltering. In SIGIR ’08: Proceedings of the 31st annual internationalACM SIGIR conference on Research and development in information retrieval. ACM,New York, NY, USA, 83–90. DOI:h�p://dx.doi.org/10.1145/1390334.1390351

[121] Beth Logan. 2002. Content-based Playlist Generation: Exploratory Experiments.In Proceedings of the 3rd International Symposium on Music Information Retrieval(ISMIR). Paris, France.

[122] Adam J Lonsdale and Adrian C North. 2011. Why do we listen to music? Auses and grati�cations analysis. British Journal of Psychology 102, 1 (February2011), 108–134.

[123] Francois Maillet, Douglas Eck, Guillaume Desjardins, Paul Lamere, and others.2009. Steerable Playlist Generation by Learning Song Similarity from RadioStation Playlists.. In ISMIR. 345–350.

[124] Brian McFee, �ierry Bertin-Mahieux, Daniel PW Ellis, and Gert RG Lanckriet.2012. �e million song dataset challenge. In Proceedings of the 21st InternationalConference on World Wide Web. ACM, 909–916.

[125] Brian McFee and Gert Lanckriet. 2011. �e Natural Language of Playlists.In Proceedings of the 12th International Society for Music Information RetrievalConference (ISMIR 2011). Miami, FL, USA.

[126] Brian McFee and Gert Lanckriet. 2012. Hypergraph Models of Playlist Dialects.In Proceedings of the 13th International Society for Music Information RetrievalConference (ISMIR). Porto, Portugal.

[127] Sean M. McNee, Shyong K. Lam, Joseph A. Konstan, and John Riedl. 2003. Inter-faces for eliciting new user preferences in recommender systems. In Proceedingsof the 9th international conference on User modeling (UM’03). Springer-Verlag,Berlin, Heidelberg, 178–187. h�p://dl.acm.org/citation.cfm?id=1759957.1759988

[128] Tao Mei, Bo Yang, Xian-Sheng Hua, and Shipeng Li. 2011. Contextual videorecommendation by multimodal relevance and user feedback. ACM Transactionson Information Systems (TOIS) 29, 2 (2011), 10.

[129] A.C. North and D.J. Hargreaves. 1996. Situational in�uences on reported musicalpreference. Psychomusicology: Music, Mind and Brain 15, 1-2 (1996), 30–45.

[130] Adrian North and David Hargreaves. 2008. �e social and applied psychology ofmusic. Oxford University Press.

[131] Adrian C North and David J Hargreaves. 1996. Situational in�uences on reportedmusical preference. Psychomusicology: A Journal of Research in Music Cognition15, 1-2 (1996), 30.

[132] Alberto Novello, Martin F. McKinney, and Armin Kohlrausch. 2006. PerceptualEvaluation of Music Similarity. In Proceedings of the 7th International Conferenceon Music Information Retrieval (ISMIR). Victoria, BC, Canada.

[133] Heather L. O’Brien and Elaine G. Toms. 2010. �e Development and Evaluationof a Survey to Measure User Engagement. J. Am. Soc. Inf. Sci. Technol. 61, 1 (Jan.2010), 50–69. DOI:h�p://dx.doi.org/10.1002/asi.v61:1

[134] Kenton O’Hara and Barry Brown (Eds.). 2006. Consuming Music Together:Social and Collaborative Aspects of Music Consumption Technologies. ComputerSupported Cooperative Work, Vol. 35. Springer Netherlands.

[135] F Pachet, P Roy, and D Cazaly. 1999. A combinatorial approach to content-basedmusic selection. In Multimedia Computing and Systems, 1999. IEEE InternationalConference on, Vol. 1. IEEE, 457–462.

http://dx.doi.org/10.1145/2043932.2043969

http://dx.doi.org/10.1145/582415.582418

http://dx.doi.org/10.1145/582415.582418

http://dx.doi.org/10.1145/2926720

http://dx.doi.org/10.1007/s10462-007-9023-8

http://dx.doi.org/10.1007/s10462-007-9023-8

http://dx.doi.org/10.1145/2645710.2645742

https://books.google.it/books?id=MdRhjwEACAAJ

https://books.google.it/books?id=MdRhjwEACAAJ

http://dx.doi.org/10.1007/978-3-642-31454-4_14

http://dx.doi.org/10.1007/978-3-642-31454-4_14

http://dx.doi.org/10.1145/1390334.1390351

http://dl.acm.org/citation.cfm?id=1759957.1759988

http://dx.doi.org/10.1002/asi.v61:1

[136] Roberto Pagano, Massimo �adrana, Mehdi Elahi, and Paolo Cremonesi.2017. Toward Active Learning in Cross-domain Recommender Systems. CoRRabs/1701.02021 (2017). arXiv:1701.02021 h�p://arxiv.org/abs/1701.02021

[137] Rong Pan, Yunhong Zhou, Bin Cao, Nathan Nan Liu, Rajan Lukose, MartinScholz, and Qiang Yang. 2008. One-class collaborative �ltering. In Proceedings ofthe 8th IEEE International Conference on Data Mining. IEEE, 502–511.

[138] Maria Panteli, Emmanouil Benetos, and Simon Dixon. 2016. Learning a FeatureSpace for Similarity in World Music. In Proceedings of the 17th InternationalSociety for Music Information Retrieval Conference (ISMIR 2016). New York, NY,USA.

[139] Seung-Taek Park and Wei Chu. 2009. Pairwise preference regression forcold-start recommendation. In Proceedings of the third ACM conference onRecommender systems (RecSys ’09). ACM, New York, NY, USA, 21–28. DOI:h�p://dx.doi.org/10.1145/1639714.1639720

[140] T.F. Pe�ijohn, G.M. Williams, and T.C. Carter. 2010. Music for the Seasons:Seasonal Music Preferences in College Students. Current Psychology (2010), 1–18.

[141] Martin Pichl, Eva Zangerle, and Gunther Specht. 2015. Towards a context-aware music recommendation approach: What is hidden in the playlist name?.In Data Mining Workshop (ICDMW), 2015 IEEE International Conference on. IEEE,1360–1365.

[142] Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer.2007. “Reinventing the Wheel”: A Novel Approach to Music Player Interfaces.IEEE Transactions on Multimedia 9 (2007), 567–575. Issue 3.

[143] Pearl Pu, Li Chen, and Rong Hu. 2012. Evaluating recommender systemsfrom the user’s perspective: survey of the state of the art. User Modeling andUser-Adapted Interaction 22, 4-5 (2012), 317–355. DOI:h�p://dx.doi.org/10.1007/s11257-011-9115-7

[144] Marko Punkanen, Tuomas Eerola, and Jaakko Erkkila. 2011. Biased emotionalrecognition in depression: Perception of emotions in music by depressed patients.Journal of A�ective Disorders 130, 1–2 (2011), 118–126.

[145] Massimo �adrana, Paolo Cremonesi, and Dietmar Jannach. 2018. Sequence-Aware Recommender Systems. arXiv preprint arXiv:1802.08452 (2018).

[146] Al Mamunur Rashid, George Karypis, and John Riedl. 2008. Learning pref-erences of new users in recommender systems: an information theoretic ap-proach. SIGKDD Explor. Newsl. 10 (December 2008), 90–100. Issue 2. DOI:h�p://dx.doi.org/10.1145/1540276.1540302

[147] Peter J. Rentfrow and Samuel D. Gosling. 2003. �e do re mi’s of everydaylife: �e structure and personality correlates of music preferences. Journal ofPersonality and Social Psychology 84, 6 (2003), 1236–1256.

[148] Rafael Caro Repe�o and Xavier Serra. 2014. Creating a Corpus of Jingju (BeijingOpera) Music and Possibilities for Melodic Analysis. In 15th International Societyfor Music Information Retrieval Conference. Taipei, Taiwan, 313–318.

[149] Gordon Reynolds, Dan Barry, Ted Burke, and Eugene Coyle. 2007. Towards aPersonal Automatic Music Playlist Generation Algorithm: �e Need for Contex-tual Information. In Proceedings of the 2nd International Audio Mostly Conference:Interaction with Sound. Ilmenau, Germany, 84–89.

[150] Marco Tulio Ribeiro, Anisio Lacerda, Adriano Veloso, and Nivio Ziviani. 2012.Pareto-e�cient Hybridization for Multi-objective Recommender Systems. In Pro-ceedings of the Sixth ACM Conference on Recommender Systems (RecSys ’12). ACM,New York, NY, USA, 19–26. DOI:h�p://dx.doi.org/10.1145/2365952.2365962

[151] Neil Rubens, Mehdi Elahi, Masashi Sugiyama, and Dain Kaplan. 2015. ActiveLearning in Recommender Systems. In Recommender Systems Handbook - chapter24: Recommending Active Learning. Springer US, 809–846.

[152] James A. Russell. 1980. A Circumplex Model of A�ect. Journal of Personalityand Social Psychology 39, 6 (1980), 1161–1178.

[153] �omas Schafer, Fee Auerswald, Ina Kristin Bajorat, Nika Ergemlidze, Katha-rina Frille, Jonas Gehrigk, Anastasia Gusakova, Bernade�e Kaiser, Rosi AnnaPatzold, Ana Sanahuja, Sema Sari, Anna Schramm, Caroline Walter, and TanjaWilker. 2016. �e e�ect of social feedback on music preference. Musicae Sci-entiae 20, 2 (2016), 263–268. DOI:h�p://dx.doi.org/10.1177/1029864915622054arXiv:h�p://dx.doi.org/10.1177/1029864915622054

[154] �omas Schafer and Claudia Mehlhorn. 2017. Can Personality Traits PredictMusical Style Preferences? A Meta-Analysis. Personality and Individual Di�er-ences 116 (2017), 265–273. DOI:h�p://dx.doi.org/h�ps://doi.org/10.1016/j.paid.2017.04.061

[155] �omas Schafer, Peter Sedlmeier, Christine Stdtler, and David Huron. 2013. �epsychological functions of music listening. Frontiers in Psychology 4, 511 (2013),1–34.

[156] Markus Schedl. 2017. Investigating country-speci�c music preferences andmusic recommendation algorithms with the LFM-1b dataset. International Journalof Multimedia Information Retrieval 6, 1 (2017), 71–84. DOI:h�p://dx.doi.org/10.1007/s13735-017-0118-y

[157] Markus Schedl, Georg Breitschopf, and Bogdan Ionescu. 2014. Mobile MusicGenius: Reggae at the Beach, Metal on a Friday Night?. In Proceedings of the 4th

ACM International Conference on Multimedia Retrieval (ICMR). Glasgow, UK.[158] Markus Schedl, Arthur Flexer, and Julian Urbano. 2013. �e Neglected User in

Music Information Retrieval Research. Journal of Intelligent Information Systems

(July 2013).[159] Markus Schedl, Emilia Gomez, Erika S. Trent, Marko Tkalcic, Hamid Eghbal-

Zadeh, and Agustin Martorell. 2017. On the Interrelation between ListenerCharacteristics and the Perception of Emotions in Classical Orchestra Music.IEEE Transactions on A�ective Computing (2017).

[160] Markus Schedl, David Hauger, and Dominik Schnitzer. 2012. A Model forSerendipitous Music Retrieval. In Proceedings of the 2nd Workshop on Context-awareness in Retrieval and Recommendation (CaRR). Lisbon, Portugal.

[161] Markus Schedl, Peter Knees, and Fabien Gouyon. 2017. New Paths in MusicRecommender Systems Research. In Proceedings of the 11th ACM Conference onRecommender Systems (RecSys 2017). Como, Italy.

[162] Markus Schedl, Peter Knees, Brian McFee, Dmitry Bogdanov, and Marius Kamin-skas. 2015. Music Recommender Systems. In Recommender Systems Handbook(2nd ed.), Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor (Eds.).Springer, Chapter 13, 453–492.

[163] Markus Schedl, Mark Melenhorst, Cynthia C.S. Liem, Agustın Martorell, ’OscarMayor, and Marko Tkalcic. 2016. A Personality-based Adaptive System for Visu-alizing Classical Music Performances. In Proceedings of the 7th ACM MultimediaSystems Conference (MMSys). Klagenfurt, Austria.

[164] Andrew I. Schein, Alexandrin Popescul, Lyle H. Ungar, and David M. Pennock.2002. Methods and metrics for cold-start recommendations. In SIGIR ’02: Pro-ceedings of the 25th annual international ACM SIGIR conference on Research anddevelopment in information retrieval. ACM, New York, NY, USA, 253–260. DOI:h�p://dx.doi.org/10.1145/564376.564421

[165] Xavier Serra. 2014. Computational Approaches to the Art Music Traditions ofIndia and Turkey.

[166] Xavier Serra. 2014. Creating Research Corpora for the Computational Study ofMusic: the case of the CompMusic Project. In AES 53rd International Conferenceon Semantic Audio. AES, AES, London, UK, 1–9.

[167] Klaus Seyerlehner, Markus Schedl, Tim Pohle, and Peter Knees. 2010. UsingBlock-Level Features for Genre Classi�cation, Tag Classi�cation and MusicSimilarity Estimation. In Extended Abstract to the Music Information RetrievalEvaluation eXchange (MIREX 2010) / 11th International Society for Music Informa-tion Retrieval Conference (ISMIR 2010). Utrecht, the Netherlands.

[168] Klaus Seyerlehner, Gerhard Widmer, Markus Schedl, and Peter Knees. 2010.Automatic Music Tag Classi�cation based on Block-Level Features. In Proceedingsof the 7th Sound and Music Computing Conference (SMC). Barcelona, Spain.

[169] Bo Shao, Dingding Wang, Tao Li, and Mitsunori Ogihara. 2009. Music Recom-mendation Based on Acoustic Features and User Access Pa�erns. IEEE Transac-tions on Audio, Speech, and Language Processing 17, 8 (2009), 1602–1611.

[170] Marcin Skowron, Bruce Ferwerda, Marko Tkalcic, and Markus Schedl. 2016.Fusing Social Media Cues: Personality Prediction from Twi�er and Instagram.In Proceedings of the 25th International World Wide Web Conference (WWW).Montreal, Canada.

[171] M. Skowron, F. Lemmerich, Bruce Ferwerda, and Markus Schedl. 2017. Pre-dicting Genre Preferences from Cultural and Socio-economic Factors for MusicRetrieval. In Proc. ECIR.

[172] Malcolm Slaney and William White. 2006. Measuring playlist diversity forrecommendation systems. In Proceedings of the 1st ACM workshop on Audio andmusic computing multimedia. ACM, 77–82.

[173] Barry Smyth and Paul McClave. 2001. Similarity vs. Diversity. In Proceedings ofthe 4th International Conference on Case-Based Reasoning: Case-Based ReasoningResearch and Development (ICCBR ’01). SpringerVerlag, London, UK, 347–361.h�p://dl.acm.org/citation.cfm?id=646268.758890

[174] Mohamed Sordo, Amin Chaachoo, and Xavier Serra. 2014. Creating Corpora forComputational Research in Arab-Andalusian Music. In 1st InternationalWorkshopon Digital Libraries for Musicology. London, UK, 1–3. DOI:h�p://dx.doi.org/10.1145/2660168.2660182

[175] Kirsten Swearingen and Rashmi Sinha. 2001. Beyond algorithms: An HCI per-spective on recommender systems. In ACM SIGIR 2001Workshop on RecommenderSystems, Vol. 13. 1–11.

[176] M. Tamir. 2011. �e maturing �eld of emotion regulation. Emotion Review 3(2011), 3–7.

[177] Nava Tintarev, Christoph Lo�, and Cynthia C.S. Liem. 2017. Sequencesof Diverse Song Recommendations: An Exploratory Study in a CommercialSystem. In Proceedings of the 25th Conference on User Modeling, Adaptationand Personalization (UMAP ’17). ACM, New York, NY, USA, 391–392. DOI:h�p://dx.doi.org/10.1145/3079628.3079633

[178] Marko Tkalcic, Andrej Kosir, and Jurij Tasic. 2013. �e LDOS-PerA�-1 corpusof facial-expression video clips with a�ective, personality and user-interactionmetadata. Journal on Multimodal User Interfaces 7, 1-2 (2013), 143–155. DOI:h�p://dx.doi.org/10.1007/s12193-012-0107-7

[179] Marko Tkalcic, Daniele �ercia, and Sabine Graf. 2016. Preface to the spe-cial issue on personality in personalized systems. User Modeling and User-Adapted Interaction 26, 2 (June 2016), 103–107. DOI:h�p://dx.doi.org/10.1007/s11257-016-9175-9

http://arxiv.org/abs/1701.02021

http://arxiv.org/abs/1701.02021

http://dx.doi.org/10.1145/1639714.1639720

http://dx.doi.org/10.1007/s11257-011-9115-7

http://dx.doi.org/10.1007/s11257-011-9115-7

http://dx.doi.org/10.1145/1540276.1540302

http://dx.doi.org/10.1145/2365952.2365962

http://dx.doi.org/10.1177/1029864915622054

http://arxiv.org/abs/http://dx.doi.org/10.1177/1029864915622054

http://dx.doi.org/https://doi.org/10.1016/j.paid.2017.04.061

http://dx.doi.org/https://doi.org/10.1016/j.paid.2017.04.061

http://dx.doi.org/10.1007/s13735-017-0118-y

http://dx.doi.org/10.1007/s13735-017-0118-y

http://dx.doi.org/10.1145/564376.564421

http://dl.acm.org/citation.cfm?id=646268.758890

http://dx.doi.org/10.1145/2660168.2660182

http://dx.doi.org/10.1145/2660168.2660182

http://dx.doi.org/10.1145/3079628.3079633

http://dx.doi.org/10.1007/s12193-012-0107-7

http://dx.doi.org/10.1007/s11257-016-9175-9

http://dx.doi.org/10.1007/s11257-016-9175-9

[180] Alexandra Uitdenbogerd and R Schyndel. 2002. A review of factors a�ectingmusic recommender success. In ISMIR 2002, 3rd International Conference on MusicInformation Retrieval. IRCAM-Centre Pompidou, 204–208.

[181] Andreu Vall, Massimo �adrana, Markus Schedl, Gerhard Widmer, and PaoloCremonesi. 2017. �e Importance of Song Context in Music Playlists. In Pro-ceedings of the Poster Track of the 11th ACM Conference on Recommender Systems(RecSys). Como, Italy.

[182] Saul Vargas, Linas Baltrunas, Alexandros Karatzoglou, and Pablo Castells. 2014.Coverage, Redundancy and Size-awareness in Genre Diversity for RecommenderSystems. In Proceedings of the 8th ACM Conference on Recommender Systems(RecSys ’14). ACM, New York, NY, USA, 209–216. DOI:h�p://dx.doi.org/10.1145/2645710.2645743

[183] Saul Vargas and Pablo Castells. 2011. Rank and Relevance in Novelty andDiversity Metrics for Recommender Systems. In Proceedings of the 5th ACMConference on Recommender Systems (RecSys). Chicago, IL, USA, 8.

[184] Xinxi Wang, David Rosenblum, and Ye Wang. 2012. Context-aware MobileMusic Recommendation for Daily Activities. In Proceedings of the 20th ACMInternational Conference on Multimedia. ACM, Nara, Japan, 99–108.

[185] Markus Weimer, Alexandros Karatzoglou, and Alex Smola. 2008. Adaptivecollaborative �ltering. In RecSys ’08: Proceedings of the 2008 ACM conference onRecommender systems. ACM, New York, NY, USA, 275–282. DOI:h�p://dx.doi.org/10.1145/1454008.1454050

[186] Yi-Hsuan Yang and Homer H. Chen. 2011. Music Emotion Recognition. CRCPress.

[187] Y.-H. Yang and H.-H. Chen. 2012. Machine recognition of music emotion: Areview. ACM Trans. Intelligent Systems and Technology 3, 4 (2012).

[188] Yi-Hsuan Yang and Homer H. Chen. 2013. Machine Recognition of MusicEmotion: A Review. Transactions on Intelligent Systems and Technology 3, 3 (May2013).

[189] Kazuyoshi Yoshii, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, andHiroshi G Okuno. 2006. Hybrid Collaborative and Content-based Music Recom-mendation Using Probabilistic Model with Latent User Preferences.. In ISMIR,Vol. 6. 7th.

[190] Hamed Zamani, Michael Bendersky, Xuanhui Wang, and Mingyang Zhang. 2017.Situational Context for Ranking in Personal Search. In Proceedings of the 26thInternational Conference on World Wide Web (WWW ’17). International WorldWide Web Conferences Steering Commi�ee, Republic and Canton of Geneva,Switzerland, 1531–1540. DOI:h�p://dx.doi.org/10.1145/3038912.3052648

[191] Marcel Zentner, Didier Grandjean, and Klaus R Scherer. 2008. Emotions evokedby the sound of music: characterization, classi�cation, and measurement. Emo-tion 8, 4 (2008), 494.

[192] Zihan Zhang, Xiaoming Jin, Lianghao Li, Guiguang Ding, and Qiang Yang. 2016.Multi-Domain Active Learning for Recommendation.. In AAAI. 2358–2364.

[193] Zhang, Yuan Cao and O Seaghdha, Diarmuid and �ercia, Daniele and Jambor,Tamas. 2012. Auralist: Introducing Serendipity into Music Recommendation.In Proceedings of the 5th ACM International Conference on Web Search and DataMining (WSDM). Sea�le, WA, USA.

[194] Elena Zheleva, John Guiver, Eduarda Mendes Rodrigues, and Natasa Milic-Frayling. 2010. Statistical Models of Music-listening Sessions in Social Media.In Proceedings of the 19th International Conference on World Wide Web (WWW).Raleigh, NC, USA, 1019–1028.

[195] Tao Zhou, Zoltan Kuscsik, Jian-Guo Liu, Matus Medo, Joseph Rushton Wakeling,and Yi-Cheng Zhang. 2010. Solving the apparent diversity-accuracy dilemma ofrecommender systems. Proceedings of the National Academy of Sciences 107, 10(2010), 4511–4515.

[196] Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, and Georg Lausen. 2005.Improving recommendation lists through topic diversi�cation. In Proceedings ofthe 14th International Conference on the World Wide Web. ACM, 22–32.

http://dx.doi.org/10.1145/2645710.2645743

http://dx.doi.org/10.1145/2645710.2645743

http://dx.doi.org/10.1145/1454008.1454050

http://dx.doi.org/10.1145/1454008.1454050

http://dx.doi.org/10.1145/3038912.3052648

current challenges and visions inmusic …current challenges and visions in music recommender...

Documents