data science-2013-heekim
Post on 07-Dec-2014
840 Views
Preview:
DESCRIPTION
TRANSCRIPT
A Unified Music Recommender System Using Users’ Listening Habits and Semantics of Tags
Hyon Hee Kim
Department of Statistics and Information Science,
Dongduk Women’s University
Outline
• Motivation & Objectives
• Overview of the System
• Generation of User Profiles
• A Unified Music Recommendation
• Performance Evaluation
• Related Work
• Conclusions and Future Work
Motivation (1/3)
• In a Social Music Site – Music recommendation is essential.
– Music recommendation is different from other product recommendation
• Explicit information : Rating system
• Implicit information : the number of plays
• Listening habits-based User Profiling – Cold Start Problem
• A new users with little information
• A new items with only a few ratings
– Data Sparsity Problem
• Data is very small compared to needed music items
Classic rock
british
pop
rock
• Collaborative Tagging – A tool for users to represent their preferences about web resources
– Users add keywords which are freely chosen by themselves to web resources
– Using tag data for user profiling in personalized recommender systems
• Tag-based User Profiling – More Easily added tags without listening to music
– Semantically meaningful tags
Motivation (2/3)
Motivation (3/3)
• In the case of last.fm
• Factual Tags – 85% of tags
– genre, region, instrumentation
• Emotional Tags – 10% of tags
– opinion, sentiment, mood
• Personal Tags – 5% of tags
– to organize, to browse, etc.
Objectives
• A Novel Approach to Music Recommendation – Combining listening habits and semantics of tags
• Using a Tag Ontology and an Emotion Ontology – UniTag: Resolving semantic ambiguity of tags
– UniEmotion: Assigning weighted values to the emotional tags
→ Semantically Enhanced Music Recommendation
Outline
• Motivation & Objectives
• Overview of the System
• Generation of User Profiles
• A Unified Music Recommendation
• Performance Evaluation
• Related Work
• Conclusions and Future Work
Overview of the System
Outline
• Motivation & Objectives
• Overview of the System
• Tag-based User Profiling – Preprocessing of tags
– Algorithms for generating user profiles
– Preliminary experimental results
• A Unified Music Recommendation
• Performance Evaluation
• Related Work
• Conclusions and Future Work
Preprocessing of Tags (1/3)
• A tag does not have any pre-defined term or hierarchies of a term
• Problems of tag data – Synonymy
• Different words represents the same meaning
• E.g., hiphop, hip-hop, hip hop/ R & B, Rhythm and Blues, Blues
– Polysemy • A single word contains multiple meanings
• E.g., French => French rock, French pop, French artist
– Spelling variants
• misspelling
• Foreign language
Preprocessing of Tags (2/3)
• Tag Ontology – Tags, users, items
• UniTag Ontology – uniTag:Users
• uniTag:userID, uniTag:hasAdded, uniTag:hasAddedTo
– uniTag:Items
• uniTag:itemID
– uniTag:Tags
• uniTag:tagID, uniTag:tagName, uniTag:RTag, uniTag:subTag,
• uniTag:Rtags {rock, hiphop, electronic, metal, jazz, rap, funk, folk, blues, reggae}
• uniTag:classifiedAs, uniTag:isKindOf, uniTag:istheSameAs, uniTag:tagVariation
Preprocessing of Tags (3/3)
• Rules for reasoning prefix – French rock, progressive rock, post rock=> rock
(Tag (?t) ^ tagPrefix (?t, ?p) ^ Prefix(?p) ^ subTag(?t, ?s) ^ Rtags (?s) -> classifiedAs (?t, ?s)
• Rules for reasoning expert knowledge – Soul => rhythm and blues, rhythm and blues => blues then Soul => blues
(Tag (?t) ^ isKindof (?t, ?A) ^ isKindof (?A, ?B) -> isKindof (?t, ?B)
• Rules for reasoning synonym – Hip-hop, hiphop => hip hop
(Tag(?t) ^tagVariation (?t, ?R) ^ istheSameAs (?t, ?s) -> tagVariation (?s, ?R)
Algorithm for Generating User Profiles (1/2) Algorithm 1. Generation of A Tag-based Profile
Input: set of Representative tags Tr, set of a user’s tag Tu
Output: set of frequencey for each representative tag of the user FTr
var RTags[] = {rock, hiphop, electronic, metal, jazz, rap, funk, folk, blues, reggae}
var tagFrequency[] = { }, tempFrequency [] = { }
var RTag = null
while ∃next tag t in Tu do
RTag = FindRTag (t)
If Rtag == RTags [i] then
{ tempFrequency[i] = tempFrequency[i] + 1
tagFrequency [i] = tempFrequency [i] }
else
tagFrequency [i] = tempFrequency [i]
endwhile
rock hiphop electronic metal jazz rap funk folk blues reggae
user1 6 2 2 3 2 4 3 1 1 1
user2 5 0 0 0 0 0 0 0 1 0
user3 2 2 1 1 1 1 2 0 0 1
user4 10 1 0 1 2 0 2 3 3 1
user5 1 4 0 0 0 4 1 0 0 0
Table 1. An example of tag-based profiles
Algorithm for generating User Profiles (2/2)
Algorithm 2. Generation of A Track-based Profile Input: set of tracks of a usr TRu, set of Representative tags Tr Output: set of number of a user’s tracks for each representative musical genre Tn var RTags[] = {rock, hiphop, electronic, metal, jazz, rap, funk, folk, blues, reggae} var numTrack[ ] = { }, tempnumTrack [ ] = { } var RTrack = null while ∃next tag t in Tu do RTrack = FindGenre (t) If Rtrack == RTags [i] then { tempnumTrack [i] = tempnumTrack[i] + 1 numTrack[i] = tempnumTrack [i] } else numTrack [i] = tempnumTrack [i] endwhile
rock hiphop electronic metal jazz rap funk folk blues reggae
User1 65 176 5 4 0 168 0 3 0 0
User2 411 8 11 109 3 5 8 1 0 0
User3 157 7 11 10 6 2 1 39 4 2
User4 257 20 9 18 2 5 0 9 0 0
User5 110 277 15 8 6 85 10 3 2 7
Table 2. An example of track-based profiles
Preliminary Experimental Results (1/3)
• 1,000 user data set from Last.fm – Users, tags, music items
• Standardization – To remove extensive preference
• K-Means clustering algorithm – Canopy Clustering
– 6 centroid points and 6 clusters
Preliminary Experimental Results (2/3)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
Cluster1 0.241 1.472 0.626 0.130 1.267 1.621 2.168 0.274 1.078 0.381
Cluster2 2.171 0.032 0.517 3.052 0.011 -0.030 0.328 1.533 1.245 0.162
Cluster3 -0.206 -0.273 -0.517 -0.178 -0.180 -0.294 -0.233 -0.171 -0.204 -0.136
Cluster4 -0.341 0.660 -0.459 -0.284 -0.208 1.178 -0.179 -0.321 -0.166 0.273
Cluster5 -0.074 -0.155 1.320 -0.230 -0.115 -0.261 -0.209 -0.070 -0.172 -0.071 Cluster6 2.815 7.640 5.168 -0.136 9.254 6.135 7.000 4.286 4.421 5.254
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
Cluster1 -0.411 0.495 0.406 -0.338 1.565 0.131 1.632 -0.135 0.147 0.812
Cluster2 0.200 -0.444 0.007 -0.341 0.907 -0.468 -0.288 2.617 1.097 0.020
Cluster3 -0.897 1.651 -0.539 -0.442 -0.213 1.836 0.059 -0.507 -0.415 0.034
Cluster4 1.925 -0.590 -0.404 0.852 -0.264 -0.491 0.655 -0.002 2.850 -0.108
Cluster5 0.914 -0.557 -0.216 0.794 -0.296 -0.511 -0.297 0.014 -0.157 -0.147 Cluster6 -0.472 -0.327 0.380 -0.373 -0.184 -0.371 -0.241 -0.205 -0.300 -0.093
Table 3. Values of Centers of Tag-based Profiles
Table 4. Values of Centers of Track-based Profiles
• Clustering Validity – Inter-cluster distances
– Distances between all pairs of centroids using cosine distance measure
Preliminary Experimental Results (3/3)
– T-test
• Mean of inter-cluster distances of tag-based profiles
• Mean of inter-cluster distances of track-based profiles
N Mean Std Dev t p-value
Tag-based profiles
15 0.8325 0.6834
2.55 0.0165 Track-based profiles
15 0.3785 0.0885
Table 5. T-test result for the means of inter-cluster distances
Outline
• Motivation & Objectives
• Overview of the System
• Generation of User Profiles
• A Unified Music Recommendation – UniEmotion Ontology
– Generation of User Profiles
– Music Recommendation Algorithm
• Performance Evaluation
• Related Work
• Conclusions and Future Work
UniEmotion Ontology (1/5)
[Plutchik’s model]
UniEmotion Ontology (2/5)
P: 0.625, O: 0.25, N: 0.125
P: 0.375, O: 0.625, N: 0
P: 1.0, O: 0, N: 0
• Definition of the intensity of emotional tags • SentiWordNet, http://sentiwordnet.isti.cnr.it/
UniEmotion Ontology (3/5)
• Intensity of emotional tags
– Strong • Positive value >= 0.75 or Negative value>= 0.75
– Middle • 0.25 <= Positive value <= 0.75 or
• 0.25 <= Negative value <= 0.75
– Weak • Positive value < 0.25 and Negative value < 0.25
UniEmotion Ontology (4/5)
• Assigning the weights to the tags
– Factual tags: 1
– Positive tags • Strong: 2.5
• Middle: 2
• Weak: 1.5
– Negative tags • Strong: -2.5
• Middle: -2
• Weak: -1.5
• Final score of an item => sum of the weights
UniEmotion Ontology (5/5)
• Two classes
– UniEmotion:Positive • Emotional tags belonging to the positive emotional categories
• trust, surprise, anticipation, and happiness
– UniEmotion:Negative • Emotional tags belonging to the negative emotional categories
• disgust, anger, fear, and sadness
• Two properties
– UniEmotion:Intensity • Specifying the intensity of tags
– UniEmotion:Weight • Specifying the weight of tags
Generation of User Profiles (1/2)
1. Listening habits-based User Profiles – U1 = {u1, u2, …, um}, I1 = {i1, i2, …, in},
– <u, I, n> • N: number of plays
2. Tag score-based User Profiles – U2 = {u1, u2, …, um}, I2 = {i1, i2, …, in},
– <u, I, s> • S: scores of tags assigned by UniEmotion ontology
3. Hybrid User Profiles – U3 = {u1, u2, …, um}, I3 = I1 ∩ I2,
– <u, I, m> • M = α * n +(1- α) * s; α = 0.5
Generation of User Profiles (2/2) 1. Listening habits-based
User profiles
2. Tag score-based User profiles
3. Hybrid User profiles
Music Recommendation Algorithm (1/2)
• Finding Similar Users
– Pearson Correlation Similarity
• Calculating scores of items
– Considering the similar users’ rates
• Recommending top n items
Music Recommendation Algorithm (2/2)
Input: a set of user profiles UP
Output: a set of recommended items RI
1. For all yi ∈ U
Compute a similarity s between X and yi.
2. Sort by similarity
3. Select top n neighbors
4.
5. For all
Compute a similarity t between x and
For all
preference +=t * pref
6. Rank by preference
7. Select top n items
Outline
• Motivation & Objectives
• Overview of the System
• Generation of User Profiles
• A Unified Music Recommendation
• Performance Evaluation
• Related Work
• Conclusions and Future Work
Performance Evaluation
• Implementation Environment: Apache Web Server
– User database : MySQL 5.0
– Listening habits collector, tag score generator: PHP
– Recommendation Engine: Apache Mahout
– UniTag and UniEmotion Ontology: JDK6.0
• Experimental Data
– 1, 000 user information from last.fm [http://mir.dcs.gla.ac.uk/]
– Containing 18,700 artist and 12,600 tags
– 70% training data, 30% test data
Performance Evaluation • Evaluation Model
– Recommended items • Items which users are interested in (True Positive, TP)
• Items which users are not (False Positive, FP)
– Items which are not recommended • Items which users are interested in (False Negative, FN)
• Items which users are not interested in (True Negative, TN)
– Precision P = TP/ TP+ FP • # of correct recommendation/# of all recommended items
– Recall R = TP / TP+FN • # of correct recommendation/# of preferred items
– F-measure F = 2* P* R / P+R • Harmonic average between precision and recall
Experimental Results (1/3)
• Precisions
[Number of similar users] [Number of recommended items]
A: Listening habits-based approach
B: Tag-based approach
C: Hybrid approach
Experimental Results (2/3)
• Recalls
[Number of similar users] [Number of recommended items]
A: Listening habits-based approach
B: Tag-based approach
C: Hybrid approach
Experimental Results (3/3)
• F-measure
[Number of similar users] [Number of recommended items]
A: Listening habits-based approach
B: Tag-based approach
C: Hybrid approach
Statistical Validation
• One-way ANOVA about three groups
– Method1: listening habits-based approach
– Method2: tag-based approach
– Method3: hybrid approach
• Tukey Multiple Comparison Test
– Asymmetric distributions • Log transformation
– Different characters in case two groups have significant difference
Method 1 2 3 F
Mean of log(prec) -3.962B -4.036B -2.879A 34.27***
Mean Precision(SD)
0.020 (0.006)
0.020 (0.009)
0.068 (0.040)
N 24 24 24
Method 1 2 3 F
Mean of log(recall) -3.285B -4.099c -2.635A 26.80***
Mean Recall (SD)
0.044 (0.023)
0.019 (0.010)
0.093 (0.056)
N 24 24 24
<Table1. test for precision> ***: p<0.001
<Table2. test for recall> ***:p<0.001
Method 1 2 3 F
Mean of log(F-measure) -3.748B -4.117c -2.894A 41.31***
Mean F-measure (SD)
0.024 (0.006)
0.018 (0.008)
0.06 (0.034)
N 24 24 24
<Table2. test for F-measure> ***: p<0.001
Related Work
• MusicBox – A personalized music recommender system based on social tags
– 3-order tensors model
– The method improves the recommendation quality
• Foafing the music – Collecting music information in a semantic web environment
– User information, music information, concert information
– Recommendation of similar music items
• OntoEmotions – An ontology of emotional categories covering the basic emotions
– Armeteo art portal
– New relations can be inferred by reasoning on the ontology of emotions
Conclusions
• Solution to Cold Start Problem – It takes time to collect users’ listening habits.
– Adding tags is easily done
– Tags look like word-of-mouth
• Performance Enhancement – Precision, Recall, F-measure
– Hybrid approach > listening habits-based approach, tag-based approach
Future Work
• Elaborating UniEmotion Ontology – Emerging Internet Slangs
• Item Selection – Product Network Analysis Considering Tags
– Analyzing short description
top related