modelling users’ profiles and interests based on cross-folksonomy analysis @ ht2009
Post on 28-Aug-2014
586 Views
Preview:
DESCRIPTION
TRANSCRIPT
TAGora: Semiotic Dynamics of Online Social Communities EU-IST-2006-034721
Modelling Users’ Profiles and Interests based on
Cross-Folksonomy Analysis
Martin SzomszorUniversity of Southampton
Outline• Introduction and Motivation
– Why is your folksonomy interaction useful?– How could it be exploited?
• Making Sense of Folksonomies– Distributed Contact Networks– Tag Filtering / Tag Senses
• Profiles of Interests• Future Work
– Disambiguation– Building Better Profiles of Interests
Introduction
delicious.comhttp://slashdot.org/
http://news.bbc.co.uk/
Dream Theater
Metallica
Rush
Increasing number ofonline identities
• Recent Ofcom study found that UK adults have on average 1.6 profiles. 39% of those that have one profile have at least 2– [Ofcom 2008] Social Networking: A quantative and qualitative
research report into attitudes, behaviours, and use.
• In the future, people will maintain an increasing number of online identities to meet different information sharing tasks and to connect with different communities
delicious.com
Tag Clouds
Profile of Interests
The Big Picturedelicious.com
delicious.com
Profiles could be exported to other sites to improve recommendation quality
Profile of
Interests
Personalisation
Profiles could be used to support
personalised searching
Better user experience
Consolidation and Integration
currency
travel
hotels
cuba
http://dbpedia.org/resource/Cuba
cuba
holiday
2008
http://dbpedia.org/resource/Travel
http://dbpedia.org/resource/Holiday
http://dbpedia.org/resource/Category:Tourism
Tagging Variation
[1] Szomszor, M., Cantador, I. and Alani, H. (2008). Correlating User Profiles from Multiple Folksonomies. In: ACM Conference on Hypertext and Hypermedia, 2008 , Pittsburgh, Pennsylvania.
Raw Tags
Filtered Tags
Disconnected Identities
fan of
contact friend
friend
#me
Delicious Last.fm Flickr Facebook
Identity Integration Tag Integration
Tagging Semantics
…
FOAF DBpedia + Wordnet
Making Sense of Folksonomies
Delicious Last.fm
Identity Integration Tag Integration
Tagging Semantics
…
FOAF DBpedia + Wordnet
1. Contact Integration
Flickr Facebook
SNS Contact Integration
#me
Consolidated Contact View
• Recommend new connections
http://tagora.ecs.soton.ac.uk/delicious/martinszomszor
http://tagora.ecs.soton.ac.uk/flickr/7214044@N08
http://tagora.ecs.soton.ac.uk/lastfm/mszomszor
http://tagora.ecs.soton.ac.uk/facebook/613077109
http://tagora.ecs.soton.ac.uk/LiveSocialSemantics/ht2009/foaf/4
<owl#sameAs> <http://tagora.ecs.soton.ac.uk/facebook/613077109> <http://tagora.ecs.soton.ac.uk/schemas/facebook#hasFriend> <http://tagora.ecs.soton.ac.uk/facebook/1006466985>, <http://tagora.ecs.soton.ac.uk/facebook/684541156>, … <http://tagora.ecs.soton.ac.uk/facebook/1043367866>;
FOAF Representation of SNS Accounts
Delicious Last.fm Flickr Facebook
Identity Integration Tag Integration
Tagging Semantics
…
FOAF DBpedia + Wordnet
2. Tag Integration
Folksonomy IntegrationTag Heterogeneity
Web2.0 Web_2.0!=
Folksonomy Integration:Tag Heterogeneity
Web2.0 Web_2.0
isFilteredTo
Tag Filtering• Find canonical form for each tag:
– Use Dbpedia entry labels as reference• compound terms separated by _
– second-life, second+life, second.life -> second_life
• concatenated / camel case terms are expanded– secondlife, SecondLife -> second_life
• International Characters Normalised:– Caf%C3%A9 -> Cafe
• Recommend Spelling Corrections– resaerch -> didYouMean research
• Follow unambiguous redirections:– Humor, Funny -> Humour
http://tagora.ecs.soton.ac.uk/schemas/tagging#
http://www.w3.org/2001/XMLSchema#
(f) = functional property
property subclass
hasUserFrequency
hasGlobalFrequency
hasDomainFrequency
rdfs:labelhasCooccurrenceInfo
hasCooccurrenceFrequency
cooccurringTag
hasPost taggedResource
isFilteredTo
hasNextSegment (f)
hasTagSequence (f)
tagUsed (f)
taggedOn
xsd:integer
xsd:integer
xsd:integer
xsd:string
xsd:integer
xsd:datetime
hasGlobalTag
hasDomainTag
UserTag
DomainTag
GlobalTag
usesTag
Tag
Tagger
PostResource
TagSegment
FinalTagSegment
CooccurrencInfo
Linked Data View
Linked Data View
Linked Data View
Linked Data View
Finding Syntactic Variationssparql$ select ?x where {?x <http://tagora.ecs.soton.ac.uk/schemas/tagging#isFilteredTo> <http://tagora.ecs.soton.ac.uk/tag/web_2.0>}┌─────────────────────────────────────────────┐│ ?x │├─────────────────────────────────────────────┤│ <http://tagora.ecs.soton.ac.uk/tag/web2.0> ││ <http://tagora.ecs.soton.ac.uk/tag/web2> ││ <http://tagora.ecs.soton.ac.uk/tag/web_2.0> ││ <http://tagora.ecs.soton.ac.uk/tag/web_20> ││ <http://tagora.ecs.soton.ac.uk/tag/web20> │└─────────────────────────────────────────────┘sparql$ select * where {?x <http://tagora.ecs.soton.ac.uk/schemas/tagging#isFilteredTo> <http://tagora.ecs.soton.ac.uk/tag/second_life>}┌───────────────────────────────────────────────────┐│ ?x │├───────────────────────────────────────────────────┤│ <http://tagora.ecs.soton.ac.uk/tag/second_Life> ││ <http://tagora.ecs.soton.ac.uk/tag/second.life> ││ <http://tagora.ecs.soton.ac.uk/tag/SecondLife> ││ <http://tagora.ecs.soton.ac.uk/tag/Second_Life> ││ <http://tagora.ecs.soton.ac.uk/tag/second%20life> ││ <http://tagora.ecs.soton.ac.uk/tag/SECOND_LIFE> ││ <http://tagora.ecs.soton.ac.uk/tag/second_life> ││ <http://tagora.ecs.soton.ac.uk/tag/secondlife> │└───────────────────────────────────────────────────┘
Tag Senses• What are the possible meanings for a tag?• We use two reference sets:
– DBPedia• Concepts
– Wordnet• Synsets
http://tagora.ecs.soton.ac.uk/schemas/tagging#
http://www.w3.org/2001/XMLSchema#
(f) = functional property
property subclass
http://tagora.ecs.soton.ac.uk/schemas/dbpedia#
http://tagora.ecs.soton.ac.uk/schemas/disambiguation#senseWeight
dbpediaSense
hasDbpediaSenseInfo
didYouMean
Resource
DbpediaSenseInfo
xsd:float
http://www.w3.org/2006/03/wn/wn20/schema/
hasWordnetSense
WordSenseTag
Disambiguation Ontology
DBpedia Extraction• Extract triples from XML dump
– Calculate normalised title string• Caf%C3%A9 -> cafe
– Calculate concatenated title string• Second_life -> secondlife
– Extract disambiguation term from title• Orange_(fruit)
– Identify compound labels• Second_Life -> Second, Life
DBpedia Extraction
• Number of incoming links• Extract page redirects• Extract Disambiguation Links
– Find Primary disambiguation (e.g. Apple)
DBpedia Extraction
• Parse wiki text and extract terms:– Terms filtered using stop words (with some wiki
specific additions)– Store term frequencies– Store number of distinct terms in page– Store total term frequency
• Can associate a vector of terms and weights to each possible sense
FinalCompoundLabelSequence
hasCompoundLabelSequence (f)
hasNextLabelSequence (f)
hasCompoundLabel (f)
isa
hasLabel
hasNormalisedLabel
hasConcatenatedLabel
hasDisambiguationTerm
hasTermFrequencyPair
hasTerm
hasTermFrequency
hasDisambiguation
hasPrimaryDisambiguation
hasTotalTermFrequencyhasTotalTerms
CompoundLabelSequence
Resource
xsd:integerxsd:integer
xsd:integer
xsd:string
xsd:string
xsd:string
xsd:string
xsd:string
xsd:string
TermFrequencyPair
Profiles of Interests[2] Szomszor, M., Alani, H., Cantador, I., O'Hara, K. and Shadbolt, N. (2008) Semantic Modelling of User Interests based on Cross-Folksonomy Analysis. In: 7th International Semantic Web Conference (ISWC), October 26th - 30th, Karlsruhe, Germany.
Global Category View• What are the differences in the interests
that are learnt from each domain?
Delicious FlickrWikipedia Category Total Freq Wikipedia Category Total Freq
Design 69,215 Travel 51,674
Blogs 68,319 Australia 51,617
Music 45,063 London 46,623
Photography 41,356 Festivals 42,504
Tools 35,795 Music 40,943
Video 34,318 Cats 38,230
Arts 29,966 Holidays 37,610
Software 28,746 Family 37,100
Maps 26,912 Japan 36,513
Teaching 22,120 Concerts 35,374
Games 21,549 Surnames 34,947
How-to 19,533 Washington 33,924
Technology 18,032 Given Names 32,843
News 17,737 Dogs 32,206
Humor 15,816 Birthdays 22,290
Future Work• Given a set of possible senses, how can
we choose the best match?• Folksonomy data can provide contextual
information:– User tag-cloud– Cooccurrence Network– User Cooccurrence Network
• Can abstract this information as a vector of terms and weights (context)
Disambiguating Flickr Images
Building Better Profiles• What tags correspond to interests?
– Locations and topics are useful, but other terms are not
• TF / IDF Approach– It’s not that useful to find out we are all
interested in HTML• Making use of the Category hierarchy
– If I’m interested in Facebook, Flickr, Last.fm, Delicious, etc, I can extrapolate the interest Online_Social_Networks
http://tagora.ecs.soton.ac.uk/tag/apple
http://tagora.ecs.soton.ac.uk/dbpedia/resource/Apple_Inc.
http://tagora.ecs.soton.ac.uk/tag/apple/sense-info/0
0.30628910807
_:b9510f00000000a5 “mac”35dbpedia:hasTermFrequency
dbpedia:hasTerm
dbpedia:hasTermFrequencyPair
dbpedia:hasDbpediaSenseInfo
dbpedia:sensedbpedia:senseWei
ght
http://tagora.ecs.soton.ac.uk/dbpedia/resource/Apple
http://tagora.ecs.soton.ac.uk/tag/apple/sense-info/1
0.248912928
_:b9510f00000000a5 “fruit”41dbpedia:hasTermFrequency
dbpedia:hasTerm
dbpedia:hasTermFrequencyPair
dbpedia:sensedbpedia:senseWei
ght
owl:sameAs
owl:sameAs
top related