modelling users’ profiles and interests based on cross-folksonomy analysis @ ht2009

Post on 28-Aug-2014

586 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Invited talk at the ACM Hypertext Conference 2009, Turin, Italy

TRANSCRIPT

TAGora: Semiotic Dynamics of Online Social Communities EU-IST-2006-034721

Modelling Users’ Profiles and Interests based on

Cross-Folksonomy Analysis

Martin SzomszorUniversity of Southampton

Outline• Introduction and Motivation

– Why is your folksonomy interaction useful?– How could it be exploited?

• Making Sense of Folksonomies– Distributed Contact Networks– Tag Filtering / Tag Senses

• Profiles of Interests• Future Work

– Disambiguation– Building Better Profiles of Interests

Introduction

delicious.comhttp://slashdot.org/

http://news.bbc.co.uk/

Dream Theater

Metallica

Rush

Increasing number ofonline identities

• Recent Ofcom study found that UK adults have on average 1.6 profiles. 39% of those that have one profile have at least 2– [Ofcom 2008] Social Networking: A quantative and qualitative

research report into attitudes, behaviours, and use.

• In the future, people will maintain an increasing number of online identities to meet different information sharing tasks and to connect with different communities

delicious.com

Tag Clouds

Profile of Interests

The Big Picturedelicious.com

delicious.com

Profiles could be exported to other sites to improve recommendation quality

Profile of

Interests

Personalisation

Profiles could be used to support

personalised searching

Better user experience

Consolidation and Integration

currency

travel

hotels

cuba

http://dbpedia.org/resource/Cuba

cuba

holiday

2008

http://dbpedia.org/resource/Travel

http://dbpedia.org/resource/Holiday

http://dbpedia.org/resource/Category:Tourism

Tagging Variation

[1] Szomszor, M., Cantador, I. and Alani, H. (2008). Correlating User Profiles from Multiple Folksonomies. In: ACM Conference on Hypertext and Hypermedia, 2008 , Pittsburgh, Pennsylvania.

Raw Tags

Filtered Tags

Disconnected Identities

fan of

contact friend

friend

#me

Delicious Last.fm Flickr Facebook

Identity Integration Tag Integration

Tagging Semantics

FOAF DBpedia + Wordnet

Making Sense of Folksonomies

Delicious Last.fm

Identity Integration Tag Integration

Tagging Semantics

FOAF DBpedia + Wordnet

1. Contact Integration

Flickr Facebook

SNS Contact Integration

#me

Consolidated Contact View

• Recommend new connections

http://tagora.ecs.soton.ac.uk/delicious/martinszomszor

http://tagora.ecs.soton.ac.uk/flickr/7214044@N08

http://tagora.ecs.soton.ac.uk/lastfm/mszomszor

http://tagora.ecs.soton.ac.uk/facebook/613077109

http://tagora.ecs.soton.ac.uk/LiveSocialSemantics/ht2009/foaf/4

<owl#sameAs> <http://tagora.ecs.soton.ac.uk/facebook/613077109> <http://tagora.ecs.soton.ac.uk/schemas/facebook#hasFriend> <http://tagora.ecs.soton.ac.uk/facebook/1006466985>, <http://tagora.ecs.soton.ac.uk/facebook/684541156>, … <http://tagora.ecs.soton.ac.uk/facebook/1043367866>;

FOAF Representation of SNS Accounts

Delicious Last.fm Flickr Facebook

Identity Integration Tag Integration

Tagging Semantics

FOAF DBpedia + Wordnet

2. Tag Integration

Folksonomy IntegrationTag Heterogeneity

Web2.0 Web_2.0!=

Folksonomy Integration:Tag Heterogeneity

Web2.0 Web_2.0

isFilteredTo

Tag Filtering• Find canonical form for each tag:

– Use Dbpedia entry labels as reference• compound terms separated by _

– second-life, second+life, second.life -> second_life

• concatenated / camel case terms are expanded– secondlife, SecondLife -> second_life

• International Characters Normalised:– Caf%C3%A9 -> Cafe

• Recommend Spelling Corrections– resaerch -> didYouMean research

• Follow unambiguous redirections:– Humor, Funny -> Humour

http://tagora.ecs.soton.ac.uk/schemas/tagging#

http://www.w3.org/2001/XMLSchema#

(f) = functional property

property subclass

hasUserFrequency

hasGlobalFrequency

hasDomainFrequency

rdfs:labelhasCooccurrenceInfo

hasCooccurrenceFrequency

cooccurringTag

hasPost taggedResource

isFilteredTo

hasNextSegment (f)

hasTagSequence (f)

tagUsed (f)

taggedOn

xsd:integer

xsd:integer

xsd:integer

xsd:string

xsd:integer

xsd:datetime

hasGlobalTag

hasDomainTag

UserTag

DomainTag

GlobalTag

usesTag

Tag

Tagger

PostResource

TagSegment

FinalTagSegment

CooccurrencInfo

Linked Data View

Linked Data View

Linked Data View

Linked Data View

Finding Syntactic Variationssparql$ select ?x where {?x <http://tagora.ecs.soton.ac.uk/schemas/tagging#isFilteredTo> <http://tagora.ecs.soton.ac.uk/tag/web_2.0>}┌─────────────────────────────────────────────┐│ ?x │├─────────────────────────────────────────────┤│ <http://tagora.ecs.soton.ac.uk/tag/web2.0> ││ <http://tagora.ecs.soton.ac.uk/tag/web2> ││ <http://tagora.ecs.soton.ac.uk/tag/web_2.0> ││ <http://tagora.ecs.soton.ac.uk/tag/web_20> ││ <http://tagora.ecs.soton.ac.uk/tag/web20> │└─────────────────────────────────────────────┘sparql$ select * where {?x <http://tagora.ecs.soton.ac.uk/schemas/tagging#isFilteredTo> <http://tagora.ecs.soton.ac.uk/tag/second_life>}┌───────────────────────────────────────────────────┐│ ?x │├───────────────────────────────────────────────────┤│ <http://tagora.ecs.soton.ac.uk/tag/second_Life> ││ <http://tagora.ecs.soton.ac.uk/tag/second.life> ││ <http://tagora.ecs.soton.ac.uk/tag/SecondLife> ││ <http://tagora.ecs.soton.ac.uk/tag/Second_Life> ││ <http://tagora.ecs.soton.ac.uk/tag/second%20life> ││ <http://tagora.ecs.soton.ac.uk/tag/SECOND_LIFE> ││ <http://tagora.ecs.soton.ac.uk/tag/second_life> ││ <http://tagora.ecs.soton.ac.uk/tag/secondlife> │└───────────────────────────────────────────────────┘

Tag Senses• What are the possible meanings for a tag?• We use two reference sets:

– DBPedia• Concepts

– Wordnet• Synsets

http://tagora.ecs.soton.ac.uk/schemas/tagging#

http://www.w3.org/2001/XMLSchema#

(f) = functional property

property subclass

http://tagora.ecs.soton.ac.uk/schemas/dbpedia#

http://tagora.ecs.soton.ac.uk/schemas/disambiguation#senseWeight

dbpediaSense

hasDbpediaSenseInfo

didYouMean

Resource

DbpediaSenseInfo

xsd:float

http://www.w3.org/2006/03/wn/wn20/schema/

hasWordnetSense

WordSenseTag

Disambiguation Ontology

DBpedia Extraction• Extract triples from XML dump

– Calculate normalised title string• Caf%C3%A9 -> cafe

– Calculate concatenated title string• Second_life -> secondlife

– Extract disambiguation term from title• Orange_(fruit)

– Identify compound labels• Second_Life -> Second, Life

DBpedia Extraction

• Number of incoming links• Extract page redirects• Extract Disambiguation Links

– Find Primary disambiguation (e.g. Apple)

DBpedia Extraction

• Parse wiki text and extract terms:– Terms filtered using stop words (with some wiki

specific additions)– Store term frequencies– Store number of distinct terms in page– Store total term frequency

• Can associate a vector of terms and weights to each possible sense

FinalCompoundLabelSequence

hasCompoundLabelSequence (f)

hasNextLabelSequence (f)

hasCompoundLabel (f)

isa

hasLabel

hasNormalisedLabel

hasConcatenatedLabel

hasDisambiguationTerm

hasTermFrequencyPair

hasTerm

hasTermFrequency

hasDisambiguation

hasPrimaryDisambiguation

hasTotalTermFrequencyhasTotalTerms

CompoundLabelSequence

Resource

xsd:integerxsd:integer

xsd:integer

xsd:string

xsd:string

xsd:string

xsd:string

xsd:string

xsd:string

TermFrequencyPair

Profiles of Interests[2] Szomszor, M., Alani, H., Cantador, I., O'Hara, K. and Shadbolt, N. (2008) Semantic Modelling of User Interests based on Cross-Folksonomy Analysis. In: 7th International Semantic Web Conference (ISWC), October 26th - 30th, Karlsruhe, Germany.

Global Category View• What are the differences in the interests

that are learnt from each domain?

Delicious FlickrWikipedia Category Total Freq Wikipedia Category Total Freq

Design 69,215 Travel 51,674

Blogs 68,319 Australia 51,617

Music 45,063 London 46,623

Photography 41,356 Festivals 42,504

Tools 35,795 Music 40,943

Video 34,318 Cats 38,230

Arts 29,966 Holidays 37,610

Software 28,746 Family 37,100

Maps 26,912 Japan 36,513

Teaching 22,120 Concerts 35,374

Games 21,549 Surnames 34,947

How-to 19,533 Washington 33,924

Technology 18,032 Given Names 32,843

News 17,737 Dogs 32,206

Humor 15,816 Birthdays 22,290

Future Work• Given a set of possible senses, how can

we choose the best match?• Folksonomy data can provide contextual

information:– User tag-cloud– Cooccurrence Network– User Cooccurrence Network

• Can abstract this information as a vector of terms and weights (context)

Disambiguating Flickr Images

Building Better Profiles• What tags correspond to interests?

– Locations and topics are useful, but other terms are not

• TF / IDF Approach– It’s not that useful to find out we are all

interested in HTML• Making use of the Category hierarchy

– If I’m interested in Facebook, Flickr, Last.fm, Delicious, etc, I can extrapolate the interest Online_Social_Networks

http://tagora.ecs.soton.ac.uk/tag/apple

http://tagora.ecs.soton.ac.uk/dbpedia/resource/Apple_Inc.

http://tagora.ecs.soton.ac.uk/tag/apple/sense-info/0

0.30628910807

_:b9510f00000000a5 “mac”35dbpedia:hasTermFrequency

dbpedia:hasTerm

dbpedia:hasTermFrequencyPair

dbpedia:hasDbpediaSenseInfo

dbpedia:sensedbpedia:senseWei

ght

http://tagora.ecs.soton.ac.uk/dbpedia/resource/Apple

http://tagora.ecs.soton.ac.uk/tag/apple/sense-info/1

0.248912928

_:b9510f00000000a5 “fruit”41dbpedia:hasTermFrequency

dbpedia:hasTerm

dbpedia:hasTermFrequencyPair

dbpedia:sensedbpedia:senseWei

ght

owl:sameAs

owl:sameAs

top related