social fabric of semantics - semtech 2010
DESCRIPTION
Vocabulary construction is critical to the success of semantic technologies. Can we learn from communities where practical vocabularies have emerged?TRANSCRIPT
The Social Fabric of SemanticsJamie Taylor, Ph.D.
http://rdf.freebase.com/ns/en.jamie_taylor
Explicit Semantics in Surprising Places
microformats
HTML5 MicroData
Open Graph Protocol
RDFa
We have overlooked the human “stack”
The Crisis of Vocabulary
Much formal analysis of knowledge representation
Little guidance on what actually works
education
nationality
contained-by
education
member-of
eventalbums
label
contained-by
contains
member-of
The arrangement of entities in a graph is not predetermined by a higher being
containscontained-by
eventmember-ofnationalityeducationalbums
Vocabulary is a social process
Semantics: To communicate meaning, resulting in an action
Or at least so Blue Guy can write code that responds to the graph in a way consistent with Red Guy's expectations
Vocabulary
"All the types of things you can say about something"
http://rdf.freebase.com/ns/en.paul_david_hewson
Alison Hewson
EDUNMount Temple
Comprehensive School
May 10, 1960
U2
Million Dollar Hotel
End of Violence
Elevation Partners
Show 8
Dublin
spouse
date of birth
founder
performer
educ
ation
founder
producerperformer
born in
mem
ber o
f
Semantics are in the Links
Alison Hewson
EDUNMount Temple
Comprehensive School
May 10, 1960
U2
Million Dollar Hotel
End of Violence
Elevation Partners
Show 8
Dublin
spouse
date of birth
founder
performer
educ
ation
founder
producerperformer
born in
mem
ber o
f
Semantics are in the Links
Do you understand the words that are coming out of
my mouth?
The Twitter Vocabulary
@
#
Short URLs
Pivot on @
Pivot on Short URL
Pivot on #
#
Broadcast: U(n) = n
Telephone:
Metcalfe's Law
U(n) = n2
Group Network Formation:
Reed's Law
U(n) = 2n
Reed's Law
N
Value N
N^22^N
N
Value
N N2
2N
Broadcast Email Chatrooms
N
Value
N N2
2N
Tweets #tagsFeeds
#tags are a USER invention!
N
Value
N N2
2N
Folksonomy ???Ontology
Twannotations
Tweets have "type"
Name/Value Structure
What's the vocabulary?
•Anything you want
•Lead by example
Vocabulary and VisibilityPros: Feedback, Incentive, Training, Convergence
Vocabulary and VisibilityCons: Usage for side effects
Lessons from everyday vocabulary
Wikipedia Word Frequency
0
2000000
4000000
6000000
8000000
10000000
12000000
14000000
16000000
18000000
20000000
0 20 40 60 80 100 120
Rank
Freq
uen
cy
Data from Victor S. Grishchenko
Zipf’s Law
!Plot by Victor Grishchenko
Zipf’s Explanation
Law of Least Effort:
Use a few common words to communicate main concept
Use a few rare words to disambiguate concepts
Satisficing
535,393 Categories
2k French Films
17 films
Schema Principle #1
Use Types Liberally:
Use a few large, encompassing Types to provide general information
Use several smaller, fine grained Types to provide detailed information
The Freebase Commons·American football ·Internet·Anime/Manga ·Language·Architecture ·Law·Astronomy ·Library·Automotive ·Location·Aviation ·Martial Arts·Awards ·Measurement Unit·Baseball ·Media Common·Basketball ·Medicine·Bicycles ·Metaweb Types·Biology ·Meteorology·Boats ·Military·Broadcast ·Music·Business ·Olympics·Celebrities ·Opera·Chemistry ·Organization·Comics ·People·Common ·Geography·Computers ·Projects·Conferences ·Protected Places·Cricket ·Publishing·Data World ·Radio·Digicams ·Rail·Education ·Religion·Engineering ·Royalty·Event ·Soccer·Clothing and Textiles ·Spaceflight·Fictional Universes ·Sports·Film ·Symbols·Food & Drink ·Tennis·Freebase ·Theater·Games ·Time·Geology ·Transportation·Government ·Travel·Hobbies and Interests ·TV·Ice Hockey ·Video Games·Influence ·Visual Art
Top-level domains
schema = vocabulary
Ontologies you design will be too complicated because almost all people will use a small subset of it
Ontologies you design will be too simple because there will be a long tail of users who will want to express something you didn’t cover
--Colin Evans (Metaweb)
Solution:
• Provide a core
• Let the community tune the specifics to their needs
What is a Politician?
Schema Principle #2
Avoid Types which "carve out" categories of things
"Original TV Program"
• Is a TV Program
• Isn't an adaptation of a film
• Isn't an adaptation of a book
• Isn't an adaptation of a play
• Wasn't spun off from another TV Program
• Hasn't spun off any other TV Programs
"Original TV Program"[{
"name": null, "type": "/tv/tv_program", "b:type": { "id": "/media_common/adaptation", "optional": "forbidden" }, "spun_off_from": [{ "id": null, "optional": "forbidden" }], "spin_offs": [{ "id": null, "optional": "forbidden" }]}]
Show as Two Viewsnot a MQL query
Principle #2 Corollary
Strive for bright lines between Types• Let queries and simple types do the work
• Better, easier to maintain data quality
What are you sitting on?
Chair
Furniture
Folding Chair
Natural Category
Added Features?
What does one look like?
Eleanor Rosch
HTML5 MicroData
Open Graph Protocol
#
Addendum
Social Network Analysis Resources
Wikipedia
Jon Kleinberg
http://www.cs.cornell.edu/home/kleinber
Kwak et al. WWW2010
http://an.kaist.ac.kr/traces/WWW2010.html
Modeling ResourcesMcGuinness & Noy's Ontologies 101
Attend when possible!
http://ksl.stanford.edu/people/dlm/papers/ontology101
Toward Principles for the Design of Ontologies Used for Knowledge Sharing
http://tomgruber.org/writing/onto-design.htm
Allemang & Hendler
Semantic Web for the Working Ontologist