semantic search on the public web with creative commons
DESCRIPTION
TRANSCRIPT
Semantic Search on the Public Web with Semantic Search on the Public Web with Creative CommonsCreative Commons
2006.03.072006.03.07
Mike LinksvayerMike Linksvayer
Billion$ (0)Billion$ (0)
Let's get the hype out of Let's get the hype out of the way....the way....
Billion$ (1)Billion$ (1)
Let's get the hype out of Let's get the hype out of the way....the way....
Billion$ (2)Billion$ (2)
Let's get the hype out of Let's get the hype out of the way....the way....
Billion$ (3)Billion$ (3)
This calls for a mashup...This calls for a mashup...
Billion$ (4)Billion$ (4)
Billion$ (5)Billion$ (5)
Fortunately CC's Fortunately CC's founders thought of founders thought of that from the that from the beginning...beginning...
Billion$ (6)Billion$ (6)
Billion$ (7)Billion$ (7)
About Creative About Creative CommonsCommons
Core Licensing Suite: Creator/Licensor chooses license options
NonCommercial
No Derivatives
ShareAlike
Every Creative Commons licenses allows the world to copy and distribute a work provided that the licensee credits the creator/licensor
In addition creator/licensor may apply the following conditions:
Simple License Generator
Internet Archive
Free Hosting for CC works
http://www.archive.org/
Creative Commons Creative Commons MetadataMetadata
Creative Commons Metadata Creative Commons Metadata ExampleExample
<rdf:RDF xmlns="http://web.resource.org/cc/"<rdf:RDF xmlns="http://web.resource.org/cc/" xmlns:dc="http://purl.org/dc/elements/1.1/"xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<Work rdf:about="http://example.com/article.html"><Work rdf:about="http://example.com/article.html"> <dc:title>An Example Article</dc:title><dc:title>An Example Article</dc:title> <dc:date>2003-10-01</dc:date><dc:date>2003-10-01</dc:date> <dc:type rdf:resource="http://purl.org/dc/dcmitype/Text" /><dc:type rdf:resource="http://purl.org/dc/dcmitype/Text" />
<license rdf:resource="http://creativecommons.org/licenses/by-<license rdf:resource="http://creativecommons.org/licenses/by-nc-sa/2.5/" />nc-sa/2.5/" />
</Work></Work>
<License rdf:about="http://creativecommons.org/licenses/by-nc-sa/2.5/"><License rdf:about="http://creativecommons.org/licenses/by-nc-sa/2.5/"> <permits rdf:resource="http://web.resource.org/cc/Reproduction" /><permits rdf:resource="http://web.resource.org/cc/Reproduction" /> <permits rdf:resource="http://web.resource.org/cc/Distribution" /><permits rdf:resource="http://web.resource.org/cc/Distribution" /> <requires rdf:resource="http://web.resource.org/cc/Notice" /><requires rdf:resource="http://web.resource.org/cc/Notice" /> <requires rdf:resource="http://web.resource.org/cc/Attribution" /><requires rdf:resource="http://web.resource.org/cc/Attribution" /> <prohibits rdf:resource="http://web.resource.org/cc/CommercialUse" /><prohibits rdf:resource="http://web.resource.org/cc/CommercialUse" /> <permits rdf:resource="http://web.resource.org/cc/DerivativeWorks" /><permits rdf:resource="http://web.resource.org/cc/DerivativeWorks" /> <requires rdf:resource="http://web.resource.org/cc/ShareAlike" /><requires rdf:resource="http://web.resource.org/cc/ShareAlike" /></License></License>
</rdf:RDF></rdf:RDF>
Rights Description Use CasesRights Description Use Cases
DiscoveryDiscovery
ExpressionExpression
CommerceCommerce
Management(1)Management(1)
Rights Description vs. Rights Rights Description vs. Rights Management(2)Management(2)
Copy/Use promotion vs. Copy/Use protectionCopy/Use promotion vs. Copy/Use protection
Encourage fans vs. Discourage casual piratesEncourage fans vs. Discourage casual pirates
Resource management vs. Customer Resource management vs. Customer managementmanagement
Web content model vs. 20Web content model vs. 20thth century content model century content model
Not mutually exclusive in theory.Not mutually exclusive in theory.
Why Semantic Web?Why Semantic Web?
Small organization, no central registration for Small organization, no central registration for every license every license
Decentralization: Let a thousand search engines Decentralization: Let a thousand search engines bloom; web as API bloom; web as API
Existing RDF tools could take advantage of CC Existing RDF tools could take advantage of CC RDFRDF
Why RDF-in-HTML comments? (yuck)Why RDF-in-HTML comments? (yuck)
Considered:Considered:• Robots.txt-likeRobots.txt-like• HTML meta tagsHTML meta tags• LINK to external RDF fileLINK to external RDF file
RDF-in-HTML comments wins becauseRDF-in-HTML comments wins because• Metadata colocated with human visible HTML, Metadata colocated with human visible HTML,
only single copy & paste for licensorsonly single copy & paste for licensors• Full power of RDFFull power of RDF
CC Search History ICC Search History I
Postgresql/tsearch2/python prototype (early 2004)Postgresql/tsearch2/python prototype (early 2004)
Sloooowwwww, but did what a prototype Sloooowwwww, but did what a prototype should doshould do
CC Search History IICC Search History II
CC-Nutch (late 2004)CC-Nutch (late 2004)
Nutch aims to be open source search engine Nutch aims to be open source search engine comparable to commercial web scale search comparable to commercial web scale search enginesengines
Built on top of Lucene full text indexBuilt on top of Lucene full text index
CC plugin only ~500 lines of code (not counting CC plugin only ~500 lines of code (not counting UI, CC-required additions to Nutch core)UI, CC-required additions to Nutch core)
http://search.creativecommons.org uses Nutch, uses Nutch, >1m CC-licensed pages indexed>1m CC-licensed pages indexed
CC Search History IIICC Search History III
Yahoo! Search for Creative Commons (early 2005)Yahoo! Search for Creative Commons (early 2005)
Search CC-licensed subset of Yahoo!’s index Search CC-licensed subset of Yahoo!’s index (~15m* pages)(~15m* pages)
*very rough guesstimate*very rough guesstimate
CC Search History IVCC Search History IV
Google CC search (November 2005)Google CC search (November 2005)
Search CC-licensed subset of Google’s index Search CC-licensed subset of Google’s index (~45m* pages)(~45m* pages)
*very rough guesstimate*very rough guesstimate
CC Search History V (the future)CC Search History V (the future)
Better metadata formatsBetter metadata formats
Image and Video searchImage and Video search
Derivatives searchDerivatives search
Content commerce searchContent commerce search
““Live” web searchLive” web search
““Management” (desktop, workgroup)Management” (desktop, workgroup)
Semantic mashupsSemantic mashups
Future CC metadata formatsFuture CC metadata formats
““Semantic XHTML” AKA “lowercase semantic web” Semantic XHTML” AKA “lowercase semantic web” AKA “microformats” (now)AKA “microformats” (now)
<a <a rel=“license”rel=“license” href=“ href=“http://creativecommons.org/licenses/by/2.5/”>”>
RDF/A AKA XHTML2 metadata (in working RDF/A AKA XHTML2 metadata (in working group)group)
GRDDL (gleaning resource descriptions from GRDDL (gleaning resource descriptions from dialects of languages)dialects of languages)
Image and Video searchImage and Video search
Better metadata formatsBetter metadata formats
Image and Video searchImage and Video search
Derivatives searchDerivatives search
Content commerce searchContent commerce search
““Live” web searchLive” web search
““Management” (desktop, workgroup)Management” (desktop, workgroup)
Semantic mashupsSemantic mashups
Searching for Derivative WorksSearching for Derivative Works
Creative Commons (0)Creative Commons (0)
Creative Commons (0)Creative Commons (0)
Creative Commons (0)Creative Commons (0)
Creative Commons (0)Creative Commons (0)
Derivatives searchDerivatives search
RDF/XML snippet:RDF/XML snippet:
<dc:source <dc:source rdf:resource=”http://ccmixter.org/media/files/vicrdf:resource=”http://ccmixter.org/media/files/victor/3385”/>tor/3385”/>
Query like Yahoo! link: search or Technorati Query like Yahoo! link: search or Technorati Cosmos searchCosmos search
source:http://ccmixter.org/media/files/victor/338source:http://ccmixter.org/media/files/victor/33855
““Who sampled this” as the new “who linked to Who sampled this” as the new “who linked to this”this”
Content commerce searchContent commerce search
Transaction costs should be low even if rights are Transaction costs should be low even if rights are reservedreserved
Commercial terms and other commerce described Commercial terms and other commerce described by metadata associated with a workby metadata associated with a work
Find me work I can use at a price I can pay forFind me work I can use at a price I can pay for usage rightsusage rights warranty/paper trail (even if rights not warranty/paper trail (even if rights not
reserved)reserved)
Reintermediate consumer and creatorReintermediate consumer and creator
““Live” web search (feeds)Live” web search (feeds)
Feeds are explicitly metadata-rich (unlike typical Feeds are explicitly metadata-rich (unlike typical web page)web page)
Existing blog search ignores metadataExisting blog search ignores metadata
Web search will become more like blog search, Web search will become more like blog search, vice versa?vice versa?
““Management” (desktop, workgroup)Management” (desktop, workgroup)
Desktop search (OS-level)Desktop search (OS-level)
Content creation and media player integrationContent creation and media player integration
XMPXMP
Semantic WikisSemantic Wikis
Semantic mashupsSemantic mashups
Issues for Semantic Search on the Issues for Semantic Search on the Public WebPublic Web
Metadata qualityMetadata quality
TrustTrust
ScalabilityScalability
UsabilityUsability
CompatibilityCompatibility
Critical massCritical mass
State of the art IR works very well – high State of the art IR works very well – high expectations!expectations!
Semantic Search on the Public Web with Creative Semantic Search on the Public Web with Creative CommonsCommons
2006.03.072006.03.07
Mike LinksvayerMike Linksvayer
Questions, feedback, flames:Questions, feedback, flames:
http://http://developer.creativecommons.orgdeveloper.creativecommons.org