free your data: instant gratification with the semantic web david karger
TRANSCRIPT
![Page 1: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/1.jpg)
Free your Data: Instant Gratification with the
Semantic Web
David Karger
![Page 2: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/2.jpg)
Why everyone should be their own database administrator,
UI designer, application developer, and
web site builder, and how they can
David Karger
![Page 3: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/3.jpg)
A Semantic Web Vision
• Autonomous computational agents perform sophisticated information tasks on behalf of their human users
• Use data that is annotated with rich semantics– Ontologies that explain precisely what the data means
– Schema annotations that explain how to align multiple ontologies
– Rules that explain how new data can be formally derived from existing
– Inference systems that put it all together
– Lots of logicians and AI researchers developing tools
• This vision is frightening– Involves solving problems that have bedeviled AI for decades
– Often used to attack the semantic web
– Or to argue to slow down deployment
* “we can’t put up that data until we have an ontology!”
![Page 4: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/4.jpg)
Aim Lower: the Semimantic Web
• Not “make computers help” but “make them not hinder”– “First, do no harm”
• Create a tiny bit of structure:– Name objects (with URLs)
– Record named relations between them
– No semantics on relations
– No schemas
– No inference
• This is both– Technically simple
– Immediately useful
• You should do it– And you can right now
![Page 5: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/5.jpg)
Why Applications?
• Typical user tasks require interaction with multiple pieces of information– Display
– Explore
– Query
– Manipulate
• Applications bring together the data, specialized views, and operations necessary to perform tasks
![Page 6: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/6.jpg)
![Page 7: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/7.jpg)
![Page 8: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/8.jpg)
• Irrelevant info– Distracting
– Covers up more important info
• Artist– Of dance, not music
– ID3v2 added “Composer”
– shown in wrong place
• No “difficulty” field– Place in comment field
– Uses field up
– Where put “tempo”?
• Menu of genre choices– My genre (of dance, not music) missing
– ID3v2 lets user add
![Page 9: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/9.jpg)
Summary of Problems
• Application has fixed idea of “right” data– Both properties and values for them
• And right way to display that data
• User wants to “stretch” the app to their needs– Cannot hide irrelevant data
– Cannot incorporate new kinds of data
– Cannot change how data is presented
• Perhaps just use generic comment field?– Add what you want
– Format how you want
![Page 10: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/10.jpg)
• Properties have structure– Used for layout
– And for browsing
![Page 11: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/11.jpg)
Sometimes, one application isn’t enough
• Applications inappropriately partition task– Because task wasn’t planned for in application design
• No application has all the necessary data, operations– Need to launch several to do task
• Each includes unneeded data, operations– Clutter distracts from what you need to see
• Can’t work with data “across” application boundaries– Can’t record or view data connections
– Have to find it again in second application
– Or enter it manually a second time
* Type budget numbers on postits to move to other application
![Page 12: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/12.jpg)
![Page 13: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/13.jpg)
![Page 14: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/14.jpg)
![Page 15: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/15.jpg)
![Page 16: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/16.jpg)
![Page 17: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/17.jpg)
Why?
• Building applications is hard– Done by expert few for the many
– They determine which data, views, operations are useful
• Applications are “mass produced”– Everybody gets the same one
– And only build for large markets
– Word processor, email, photo album, …
• Problem: different people want different applications– Basket weaving. UFO sightings, junkyard management
– Want to work with unusual information
– Want to see, navigate, manipulate it “their way”
• Developers can’t afford to build these boutique applications
![Page 18: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/18.jpg)
What about the Web?
• Anything can get a URL
• Anything can go in a page, linked to anything– Common to “schematize on the fly”, making lists of interesting
properties/values
• Support for orienteering– Scan list of choices
– Pick the one that seems to lead in the right direction
– Fact: people orienteer even when there’s an easy query that is faster
– On web, never bounce off an application boundary
![Page 19: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/19.jpg)
Downside
• Hard to author– Especially if I want to record lots of complex data
• Hard to manipulate, do complex queries– HTML loses meaning of data
– Can’t “switch to tabular view”
• That’s why web sites are backed by databases– Data is kept structured to support complex queries
– Templating engines convert to human readable presentation
• End users aren’t going to manage this kind of web site
• Gives powerful operations, but only “inside” web site– User may discover need to cross site boundaries
– Like applications, web sites create (possibly wrong) data partitions
– So all the problems with applications apply here too
![Page 20: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/20.jpg)
![Page 21: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/21.jpg)
![Page 22: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/22.jpg)
![Page 23: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/23.jpg)
![Page 24: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/24.jpg)
Not just music
• Scientific research generates masses of data– E.g. Bioinformatics
• Others want to access that data
• Big standards bodies meet to decide on community standard formats and systems under which everyone will distribute data
• When scientist wants to try or report something new, or needs data from outside the community, stuck.
![Page 25: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/25.jpg)
Information Wants to be Free
• Applications and Web Sites make assumptions about how their data will be used
• Those assumptions are hard-coded into the interaction with the data
• But no developer can predict all uses of the data
• Fixed interfaces prevent data repurposing
• Solution: give direct access to the data
• Just set up a SQL server?– (A long-running screed of the DB community)
![Page 26: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/26.jpg)
But it Can’t be Just about the Data
• People need to look at the data– (unless we figure out those autonomous agents…)
• And need to create it in the first place
• Apps and template-driven web sets give us nice interfaces for interacting with the data they manage
• But if we use them we can’t repurpose the data
• And what interface can we use for the repurposed data?
• Web needed a server (of data) and a client (to show it)
• How make viewing, authoring and repurposing arbitrary data as easy as viewing and authoring web pages?– Without knowing precisely what data people will want to view or how
they will want to view it?
![Page 27: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/27.jpg)
Example: Piggy Bank
• I need data from more than one web site
• And I need to look at it differently than any web site
• What is minimum necessary support?
• Piggy Bank: A firefox plugin for navigating structured data
![Page 28: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/28.jpg)
• Find some movies
![Page 29: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/29.jpg)
• Free that data
![Page 30: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/30.jpg)
• Show it a different way
![Page 31: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/31.jpg)
• Combine it with other sources
![Page 32: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/32.jpg)
Mash Ups?
• Developer decides to integrate data from multiple sites
• Writes programmatic “scrapers” – reverse the web site’s templating process to recover data
• Combines resulting data structures
• Presents using their own template driven web site– Thus guilty of same sin as the one they are fighting
– I only get the mash-ups a programmer decides to create
• Piggy bank lets end users do their own mashing
![Page 33: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/33.jpg)
Data Model
![Page 34: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/34.jpg)
RDF
• W3C standard
• Minimum data model– URL for arbitrary objects
– Arbitrary named links between two objects
– No schemas
• Much like the web, except– URLs need not be web pages
– Machine readable “anchor text” in links
• Yet Powerful– Relations are natural/universal
– Represent a semantic network
Loew’s
Supermantitle
venue
Kendall Sq.
Movie type
location
8PMtime
Theater type
![Page 35: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/35.jpg)
Are we done?
• Is RDF the only answer?– SQL/Tuples, XML can represent same info
– So any would do
– And user shouldn’t have to know which we’ve chosen
– But RDF is easiest to create sloppily, incrementally
* So best suited to let enthusiasts create some
– And imposes fewest requirements to be “compatible”
• Is RDF the whole answer?– Still unclear how to interact with it
![Page 36: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/36.jpg)
Visualization
![Page 37: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/37.jpg)
Lenses
• If data is amorphous, monolithic UI won’t do– Can’t know in advance what kind of data we’ll need to display
– Or what user will want to do with that data
• Let each type come with “view prescription”– “To display a document, show its title, author, and abstract
– “To display a person, show his name and affiliation”
– Specifies properties to show, and “decoration” (fonts, layouts)
• After you get the data, assemble lenses to show it– (recursively)
• Lenses are described in RDF– So they can be collected, repurposed like any other data
![Page 38: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/38.jpg)
Fresnel
dsp:publicationLens rdf:type :Lens; :classLensDomain ow:Publication; :group gr:group; :purpose :defaultLens; :showProperties ( dc:description dc:identifier dc:creator dc:contributor dc:date dc:subject dc:type dc:publisher dc:rights ) . dsp:rightsFomat rdf:type :Format; :group gr:group; :propertyFormatDomain dc:rights; :propertyStyle "dspace-rights" .
![Page 39: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/39.jpg)
Benefits
• Data collected from anywhere can be viewed together– Each piece of data with its own lens
• Lenses are described, not programmed– Enthusiasts can write their own
– (especially if we give them wysiwyg tools)
– No need to build a template driven web site
– Just edit, publish some lenses
![Page 40: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/40.jpg)
Manipulation
![Page 41: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/41.jpg)
Application Development by End Users
• People want applications to manipulate their data
• But applications only manipulate developer’s data
• So let end users build their own
• Use lenses, but refract in both directions– Lenses describe how to map data to presentation
– Invert, interpret manipulation of presentation as manipulation of data
* (extend lenses to talk about click, drag, drop)
• Operations represented as web services– Internal and remote operations
– Receive RDF data and act on it
![Page 42: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/42.jpg)
![Page 43: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/43.jpg)
The Big Picture
![Page 44: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/44.jpg)
Sufficient for Nice Applications?
• Application design is impoverished– Divide up the screen
– Put an object in each piece
– Show properties of each object
– With pretty formatting
– Put operations in menus
– And add some toolbars to save time
• This application “vocabulary” is limited enough – to be described instead of programmed
– so it can be edited by end users
![Page 45: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/45.jpg)
Workspace Designer
• Editing mode for applications
• Define regions of screen– By splitting existing regions
• Resize Regions
• Specify content of each region– Object to be shown (drag and drop object)
– Lens to use to show object (menu of relevant lens)
– Operations to make available on object (drag operations)
![Page 46: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/46.jpg)
Writing a Brain Research Paper
![Page 47: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/47.jpg)
Adding “Things to Do” Region
![Page 48: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/48.jpg)
Revised Application
![Page 49: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/49.jpg)
Lens Designer
• Specify how a particular object can be shown
• Similar to workspace designer– Lens is “workspace” for viewed object
• Subdivide canvas
• Specify property to show in each region
• Specify lens for value of each property
![Page 50: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/50.jpg)
5050
Drug Discovery Dashboardhttp://www.w3.org/2005/04/swls/BioDash
Topic: GSK3beta Topic
Target: GSK3beta
Disease: DiabetesT2
Alt Dis: Alzheimers
Cmpd: SB44121
CE: DBP
Team: GSK3 Team
Person: John
Related Set
Path: WNT
![Page 51: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/51.jpg)
5151
•Lenses can aggregate, accentuate, or even analyze new result sets
• Behind the lens, the data can be persistently stored as RDF-OWL
• Correspondence does not need to mean “same descriptive object”, but may mean objects with identical references
Bridging Chemistry and Molecular Bridging Chemistry and Molecular BiologyBiology
![Page 52: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/52.jpg)
5252
Pathway PolymorphismsPathway Polymorphisms
•Merge directly onto pathway graph
•Identify targets with lowest chance of genetic variance
•Predict parts of pathways with highest functional variability
•Map genetic influence to potential pathway elements
•Select mechanisms of action that are minimally impacted by polymorphisms
Non-synonymous polymorphisms from db-SNP
![Page 53: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/53.jpg)
5353
Clinical DashboardClinical Dashboard• Gene Expression Data
• Additional relations and aspects can be defined additionally: Mendelian Index of Man
Diseased Tissue
Links to OMIM (RDF)
![Page 54: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/54.jpg)
5454
Bar View Lens for Gene Bar View Lens for Gene ExpressionExpression
![Page 55: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/55.jpg)
5555
ClinDash: Clinical Trials ClinDash: Clinical Trials BrowserBrowser
Clinical Obs
Expression Data
Subjects
•Values can be normalized across all measurables (rows)
•Samples can be aligned to their subjects using RDF rules
•Clustering can now be done over all measureables (rows) and types
![Page 56: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/56.jpg)
Shattering Applications
• Specific lenses may be too complex for end users to create
• But end users can– Assemble these lenses into “applications”
– Decide at which data these lenses point
• Current application developers can build those views– Much more modular
– Instead of building whole application, just build a lens and add to pool
– Repurposable lenses for repurposable data
• Simpler views can be built by non programmers– Embedding the complex lenses as subparts
![Page 57: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/57.jpg)
Sharing
![Page 58: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/58.jpg)
Semantic Bank
• Tools directly collect and manipulate RDF– So sharing just requires publishing the RDF back
• Semantic Bank is just a big RDF repository– GET a resource to fetch the (XML encoding of) RDF about it
– Similarly, upload an XML encoding of the RDF:
* POST /semantic-bank/foo?command=upload&format=rdfxml HTTP/1.1Host: bank.example.orgContent-Length: 317
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <rdf:Description rdf:about="http://www.example.org/ns#item12345"> <rdfs:label>An Example</rdfs:label> <rdf:type rdf:resource="http://www.example.org/ns#Thing"/> </rdf:Description></rdf:RDF>
![Page 59: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/59.jpg)
Getting There
![Page 60: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/60.jpg)
What’s wrong?
• It seems obvious: RDF lets anyone– Ignore web site and application boundaries
– Gather data they need
– Define their own new attributes and relationships
– Look at it the way that the need
– Manipulate it
– Publish it back for others to use it, without having to manage a web site
• So why don’t we already have it?
![Page 61: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/61.jpg)
Cost of Getting Started?
• Web:– Download/run a web server (hardest part, happens only once)
– Download a web browser
– Write a web page
• Semantic Web– Install database, define schemas
– Add middleware layer
– Create templating engines
– Develop ontolgies, data import protocols
– …
• Semimantic web– Post some rdf (written in n3) to a semantic bank
– Install piggybank
![Page 62: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/62.jpg)
Absence of Schemas?
• What good is it to put up RDF without explaining all the properties?
• What happens when different people put up “mismatched” data with different (explicit or implicit) schemas?
• What if there are multiple URLs for the same thing, with inconsistent statements about them?
• How can I use data I collected from somewhere else, if it doesn’t have the same schema as mine?
• But designing schemas is hard– Requires big committees, lots of meetings, deliberation, buy-in
![Page 63: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/63.jpg)
Data First, Schema later (if ever)
• Need for schemas is a fallacy, blocking progress
• Each site is likely consistent with itself
• And will likely “go with the crowd” and be consistent with others
• If not, let users (not machines) translate– Mapping properties to properties
– As needed, from site to site
* (or site to personal repository)
– Typically only need to blend a few sites
![Page 64: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/64.jpg)
![Page 65: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/65.jpg)
There’s no RDF?
• Database backed servers can easily expose RDF, if they want to– E.g., citeseer.csail.mit.edu
– Import into piggy bank
– Browse, query, search in interesting ways
– Maintain collections of references
• If server won’t cooperate, scrape– Piggy bank has a scraper repository
– One person writes scraper, everyone uses
– Or, one scrapes and publishes to semantic bank, others get from bank
– Also unsupervised machine learning approaches
![Page 66: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/66.jpg)
Clogs and Plogs
• Much blogging is about recycling content
• Clogs (Content Blogs) can manually merge data – Blogger locates sources of data that ought to be in their schema
– Invests work to align properties and instances
– Publishes resulting single (schema unified) blob of data
– No front end
• Plogs (Presentation Blogs) display data– Develop interesting lenses
– Point them at clogger content
– Someone else’s back end
• Separate front and back ends into different web sites
![Page 67: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/67.jpg)
Chicken and Egg
• RDF-aware clients useless without data, and vice versa
• What can prime the pump?
![Page 68: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/68.jpg)
Research Projects
• Many of our projects generate interesting data
• Then present through one interface– Eg NLP, speech
• Instead, post it to the semantic bank– Others will find new uses for the data
• Other projects consume data– Get it from the bank
• Let’s talk…
![Page 69: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/69.jpg)
Conclusion
• We have the tools to separate data from presentation– RDF repositories
– Lenses to display arbitrary data in arbitrary combination
• Doing so would offer substantial benefits– Application barriers go away
– Anyone can create interesting content
– People can repurpose it to their own specific needs
• Semantic Web can be lightweight– Low cost of deployment
– Immediate benefit
– All we need do is ignore semantics
![Page 70: Free your Data: Instant Gratification with the Semantic Web David Karger](https://reader038.vdocument.in/reader038/viewer/2022103005/56649e105503460f94afbce8/html5/thumbnails/70.jpg)
• Haystack.csail.mit.edu
• Simile.mit.edu