how the web can change social science research (including yours)
DESCRIPTION
A presentation for a group of PhD students from the Leibniz Institutes (section B, social sciences) to discuss how they could use the Web, and even better the Web of Data, as an instrument in their research.TRANSCRIPT
How the Web can change
social science research (including yours)
Frank van HarmelenComputer Science
DepartmentVU University Amsterdam
Creative Commons License: allowed to share & remix,but must attribute & non-commercial
Using the web (of data) for e-science
in Social SciencesFrank van HarmelenComputer Science
DepartmentVU University Amsterdam
Creative Commons License: allowed to share & remix,but must attribute & non-commercial
Health Warning:Computer Scientist!
This talk is aboutusing the web
as an observational instrument
using the web of data as an even better observational instrument
using the web of data as a data-sharing platform
This talk is not aboutit's NOT social science about e-science
(e.g Oxford research center)
it's NOT about high-performance computing(that's just boring infrastructure, let the computer scientists will deal with that)
I don’t discuss online social experiments(crowd sourcing, social games, mech. turk, etc)
Who are you?
who is using large computerised data-sets ?
who is using data extracted from the web ?
who is using semantic web data ?
This talk is aboutusing the web & the web of data
as an observational instrument &as a sharing platform
Through:A whole bunch of realistic examplesA sketch of the technology
Message = yes, you can do this too!
Philosophical confessionI take a strongly positivistic stance
Revolution ahead?
Effects of observation instruments
Effects of observation instruments
Effects of observation instruments
Effects of observation instruments
Effects of observation instruments
Example:Political science
Question: Is the content of party-political programmes and election speeches predictive of government coalition attempts?
Data • All party manifesto’s, • half a year of all Dutch newspapers
Example:Communication science
Question: Can we predict the social network at Tn from the content at Tn-1?
Data• Discussions from online forum nl.politiek • 21.000 participants talking about 19 Dutch political parties during 259 weeks
Example:Science dynamics
Question: Is thematic co-occurence at Ynpredictive of co-authoring at Yn+1?
Data:5 year conference series, 1000 papers/year, 3000 authors/year
AmCAT3: Keyword search
This works…. sort of….Methods:web scrapingnat. lang. analysis
(parsing, stemming, synonyms, homonyms)
identity resolutionRequiredPhysical Interoperability Syntactic InteroperabilitySemantic Interoperability
Web of Data to the rescue
General idea of Web of Data(a.k.a. “Semantic Web”)
1. Make data available on the Webin machine-understandable form (formalised)
2. Structure the data and meta-data in ontologies
Warning:technical content
coming up
Bluffer’s Guide to RDF• Express relations between things:
• Results in labelled network (“graph”)• All labels are actually web-addresses (URIs)• You can “ping” any label and find out more• Bits of the graph can live at physically different
locations & have different owners
Frank y
x
AuthorOf
AuthorOf MITpublishedBy
Subject ObjectPredicate
Bluffer’s Guide to RDF Schema
• types for subjects & objects & predicates• Types organised in a hierarchy• Inheritance of properties
Frank y
x
AuthorOf
AuthorOf MITpublishedBy
author book publisher
person artifact
man
Ontologies (= hierarchical conceptual vocabularies)Identify the key concepts in a
domainIdentify a vocabulary for these
conceptsIdentify relations between these
conceptsMake these precise enough
so that they can be shared between• humans and humans• humans and machines• machines and machines
Biomedical ontologies (a few..) Mesh
• Medical Subject Headings, National Library of Medicine
• 22.000 descriptions EMTREE
• Commercial Elsevier, Drugs and diseases• 45.000 terms, 190.000 synonyms
UMLS• Integrates 100 different vocabularies
SNOMED• 200.000 concepts, College of American
Pathologists Gene Ontology
• 15.000 terms in molecular biology NCBI Cancer Ontology:
• 17,000 classes (about 1M definitions),
On the Web of Data, anyone can link anything to anything
x T
[<x> IsOfType <T>]
differentowners & locations
<institute>
SPARQL: Bluffer’s GuideSELECT ?country_name ?population
WHERE { ?country a type:LandlockedCountries ; ?country rdfs:label ?country_name ; ?country prop:populationEstimate ?
population . FILTER (?population > 15000000) .
SELECT ?name ?img ?hp ?locWHERE {
?a a mo:MusicArtist ; ?a foaf:name ?name .
OPTIONAL { ?a foaf:homepage ?hp } . }
Example:science dynamics
Faculteit der Exacte Wetenschappen
MEET JULIE
PhD Student “institutional influences on collaboration patterns in interdisciplinary research”
Faculteit der Exacte Wetenschappen
Julie needs data
33
Faculteit der Exacte Wetenschappen34
Faculteit der Exacte Wetenschappen
DBLP: RDF & RDF Schema
Faculteit der Exacte Wetenschappen36
SELECT ?author ?affiliation ?uriAffiliation WHERE { GRAPH <$graph> { {<$article> swrc:author ?author.
OPTIONAL{?author swrc:affiliation ?uriAffiliation.} OPTIONAL{?author swc:affiliation ?affiliation.} }
}}
DBLP Query: 2 weeks 15 mins.
UNION { <$article> foaf:maker ?author.
OPTIONAL{?author swrc:affiliation ?uriAffiliation.} OPTIONAL{?author swc:affiliation ?affiliation.}
} UNION { <$article> dc:creator ?author.
OPTIONAL{?author swrc:affiliation ?uriAffiliation.} OPTIONAL{?author swc:affiliation ?affiliation.}}
Example:Dutch census data
(1795 – 1971)
40.745.554.078 triples Semantically rich
Who’s doing it?
The World Bank is also doing it!
http://data.worldbank.org/7,000 indicators from World Bank data
sets.
The US gov is also doing it!http://data.gov/ : 390.000 data sets
Compare foreign aid budgets
Does tax influence smokers?
Compare campaign money
already many billions of facts & rules
Everybody’s doing it!
Encyclopedia
Geographic names (millio
ns)
names of artists & art w
orks
(10.000’s)
scientific bibliographies
hierarchical dictionaries
(UK, FR, N
L)
life-science databases
any CD ever recorded (a
lmost)
May ‘09 estimate > 4.2 billion triples + 140 million interlinks
basic facts on every country on the planet
common sense rules & facts (1
00.000’s)
It gets bigger every month
It gets bigger every month
25 billion facts &
relations…
And many more
• Reuters• New York Times• EU (EUROSTAT, others)• BBC• Facebook• ….
So how good is this observational instrument ? Studies on validity (e.g. in science
dynamics)
methods for provenance & trustmethods for attribution & citation
For real ?
“ use the power of information to explore social and economic life on Earth ”
1bn€ over 10 years
Pfew….
Take home messageuse the web & the web-of-data
to obtain your datause the web-of-data to share your
datayes, you can do this too!
Collaborate with computer scientistsreflect on deeper consquences
for the social sciences(methodological, theoretical, etc)
AcknowledgementsI’ve freely used material from the work
ofShenghui WangPaul GrothJulie BirkholzWouter van AtteveldtLaurens van RietveldRinke Hoekstraand many in the Semantic Web
community