Download - Data-gov @ RPI
Data-gov @ RPI
Li Ding, Jim Hendler and Deborah McGuinness
Tetherless World Constellation, Rensselaer Polytechnic Institute
July 27, 2010
The Data-gov project is headed by Professors Jim Hendler and Deborah McGuinness and led by Li Ding. Other student team members include: Dominic DiFranzo, Sarah Magidson ,James Michaelis, Alvaro Graves, Adam Bell, Jin Guang Zheng, Xian Li, Tim Lebo, Gregory Todd Williams, Peter Coons, Zhenning Shangguan, Devin Gaffney, William Cooper, Brian Zaik, and Johanna Flores .
2
Raw Government Data NowJa
nu
ary
1,
20
09
“Openness will strengthen our democracy and promote efficiency and effectiveness in Government.”
--- President Obama
Putting Government Data online
Ma
y 2
1,
20
09
Jan
ua
ry 1
9,
20
10
data.gov.uk online
Ma
y 2
1,
20
10
data.gov online data.gov relaunchwith semantic webfeatured
Jun
e3
0,2
00
9
De
cem
be
r 8
, 2
00
9
“Open GovernmentDirective” released
2009 2010 …
3
Semantic Web featured at data.gov
• leveraged contributions from the Tetherless World Constellation at RPI• published 6.4 billions of triples (almost doubled LOD cloud – 13 billion triple in total)• hosted triple store (virtuoso) and open source RDF mashups
http://www.data.gov/semantic/ http://www.data.gov/semantic/data/alpha
4
Data-gov Wiki: Portal for Innovations at RPI
The Data-gov Wiki explores and educates the use of semantic web technologies, esp. linked data, in producing, processing and utilizing government data from data.gov.
The Data-gov Wiki is run by the Tetherless World Constellation at RPI, headed by Professors Jim Hendler and Deborah McGuinness and led by Li Ding. Other student team members include: Dominic DiFranzo, Sarah Magidson ,James Michaelis, Alvaro Graves, Adam Bell, Jin Guang Zheng, Xian Li, Tim Lebo, Gregory Todd Williams, Peter Coons, Zhenning Shangguan, Devin Gaffney, William Cooper, Brian Zaik, and Johanna Flores .
40+ Demos 400+ Datasets Tutorials & Videos
5
Synopsis
• Open Data: available for public use
• Linked Data: easy to integrate
• Visualization: easy to understand data
• Mashups: enrich meaning of data
• Provenance: make mashups accountable
6
A Typical Mashup: CASTNET
Exhibit Visualization API
Data.govData.gov
CASTNET Ozone(CSV)
epa.govepa.gov
CASTNET Site(CSV)
Convert raw dataset into linkable RDF
Data Mashup Web Application MashupVisualization Mashup
query multiple RDF dataset via SPARQL end point
surf to EPA applications
1
2
drill down for details3
4
Created by Dominic DiFranzo, PhD student at RPI, http://www.data.gov/semantic/Castnet/html/exhibit
7
Mashup: AGI vs Medicare Claims
Created by Peter Coons, http://data-gov.tw.rpi.edu/demo/stable/demo-1356-1623-health-claim-vs-income.html
[Spatial Mashup] Data.gov (AGI + Medicare Claims + Population)
8
Mashup: US and UK Foreign AID
AID Major aids from US Major aids from UK
Pakistan
US >UK Economic/Security Assistance,
Health,
India UK > US Child Survival and Health Health,
Created by James Michaelis, PhD student at RPI, http://data-gov.tw.rpi.edu/demo/linked/aidviz-1554-10030.html
Data Sources:
[Spatial Mashup] Data.gov (USAID) + Data.gov.uk (DFID)
9
Social Mashup: US Wildland Fire
Wildland fire(NIFC)
Budget on wildfire“DOI” and “USDA”(OMB)
Category:Wildfires In The United States
Created by Li Ding, researcher at RPI, http://data-gov.tw.rpi.edu/demo/stable/demo-1187-40x-wildfire-budget.html
[Temporal Mashup] Data.gov (statistics+ budget) + Wikipedia (famous fires)
10
Mashup: White House Visitor Search
“POTUS”
dbpedia:Barack_Obama
Created by Dominic DiFranzo, http://data-gov.tw.rpi.edu/demo/stable/white-house-visitor/top100-visitees.php
[Person Mashup (via Data-gov Wiki)] Data.gov (statistics) + Wikipedia (personal profiles)
whitehouse
Data-gov Wiki
Wikipedia
11
Mashup: USPS Spending and News
Created by Sarah Magidson, http://data-gov.tw.rpi.edu/demo/linked/demo-401-usps-news.html
[Temporal Mashup] Data.gov (spending and budget) + User-contributed Data (news)
12
Mashup: Supreme Court Justices
Created by Xian Li, http://data-gov.tw.rpi.edu/demo/stable/supremeCourt/demo-10016-portal.html
[Person Mashup] Data.gov (budget) + SCDB (Voting History) + Wikipedia (personal profiles)
13
More Mashups: Using Web Tools
SPARQL results (XML) can be converted into other formats (e.g. JSON, CSV) as input of other Web tools: Yahoo Pipes, IBM Many Eyes, Microsoft Web n-gram Service, …
14
More Mashups: Provenance
• Critical to accountability• Demo => Dataset => Agency
– Where data come from?
• Agency =>Dataset => Comments– Support users’ feedback
DatasetDemo
Agency
15
Conclusions
• Now– 6.4 billions of triples from data.gov– “data + visualization + mashup” is powerful – Low-cost solutions available for education
• Future– Development
• More raw data, data catalog, links, RDFa• More tools, esp. Web visualization, SPARQL endpoint• More demos and applications in different domain
– Research• Integration: link, search, social contribution,…• Provenance: source, versions, trust, …• Usability: scalable, quality…