Transcript
Page 1: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

Data.gov Wiki: A Semantic Web Approach to

Government Data

Li Ding, Dominic DiFranzo, Sarah Magidson, Alvaro Graves, James R. Michaelis, Xian Li,

Deborah L. McGuinness, Jim Hendler

Tetherless World ConstellationNov 2, 2009

Page 2: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

Synergy

• Government: data is out there “as is”• Loop: gov data and linked data• Loop: gov data and web developers• Loop: gov data and end users

Page 3: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

Government Data on the Web

Page 4: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

Objectives

• Investigate the role of semantic web in producing, processing and utilizing government datasets– To enrich the value of data via normalizing,

linking and information-extraction– To realize the value of data via applications,

esp. visualization– To support web developers via machine

friendly data access and web services

Page 5: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

Data Processors(Web Services & Analyzers)Data Processors(Web Services & Analyzers)

SPARQL Web Service

XSLT Service Diff Service

RDF/XML

RSS Generator

SPARQL End Point

Linked Data

Linked DataGOV data

(RDF)

Google Viz MIT Exhibit RSS 1.0 tagCloud

CSVXSL…

Tabulator

Convert D

ataLink &

Enrich D

ataV

iew &

Use D

ata

Link Annotator

RDF/XML

Li Ding, Dominic DiFranzo, Sarah Magidson, and Jim Hendler · Tetherless World Constellation · Rensselaer Polytechnic Institute · Aug 7 2009 · http://data-gov.tw.rpi.edu/

Sem Wiki

Semantic Web Architecture for Government Data

Page 6: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

The Landscape

Page 7: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

The catalog data

Page 8: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

(#10)Residential Energy

Consumption Survey

(#401) Budget Authority and

offsetting receipts1976-2014

(#403) Governmental

Receipts1962-2014

(#402) Outlays and

offsetting receipts1962-2014

(#249) 2006 Toxics Release

Inventory

(#90) 2005-2007 ACS PUMS

Housing (#191) 2005 Toxics Release

Inventory

(#91) 2005-2007 ACS PUMS Population

(#34)Worldwide M1+ Earthquakes past 7 days

(#9) CASTNET Visibility

(#397) 2007 Toxics Release

Inventory

(#8) CASTNET Ozone

Budget

Population

Energy and Utilities

Geography and Environment

(@10001)CASTNET sites

Li Ding, Dominic DiFranzo, Sarah Magidson, and Jim Hendler · Tetherless World Constellation · Rensselaer Polytechnic Institute · Aug 7 2009 · http://data-gov.tw.rpi.edu/

Data-gov Cloud (Aug 2009)

Page 9: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

Data-gov Cloud (Oct 2009)

Li Ding and Jim Hendler · Tetherless World Constellation · Rensselaer Polytechnic Institute · Oct 2009 · http://data-gov.tw.rpi.edu/

US-COMMUNITY(2005-2007)

CASTNET(1990 – Present)

RECS(2005)

GOV-BUDGET(1962-2014)

TOXIC-RELEASE(2005-2008)

EARTHQUAKE(Present)

STATE-LIB(2006-2007)

PUBLIC-LIB(1992-2006)

MED-COST(1994-2009)

LABOR-STAT(19xx-Present)

DATA-GOV-CATALOG(present)

Government

Community

Services

Environment

CASTNET sites

RECS code

US agency US location

Linked Data

USAspending(2008-2010)

GeoNamesGeoNames

Page 10: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

More statistics

Page 11: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

Demos

Page 12: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

Data.gov + epa.gov

Page 13: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

Gov Data + Corporate Data + User Data

Page 14: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

Computing Difference of Revisions

Page 15: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

More demos?

• http://data-gov.tw.rpi.edu/wiki/demos

Page 16: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

Technical Issues

Page 17: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

Issues in Data.gov

• Duplicated Datasets- Some datasets are part of another dataset

– Dataset 140 (2005 Toxics Release Inventory data for the state of California (EPA)) is a subset of Dataset 191.

• Formatting Issues - The format of some datasets is not friendly to machine processing.

– Dataset 37 (Lower Colorado River Daily Average Water Elevations and Releases (US Bureau of Reclamation)).

– Dataset 335 (National Longitudinal Surveys (US Bureau of Labor Statistics)) tells you how to order data from the government.

• Access Point Issues - The access points are interactive webpage which is not friendly for machine access.

– Dataset 330 (Local Area Unemployment Statistics (US Bureau of Labor Statistics)

Sarah

Page 18: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

Linking Data

1. link similar datasets by reusing property namespace

2. link to rdfs:label (via rdfs:subPropertyOf) using semantic wiki

3. link to DBpedia (via owl:sameAs) using wikipedia widget

4. link instances (via common <property, literal-value> pair)

5. link government data with web data (via time and location)

6. link revisions of government data (via knowledge provenance)

Page 19: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

Semantic mapping: AI + CI

need manual disambiguation!

Map to Wikipedia/DBpedia Name

Page 20: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

RDF => SPARQL => Web

• We use SPARQL to bridge Web devlopers and Semantic Web data.

• A triple store is used to support handling multi-million triple RDF datasets

Page 21: Data.gov Wiki: A Semantic Web Approach to Government Data · esp. visualization – To support web developers via machine friendly data access and web services. ... value> pair) 5

Conclusion

semantic web enabled portal for linked government data 5 billion triples from data.gov hosts apps, demos & services provide education services integrates web users’ contributions


Top Related