1 build epa’s synaptica in the cloud: an enterprise vocabulary catalog for data.gov/semantic brand...

32
1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010 http:// semanticommunity.net Disclaimer: These slides do not reflect the views of the U.S. Environmental Protectio and does not constitute endorsement by the EPA of the standards or products mentioned.

Upload: egbert-woods

Post on 16-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

1

Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for

Data.gov/semantic

Brand NiemannU.S. EPA

July 8, 2010http://semanticommunity.net

Disclaimer: These slides do not reflect the views of the U.S. Environmental Protection Agencyand does not constitute endorsement by the EPA of the standards or products mentioned.

Page 2: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

2

Overview

• The Challenge• EPA’s Synaptica Program• The Expert and His Advice• The Cloud Tools• The Inspiration• The Data Sources• Other Sources of Data• The Process• The Results• Comments• Acknowledgements• References

Page 3: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

3

The Challenge• EPA's Synaptica (Vocabulary Catalog) is a closed-system for

terminology services that contains valuable vocabularies that need to be used in Semantic Web applications. Semantic Web applications require harmonized enterprise vocabularies that are referenced by well-defined web addresses (URI's or URL's) and used in Semantic Web markup language data models called RDFS. EPA's Synaptica is part of EPA's System of Registries (SoR) that is moving towards a Semantic Web application (URIs and metadata) as well.– See jahendler  Good Data.gov meeting with @georgethomas and others

- lots of cool stuff in #semweb space- starting to focus on URIs and metadata

• Data science and data forensics takes a systematic approach to understanding and auditing data resources so non-experts can use those data resources more easily and confidently.

http://epadata.wik.is/EPA's_Synaptica#The_Challenge

Page 4: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

4

EPA’s Synaptica Program

• 1. Go to the System of Registries at http://www.epa.gov/sor

• 2. Click on "Login for EPA and Partners" (on the left in the blue navigation bar) and provide your EPA Portal User ID and password (same as your Novell login).– Do not use your Synaptica User ID/password for this login.

• 3. Once you are logged in, click on Terminology Services• 4. Go to "Manage Terminology" tab, select "Access

Terminology Tool"• 5. Click "Launch Terminology Services Tool"• 6. Click "Continue to this Web Site (not recommended)"• 7. Login to Synaptica using your User ID and password

http://epadata.wik.is/EPA's_Synaptica#EPA's_Synaptica_Program

Page 5: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

5

EPA’s Synaptica Program

http://www.epa.gov/sor

Page 6: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

6

EPA’s Synaptica Program

https://iaspub.epa.gov/sor_internet/registry/sysofreg/login/login.do

Page 7: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

7

EPA’s Synaptica Program

https://iaspub.epa.gov/sor_extranet/registry/sysofreg/home/overview/home.do

Page 8: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

8

EPA’s Synaptica Program

https://iaspub.epa.gov/sor_extranet/registry/termreg/home/overview/home.do

Page 9: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

9

EPA’s Synaptica Program

https://iaspub.epa.gov/sor_extranet/registry/termreg/manageterminology/accessterminologytool/

Page 10: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

10

EPA’s Synaptica Program

https://etss.epa.gov/home/homepage.asp

Page 11: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

11

EPA’s Synaptica Program

https://etss.epa.gov/home/homepage.asp?vtvpid=1000041

Page 12: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

12

EPA’s Synaptica Program

https://etss.epa.gov/tools/XMLUploadForm.asp?ext=ALL

Show all importable files (.XLS, .CSV, .TXT, .XML)

See next two slides

Page 13: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

13

EPA’s Synaptica Program

https://etss.epa.gov/incoming/AG%20101%20load%20file.xls

Page 14: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

14

EPA’s Synaptica Program

https://etss.epa.gov/incoming/Aquatic%20Biodiversity%20Glossary_Term_LOAD%2020091116.csv

Page 15: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

15

EPA’s Synaptica Program

https://etss.epa.gov/tools/XMLUploadForm.asp?ext=XML

ZThes, RDF/SKOS and RDF/OWL formatted XML files (.XML)

None!

Page 16: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

16

The Expert and His Advice

• Edward Tufte Presidential appointment announced by White House, March 5, 2010.

• Tufte Comment on iPhone interface design: Better to have users looking over material adjacent in space within our eyespan rather than stacked in time. This is especially the case for statistical data, where the fundamental analytical task is to make comparisons. Also see page 159 in the book reference below.– http://epadata.wik.is/EPA's_Synaptica#References

http://epadata.wik.is/EPA's_Synaptica#The_Expert_and_His_Advice

Page 17: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

17

The Cloud Tools

http://cloud.mindtouch.com/

Page 18: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

18

The Cloud Tools

http://epadata.wik.is/EPA's_Synaptica#The_Cloud_Tools

Page 19: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

19

The Cloud Tools

http://spotfire.tibco.com/

Page 20: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

20

The Cloud Tools

http://ondemand.spotfire.com/public/Help/index.htm

Page 22: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

22

The Data Sources

http://epadata.wik.is/EPA's_Synaptica/File_Import_Manager

Page 23: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

23

Other Sources of Data

http://epadata.wik.is/System_of_Registries/Environmental_Terminology_System_and_Services

Page 24: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

24

The Process

• Linked Data Design Principles:– Use HTTP URI’s so that people can look up those names– When someone looks up a URI, provide useful information,

using the standards (RDF and SPARQL)– Include links to other URI’s so that they can discover more

things– Use URI’s as names for things

• A nice summary of the 5 star scheme:– make your stuff available on the web (whatever format)– make it available as structured data (e.g. excel instead of image

scan of a table)– non-proprietary format (e.g. csv instead of excel)– use URLs to identify things, so that people can point at your stuff– link your data to other people’s data to provide contextSource: http://www.w3.org/DesignIssues/LinkedData

http://epadata.wik.is/EPA's_Synaptica#The_Process

Page 25: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

25

The Process

• The Basic Steps:– Inventory Data Sources and Plan Application– Prepare and Import Data and Metadata– Implement Layout and Analytics– Add Bookmarks and Create Data Stories– Publish and Test in Web Player– Get Feedback and Improve

• First create visualizations, faceted search (filters), and analytics for each individual data source and then look for relationships between the data sources.

http://epadata.wik.is/EPA's_Synaptica#The_Process

Page 26: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

26

The Results

• Reproduced EPA’s Synaptica File Manager in the Wiki.

• Draged-and-Droped EPA’s Synaptica File Manager Files into Spotfire to Make a Selection for Spotfire.– See right-hand side.

http://epadata.wik.is/EPA's_Synaptica#The_Results

Page 27: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

27

The Results

http://epadata.wik.is/EPA's_Synaptica#The_Results Spotfire on PC

Page 28: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

28

The Results

http://epadata.wik.is/EPA's_Synaptica#The_Results Spotfire on PCSearch

Page 29: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

29

The Results

http://epadata.wik.is/EPA's_Synaptica#The_Results Spotfire Web Player

Page 30: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

30

Comments• The initial objective to see how fast one could re-create EPA's

Synaptica (Vocabulary Catalog) using Spotfire's Multiple Visualizations to make it an open-system for terminology services that contains valuable vocabularies that need to be used in Semantic Web applications (URIs and metadata). Now it is ready to integrate with other vocabularies for Data.gov/semantic.

• Please use the Add Comment feature at the bottom of this wiki page to provide feedback and suggest additional analyses you would like to see. To use the Add Comment feature you first need to register by providing your email address. Your privacy will be respected and your email address will not be available to others or used for any other purpose. You can also download the Spotfire File (<1 MB) from this Wiki and a 30-day free evaluation copy from http://spotfire.tibco.com/ and reuse these analyses, add your own data to this file or new Spotfire files that you create. Have fun and give us your feedback!

http://epadata.wik.is/EPA's_Synaptica#Comments

Page 31: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

31

Acknowledgements

• The author acknowledges gratefully Dean Allemang, Cory Casanave, Sean Connors, Mills Davis, Li Ding, David Eng, Lee Feigenbaum, Aaron Fulkerson, Jim Hendler, Ralph Hodgson, Kevin Kirby, Kevin Jackson, Bob Marcus, John McMahon, Richard Murphy, Brand Niemann, Jr., Barry Nussbaum, Matthew Phoenix, Tony Shaw, Jeff Stein, George Strawn, George Thomas, Pete Tseronis, and Edward Tufte.

http://epadata.wik.is/EPA's_Synaptica#Acknowledgements

Page 32: 1 Build EPA’s Synaptica in the Cloud: An Enterprise Vocabulary Catalog for Data.gov/semantic Brand Niemann U.S. EPA July 8, 2010

32

References• Brand L. Niemann, Put Your Desktop in the Cloud to Support the Open Government Directive

and Data.gov/semantic, April 19, 2010, Semantic Universe.• Brand L. Niemann, Build Your Own Data.gov (Spotfire) and EPA Microsite (Spotfire) with

Semantics and Statistics in the Cloud, May 15, 2010. Slides.• Brand L. Niemann, Build Your Community Health Information "Design for America" Using 

Mindtouch and Spotfire, May 17, 2010. Slides.• Brand L. Niemann, Build EPA’s CASTNET In the Cloud, May 21 and 30, 2010. Slides, Mindtouch,

and Spotfire.• Brand L. Niemann, Build Your Own Data.gov/semantic with Mindtouch and Spotfire in the Cloud:

The White House Visitor Database, May 22, 2010. Slides. See Data.gov takes the 'Mumsy' test, FCW, May 26, 2010.

• Brand L. Niemann, Build EPA's EPA's Facility Registry System (FRS) and Locational Reference Database with Mindtouch and Spotfire in the Cloud: Virginia, June 1, 2010.

• Brand L. Niemann, Build the UK’s COINS in the Data Science Library Cloud. Mindtouch and Slides. June 9, 2010.

• Brand L. Niemann, Build EPA's Envirofacts in the Cloud: Virginia FRS, NPL, and TRI. Mindtouch and Slides. June 14, 2010.

• Brand L. Niemann, Build the SemTech 2010 in the Cloud (Mindtouch and Spotfire), No Slides, July 2, 2010.

• Brand L. Niemann, Build the White House Staff Salaries in the Cloud (Mindtouch and Spotfire). Slides, July 3, 2010.

http://epadata.wik.is/EPA's_Synaptica#References