data science for epa big data analytics: oregon data dr. brand niemann director and senior data...

44
Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Virginia-Big-Data-Meetup/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://www.meetup.com/Northern-Virginia-Semantic-Web-Meetup/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Gro up_Meetup April 21, 2015 1

Upload: sharon-richard

Post on 13-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

1

Data Science for EPA Big Data Analytics: Oregon Data

Dr. Brand NiemannDirector and Senior Data Scientist/Data Journalist

Semantic Communityhttp://semanticommunity.info/

http://www.meetup.com/Virginia-Big-Data-Meetup/ http://www.meetup.com/Federal-Big-Data-Working-Group/

http://www.meetup.com/Northern-Virginia-Semantic-Web-Meetup/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup

April 21, 2015

Page 2: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

2

Data Science Process and Questions

• How was the data collected?– Dr. Joan Aron, Subject Matter Expert, helped me find it.

• Where is the data stored?– The Oregon Geospatial Data Clearinghouse and MindTouch,

Excel, and Spotfire.• What are the data results?

– The Oregon Geospatial Data Clearinghouse is an excellent source for data science work as these slides show.

• Why should we believe the data results?– The Oregon Geospatial Data Clearinghouse has high quality

data and I am a competent data scientist.

Page 3: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

3

Oregon.gov: Home Page

http://www.oregon.gov/pages/index.aspx

Data

Page 4: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

4

Oregon.gov: Search for Data

http://www.oregon.gov/Pages/index.aspx#search?q=Data

My Note: Let’s Look at:• Oregon Geospatial Data Clearinghouse• Oregon | Open Data• Oregon Spatial Data Library: Department of Administrative Services

Page 5: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

5

Oregon Geospatial Data: Clearinghouse

http://www.oregon.gov/DAS/pages/irmd/geo/sdlibrary.aspx

Page 6: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

6

Oregon Geospatial Data: Alphalist

http://www.oregon.gov/DAS/CIO/GEO/pages/alphalist.aspx

My Note: This can be a data catalog in a spreadsheet.

Page 7: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

7

Oregon: Open Data

https://data.oregon.gov/

Page 8: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

8

Oregon: Open Data Catalog

https://data.oregon.gov/browse

My Note: I would like the entire catalog in a spreadsheet. I made the GeoData one!The metrics suggest that only a small number of the 1732 are downloadable files.

Page 9: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

9

Oregon Spatial Data Library: Department of Administrative Services

http://spatialdata.oregonexplorer.info/geoportal/catalog/main/home.page

My Note: This appears to be the same as the Alphalist.

Page 11: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

11

Data Science for Oregon Data: MindTouch Knowledge Base Find

My Note: Find Forestry (44)1. First 4: Joan Aaron’s2. Next 3:• Forested Lands (Shape)• Forest Ownership (western

Oregon) (Shape)• Forest Types (1914) (Shape)3. Next 3:• Wildfires: Communities At Risk

Data, 2005. Oregon Department of Forestry (ODF) (22)• Classified Forestland - Urban

Interface (SB360) (Shape)• Value: Forest (GRID)

Data Science for EPA Big Data Analytics Oregon Data

My Note: Shape (112)

Page 12: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

12

Data Science for Oregon Data: Spreadsheet Knowledge Base Find

OregonData.xlsx

Page 13: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

13

FME Workbench: OWRI GDB-to-SHP

http://www.safe.com/

Page 15: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

15

Data Science for Oregon Data:Spotfire GeoData Alpha List Data Set

My Note: This Excellent Catalog Needed to Be a Searchable Linked Data Set!

Web Player

Page 16: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

16

Data Science for Oregon Data:Spotfire OWRI GDB-to-Shape Files

My Note: One Mapper (Ashley Seim) has 8914 rows of points!Cannot get OWSR Polygons to display.

Web Player

Page 18: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

18

Data Science for Oregon Data:Spotfire Forestland Shape File

My Note: The VEGNAME for most of the total area is not specified!

Web Player

Page 19: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

19

Data Science for Oregon Data:Spotfire West Forestown Shape File

My Note: See the Dynamic Linking Between Visualizations!

Web Player

Page 20: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

20

Conclusions and Recommendations

• Subject matter expert, Dr. Joan Aron, provides use case (Oregon forestry) and suggested links to geospatial data sets.

• Oregon.gov has excellent Open Data, Geospatial Data Clearinghouse and Spatial Data Library, but not a catalog data set.

• An Oregon Geospatial Catalog Data Set was created to aid in the selection of specific data sets and their metadata.

• MindTouch, Spreadsheet, and Spotfire Knowledge Bases were created to support the Data Science Data Publication for this use case.

• An Open Data Catalog Data Set could be used with the Oregon Geospatial Catalog Data Set to produce more Data Science Use Cases.

Page 22: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

22

Oregon’s 2012 Integrated Report – Submitted to EPA for Review and Action

Oregon’s 2012 Integrated Report – Submitted to EPA for Review and Action

Page 23: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

23

FME Workbench:2012 Assessment GDB-to-SHP

http://www.safe.com/

Page 28: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

28

Conclusions and Recommendations

• The Oregon Land Use GIS Data and Oregon’s 2012 Integrated Report – Submitted to EPA for Review and Action GIS Data have been visualized in Spotfire.– See Data Science Data Publication (in process): Data

Science for EPA Big Data Analytics and Oregon Data• Data Science for Oregon Harmful Algal Bloom (HAB)

Data has begun to extract and integrate from Web (Word & Excel), and PDF for Spotfire analytics and visualizations.– See Data Science Data Publication (in process): Data

Science for EPA Big Data Analytics and Oregon HAB Data

Page 30: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

30

Data Science for Oregon HAB Data: PDF Report to Excel Knowledge Base

Oregon HAB Data

Page 36: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

36

Conclusions and Recommendations

• HAB Strategy Report Appendices B and C Tables (PDF) and Web Downloads (Word & Excel), were extracted and integrated for Spotfire analytics and visualizations.

• The Spotfire interactive graphics support Exploratory Data Analysis.

• The data sets require semantic harmonization of Basin Names for further integration.

Page 37: Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

37

Oregon HAB Tenmile Lakes-Spotfire Manage Relations