data science for epa enviroatlas 2015 dr. brand niemann director and senior data scientist/data...

23
Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Semantic Community Data Science Data Science for EPA EnviroAtlas Data Science for EPA Big Data Analytics October 19, 2015 1

Upload: philomena-maxwell

Post on 02-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

1

Data Science for EPA EnviroAtlas 2015

Dr. Brand NiemannDirector and Senior Data Scientist/Data Journalist

Semantic CommunitySemantic Community

Data ScienceData Science for EPA EnviroAtlas

Data Science for EPA Big Data AnalyticsOctober 19, 2015

Page 2: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

2

Overview

• Previous:• EPA Data Science Products (while at EPA circa 2008-2010)• EPA/NASA Climate-Environment al Data Analytics & A Redesigned, Open Data.gov Meetup

(May 6, 2014)• Data Science for EPA EnviroAtlas (June 6, 2014)• Data Science for EPA Big Data Analytics (April 17, 2015)• President's Chief Data Scientist and EPA Big Data Analytics Meetup (April 20, 2015)• Uncovering Hydraulic Fracturing Trends and Data with TIBCO Spotfire (September 1, 2015)• Sensing Our Air: The Quest for Big Data About Our Air Quality (October 19, 2015)

• Present:• Translating Big Data into Big Climate Ideas (April 2015)• Joan Aron Comments and Questions

Page 3: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

3

Translating Big Data into Big Climate Ideas

• By Brian R. Pickard (U.S. EPA, Landscape Ecology Division), Jeremy Baynes (Contributor), Megan Mehaffey (Contributor), Anne C. Neale (Contributor), Solutions Vol. 6, Issue 1, April 2015.

• In Brief: Climate change has emerged as the significant environmental challenge of the 21st century. Therefore, understanding our changing world has forced researchers from many different fields of science to join together to tackle complicated research questions. The climate change research community now faces the daunting task of disseminating massive amounts of information about possible future climates under differing scenarios to a broad audience. They also need to make the data readily accessible so that it can be used by scientists in other research fields. One potential solution for distribution and communication of the climate scenario information may be through the EnviroAtlas, a new geospatial application developed by the United States Environmental Protection Agency and its partners. This interactive mapping tool allows users to access and explore climate change modeling information in easily understandable formats while providing a range of information on different ecosystem goods and services, or the benefits people receive from nature. By incorporating future scenarios such as land use and climate change within EnviroAtlas, we can evaluate specific components of complex ecosystems within the context of forecasted futures. Linking climate change impacts to ecosystem services, such as clean air and water, allows for opportunities to demonstrate how climate change will impact ecosystems, societies, and human health.http://www.thesolutionsjournal.org/node/237304

Page 4: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

4

Joan Aron Comments and Suggestions 1

• I just sent EnviroAtlas a question about finding the climate data discussed in that Solutions article. I could not find climate data.• According to the Solutions article, NASA provided climate data to use

EnviroAtlas as a platform for disseminating climate data for ecological applications. I'm trying to build on that platform to leverage the NASA and EPA investment for water quality decision-making related to ecosystem function.• It would also be nice to connect to other applications like your Precision Farming

work. If we can link your agricultural info to EnviroAtlas, that could be interesting since agricultural runoff contributes a lot of pollution and is a focus for pollution control. • The characterization of agriculture is also important since USDA covers crops,

forests, ranching and CAFOs.

Page 5: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

5

Response from Anne Neale, EnviroAtlas Project Lead, US EPA,

RTP, NC• Many apologies that you were not able to find the data we described. We

are sorry and embarrassed that the data are not available as promised in the article. When we published the article, we felt sure that we would have the data available by the time the article was published. We ran into some unforeseen complications and although we have all the data and software available on our development server, we have not been able to push that to our public facing site yet. We anticipate that the issue will be resolved soon and we can let you know as soon as the data are accessible.• On July 22, 2015, the EnviroAtlas team will be hosting an informational

meeting in downtown Portland, OR to showcase newly available data. Register here to join us!

http://enviroatlas.epa.gov/enviroatlas/Status/index.html

Page 6: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

6

http://enviroatlas.epa.gov/enviroatlas/Datadownload/index.html

Page 7: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

7

Disclaimer 1

• The data for this download Esri File GeoDataBase (FGDB) contains many data tables and feature classes. Varying numbers of attributes within each data table were used to map the layers in the EnviroAtlas Interactive Map. We encourage users to evaluate the associated metadata for the tables and feature classes through the EnviroAtlas Interactive Map, the XML files included in this zip file, or the EPA’s Environmental Dataset Gateway (EDG).• EnviroAtlas is an interactive web-based decision support tool designed to

provide information about ecosystems and the benefits people receive from those ecosystems. This application represents the first peer-reviewed public release. EnviroAtlas will continue to evolve over the coming years with periodic public releases. We welcome and request your feedback on all aspects of the website and tools.

Page 8: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

8

Disclaimer 2

• EnviroAtlas is designed for use by government, professional, academic, and community users, as well as members of the public. EnviroAtlas does not require special software, technical expertise, or a scientific background. However, it is the responsibility of the user to read and evaluate dataset limitations, restrictions, and intended use. To the best of our knowledge, the data and information on this website are accurate, but no warranty expressed or implied is made regarding the accuracy or utility of the data for general or scientific purposes, nor shall the act of distribution constitute any such warranty. All modeled geographic data are, by their nature, imperfect and the data provided in this Atlas should not be taken as absolute truth but as the best approximation of that truth based on best available data. For site-specific data, EnviroAtlas data will not replace "boots-on-the-ground measurements" or local knowledge. Neither EPA, EPA contractors, nor any other organizations cooperating with EPA assume any responsibility for damages or other liabilities related to the accuracy, availability, use, or misuse of the information provided on this website. EPA reserves the right to change information at any time without public notice. Any errors or omissions should be reported to the EnviroAtlas Team.

• We are always happy to hear your feedback and use that feedback for future enhancements.• I understand and agree to the terms and conditions of this disclaimer.

Page 9: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

9

National:34 Files29.8 MB

Page 10: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

10

Page 11: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

11

The crop acres without pollinator habitat data was developed using the USDA National Agriculture Statistical Service Crop Land Data (http://www.nass.usda.gov/research/cropland/SARS1a.htm)merged with the National Landcover Data 2006 dataset (http://www.mrlc.gov/nlcd2006_data.php). The maximum pollinator flight distance was set based on literature (see J.H. Cane's article 2001 in Ecology and Society "Habitat Fragmentation and Native Bees: a Premature Verdict?"), which set average home range radius of pollinators at 2.8 km. Euclidean distance from each crop pixel was used to select those without nearby habitat then summed by 12 digit HUC.A comprehensive list of metric creation steps is included in the Processing Steps.

Page 12: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

12

Page 13: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

13

Page 14: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

14

Portland:22 Files656 MB

Page 15: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

15

Page 16: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

16

Page 17: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

17

Page 18: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

18

Page 20: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

20

Joan Aron Comments and Suggestions 2

• I need to have more metadata to understand the net flux of pollutants leaving agricultural fields and the locations where the flux applies. There may be multiple sources of data in the same watershed.• I am especially interested in three geographical areas for the

examples:• Around Coos (Oregon) HUC 17100304 - Tenmile Lakes should be nearby• Around Lower Maumee (Ohio) HUC 04100009 - Maumee watershed

produces flux of nutrients to Lake Erie at Toledo• Around Tensas (Louisiana) HUC 08050003 - near Mississippi River

Page 21: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

21

Response

• I think I found the best watershed and pollution data:• http://nugis.ipni.net/About%20NuGIS/

• that I am working with for my precision ag online course.• NuGIS is all counties, all watersheds, and all HUC8, but the data is not

complete for all parameters.

Page 22: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

22

Coos, Oregon

Page 23: Data Science for EPA EnviroAtlas 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science

23

Oregon