data science for rda climate change data challenge and meetup dr. brand niemann director and senior...

24
Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Semantic Community Data Science Data Science for RDA Climate Change Data Challenge September 28, 2015 1

Upload: stuart-hubbard

Post on 24-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

1

Data Science for RDA Climate Change Data Challenge and Meetup

Dr. Brand NiemannDirector and Senior Data Scientist/Data Journalist

Semantic CommunitySemantic Community

Data ScienceData Science for RDA Climate Change Data Challenge

September 28, 2015

Page 2: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

2

NSF Graduate Data Science Workshop & Community

Building, August 5-7, Seattle• This NSF sponsored 2.5 day workshop on August 5th – 7th on the University of Washington, Seattle

campus, will bring together 100 graduate students from diverse domain sciences and engineering with Data Scientists from industry and academia to discuss and collaborate on Big Data / Data Science challenges.

• In addition to keynote presentations from high profile speakers, the participants will present posters covering their own research and work collaboratively to begin to solve some of the Grand Challenge problems facing Data Enabled Science & Engineering disciplines.

• After the workshop, the output from the collaborative teams will be published in an open access environment. Through the shared work at the workshop and beyond, the participants will form lasting, collaborative relationships with their peers and the senior academia partners and industry participants including those from Amazon, Google and Microsoft.

• The workshop Grand Challenge topics will be selected from the highest scoring white paper submissions. During the workshop, attendees will form teams to work on the Grand Challenges.

• The authors of the very highest scoring white papers will be invited to give lightning talks of a few slides during the plenary session to describe their challenges or methods.

http://depts.washington.edu/dswkshp/

Page 3: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

3

Purpose

• I think we will do a meetup (or series of meetups like this) to support the NSF Data Science / Big Data Community and use the RDA Climate Change Data Challenge, climate.data.gov, and the U.S. Climate Resilience Toolkit data sets, I am preparing, to jump start our meetup members and other data science meetup participants.• Data Sets:

• RDA Climate Data Challenge: Only 17 of 64 could be used so far.• NTRD: 36 Shape (problem reading largest file).• Climate.Data.gov: 16 of 38 used so far.• U.S. Climate Resilience Toolkit: 63 data sets used in 80 Case Studies. Using Climate Data,

Satellite Imagery, and Local Knowledge to Prevent Famine uses 6 data sets (the maximum for any case study), so this would be the best one for integrating multiple data sets.

• National Climate Assessment: 2377 data sets, in addition to the 36 data tables I extracted from the report itself.

See: Spreadsheet

Page 4: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

4

Data Science for RDA Climate Change Data Challenge and Meetup• Goal 1: Digital Catalog - Done• Goal 2: Data Audit - Done• Goal 3: Individual Data Sets in

Spotfire – Done (RDA and NTRD)• Goal 4: Integration/Applications

– IN PROCESS (See right box)• Goal 5: Meetups/Data Science

Publication/MOOCs – IN PROCESS (See right Box)

• An additional goal, is to integrate the climate.data.gov and the U.S. Climate Resilience Toolkit into one “seamless” system, which we will call "a Data Science Data Publication".

• This will be my challenge submission and experimentation day demo for the 6th RDA Plenary in Paris on September 23-25, and support the NSF Meetup of Data Science Meetups on November 6-7 in Washington DC.

• Our Meetup of Data Science Meetups in preparation for the November 6-7th Meetup is tentatively planned for September 28th.

Page 5: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

5

NSF Big Data Hubs and Data Science Meetups• Initial Schedule:

• Data Science Call I: June 12, 2015• Data Science Call II: June 18, 2015• In-person Meetup Workshop: Washington, DC

November 6-7, 2015

• Big Data Regional Innovation Hubs (Accelerating the Big Data Innovation Ecosystem):• Midwest• Northeast• South• West

• Initial Ideas:• Data Science YouTube channel or Podcast• Angie's List for Data Scientists• Gathering groups working around the same domain.

I.e. connecting people doing different climate global challenges

• Groups Participating:• Bayes Impact San Francisco, CA Non-profit• Big Data Utah Salt Lake City, Utah Collaboration• Boston Predictive Analytics Boston, MA Meetup• Data Community DC Washington, DC Meetup• Data Science ATL Atlanta, GA Meetup• Data Science for Social Good Chicago, IL Fellowship

Program• DataKind New York, NY Nonprofit• District Data Lab Washington, DC Meetup• NYC Data Science New York, NY Meetup• SF Data Mining San Francisco, CA Meetup• Data Science Chicago Chicago, IL Meetup• Data Science MD Baltimore, MD Meetup• U.S. Ignite Nation-wide Communities Non-profit• Analytics Club Boston, MA Meetup• Data Science for Social Good Atlanta Atlanta, GA

Fellowship Programhttps://bdhub.info/ http://data-science.meetup.com/

Page 6: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

6

http://data-science.meetup.com/

My Note: Start to Join and Invite Them to the September 28th Meetup.

Page 7: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

7

Data Mining Data.gov and U.S. Climate Resilience Toolkit

• Themes • Data• Resources• Challenges• FAQ• Contact Climate• Other?

• Get Started• Taking Action• Tools• Topics• Expertise• About• Contact• Funding Opportunities• FAQhttp://www.data.gov/climate/ http://toolkit.climate.gov/

Page 8: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

8

http://www.data.gov/climate/

Page 9: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

9

Spreadsheet

My Note: Requested and received spreadsheet of 547 data sets and all 100,000+ data sets so I can integrate the catalog and the actual data sets.

Page 10: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

10

Spreadsheet

My Note: See imported and filtered in Spotfire in next slide.

Page 11: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

11

My Note: First example in next Tab (in process)

Page 12: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

12

http://toolkit.climate.gov/

Page 13: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

13

http://toolkit.climate.gov/help/partners

Page 14: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

14

Expertise

My Note: From map popups to MindTouch to spreadsheet to Spotfire.

Page 16: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

16

http://toolkit.climate.gov/training-courses

Page 17: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

17

Spreadsheet

My Note: These can be filtered in spreadsheet and Spotfire.

Page 18: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

18

My Note: Filter by Type of Training and/or Difficulty Scale.

Page 19: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

19

Climate Explorer—Visualizing Climate Data in Maps and Graphs

• Climate Explorer is a research application built to support the U.S. Climate Resilience Toolkit. The tool offers interactive visualizations for exploring maps and data related to the toolkit's Taking Action case studies.• Map layers in the tool represent geographic information available through

climate.data.gov. Each layer's source and metadata can be accessed through its information icon. Climate Explorer graphs display 1981-2010 U.S. Climate Normals for temperature and precipitation, overlain with daily observations from the Global Historical Climatology Network-Daily (GHCN-D) database. Please note that GHCN-D data have been checked for obvious inaccuracies, but they have not been adjusted to account for the influences of historical changes in instrumentation and observing practices. GHCN-D data are useful for comparing weather and climate, but for long-term climate change analyses, we recommend the National Climatic Data Center's Climate at a Glance.

Climate Explorer

Page 20: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

20

http://toolkit.climate.gov/climate-explorer/

My Note: This is like Spotfire with the NTRD I just did! I can reproduce these in Spotfire.

Page 21: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

21

http://toolkit.climate.gov/crt-search

Page 22: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

22

http://toolkit.climate.gov/crt-search?query=*&resource=18

My Note: Find the words “datasets” but not the data!Spreadsheets and Spotfire show you the data (e.g. CSV)!

Page 23: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

23

Page 24: Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data

24

Conclusions and Recommendations

• In support of the NSF Data Science / Big Data Community and the Research Data Alliance (RDA), Semantic Community has prepared four multiple data set data sets from the RDA Climate Change Data Challenge, U.S. National Transportation Atlas Database (NTRD), Climate.data.gov, and the U.S. Climate Resilience Toolkit, to jump start the Federal Big Data Working Group Meetup, and other data science meetup participants, for our September 28th Meetup of Data Science Meetups, to prepare for the NSF Meetup of Data Science Meetups, November 6-7, 2015.• All of the information is a Data FAIRPort (Free, Accessible, Interoperable,

and Reusable) in a Data Science Commons or Hub as a community service. Suggestions and feedback are welcomed.