data science publication for nsf polar cyberinfrastructure dr. brand niemann director and senior...

22
Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Work ing_Group_Meetup November 4, 2014 1

Upload: marilyn-brown

Post on 27-Dec-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

1

Data Science Publication for NSF Polar Cyberinfrastructure

Dr. Brand NiemannDirector and Senior Data Scientist/Data Journalist

Semantic Communityhttp://semanticommunity.info/

http://www.meetup.com/Federal-Big-Data-Working-Group/http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup

November 4, 2014

2

Preface

• Some prep work is already underway (if you scour the Open Science Codefest site you will find some) to prepare some datasets of relevance to the Polar community. We will provide some of this prepared data to interested parties ahead of the workshop in the next few weeks in case folks want to start hacking early. We will tweet under the hash tag: #nsfpolardatavis

Source: Chris Mattmann

3

Overview• Build the knowledge base (in MindTouch) and spreadsheet (in Excel) first,

which then makes the Spotfire (data browser) application easier to “storify” the results.

• Follow the Cross-Industry Data Mining Standard by:– 1 Business Understanding (of the Hackathon),– 2 Data Understanding (by mining the Sessions),– 3 Data Preparation (by screen scraping and downloading),– 4 Modeling (enough data for statistical significance?),– 5 Evaluation (How collected?, Where stored?, What results?, and Believe them?;

and– 6 Deployment (Story and Demo).

• The documentation will be in the form of the Data Science Publication for NSF Polar Cyberinfrastructure.

• My goal is to see if I can integrate and federate these multiple data sources.

4

Data Science for Business:Data Mining Process

Source: Data Science for Business: Chapter 2, 2014

5

Data Science for NSF Polar Cyberinfrastructure: Knowledge Base

Data Science for NSF Polar Cyberinfrastructure

6

Possible Data Sets

• Scour the Open Science Codefest:– https://github.com/NCEAS/open-science-codefest/issues/26

• The Polar Data Catalogue (YES):– https://polardata.ca/

• BCO-DMO (YES):– http://www.bco-dmo.org/

• Polar Hub (NO):– http://polar.geodacenter.org/polarhub/

• The AMRC at University of Wisconsin-Madison (YES):– ftp://amrc.ssec.wisc.edu/pub/requests/DVPC/

7

Open Science Codefest:NASA/NSF/NSIDC Data Sets

• NASA Antarctic Master Directory– A master directory for arctic data sets

• http://gcmd.gsfc.nasa.gov/KeywordSearch/Keywords.do?Portal=amd&KeywordPath=Parameters|CRYOSPHERE&MetadataType=0&lbnode=mdlb2

• NSF ACADIS Gateway– NSF data repository for arctic/polar data

• https://www.aoncadis.org/home.htm

• NSIDC Arctic Data Explorer– National Snow and Ice Data Center repository

• http://nsidc.org/acadis/search/ Source: Link to presentation given at Open Science Codefest

8

Polar Data Catalogue: Home Page

https://polardata.ca/

9

Polar Data Catalogue: Collections

https://polardata.ca/pdcsearch/

10

Polar Data Catalogue: Search

https://polardata.ca/pdcsearch/

11

Polar Data Catalogue:Canadian Lake Ice Database

https://polardata.ca/pdcsearch/PDC_Metadata_Data_Download.ccin?action=downloadPDCData&ccin_ref_number=1821&fileLoc=/pdc/ccin/1821/lakeice/CCIN1821_20030925_CID_BDCG_Ver2003_1.zip

11 MB MDB

12

Polar Data Catalogue:Sea Ice Thickness in Southern Beaufort Sea

https://polardata.ca/pdcsearch/PDC_Metadata_Data_Download.ccin?action=downloadPDCData&ccin_ref_number=11470&fileLoc=/pdc/brea/11470/CCIN11470_20130827_GIS_DATA_Sea_Ice_Thickness_2012.zip

Downloaded 5 Files4 Text and 1 ZIP (Shape) 1.3 MB

13

Polar Data Catalogue: Spreadsheet

http://semanticommunity.info/@api/deki/files/31201/NSFPolarCI.xlsx?origin=mt-web

Canadian Lake Ice DatabaseSea Ice Thickness in Southern Beaufort Sea

14

BCO-DMO

http://www.bco-dmo.org/

Tutorial PDF

15

Data Access Tutorial 2014 OCB PI Summer Workshop

• How to Submit Data• Data access: TEXT-BASED SEARCH scenario 1:

– You have a general idea of what you are looking for.• Data access: MAP BROWSE scenario 2:

– You are interested in data from a particular geographic region.• Data access: MAP KEYWORD SEARCH scenario 3:

– You are interested in data of a particular type from a particular geographic area.• Data access: MAP SEMANTIC SEARCH scenario 4:

– You have an idea what you are looking for, but you do not know the Program, Project, or Deployment name.

• Glossary of Terms• Acknowledgments• Follow BCO-DMO

http://www.bco-dmo.org/files/bcodmo/OCB-Tutorial.pdf

My Question: Could Spotfire do all of this?

16

BCO-DMO Datasets

http://www.bco-dmo.org/datasets

17

BCO-DMO MapServer Geospatial Interface

http://mapservice.bco-dmo.org/mapserver/maps-ol/index.php

18

Polar Hub:A Global Hub for Polar Data Discovery

http://polar.geodacenter.org/polarhub/

My Question: Where is the Data?

19

The AMRC at University of Wisconsin-Madison

Name Size Date Modified[parent directory]Ant_IR_area/ 9/30/14, 8:18:00 PMAnt_IR_netCDF/ 9/30/14, 8:37:00 PMAWS_dat_MAY_2014/ Text Files: Spotfire? 9/30/14, 3:23:00 PMAWS_q10_MAY_2014/ Text Files: Spotfire? 9/30/14, 3:24:00 PMAWS_q1h_MAY_2014/ Text Files: Spotfire? 9/30/14, 3:25:00 PMAWS_q3h_MAY_2014/ Text Files: Spotfire? 9/30/14, 3:27:00 PMAWS_r_MAY_2014/ Text Files: Spotfire? 9/30/14, 3:22:00 PMreadme.txt 4.6 kB 10/6/14, 6:37:00 PM

Index of /pub/requests/DVPC/

ftp://amrc.ssec.wisc.edu/pub/requests/DVPC/

20

Data Science for NSF Polar Cyberinfrastructure: Spreadsheet Knowledge Base

http://semanticommunity.info/@api/deki/files/31201/NSFPolarCI.xlsx?origin=mt-web