data science for the noaa chief data officer dr. brand niemann director and senior data...

20
Data Science for the NOAA Chief Data Officer Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Work ing_Group_Meetup October 24, 2014 1

Upload: samson-nash

Post on 03-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

1

Data Science for the NOAA Chief Data Officer

Dr. Brand NiemannDirector and Senior Data Scientist/Data Journalist

Semantic Communityhttp://semanticommunity.info/

http://www.meetup.com/Federal-Big-Data-Working-Group/http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup

October 24, 2014

2

Data Science for NOAA Big Data• Build Knowledge Base:

– NOAA RFI and Big Industry Day Content• Original and New RFI• Emails and Press

– Government Data Hubs:• Complete catalogue of publicly-available Commerce data sets (DoC 22,365 and NOAA

3,560)• Application Programming Interfaces (APIs) (7 agencies and NOAA has 11)

– Discovered 55,602 Data Sets at data.NOAA.gov!• Prototype under active development. Availability and completeness are not guaranteed.• Also data catalog with 532 data sets at NOAA Climate.gov.

• Build Spotfire Knowledge Base and Data Ecosystem:– Knowledge Base Indices and Data Set Examples:

• NOAA Patents and NWS Current Warnings• Environmental Research Division's Data Access Program RESTFul Web Services

3

NOAA Big Data Industry Day• My Comment: Maybe this is more complicated than it needs to be:

– Just appoint a Chief Data Officer like I advised a senior Commerce official to do recently and then there was the announcement soon after This was also my recommendation to Congress in 2012; and/or

– Just form a partnership like we did at EPA when I was their data architect/data standards person: http://www.exchangenetwork.net/about/why-we-exist/

• My Questions: Focused on the role of data science and data scientists in the NOAA Big Data Program as follows:– Looking at your data assets, I see about 22,00 data sets at Data.gov, about 55,000 in

your pilot data catalog, and 3 data hubs at the Open Data Policy GitHub site, which raises four questions:• Which is most authoritative?• Do you want help with building more data hubs?• Who will do the work to make the many different data formats interoperable so data integration

is possible?• Who will produce the data science data publications called for by OSTP as the “new data

currency”?

• My Comment: The latter is what we are doing for the OSTP NITRD NSF RFI– See: Data Science for the National Big Data R & D Initiative

4

NOAA Site Map

http://www.noaa.gov/sitemap.html

Start with NOAA Site Map Looking for Publications and Data

5

NOAA Publication Sources

http://www.lib.noaa.gov/noaainfo/pubsource.html

Found NOAA Central LibraryBut Not Data Science Publications

6

NOAA Climate.gov: Site Map

http://www.climate.gov/sitemap

Found NOAA Climate.gov, and specifically Maps & Data, to be the best content for building a Data Science Publication for the NOAA Chief Data Officer & the Public.

7

NOAA Climate.gov: Maps & Data

http://www.climate.gov/maps-data

8

NOAA Climate.gov:Global Climate Dashboard

http://www.climate.gov/maps-data

9

NOAA Climate.gov:Integrated Map Application

http://gis.ncdc.noaa.gov/map/viewer/#app=clim&cfg=cdo&theme=indices&layers=01

10

NOAA Climate.gov: Data Catalog

http://www.climate.gov/datasearch/

Data Sets: 532Applications: 1 (See Next Slide)Data: 35Uncategorized: 491!

11

NOAA Climate.gov:Data Catalog Application

http://www.climate.gov/datasearch/?fq=%7B%21tag%3Dmct%7Dmetadata_content_type%3Aapplication

Web Services: Home Page! (See Data and Publications)Data Access: See DashboardMetadata: XMLDetails: Metadata (See Next Slide)

13

NOAA Climate.gov:Great Lakes Water Level Dashboard

http://www.glerl.noaa.gov/data/dashboard/GLWLD.html

Download Data(See Next Slide)

14

NOAA Climate.gov:Great Lakes Water Level Dashboard Data

http://www.glerl.noaa.gov/data/dashboard/data/

15

NOAA Climate.gov:Great Lakes Water Level Dashboard Data Ecosystem

• GLDData:– Data:

• hydroIO (11 Folders):– basinWideData (9 spreadsheets), clouds (10 spreadsheets), evap (10 spreadsheets and one

folder with 2 text files), flows (20 spreadsheets), nbs (40 spreadsheets and one folder with 5 spreadsheets), PME (10 spreadsheets and one folder with 20 spreadsheets, and a ZIP file with 20 spreadsheets), precip (60 spreadsheets and two folders with 38 and 5 spreadsheets); runoff (20 spreadsheets); sourceSpreadsheets (18 spreadsheets); temps (103 spreadsheets and a folder with 20 spreadsheets and a folder with 20 spreadsheets and 19 text files; and wind( 10 spreadsheets)

• Ice (7 spreadsheets)• Levels (5 folders)• longTermForecasts (51 spreadsheets)• monthlyForecasts (14 Spreadsheets, 4 text files, and 1 folder with 15 spreadsheets)• paleoRecon (4 spreadsheets)

– Info:• 17 Google Chrome Files

http://www.glerl.noaa.gov/data/dashboard/GLDData.zip

19

Data Science Data Publications for the NOAA Chief Data Officer: Some Observations

• Finding NOAA Data and Scientific Publications from the NOAA Web Site Map is not obvious;

• Most of NOAA's data assets are very large files to be downloaded and are embedded in application tools;

• NOAA Publication Sources that distribute data and publications are maintain by the Library who are probably not data scientists;

• The new NOAA Climate.gov site has the best content for Data Science Publications, but there were some difficulties in doing that;

• The NOAA Climate.gov Data Catalog with 532 data sets is not available as a data set itself and contains only one application (Great Lakes Dashboard) and 35 data sets, with the rest being uncategorized;

• The Great Lakes Dashboard complete data set can be downloaded as a ZIP file, but contains about 500 individual files that need to be inventoried and matched to their metadata to be used;

• The NOAA Climate.gov Dashboard is difficult to understand and use with 15 separate indicators of climate change and variability whose data is mostly in text files, except for four that are in spreadsheets.

20

Some Conclusions and Next Steps• After the recent NOAA Big Data RFI Industry Day, I provided my suggestions

to David McClure, Lead Analyst, Open Government Data Services, Office of the Chief Information Officer, NOAA.

• That experience led me to think, what would I do if I were the NOAA Chief Data Officer? What questions would I ask and want answered?:– What are NOAA's data assets?;– How can NOAA content be made big data by treating all of its content as data?;– How can data science help NOAA's Big Data effort and the Chief Data Officer?;– What has/will the NOAA RFIs accomplished that I could use in my work going

forward?• I have answered the four questions by building Data Science Data

Publications for the NOAA Chief Data Officer to help him when he is appointed

• There is still more that I can and will do to help support a NOAA Chief Data Officer.