data science for nist big data framework dr. brand niemann director and senior data scientist/data...

20
Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://www.meetup.com/Virginia-Big-Data-Meetup http://www.meetup.com/Northern-Virginia-Semantic-Web-Meetup/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working _Group_Meetup May 21, 2015 1

Upload: regina-kathryn-hunt

Post on 17-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

1

Data Science for NIST Big Data Framework

Dr. Brand NiemannDirector and Senior Data Scientist/Data Journalist

Semantic Communityhttp://semanticommunity.info/

http://www.meetup.com/Federal-Big-Data-Working-Group/http://www.meetup.com/Virginia-Big-Data-Meetup

http://www.meetup.com/Northern-Virginia-Semantic-Web-Meetup/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup

May 21, 2015

Page 2: Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

2

Introduction• NIST is seeking feedback on the Version 1 draft of the NIST Big Data

Interoperability Framework. Once public comments are received, compiled, and addressed by the NBD-PWG, and reviewed and approved by NIST internal editorial board, Version 1 of Volume 1 through Volume 7 will be published as final. Three versions are planned, with Versions 2 and 3 building on the first.

• My Comment: I complemented the NIST Team on excellent work over a long period of time and told them that I asked the 700+ members of our Federal Big Data Working Group Meetup to review the DRAFT documents and provide comments. I said I think this will take us longer than the May 21st deadline and we plan to do a Meetup on this in July. We are looking especially for the 6 Uses Cases that have data sets according to a recent email we saw from the NIST Big Data Workgroup participants.

Page 3: Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

3

Federal Big Data Working Group Meetup

http://www.meetup.com/Federal-Big-Data-Working-Group/events/222458479/

Page 4: Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

4

NIST Requests Comments on NIST Big Data interoperability Framework

http://bigdatawg.nist.gov/V1_output_docs.php

Page 5: Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

5

NIST Big Data interoperability Framework: Seven Volumes

• The NIST Big Data Interoperability Framework consists of seven volumes, each of which addresses a specific key topic, resulting from the work of the NBD-PWG. The seven volumes are as follows:– Volume 1, Definitions– Volume 2, Taxonomies– Volume 3, Use Cases and General Requirements– Volume 4, Security and Privacy– Volume 5, Architectures White Paper Survey– Volume 6, Reference Architecture– Volume 7, Standards Roadmap

• My Comment: Volumes 1 and 2 support the Knowledge Base, Volume 3 Supports the Data Science Data Publication, and Volumes 1-7 all support the Massive Open Online Course (MOOC).

Page 6: Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

6

NIST Big Data interoperability Framework: Three Stages

• The NIST Big Data Interoperability Framework will be released in three versions, which correspond to the three stages of the NBD-PWG work. The three stages aim to achieve the following:– Stage 1: Identify the high-level Big Data reference architecture key

components, which are technology, infrastructure, and vendor agnostic.

– Stage 2: Define general interfaces between the NIST Big Data Reference Architecture (NBDRA) components.

– Stage 3: Validate the NBDRA by building Big Data general applications through the general interfaces.

• My Comment: The Federal Big Data Working Group Meetup is creating an interface (Stage 2) and applications (Stage 3) by doing Data Science for NIST Big Data Framework!

Page 7: Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

7

Purpose

• While I have started a Comment Template for detailed comments, my focus is to use the excellent content for the Federal Big Data Working Group Meetup as follows:– Build a Knowledge Base (especially using

the Definitions and Taxonomies).– Build a Data Science Data Publication (especially using Use

Case & Requirements).– Build a MOOC (Massive Open Online Course) (using the

above and Security and Privacy, Architecture White Paper Survey, Reference Architecture, and Standards Roadmap).

Page 8: Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

8

Data Mining Standard Process

• Data Science for NIST Big Data Framework will be done by Data Mining following the six step standard:– CRISP-DM Step 1: Business (Organizational)

Understanding– CRISP-DM Step 2: Data Understanding– CRISP-DM Step 3: Data Preparation– CRISP-DM Step 4: Modeling– CRISP-DM Step 5: Evaluation– CRISP-DM Step 6: Deployment

Data Mining

Page 9: Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

9

Method and Results

• The method and results are documented in the Slides and Spotfire Dashboard. The Knowledge Base Index and selected tables will be documented in the NIST Big Data Spreadsheet.

• The Meetup date and agenda will be announced soon.

Page 10: Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

10

Data Mining Standard Results

• CRISP-DM Step 1: Business (Organizational) Understanding:– Knowledge Base: 7 Word Documents to MindTouch

• CRISP-DM Step 2: Data Understanding:– MindTouch Index to Spreadsheet

• CRISP-DM Step 3: Data Preparation:– Report Tables and Use Case Data Sets

• CRISP-DM Step 4: Modeling:– Spotfire Exploratory Data Analysis

• CRISP-DM Step 5: Evaluation:– Data Science Answer to Four Questions

• CRISP-DM Step 6: Deployment:– Data Science Data Publication and MOOC

Page 11: Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

11

Data Science for NIST Big Data Framework: MindTouch Knowledge Base Index

Data Science for NIST Big Data Framework NIST Big Data Framework

Page 13: Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

13

Data Science for NIST Big Data Framework: Spreadsheet Knowledge Base: Find

NIST Big Data Spreadsheet

Page 14: Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

14

Data Science for NIST Big Data Framework: Spreadsheet Knowledge Base: Other

NIST Big Data Spreadsheet.

Report Tables and Use Case Data Sets

Page 20: Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

20

Conclusions and Recommendations

• The Version 1 DRAFT NIST Big Data Interoperability Framework (7 volumes) has been reviewed for detailed comments and repurposed by the Federal Big Data Working Group Meetup.

• A Knowledge Base, Data Science Data Publication, and Massive Open Online Course (MOOC) have been created from the excellent content using the CRISP Data Mining Standard.

• The methods and results are documented to aid the NIST Big Data Work Group and Federal Big Data Working Group Meetup in future activities.

• The Federal Big Data Working Group Meetup is creating an interface (Stage 2) and applications (Stage 3) by doing Data Science for NIST Big Data Framework!

• The Federal Big Data Working Group Meetup is focused on Use Cases with Government Data and Workforce Education of Data Scientists and Chief Data Officers.