data science for nist big data framework dr. brand niemann director and senior data scientist/data...
TRANSCRIPT
1
Data Science for NIST Big Data Framework
Dr. Brand NiemannDirector and Senior Data Scientist/Data Journalist
Semantic Communityhttp://semanticommunity.info/
http://www.meetup.com/Federal-Big-Data-Working-Group/http://www.meetup.com/Virginia-Big-Data-Meetup
http://www.meetup.com/Northern-Virginia-Semantic-Web-Meetup/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup
May 21, 2015
2
Introduction• NIST is seeking feedback on the Version 1 draft of the NIST Big Data
Interoperability Framework. Once public comments are received, compiled, and addressed by the NBD-PWG, and reviewed and approved by NIST internal editorial board, Version 1 of Volume 1 through Volume 7 will be published as final. Three versions are planned, with Versions 2 and 3 building on the first.
• My Comment: I complemented the NIST Team on excellent work over a long period of time and told them that I asked the 700+ members of our Federal Big Data Working Group Meetup to review the DRAFT documents and provide comments. I said I think this will take us longer than the May 21st deadline and we plan to do a Meetup on this in July. We are looking especially for the 6 Uses Cases that have data sets according to a recent email we saw from the NIST Big Data Workgroup participants.
3
Federal Big Data Working Group Meetup
http://www.meetup.com/Federal-Big-Data-Working-Group/events/222458479/
4
NIST Requests Comments on NIST Big Data interoperability Framework
http://bigdatawg.nist.gov/V1_output_docs.php
5
NIST Big Data interoperability Framework: Seven Volumes
• The NIST Big Data Interoperability Framework consists of seven volumes, each of which addresses a specific key topic, resulting from the work of the NBD-PWG. The seven volumes are as follows:– Volume 1, Definitions– Volume 2, Taxonomies– Volume 3, Use Cases and General Requirements– Volume 4, Security and Privacy– Volume 5, Architectures White Paper Survey– Volume 6, Reference Architecture– Volume 7, Standards Roadmap
• My Comment: Volumes 1 and 2 support the Knowledge Base, Volume 3 Supports the Data Science Data Publication, and Volumes 1-7 all support the Massive Open Online Course (MOOC).
6
NIST Big Data interoperability Framework: Three Stages
• The NIST Big Data Interoperability Framework will be released in three versions, which correspond to the three stages of the NBD-PWG work. The three stages aim to achieve the following:– Stage 1: Identify the high-level Big Data reference architecture key
components, which are technology, infrastructure, and vendor agnostic.
– Stage 2: Define general interfaces between the NIST Big Data Reference Architecture (NBDRA) components.
– Stage 3: Validate the NBDRA by building Big Data general applications through the general interfaces.
• My Comment: The Federal Big Data Working Group Meetup is creating an interface (Stage 2) and applications (Stage 3) by doing Data Science for NIST Big Data Framework!
7
Purpose
• While I have started a Comment Template for detailed comments, my focus is to use the excellent content for the Federal Big Data Working Group Meetup as follows:– Build a Knowledge Base (especially using
the Definitions and Taxonomies).– Build a Data Science Data Publication (especially using Use
Case & Requirements).– Build a MOOC (Massive Open Online Course) (using the
above and Security and Privacy, Architecture White Paper Survey, Reference Architecture, and Standards Roadmap).
8
Data Mining Standard Process
• Data Science for NIST Big Data Framework will be done by Data Mining following the six step standard:– CRISP-DM Step 1: Business (Organizational)
Understanding– CRISP-DM Step 2: Data Understanding– CRISP-DM Step 3: Data Preparation– CRISP-DM Step 4: Modeling– CRISP-DM Step 5: Evaluation– CRISP-DM Step 6: Deployment
Data Mining
9
Method and Results
• The method and results are documented in the Slides and Spotfire Dashboard. The Knowledge Base Index and selected tables will be documented in the NIST Big Data Spreadsheet.
• The Meetup date and agenda will be announced soon.
10
Data Mining Standard Results
• CRISP-DM Step 1: Business (Organizational) Understanding:– Knowledge Base: 7 Word Documents to MindTouch
• CRISP-DM Step 2: Data Understanding:– MindTouch Index to Spreadsheet
• CRISP-DM Step 3: Data Preparation:– Report Tables and Use Case Data Sets
• CRISP-DM Step 4: Modeling:– Spotfire Exploratory Data Analysis
• CRISP-DM Step 5: Evaluation:– Data Science Answer to Four Questions
• CRISP-DM Step 6: Deployment:– Data Science Data Publication and MOOC
11
Data Science for NIST Big Data Framework: MindTouch Knowledge Base Index
Data Science for NIST Big Data Framework NIST Big Data Framework
12
Data Science for NIST Big Data Framework: MindTouch Knowledge Base Find
Data Science for NIST Big Data Framework NIST Big Data Framework
Google Chrome Find: Data sets
13
Data Science for NIST Big Data Framework: Spreadsheet Knowledge Base: Find
NIST Big Data Spreadsheet
14
Data Science for NIST Big Data Framework: Spreadsheet Knowledge Base: Other
NIST Big Data Spreadsheet.
Report Tables and Use Case Data Sets
15
Data Science for NIST Big Data Framework: Spotfire Cover Page
Web Player
16
Data Science for NIST Big Data Framework: Spotfire Tab 1
Web Player
17
Data Science for NIST Big Data Framework: Spotfire Tab 2
Web Player
18
Data Science for NIST Big Data Framework: Spotfire Tab 3
Web Player
19
Data Science for NIST Big Data Framework: Spotfire Tab 4
Web Player
20
Conclusions and Recommendations
• The Version 1 DRAFT NIST Big Data Interoperability Framework (7 volumes) has been reviewed for detailed comments and repurposed by the Federal Big Data Working Group Meetup.
• A Knowledge Base, Data Science Data Publication, and Massive Open Online Course (MOOC) have been created from the excellent content using the CRISP Data Mining Standard.
• The methods and results are documented to aid the NIST Big Data Work Group and Federal Big Data Working Group Meetup in future activities.
• The Federal Big Data Working Group Meetup is creating an interface (Stage 2) and applications (Stage 3) by doing Data Science for NIST Big Data Framework!
• The Federal Big Data Working Group Meetup is focused on Use Cases with Government Data and Workforce Education of Data Scientists and Chief Data Officers.