dr. brand niemann director and senior data scientist semantic community

14
Challenges and Solutions for Big Data in the Public Sector: DGI’s 3 rd Annual Government Big Data Conference October 9, Ronald Reagan Building, Washington, DC Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Work ing_Group_Meetup October 9, 2014 1

Upload: mandar

Post on 05-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Challenges and Solutions for Big Data in the Public Sector: DGI’s 3 rd Annual Government Big Data Conference October 9, Ronald Reagan Building, Washington, DC. Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

1

Challenges and Solutions for Big Data in the Public Sector:

DGI’s 3rd Annual Government Big Data ConferenceOctober 9, Ronald Reagan Building, Washington, DC

Dr. Brand NiemannDirector and Senior Data Scientist

Semantic Communityhttp://semanticommunity.info/

http://www.meetup.com/Federal-Big-Data-Working-Group/http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup

October 9, 2014

Page 2: Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

2

Overview• Related Presentations:

– COM.BigData Conference (Keynote and Panel), August 4-6, Washington, DC, and

– IEEE 2014 Big Data Conference (Paper and NIST Big Data Workshop), October 27-30, Washington, DC.

• Moderator:– Dr. Brand Niemann, Director and Senior Data Scientist, Semantic

Community, and Co-organizer, Federal Big Data Working Group Meetup• Panelists:

– Dr. Tom Rindflesch, Information Research Specialist at Cognitive Science Branch, National Institutes for Health (NIH): Semantic Medline (Ontology, Cray Graph Appliance, and Relational Databases)

– Dr. Kirk Borne, Professor of Astrophysics and Computational Science, George Mason University: NSF Big Data Project of the Decade: LSST

http://www.digitalgovernment.com/Events/Conferences/Government-Big-Data-Conference--Expo.shtml

Page 3: Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

3

Mission Statement• Federal: Supports the Federal Big Data Initiative, but not

endorsed by the Federal Government or its Agencies;• Big Data: Supports the Federal Digital Government

Strategy which is "treating all content as data", so big data = all your content;

• Working Group: Data Science Teams composed of Federal Government and Non-Federal Government experts producing big data products (How was the data collected, Where is it stored, What are the results, and Does the data story persuade?); and

• Meetup: The world's largest network of local groups to revitalize local community and help people around the world self-organize like MOOCs (Massive Open On-line Classes) being considered by the White House to reduce the cost of higher education.Co-organizers: Brand Niemann and Katherine Goodier

Page 4: Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

4

Decisions = Science + Art:The Challenger Accident

• Richard Feynman's famous conclusion to his report on the shuttle Challenger accident, which arose again in the Columbia accident, is "For a successful technology, reality must take precedence over public relations, for Nature cannot be fooled."

• -- Edward Tufte

The Challenger: An Information Disaster

Note: These charts appear in Edward Tufte’s book, Visual Explanations: Images and Quantities, Evidence and Narrative.

Page 5: Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

5

Fourth Paradigm and Fourth Question

• The Fourth Paradigm of Science (1):– First Paradigm. Observation, descriptions of natural phenomena, and

experimentation.– Second Paradigm. Theoretical science such as Newton’s laws of motion

and Maxwell’s equations.– Third Paradigm. Simulation and modelling, such as in astronomy.– Fourth Paradigm. Data-intensive science that exploits the large volumes of

data in new ways for scientific exploration, such as the International Virtual Observatory Alliance in astronomy.

• The Fourth Question of Big Data for Science (2):– How was the data collected?– Where is the data stored?– What are the data results?– Does the data story persuade?(1) Bell G, Hey, T., & Szalay, A. (2009) Beyond the data deluge, Science 323, 6 March 2009, pp. 1297-1298.

(2) de Waard, Anita, (2014) About Stories, that Persuade With Data, Federal Big Data Working Group Meetup, 20 May,, 41 slides.

President Obama Discovers Big Data in 2009

Page 6: Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

6

Symposium on Predictive Analytics For Defense and Government, November 18-19, Washington, DC

Big Data Analytics Data Science

All Content - Structured and Unstructured

Results and Decisions Mining and Discovery

Data EcosystemData Set 1...Data Set N

PerformanceContentNetworkData

DescriptivePrescriptive

Microscope and Telescope: Szalay (JHU)

Data FAIRport: Strawn (NITRD)Data Commons: Bourne (NIH)

Data Publications in a Data Browser: Semantic Community

Data Science Central

Page 7: Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

7

NIH Data CommonsDr. Phil Bourne (7/30/2014): Rules, Credit/Not Money, & More Offline

http://semanticommunity.info/Data_Science/Data_Science_for_RDA#Slide_50_The_Power_of_the_CommonsMy Note: Registries, Repositories, Clearinghouses, Portals, GitHubs, Data Commons, & Data FAIRports to MindTouch and Spotfire

Page 8: Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

8

Best Practices for Data: A Biologists View

BestPracticesForData_PhilipBourne.pdf

Page 9: Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

9

Examples of Data Publications in Data Browsers for Senior Government PeoplePerson Interest Data Publication in

Data BrowserExample

Dr. John Holdren Climate Change Data Publication in Data Browser

Climate Change Assessment

Dr. George Strawn Research Objects as Digital Objects

Data Publication in Data Browser

VIVO

Dr. Farnam Jahanian NSF Big Data Publications Data Publication in Data Browser

NSF Big Data

Dr. Phil Bourne Data Culture at NIH Data Publication in Data Browser

Bourne Research & NIH

Dan Kaufman and Paul Cohen

Big Mechanism for Cancer

Data Publication in Data Browser

DARPA Contract

Bryan Sivak Hack-a-Thon Data Publication in Data Browser

HHS IDEALAB

Todd Park Code-a-Palooza Data Publication in Data Browser

Health Datapalooza V

Brian Lee Health United States 2013

Data Publication in Data Browser

Centers for Disease Control & Prevention Report

The Honorable Kathleen Sebelius

Dynamic Case Management

Data Publication in Data Browser

HealthCare.gov Web Site

Data Source: Semantic Community NSF BIG DATA PROPOSAL

Page 11: Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

11

Data Science for JHU/NSF DIBBs Project: Knowledge Bases

Data Science for JHU DIBBs Project SDSS.xlsx

Data Science Data Publication:Table of Contents is An Ontology!

Data Science Publication Index:Index is Linked Open Data!

Page 12: Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

12

Data Science for JHU/NSF DIBBs Project: Analytics & Visualizations

Spotfire Content, Network, and Data Analytics:Spotfire is a Microscope and a Telescope for 77 TB!

Web Player

Data Science Data Publications in a Data Browser

Page 14: Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

14

October 6th Meetup Agenda• 6:30 p.m. Welcome and Introduction – Report on Recent Meeting

with Dr. Taha Kass-Hout, FDA’s First Chief Health Informatics Officer (CHIO) and FDA Data Science Data Publication Tutorial:– Interest in our Meetup on OpenFDA, July 7th – Keynote at AFCEA Bethesda’s Health IT Day, December 2nd

• 7:00 p.m. Brooke Aker, Big Data Lens, Predictive Analytics for OpenFDA and Other Examples

• 7:45 p.m. Brief Member Introductions and Inter-American Development Bank Open Data Portal Examples

• 8:30 p.m. Open Discussion • 8:45 p.m. Networking • 9:00 p.m. Depart