federal big data working group meetup dr. brand niemann director and senior data scientist semantic...

14
Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Work ing_Group_Meetup September 8, 2014 1

Upload: kenneth-rose

Post on 17-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

1

Federal Big Data Working Group Meetup

Dr. Brand NiemannDirector and Senior Data Scientist

Semantic Communityhttp://semanticommunity.info/

http://www.meetup.com/Federal-Big-Data-Working-Group/http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup

September 8, 2014

Page 2: Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

2

Mission Statement• Federal: Supports the Federal Big Data Initiative, but not

endorsed by the Federal Government or its Agencies;• Big Data: Supports the Federal Digital Government

Strategy which is "treating all content as data", so big data = all your content;

• Working Group: Data Science Teams composed of Federal Government and Non-Federal Government experts producing big data products (How was the data collected, Where is it stored, What are the results, and Does the data story persuade?); and

• Meetup: The world's largest network of local groups to revitalize local community and help people around the world self-organize like MOOCs (Massive Open On-line Classes) being considered by the White House to reduce the cost of higher education.Co-organizers: Brand Niemann and Katherine Goodier

Page 3: Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

3

What Are We Doing?• Leadership of the Semantic Data Science Team that produced Semantic Medline

running on the Yarc Data Graph Appliance.• Founding and co-organizing of the Federal Big Data Working Group Meetup.• A graduate class prepared for GMU entitled “Practical Data Science for Data

Scientists”.• Using the Cross Industry Standard Process for Data Mining (CRISP-DM; Shearer,

2000) to build a Data Science Knowledge Base• Mining of the Data Science and Digital Earth scientific journals for the CODATA

International Workshop on Big Data for International Scientific Programmes, June 8-9, in Beijing.

• Participation in the Data FAIRport (Findable, Accessible, Interoperable, and Reusable) with “Data Publication in Data Browsers”.

• Providing data stories that persuade and presentation materials for public education conferences like the COM.BigData Conference, August 4-6, in Washington, DC.

Page 4: Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

4

How Are we Doing it?• Federating Uses Cases: Data Science (Brand Niemann); Environmental

and Earth Science (Joan Aron); and Astronomy (Kirk Borne)• Federating Data Publications: Structured Scientific Content (Papers,

journals, books, reports, etc.); Data FAIRports (Findable, Accessible, Interoperable); and Reusable Data Stories That Persuade (Claims and Evidence)

• Federating Solutions & Technologies: Hand-Crafted by Individuals and Teams (Mary Galvin, STEM); Data Mining Standards and Products (Brand Niemann, Data Publications in Data Browsers); Machine Processing (Fredrik Salvesen, Semantic Data Publications on Yarc Data Graph Appliance); Reading and Reasoning (Katherine Goodier and Chuck Rehberg (Semantic Insights on Elsevier Content Text Mining); and Data Curation at Scale (Alan Wagner, Tamr on 1000s of Spreadsheets)

Page 5: Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

5

Data FAIRPort

http://datafairport.org/http://semanticommunity.info/Data_Science/Euretos_BRAIN

Final Report, Interview, andJoint Hackathons Started

Page 6: Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

6

NIH Data CommonsDr. Phil Bourne (7/30/2014): Rules, Credit/Not Money, & More Offline

http://semanticommunity.info/Data_Science/Data_Science_for_RDA#Slide_50_The_Power_of_the_CommonsMy Note: Registries, Repositories, Clearinghouses, Portals, GitHubs, Data Commons, & Data FAIRports to MindTouch and Spotfire

Page 7: Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

7

Fourth Paradigm and Fourth Question

• The Fourth Paradigm of Science (1):– First Paradigm. Observation, descriptions of natural phenomena, and

experimentation.– Second Paradigm. Theoretical science such as Newton’s laws of motion

and Maxwell’s equations.– Third Paradigm. Simulation and modelling, such as in astronomy.– Fourth Paradigm. Data-intensive science that exploits the large volumes of

data in new ways for scientific exploration, such as the International Virtual Observatory Alliance in astronomy.

• The Fourth Question of Big Data for Science (2):– How was the data collected?– Where is the data stored?– What are the data results?– Does the data story persuade?(1) Bell G, Hey, T., & Szalay, A. (2009) Beyond the data deluge, Science 323, 6 March 2009, pp. 1297-1298.

(2) de Waard, Anita, (2014) About Stories, that Persuade With Data, Federal Big Data Working Group Meetup, 20 May,, 41 slides.

Page 8: Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

8

August 11th SilverLine Metro More Ontology Experts (Baclawski, Guerino, Morosoff, & Goodier)

• How Was the Meetup?– If you could perfect your meeting A/V, it would be even more awesome!

Nevertheless, lots on enterprise ontology on the way for archivists everywhere, including LOC, SharePoint experts, etc.

– Another good meeting with more pieces of the Big Data puzzle being placed on the table and related to each other.

– Fuse was a bust, but the presentations were great! Thanks Brand and Katherine!

– Brand was a Star at the Comstar conference! • We Listen and Respond:

– We are involved in two research collaborations (one with Columbia Univ. and the other with Harvard) that are investigating the use of Semantic MEDLINE and literature-based discovery to elucidate statistical correlations found in the EHR. –Tom Rindfleschhttp://www.meetup.com/Federal-Big-Data-Working-Group/events/199040882/

Page 9: Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

9

Current Activities• Ongoing analytics with OpenFDA data for Dr. Taha Kass-Hout, FDA’s first Chief Health

Informatics Officer (CHIO), August 11:– Keynote at AFCEA Bethesda’s Health IT Day, December 2, Bethesda North Marriott Hotel and

Conference Center.• Followup Meeting with Bob Chadduck and Fouad Ramia on September 8 th Joint Meetup with

NSF and Professor Alex Szalay on Joint Meetup on The JHU DIBBs Project, August 19 th:– Big Data Science for Astronomy use case (ontology, graph computing and SciDB.org) with Professor

Borne.• Followup Meeting with Professor Jens Pohl and Peter Morosoff, August 20:

– Your pioneering work in trying to convince various branches of the Government to gain control of their data and exploit the information that can be extracted from the data is quite remarkable and certainly inspirational.

• The Federal Trade Commission, Big Data: A Tool for Inclusion or Exclusion?, September 15, Washington, DC:– This workshop is free and open to the public. Registration will begin at 8:00 a.m. A live webcast of

the workshop will also be available on the day of the event. The submission deadline for pre-workshop comments is August 15, 2014, but the comment period will be held open until October 15, 2014.

• 2014 IEEE International Conference on Big Data, October 27-30, Washington DC.:– Submitted Paper and NIST Workshop Proposals.

Page 10: Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

10

DGI’s Annual Big Data Conference, October 9, Washington, DC Reagan Building

• Session title: Challenges and Solutions for Big Data in the Public Sector

• Moderator: Dr. Brand Niemann, Director and Senior Data Scientist, Semantic Community, and Co-organizer, Federal Big Data Working Group Meetup

• Panelists:– Dr. Kirk Borne, Professor of Astrophysics and Computational

Science, George Mason University– Dr. Tom Rindflesch, Information Research Specialist at

Cognitive Science Branch, National Institutes for Health (NIH)http://www.digitalgovernment.com/Events/Conferences/Government-Big-Data-Conference--Expo.shtml

Page 11: Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

11

DRAFT Future Meetups

• October 6: Wolfram Language (Invited) and Michael Daconta, Build a Knowledge Base with the my (experimental) software EzKb

• November 3: Georgetown Massive Data Institute (Invited)

• December 1: NSF GEO/EarthCube and ICER (Integrative and Collaborative Education & Research)

Page 12: Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

12

FASTER• Faster Administration of Science and Technology Education and Research

(FASTER) Community of Practice (CoP):– FASTER’s goal is to enhance collaboration and accelerate agencies’ adoption of

advanced IT capabilities developed by Government-sponsored IT research. FASTER hosts Expedition and Emerging Technology workshops as well as monthly meetings with invited guest speakers to achieve this goal.

– NITRD created FASTER for Federal agency CIOs and/or their advanced technology specialists. FASTER, seeks to accelerate deployment of promising research technologies; share protocol information, standards, and best practices; and coordinate and disseminate technology assessment and testbed results. The Federal CIO Council under the leadership of the Office of Management and Budget (OMB) coordinates the use of IT systems. NITRD coordinates federally supported IT research under the leadership of OSTP (with OMB participation). FASTER, supported by the NITRD NCO, communicates with OMB and the Federal CIO Council concerning IT R&D matters that are of general interest to Federal agencies.

– FASTER is responding to the Open Government Directive by using the technologies of the Social Data Web (e.g., Linked Open Data and the Semantic Web).

Web Site

Page 13: Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

13

NSF Strategic Plan Knowledge Base

http://semanticommunity.info/Data_Science/NSF_Strategic_Plan

The FBDWG Meetup is doing the NSF Strategic Plan with Linked Open Data and the Semantic Web!

Page 14: Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

14

Agenda• Joint Meetup for NSF Data Scientists, Data Infrastructure, and Data

Publication• Agenda:

– 6:30 p.m. Welcome and Introduction, FASTER Co-chairs, Robert Chadduck (NSF), and Dr. Robert Bohn (NIST)

– 6:45 p.m Big Data and the NITRD: NSF Strategic Plan for Big Data and Open Research Data Publications, Dr. George Strawn, NITRD Director Slides

– 7:10 p.m. Brief Member Introductions– 7:15 p.m NSF Strategic Plan Knowledge Base, Dr. Brand Niemann, Federal Big Data

Working Group Meetup Slides– 7:45 p.m. Finding Funding for Research Topics on NSF Website(a simple example of

“finding”), 04-AUG-2014, Dr. Chuck Rehberg, CTO, Semantic Insights™ a Division of Trigent Software, Inc. Slides

– 8:30 p.m. Open Discussion– 8:45 p.m. Networking– 9:00 p.m. Depart