data science for myfamilysearch.org and familytree dna dr. brand niemann director and senior data...

12
Data Science for MyFamilySearch.org and FamilyTree DNA Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Virginia-Big-Data-Meetup / http://www.meetup.com/Federal-Big-Data-Working-Group/ http://www.meetup.com/Northern-Virginia-Semantic-Web-Meetup / http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Gro up_Meetup February 16, 2015 1

Upload: derek-cross

Post on 23-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Science for MyFamilySearch.org and FamilyTree DNA Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

1

Data Science for MyFamilySearch.org and FamilyTree DNA

Dr. Brand NiemannDirector and Senior Data Scientist/Data Journalist

Semantic Communityhttp://semanticommunity.info/

http://www.meetup.com/Virginia-Big-Data-Meetup/ http://www.meetup.com/Federal-Big-Data-Working-Group/

http://www.meetup.com/Northern-Virginia-Semantic-Web-Meetup/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup

February 16, 2015

Page 2: Data Science for MyFamilySearch.org and FamilyTree DNA Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

2

Introduction• Welcome:

– Federal Big Data Working Group Meetup– Virginia Big Data Meetup– Lotico Northern Virginia Semantic Web– Other?

• Data Science for the National Big Data R and D Initiative, February 2, 2015:– NITRD Big Data Chronology (2012-present) and NITRD-GU Big Data Workshop:

Dr. Moore agrees with IBM Watson that human curation is generally under appreciated and is the secret sauce in Big Data successes.

– Wendy Wigen’s Slides: Summary of RFIs and Dr. Sudarsan Rachuri, NIST, Smart Manufacturing Systems Design and Analysis.

– Calvin Andrus, CIA (Data Science: An Introduction) would "like to see more science in data science.“

– Jim Burke: Conference call-in and online slides were very useful. I appreciate the extra mile efforts, and great, informative conversations.

Page 3: Data Science for MyFamilySearch.org and FamilyTree DNA Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

3

Federal Big Data Working Group Meetup

• Federal: Supports the Federal Big Data Initiative, but not endorsed by the Federal Government or its Agencies;

• Big Data: Supports the Federal Digital Government Strategy which is "treating all content as data", so big data = all your content;

• Working Group: Data Science Teams composed of Federal Government and Non-Federal Government experts producing big data products (see Possible Team Presentations below); and

• Meetup: The world's largest network of local groups to revitalize local community and help people around the world self-organize like MOOCs (Massive Open On-line Courses) being considered by the White House

Page 4: Data Science for MyFamilySearch.org and FamilyTree DNA Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

4

The Profit and Data Enterprises• Marcus Lemonis (born

November 16, 1973) is a Lebanese-born American businessman, investor, television personality and philanthropist. He is currently the chairman and CEO of Camping World and Good Sam Enterprises, and the star of The Profit, a CNBC reality show about saving small businesses through People, Process, and Products.– http://

en.wikipedia.org/wiki/Marcus_Lemonis

• The Federal Big Data Working Group Meetup is also about helping government agencies develop:– People – Data Scientists– Process – Data Infrastructure– Products – Data Publications

• Some examples:– EPA– FDA– NOAA– HHS– Eastern Foundry

• And provide MOOCs for training and networking. (Massive Open Online Courses)

Page 5: Data Science for MyFamilySearch.org and FamilyTree DNA Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

5

Calendar• NITRD FASTER Bigdata at NSF, February 17, 2015:

– Dr. McHenry will discuss Brown Dog: A search engine for the other 99 percent (of data). Brown Dog seeks to develop a service that will make un-curated data accessible to scientists.

• Mission Source Consulting Launch Party, February 28:– Steven M. Hanmer, 12:00 PM to 4:00 PM, Eastern Foundry 2011 Crystal Drive, Suite

400, http://www.missionsourceconsulting.com • Data Science for Big Data Application and Analytics MOOC, March 2, 2015• 5th Annual Government Big Data Forum, March 12, 2015• USDA CIO and ACDO on Open Data Plan and Roundtable, March 16, 2015• Government Technology & Innovation Incubator for Big Data Analytics II, TBA.

Week of March 23, Need Sponsor• Data Science for HealthData.gov Developers & Family Caregivers. April 6, 2015• The Wharton DC Alumni Innovation Summit, April 28-29, 2015• Data Science for Natural Medicines and Epigenetics (in planning), May 4, 2015

Page 6: Data Science for MyFamilySearch.org and FamilyTree DNA Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

6

Agenda• • 6:30 p.m. Welcome and Introduction (New Tutorial and

Mentoring) Story, Slides for RootsTech 2015 Developer Challenge: Big Data from Everywhere for Families and Community Service, February 12–14, 2015 in Salt Lake City, Utah

• • 7:10 p.m. Brief Member Introductions• • 7:15 p.m. Data Science for MyFamilySearch.org: Story, Slides,

and Tutorial• • 7:45 p.m. National Geographic Genographic Project and Big Data,

Syed Ali, Data Scientist, Analytics Led Intelligence Slides. See FamilyTree DNA and National Geography Genographic DNA test for deep ancestry

• • 8:30 p.m. Open Discussion• • 8:45 p.m. Networking• • 9:00 p.m. Depart

http://www.meetup.com/Federal-Big-Data-Working-Group/events/220271343/

Page 7: Data Science for MyFamilySearch.org and FamilyTree DNA Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

7

Overview

• January 13, 2015: Family Search Launches New App Gallery (more than 50 apps)

• February 12–14, 2015: RootsTech 2015 Developer Challenge in Salt Lake City, Utah

• My Entry: Big Data from Everywhere for Families and Community Service

• My Partner Work: Data Science for MyFamilySearch.org• Syed Ali’s App: National Geographic Genographic Project and

Big Data• You could be a partner and develop apps (e.g. A Billion Person

Family Tree with MongoDB by Randall Wilson, Family Tree of Data: Provenance and Neo4, etc.)

Page 8: Data Science for MyFamilySearch.org and FamilyTree DNA Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

8

FamilySearch.org

• “FamilySearch is a great resource, but FamilySearch alone can’t do everything. That is why we work with partners to provide complementary tools and resources and why the FamilySearch App Gallery is so important,” said Dennis Brimhall, FamilySearch CEO.

• “We’ve had partners for many years, and now we want to make it easier for our patrons to know about them and to find the apps they need.”

Page 9: Data Science for MyFamilySearch.org and FamilyTree DNA Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

9

MyTableBox of MyFamily Tree

http://semanticommunity.info/MyFamilySearch.org#MyTableBox_of_MyFamily_Tree

Page 10: Data Science for MyFamilySearch.org and FamilyTree DNA Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

10

Person Template for Brand Lee Niemann

http://semanticommunity.info/MyFamilySearch.org#Person_Template_for_Brand_Lee_Niemann

Page 11: Data Science for MyFamilySearch.org and FamilyTree DNA Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

11

Mini-Tutorial: Sony Camcorder and Camtasia Video to YouTube Video

• How is the data collected?– Sony Camcorder and PowerPoint Slides.

• Where is the data stored?– Hard drive and DVD in MP4 format.

• What are the results?– MP4 files converted and uploaded to YouTube.

• Why should we believe the results?– Because I and others have done it successfully

many times.