big data for one big family
DESCRIPTION
Presentation by Matt Asay (MongoDB) at the FamilySearch Developer Conference (2014), talking about how big data applies to family history.TRANSCRIPT
![Page 1: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/1.jpg)
MongoDB Inc. Proprietary and Confidential
Big Data for One Big Family
VP, Community, MongoDB Matt Asay
![Page 2: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/2.jpg)
2
What Genealogy Was: Neat and Tidy Data
![Page 3: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/3.jpg)
3
Genealogy = Family Stories
![Page 4: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/4.jpg)
4
Stories Aren’t Told in Spreadsheets
![Page 5: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/5.jpg)
5
They’re Increasingly Told Like This
![Page 6: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/6.jpg)
6
Modern, “Big” Data Is Messy
![Page 7: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/7.jpg)
7
Data Now Looks Like This
![Page 8: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/8.jpg)
8
It Looks Like People
![Page 9: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/9.jpg)
The Big Data Unknown
![Page 10: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/10.jpg)
10
Who’s Embracing Big Data?
Source: Gartner
![Page 11: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/11.jpg)
11
Top Big Data Challenges?
Translation? Most struggle to know what Big Data is, how to manage it and who can manage it
Source: Gartner
![Page 12: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/12.jpg)
12
• More than 90% of today’s data was created in the last 2 years
• Moore’s Law for data: Doubles at regular intervals
Big Data: Volume Matters
![Page 13: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/13.jpg)
13
Big(ger) Is the New Normal
![Page 14: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/14.jpg)
14
Volume Is Not Really the Problem
“Of Gartner's "3Vs" of big data (volume, velocity, variety), the variety of data sources is seen by our clients as both the greatest challenge and the greatest opportunity.”
- Forrester, 2014
* From Big Data Executive Summary of 50+ execs from F100, gov orgs
What are the primary data issues driving you to consider Big Data?*
Data Variety (68%)
Data Volume (15%)
Other Data (17%)
Diverse, streaming or new data types
Greater than 100TB
Less than 100TB
![Page 15: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/15.jpg)
15
Compounding the Confusion
![Page 16: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/16.jpg)
16
We Hire for Machines but…
Source: Kdnuggets 2014
![Page 17: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/17.jpg)
17
Time to Rethink the Solution
![Page 18: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/18.jpg)
18
NoSQL Born for Unstructured Data
18
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
Log
data
Free
-form
text
Web
or m
obile
co
nten
t
Soc
ial m
edia
dat
a
Geo
spat
ial d
ata
Tran
sact
ions
Mob
ile d
evic
e da
ta
Web
ses
sion
s or
ca
chin
g da
ta
Sen
sor d
ata
Em
ail/d
ocum
ents
Mac
hine
dat
a
Imag
es
Vide
o
Aud
io
NoSQL Data Types (multiples allowed)
Source: Gartner, 2014
![Page 19: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/19.jpg)
Innovation As Iteration
![Page 20: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/20.jpg)
“I have not failed. I've just found 10,000 ways that won't work.” ― Thomas A. Edison
![Page 21: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/21.jpg)
21
Back in 1970…Cars Were Great!
![Page 22: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/22.jpg)
22
So Were Computers!
![Page 23: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/23.jpg)
23
Including the Relational Database
![Page 24: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/24.jpg)
24
Lots of Great Innovations Since 1970
![Page 25: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/25.jpg)
25
Legacy Data Infrastructure Makes Development Hard
Relational Database
Object Relational Mapping Application
Code XML Config DB Schema
![Page 26: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/26.jpg)
26
And Even Harder To Iterate
New Table
New Table
New Column
Name Pet Phone Email
New Column
3 months later…
![Page 27: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/27.jpg)
27
Scale and Flexibility Drive Choices
27
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
Scalability Schema flexibility Ease of development
Cost Availability of cloud deployment options
What motivated you to use a NoSQL database over traditional alternatives? (multiples allowed)
Source: Gartner, 2014
![Page 28: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/28.jpg)
28
RDBMS
NoSQL Drives Agility
MongoDB
{ _id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin", department : "Marketing",
title : "Product Manager, Web", report_up: "Neray, Graham",
pay_band: “C", benefits : [
{ type : "Health", plan : "PPO Plus" },
{ type : "Dental", plan : "Standard" }
] }
![Page 29: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/29.jpg)
29
Optimize for (Developer) Iteration
1985 2013
Infrastructure Cost
Engineer Cost
![Page 30: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/30.jpg)
30
So…Use Open Source
![Page 31: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/31.jpg)
31
Big Data != Big Upfront Payment
![Page 32: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/32.jpg)
32
Shouldn’t Be Penalized for Success
“Clients can also opt to run zEC12 without a raised datacenter floor -- a first for high-end IBM mainframes.”
IBM Press Release 28 Aug, 2012
![Page 33: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/33.jpg)
33
Cloud Fosters Experimentation
Those that go out and buy expensive infrastructure find that the problem scope and domain shift really quickly. By the time they get around to answering the original question, the business has moved on. You need an environment that is flexible and allows you to quickly respond to changing big data requirements. Your resource mix is continually evolving - if you buy infrastructure it's almost immediately irrelevant to your business because it's frozen in time. It's solving a problem you may not have or care about any more.
- Matt Wood, GM of Data Science, Amazon Web Services
![Page 34: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/34.jpg)
34
NoDoop: Not Only Hadoop
Source: Silicon Angle, 2012
![Page 35: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/35.jpg)
35
The Data Scientist Is You
“Organizations already have people who know their own data better than mystical data scientists….Learning Hadoop is easier than learning the company’s business.”
(Gartner, 2012)
![Page 36: Big Data for One Big Family](https://reader033.vdocument.in/reader033/viewer/2022060117/5587a256d8b42a1e368b4690/html5/thumbnails/36.jpg)
@mjasay