masterworks talk on big data and the implications of petascale science
DESCRIPTION
TRANSCRIPT
![Page 1: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/1.jpg)
Big Data and Biology: The implica4ons of petascale scienceDeepak Singh
![Page 2: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/2.jpg)
![Page 3: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/3.jpg)
Via Reavel under a CC-BY-NC-ND license
![Page 4: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/4.jpg)
![Page 5: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/5.jpg)
![Page 6: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/6.jpg)
life science industry
![Page 7: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/7.jpg)
Credit: Bosco Ho
![Page 8: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/8.jpg)
![Page 9: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/9.jpg)
By ~Prescott under a CC-BY-NC license
![Page 10: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/10.jpg)
![Page 11: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/11.jpg)
data
![Page 12: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/12.jpg)
Image: Wikipedia
![Page 13: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/13.jpg)
biology
![Page 14: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/14.jpg)
big data
![Page 15: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/15.jpg)
Source: http://www.nature.com/news/specials/bigdata/index.html
![Page 16: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/16.jpg)
Image: Matt Wood
![Page 17: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/17.jpg)
Human genome
Image: Matt Wood
![Page 18: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/18.jpg)
![Page 19: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/19.jpg)
not just sequencing
![Page 20: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/20.jpg)
![Page 22: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/22.jpg)
![Page 23: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/23.jpg)
![Page 24: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/24.jpg)
more data
![Page 25: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/25.jpg)
![Page 26: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/26.jpg)
![Page 27: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/27.jpg)
Image: Matt Wood
![Page 28: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/28.jpg)
all hell breaks loose
![Page 29: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/29.jpg)
~100 TB/Week
![Page 30: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/30.jpg)
~100 TB/Week
>2 PB/Year
![Page 31: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/31.jpg)
![Page 32: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/32.jpg)
![Page 33: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/33.jpg)
![Page 34: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/34.jpg)
![Page 35: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/35.jpg)
![Page 36: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/36.jpg)
![Page 37: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/37.jpg)
![Page 38: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/38.jpg)
years
![Page 39: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/39.jpg)
weeks
![Page 40: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/40.jpg)
days
![Page 41: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/41.jpg)
days
![Page 42: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/42.jpg)
days
minutes?
![Page 43: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/43.jpg)
gigabytes
![Page 44: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/44.jpg)
terabytes
![Page 45: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/45.jpg)
petabytes
![Page 46: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/46.jpg)
exabytes?
![Page 47: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/47.jpg)
really fast
![Page 48: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/48.jpg)
Image: http://www.broadinstitute.org/~apleite/photos.html
![Page 49: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/49.jpg)
single lab
![Page 50: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/50.jpg)
Image: Chris Dagdigian
![Page 51: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/51.jpg)
![Page 52: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/52.jpg)
![Page 53: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/53.jpg)
![Page 54: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/54.jpg)
![Page 55: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/55.jpg)
implications of scale
![Page 56: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/56.jpg)
data management
![Page 57: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/57.jpg)
data processing
![Page 58: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/58.jpg)
data sharing
![Page 59: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/59.jpg)
![Page 60: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/60.jpg)
fundamental concepts
![Page 61: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/61.jpg)
1. architecting for scale
![Page 62: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/62.jpg)
![Page 63: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/63.jpg)
“Everything fails, all the time”-- Werner Vogels
![Page 64: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/64.jpg)
![Page 65: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/65.jpg)
“Things will crash. Deal with it”-- Jeff Dean
![Page 66: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/66.jpg)
![Page 67: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/67.jpg)
“Remember everything fails”-- Randy Shoup
![Page 68: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/68.jpg)
fun with numbers
![Page 69: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/69.jpg)
datacenter availability
![Page 70: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/70.jpg)
Source: Uptime Institute
![Page 71: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/71.jpg)
Tier I: 28.8 hours annual down4me (99.67% availability)Tier II: 22.0 hrs annual down4me (99.75% availability)Tier III: 1.6 hrs annual down4me (99.98% availability)Tier IV: 0.8 hrs annual down4me (99.99% availability)
Source: Uptime Institute
![Page 72: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/72.jpg)
cooling systems go down
![Page 73: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/73.jpg)
power units fail
![Page 74: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/74.jpg)
2-4% of serverswill die annually
Source: Jeff Dean, LADIS 2009
![Page 75: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/75.jpg)
1-5% of disk drives will die every year
Source: Jeff Dean, LADIS 2009
![Page 76: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/76.jpg)
2.3% AFR in population of 13,2503.3% AFR in population of 22,400
4.2% AFR in population of 246,000
Source: James Hamilton
![Page 77: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/77.jpg)
software breaks
![Page 78: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/78.jpg)
human errors
![Page 79: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/79.jpg)
human errors~20% admin issues have unintended consequences
Source: James Hamilton
![Page 80: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/80.jpg)
achieving scalabilityand availability
![Page 81: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/81.jpg)
partitioning
![Page 82: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/82.jpg)
redundancy
![Page 83: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/83.jpg)
recovery oriented computing
Source: http://perspectives.mvdirona.com/, http://roc.cs.berkeley.edu/
![Page 84: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/84.jpg)
assume sw/hw failure
![Page 85: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/85.jpg)
design apps to be resilient
![Page 86: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/86.jpg)
automation
![Page 87: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/87.jpg)
![Page 88: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/88.jpg)
ComputeAmazon Elastic Compute
Cloud (EC2)- Elastic Load Balancing- Auto Scaling
StorageAmazon Simple
Storage Service (S3)- AWS Import/Export
Your Custom Applications and Services
Content DeliveryAmazon CloudFront
MessagingAmazon Simple
Queue Service (SQS)
PaymentsAmazon Flexible Payments Service
(FPS)
On-Demand Workforce
Amazon Mechanical Turk
Parallel ProcessingAmazon Elastic
MapReduce
MonitoringAmazon CloudWatch
ManagementAWS Management Console
ToolsAWS Toolkit for Eclipse
Isolated NetworksAmazon Virtual Private
Cloud
DatabaseAmazon RDS and
SimpleDB
![Page 89: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/89.jpg)
Amazon S3
![Page 90: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/90.jpg)
durable
![Page 91: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/91.jpg)
available
![Page 92: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/92.jpg)
!"#$%&'()*+
T
TT
![Page 93: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/93.jpg)
Amazon EC2
![Page 94: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/94.jpg)
highly scalable
![Page 95: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/95.jpg)
3000 CPU’s for one firm’s risk management application
!"#$%&'()'*+,'-./01.2%/'
344'+567/'(.'
8%%9%.:/'
;<"&/:1='
>?,3?,44@'
A&B:1='
>?,>?,44@'
C".:1='
>?,D?,44@'
E(.:1='
>?,F?,44@'
;"%/:1='
>?,G?,44@'
C10"&:1='
>?,H?,44@'
I%:.%/:1='
>?,,?,44@'
3444JJ'
344'JJ'
![Page 96: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/96.jpg)
highly available systems
![Page 97: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/97.jpg)
dynamic
![Page 98: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/98.jpg)
fault tolerant
![Page 99: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/99.jpg)
US East Region
Availability Zone A
Availability Zone B
Availability Zone C
Availability Zone D
![Page 100: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/100.jpg)
2. one size does not fit all
![Page 101: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/101.jpg)
2. one size does not fit all^data
![Page 102: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/102.jpg)
many data types
![Page 103: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/103.jpg)
structured data
![Page 104: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/104.jpg)
using the right data store
![Page 105: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/105.jpg)
(a) feature first
![Page 106: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/106.jpg)
RDBMS
Oracle, SQL Server, DB2, MySQL, Postgres
![Page 107: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/107.jpg)
Source: http://www.bioinformaticszen.com/
![Page 108: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/108.jpg)
Source: http://www.bioinformaticszen.com/
![Page 109: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/109.jpg)
Source: http://www.bioinformaticszen.com/
![Page 110: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/110.jpg)
use a bigger computer
![Page 111: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/111.jpg)
remove joins
![Page 112: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/112.jpg)
scaling limits
![Page 113: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/113.jpg)
(b) scale first
![Page 114: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/114.jpg)
scale is highest priority
![Page 115: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/115.jpg)
single RDBMS incapable
![Page 116: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/116.jpg)
solution 1: data sharding
![Page 117: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/117.jpg)
10’s
![Page 118: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/118.jpg)
100’s
![Page 119: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/119.jpg)
solution 2: scalable key-value store
![Page 120: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/120.jpg)
scale is design point
MongoDB, Project Voldermort, Cassandra, HBase, BigTable, Amazon SimpleDB, Dynamo
![Page 121: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/121.jpg)
(c) simple structured storage
![Page 122: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/122.jpg)
![Page 123: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/123.jpg)
simplefast
low ops cost
BerkeleyDB, Tokyo Cabinet, Amazon SimpleDB
![Page 124: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/124.jpg)
(d) purpose optimized stores
![Page 125: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/125.jpg)
![Page 126: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/126.jpg)
data warehousingstream processing
Aster Data, Vertica, Netezza, Greenplum, VoltDB, StreamBase
![Page 127: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/127.jpg)
what about files?
![Page 128: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/128.jpg)
cluster file systems
Lustre, GlusterFS
![Page 129: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/129.jpg)
distributed file systems
HDFS, GFS
![Page 130: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/130.jpg)
distributed object store
Amazon S3, Dynomite
![Page 131: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/131.jpg)
![Page 132: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/132.jpg)
ComputeAmazon Elastic Compute
Cloud (EC2)- Elastic Load Balancing- Auto Scaling
StorageAmazon Simple
Storage Service (S3)- AWS Import/Export
Your Custom Applications and Services
Content DeliveryAmazon CloudFront
MessagingAmazon Simple
Queue Service (SQS)
PaymentsAmazon Flexible Payments Service
(FPS)
On-Demand Workforce
Amazon Mechanical Turk
Parallel ProcessingAmazon Elastic
MapReduce
MonitoringAmazon CloudWatch
ManagementAWS Management Console
ToolsAWS Toolkit for Eclipse
Isolated NetworksAmazon Virtual Private
Cloud
DatabaseAmazon RDS and
SimpleDB
![Page 133: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/133.jpg)
ComputeAmazon Elastic Compute
Cloud (EC2)- Elastic Load Balancing- Auto Scaling
StorageAmazon Simple
Storage Service (S3)- AWS Import/Export
Your Custom Applications and Services
Content DeliveryAmazon CloudFront
MessagingAmazon Simple
Queue Service (SQS)
PaymentsAmazon Flexible Payments Service
(FPS)
On-Demand Workforce
Amazon Mechanical Turk
Parallel ProcessingAmazon Elastic
MapReduce
MonitoringAmazon CloudWatch
ManagementAWS Management Console
ToolsAWS Toolkit for Eclipse
Isolated NetworksAmazon Virtual Private
Cloud
DatabaseAmazon RDS and
SimpleDB
![Page 134: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/134.jpg)
3. processing big data
![Page 135: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/135.jpg)
disk read/writesslow & expensive
![Page 136: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/136.jpg)
data processingfast & cheap
![Page 137: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/137.jpg)
distribute the dataparallel reads
![Page 138: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/138.jpg)
![Page 139: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/139.jpg)
data processing for the cloud
![Page 140: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/140.jpg)
distributed file system(HDFS)
![Page 141: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/141.jpg)
map/reduce
![Page 142: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/142.jpg)
Via Cloudera under a Creative Commons License
![Page 143: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/143.jpg)
Via Cloudera under a Creative Commons License
![Page 144: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/144.jpg)
fault tolerance
![Page 145: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/145.jpg)
massive scalability
![Page 146: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/146.jpg)
petabyte scale
![Page 147: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/147.jpg)
![Page 148: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/148.jpg)
![Page 149: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/149.jpg)
hosted hadoop service
![Page 150: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/150.jpg)
hadoop easy and simple
![Page 151: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/151.jpg)
Input S3 bucket
Output S3 bucket
Amazon S3
Hadoop
Amazon EC2 Instances
Input dataset
outputresults
Deploy Application
Web Console, Command line tools
End
Notify
Get ResultsInput Data
Amazon Elastic MapReduce
Hadoop Hadoop
Hadoop
Hadoop
Hadoop
Elastic MapReduce
Elastic MapReduce
![Page 152: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/152.jpg)
back to the science
![Page 153: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/153.jpg)
basic informatics workflow
![Page 154: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/154.jpg)
![Page 155: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/155.jpg)
![Page 156: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/156.jpg)
![Page 157: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/157.jpg)
![Page 158: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/158.jpg)
Via Christolakis under a CC-BY-NC-ND license
![Page 159: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/159.jpg)
Via Argonne National Labs under a CC-BY-SA license
![Page 160: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/160.jpg)
Via Argonne National Labs under a CC-BY-SA license
killer app
![Page 161: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/161.jpg)
getting the data
![Page 162: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/162.jpg)
Register projects
Register samples
Sample prep
Sequencing
Analysis
These slides cover work presented by Matt Wood at various conferences
![Page 163: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/163.jpg)
Image: Matt Wood
![Page 164: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/164.jpg)
constant change
![Page 165: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/165.jpg)
flexible data capture
![Page 166: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/166.jpg)
virtual fields
![Page 167: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/167.jpg)
no schema
![Page 168: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/168.jpg)
![Page 169: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/169.jpg)
specify at run time
![Page 170: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/170.jpg)
specify at run time(bootstrapping)
![Page 171: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/171.jpg)
Sample
Name
Organism
Concentration
Source: Matt Wood
![Page 172: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/172.jpg)
Source: Matt Wood
![Page 173: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/173.jpg)
key value pairs
![Page 174: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/174.jpg)
![Page 175: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/175.jpg)
change happens
![Page 176: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/176.jpg)
Sample
Name
Organism
Concentration
Sample
Name
Organism
Concentration
Origin
Quality metric
V1 V2
Source: Matt Wood
![Page 177: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/177.jpg)
Source: Matt Wood
![Page 178: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/178.jpg)
high throughput
![Page 179: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/179.jpg)
lots of pipelines
![Page 180: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/180.jpg)
scaling projects/pipelines?
![Page 181: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/181.jpg)
lots of apps
![Page 182: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/182.jpg)
loosely coupled
![Page 183: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/183.jpg)
automation
![Page 184: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/184.jpg)
scale operationally
![Page 185: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/185.jpg)
be agile
![Page 186: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/186.jpg)
now what?
![Page 187: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/187.jpg)
Via asklar under a CC-BY license
![Page 188: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/188.jpg)
Via Argonne National Labs under a CC-BY-SA license
![Page 189: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/189.jpg)
many data types
![Page 190: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/190.jpg)
changing data types
![Page 191: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/191.jpg)
Shaq Image: Keith Allison under a CC-BY-SA license
![Page 192: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/192.jpg)
Shaq Image: Keith Allison under a CC-BY-SA license
![Page 193: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/193.jpg)
Shaq Image: Keith Allison under a CC-BY-SA license
![Page 194: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/194.jpg)
Shaq Image: Keith Allison under a CC-BY-SA license
![Page 195: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/195.jpg)
Shaq Image: Keith Allison under a CC-BY-SA license
![Page 196: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/196.jpg)
?
![Page 197: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/197.jpg)
![Page 198: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/198.jpg)
lots and lots and lots and lots and lots and lots of data andlots and lots of lots of data
![Page 199: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/199.jpg)
By bitterlysweet under a CC-BY-NC-ND license
![Page 200: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/200.jpg)
Source: http://bit.ly/anderson-bigdata
![Page 201: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/201.jpg)
Chris Anderson doesn’t understand science
![Page 202: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/202.jpg)
“more is different”
![Page 203: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/203.jpg)
few data points
![Page 204: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/204.jpg)
elaborate models
![Page 205: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/205.jpg)
the unreasonable effectiveness of data
Source: “The Unreasonable Effectiveness of Data”, Alon Halevy, Peter Norvig, and Fernando Pereira
![Page 206: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/206.jpg)
simple modelslots of data
![Page 207: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/207.jpg)
![Page 208: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/208.jpg)
information platform
![Page 209: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/209.jpg)
![Page 210: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/210.jpg)
information platforms at scale
![Page 211: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/211.jpg)
one organization
![Page 212: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/212.jpg)
4 TB daily added(compressed)
![Page 213: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/213.jpg)
135 TB data scanned daily(compressed)
![Page 214: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/214.jpg)
15 PB data total capacity
![Page 215: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/215.jpg)
???
![Page 216: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/216.jpg)
Facebook data from Ashish Thusoo’s HadoopWorld 2009 talk
![Page 217: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/217.jpg)
not always that big
![Page 218: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/218.jpg)
can we learn any lessons?
Source: “Information Platforms and the Rise of the Data Scientist”, Jeff Hammerbacher in Beautiful Data
![Page 219: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/219.jpg)
analytics platform
![Page 220: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/220.jpg)
Data warehouse
![Page 221: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/221.jpg)
Data warehouse is a repository of anorganization's electronically stored data. Data warehouses are designed to facilitate reporting and analysis
![Page 222: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/222.jpg)
![Page 223: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/223.jpg)
![Page 224: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/224.jpg)
![Page 225: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/225.jpg)
ETL
![Page 226: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/226.jpg)
extract
![Page 227: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/227.jpg)
transform
![Page 228: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/228.jpg)
load
![Page 229: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/229.jpg)
Via asklar under a CC-BY license
![Page 230: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/230.jpg)
1 TB
![Page 231: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/231.jpg)
MySQL --> Oracle
![Page 232: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/232.jpg)
more data
![Page 233: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/233.jpg)
more data types
![Page 234: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/234.jpg)
changing data types
![Page 235: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/235.jpg)
limit data warehouse
![Page 236: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/236.jpg)
too limited
![Page 237: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/237.jpg)
how do you scale and adapt?
![Page 238: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/238.jpg)
100’s of TBs
![Page 239: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/239.jpg)
1000’s of jobs
![Page 240: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/240.jpg)
back to the science
![Page 241: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/241.jpg)
back in the day
![Page 242: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/242.jpg)
small data sets
![Page 243: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/243.jpg)
flat files
![Page 244: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/244.jpg)
../../folder1/ ../folder2/
file1file2..fileN
../folderN/.. .
![Page 245: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/245.jpg)
shared file system
![Page 246: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/246.jpg)
RDBMS
![Page 247: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/247.jpg)
Image: Wikimedia Commons
![Page 248: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/248.jpg)
![Page 249: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/249.jpg)
![Page 250: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/250.jpg)
Image: Chris Dagdigian
![Page 251: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/251.jpg)
need to process
![Page 252: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/252.jpg)
need to analyze
![Page 253: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/253.jpg)
100’s of TBs
![Page 254: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/254.jpg)
1000’s of jobs
![Page 255: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/255.jpg)
Facebook data from Ashish Thusoo’s HadoopWorld 2009 talk
![Page 256: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/256.jpg)
![Page 257: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/257.jpg)
ETL
![Page 258: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/258.jpg)
Via asklar under a CC-BY license
![Page 259: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/259.jpg)
data mining&
analytics
![Page 260: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/260.jpg)
Via Argonne National Labs under a CC-BY-SA license
![Page 261: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/261.jpg)
analysts are not programmers
![Page 262: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/262.jpg)
not savvy with map/reduce
![Page 263: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/263.jpg)
apache hive
http://hadoop.apache.org/hive/
![Page 264: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/264.jpg)
manage & query data
![Page 265: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/265.jpg)
manage & query dataon top of Hadoop
![Page 266: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/266.jpg)
work by @peteskomoroch
![Page 267: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/267.jpg)
![Page 268: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/268.jpg)
![Page 270: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/270.jpg)
![Page 271: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/271.jpg)
apache pig
http://hadoop.apache.org/pig/
![Page 272: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/272.jpg)
Input S3 bucket
Output S3 bucket
Amazon S3
Hadoop
Amazon EC2 Instances
Input dataset
outputresults
Deploy Application
Web Console, Command line tools
End
Notify
Get ResultsInput Data
Amazon Elastic MapReduce
Hadoop Hadoop
Hadoop
Hadoop
Hadoop
Elastic MapReduce
Elastic MapReduce
![Page 273: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/273.jpg)
hadoop and bioinformatics
![Page 274: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/274.jpg)
High Throughput Sequence AnalysisMike Schatz, University of Maryland
![Page 275: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/275.jpg)
Short Read Mapping
![Page 276: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/276.jpg)
Seed & ExtendGood alignments must have significant exact alignment
Minimal exact alignment length = l/(k+1)
![Page 277: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/277.jpg)
Seed & ExtendGood alignments must have significant exact alignment
Minimal exact alignment length = l/(k+1)
Expensive to scale
![Page 278: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/278.jpg)
Seed & ExtendGood alignments must have significant exact alignment
Minimal exact alignment length = l/(k+1)
Expensive to scale
![Page 279: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/279.jpg)
Seed & ExtendGood alignments must have significant exact alignment
Minimal exact alignment length = l/(k+1)
Expensive to scale
Need parallelization framework
![Page 280: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/280.jpg)
CloudBurst
Catalog k-mers Collect seeds End-to-end alignment
![Page 281: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/281.jpg)
http://cloudburst-bio.sourceforge.net; Bioinformatics 2009 25: 1363-1369
![Page 282: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/282.jpg)
![Page 283: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/283.jpg)
Bowtie: Ultrafast short read aligner
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10 (3): R25.
![Page 284: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/284.jpg)
SOAPSnp: Consensus alignment and SNP calling
Ruiqiang Li, Yingrui Li, Xiaodong Fang, et al. (2009) "SNP detection for massively parallel whole-genome resequencing" Genome Res
![Page 285: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/285.jpg)
Crossbow: Rapid whole genome SNP analysis
Ben Langmead
http://bowtie-bio.sourceforge.net/crossbow/index.shtml
![Page 286: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/286.jpg)
![Page 287: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/287.jpg)
Preprocessed reads
![Page 288: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/288.jpg)
Preprocessed reads
Map: Bowtie
![Page 289: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/289.jpg)
Preprocessed reads
Map: Bowtie
Sort: Bin and partition
![Page 290: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/290.jpg)
Preprocessed reads
Map: Bowtie
Sort: Bin and partition
Reduce: SoapSNP
![Page 291: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/291.jpg)
Crossbow condenses over 1,000 hours of resequencing computa:on into a few hours without requiring the user to own or operate a computer cluster
![Page 292: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/292.jpg)
Comparing Genomes
![Page 293: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/293.jpg)
Estimating relative evolutionary rates from sequence comparisons:Identification of probable orthologs
A B C D E
S. cerevisiae C. elegans
species treegene tree
Admissible comparisons: A or B vs. DC vs. E
Inadmissible comparisons: A or B vs. EC vs. D
![Page 294: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/294.jpg)
Estimating relative evolutionary rates from sequence comparisons:
A B C D E
S. cerevisiae C. elegans
species treegene tree
1. Orthologs found using the Reciprocal smallest distance algorithm2. Build alignment between two orthologs>Sequence CMSGRTILASTIAKPFQEEVTKAVKQLNFT-----PKLVGLLSNEDPAAKMYANWTGKTCESLGFKYEL-…
>Sequence EMSGRTILASKVAETFNTEIINNVEEYKKTHNGQGPLLVGFLANNDPAAKMYATWTQKTSESMGFRYDL…
3. Estimate distance given a substitution matrix
Phe Ala Pro Leu ThrPhe Ala µπPro µπ µπ µπLeu µπ µπ µπ µπ
![Page 295: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/295.jpg)
ab
bb
cb
c
c
c
a
b
c
vs.
vs.
vs.
vs.
vs.
vs.
Align sequences &Calculate distances
D=0.2
D=0.3
D=0.1
D=1.2
D=0.1
D=0.9
Orthologs:ib - jc D = 0.1
HL Align sequences &Calculate distances
JcIb
Genome I Genome J
RSD algorithm summary
![Page 296: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/296.jpg)
Prof. Dennis WallHarvard Medical School
![Page 297: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/297.jpg)
Roundup is a database of orthologs and their evolutionary distances.To get started, click browse. Alternatively, you can read our documentation here.
Good luck, researchers!
![Page 298: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/298.jpg)
massive computational demand
![Page 299: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/299.jpg)
1000 genomes = 5,994,000 processes = 23,976,000
hours
![Page 300: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/300.jpg)
2737 years
![Page 301: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/301.jpg)
![Page 302: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/302.jpg)
compared 50+ genomes
![Page 303: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/303.jpg)
trends in data sharing
![Page 304: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/304.jpg)
data motion is hard
![Page 305: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/305.jpg)
cloud services are a viable dataspace
![Page 306: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/306.jpg)
share data
![Page 307: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/307.jpg)
share applications
![Page 308: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/308.jpg)
![Page 309: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/309.jpg)
share results
![Page 310: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/310.jpg)
http://aws.amazon.com/publicdatasets/
![Page 311: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/311.jpg)
![Page 312: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/312.jpg)
Data Platform
App Platform
![Page 313: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/313.jpg)
Data Platform
App Platform
![Page 314: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/314.jpg)
Scalable Data Platform
Services
APIs
Getters Filters Savers
WORK
![Page 315: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/315.jpg)
to conclude
![Page 316: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/316.jpg)
big data
![Page 317: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/317.jpg)
change thinking
![Page 318: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/318.jpg)
data managementdata processing
data sharing
![Page 319: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/319.jpg)
think distributed
![Page 320: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/320.jpg)
new software architectures
![Page 321: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/321.jpg)
new computing paradigms
![Page 322: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/322.jpg)
cloud services
![Page 323: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/323.jpg)
the cloud works
![Page 324: Masterworks talk on Big Data and the implications of petascale science](https://reader038.vdocument.in/reader038/viewer/2022103109/5463b9d3af795927598b5ec6/html5/thumbnails/324.jpg)
[email protected] Twi2er:@mndoci Presenta4on ideas from @mza, James Hamilton, and @lessig
Thank you!