from the benchtop to the datacenter: it and converged infrastructure in life sciences

83
From the Benchtop to the Datacenter IT and Converged Infrastructure in Life Science Research 1 Leverage Big Data ’14, San Diego, CA.; May 2014 Thursday, May 22, 14

Upload: arieberman

Post on 25-Jun-2015

1.029 views

Category:

Technology


2 download

DESCRIPTION

Talk given at the Leverage Big Data '14 Event in May 2014. Big data has arrived in the life science research domain, and it has caught researchers and IT professionals alike off-guard. Workstations, Excel, and even small clusters under people's desks are no longer sufficient to meet the data storage and processing needs of modern biological research techniques. Data is being produced cheaply and rapidly at unprecedented rates in academic, commercial and clinical laboratories while budgets in those spaces continue to be slashed. Despite the reduced budgets, it is predicted that 25% of all researchers will require HPC to analyze their data in the coming year. Research organizations are starting to realize they have to run to catch up, or face failure in the wake of old-school IT infrastructures and policies. IT organizations have been forced to get creative and build amazing infrastructures for pennies, or fail in the wake of the user pressure being generated from the laboratories. Converged infrastructure is the present and the future for biomedical, clinical, and life sciences research. In this talk, I'll cover the IT challenges in life sciences, how and where they are being met, and talk about the near-future trends in IT infrastructure, services, and informatics and how they will affect medical discoveries in the next 5-10 years.

TRANSCRIPT

Page 1: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

From the Benchtop to the DatacenterIT and Converged Infrastructure in Life Science Research

1Leverage Big Data ’14, San Diego, CA.; May 2014

Thursday, May 22, 14

Page 2: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Ari E. Berman, Ph.D.

Who am I?

2

Director of Government Services, Principal Investigator

I’m a fallen scientist - Ph.D. Molecular Biology, Neuroscience, Bioinformatics

I’m an HPC/Infrastructure geek - 14 years

I help enable science!Thursday, May 22, 14

Page 3: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

3

BioTeam

‣ Independent consulting shop

‣ Staffed by scientists forced to learn IT, SW & HPC to get our own research done

‣ Infrastructure, Informatics, Software Development, Cross-disciplinary Assessments

‣ 11+ years bridging the “gap” between science, IT & high performance computing

‣ Our wide-ranging work is what gets us invited to speak at events like this ...

Thursday, May 22, 14

Page 4: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

What do we do?BioTeam

4

Laboratory Knowledge

Thursday, May 22, 14

Page 5: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

What do we do?BioTeam

4

Laboratory Knowledge

Thursday, May 22, 14

Page 6: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

What do we do?BioTeam

4

Laboratory Knowledge

Thursday, May 22, 14

Page 7: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

What do we do?BioTeam

4

Laboratory Knowledge

Thursday, May 22, 14

Page 8: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

What do we do?BioTeam

4

Laboratory Knowledge

Thursday, May 22, 14

Page 9: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

What do we do?BioTeam

4

Laboratory Knowledge

Thursday, May 22, 14

Page 10: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

What do we do?BioTeam

4

Laboratory Knowledge

Converged Solution

Thursday, May 22, 14

Page 11: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

What do we do?BioTeam

4

Laboratory Knowledge

Converged Solution

Thursday, May 22, 14

Page 12: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Mostly work in Life SciencesOur domain coverage

• Government• Universities• Big pharma• Biotech• Private institutes• Diagnostic startups• Oil and Gas• Geospatial• Hollywood Animation• Law Enforcement

5Thursday, May 22, 14

Page 13: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

A ton of .gov work on a massive scaleMy Recent Work...

‣ National Institutes of Health

‣ US Department of Agriculture (USDA)

‣ Navy‣ Centers for Disease

Control

6Thursday, May 22, 14

Page 14: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

7

OK, so why are we here talking to you?

Thursday, May 22, 14

Page 15: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

We have a unique perspective across much of life sciences

We’ve noticed a few things

‣ Big Data has arrived in Life Sciences

‣ Data is being generated at unprecedented rates

‣ Research and Biomedical Orgs were caught off guard

‣ IT running to catch up, limited budgets

‣ Money is tight, Orgs reluctant to invest in Bio-IT

8

25% of all Life Scientists will require HPC in 2015!Thursday, May 22, 14

Page 16: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

It’s being made harder by lack of supportResearch is hard

‣ Scientists are getting frustrated

‣ Stubborn, fight for what they need

‣ They will build a cluster under their desk if it gets the job done

‣ In general they will win against IT

9Thursday, May 22, 14

Page 17: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

10

It’s a risky time to be doing Bio-IT

Thursday, May 22, 14

Page 18: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

11

Big Picture / Meta Issue

‣ HUGE revolution in the rate at which lab platforms are being redesigned, improved & refreshed

‣ IT not a part of the conversation, running to catch up

Thursday, May 22, 14

Page 19: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Science progressing way faster than IT can refresh/change

The Central Problem Is ...

‣ Instrumentation & protocols are changing FAR FASTER than we can refresh our Research-IT & Scientific Computing infrastructure

• Bench science is changing month-to-month ...• ... while our IT infrastructure only gets refreshed every

2-7 years

‣ We have to design systems TODAY that can support unknown research requirements & workflows over many years (gulp ...)

12Thursday, May 22, 14

Page 20: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

The Central Problem Is ...

‣ The easy period is over

‣ 5 years ago - under the desk solutions worked

‣ This doesn’t work any more!

13Thursday, May 22, 14

Page 21: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

14

The new normal for informatics

Thursday, May 22, 14

Page 22: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

And a related problem ...

‣ Easy to acquire vast amounts of data cheaply and easily

‣ Growth rate of data creation exceeds rate of storage improvements

‣ Not just a storage lifecycle problem.

• This data *moves* and often needs to be shared among multiple entities and providers

• ... ideally without punching holes in your firewall or consuming all available internet bandwidth

15Thursday, May 22, 14

Page 23: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

If you get it wrong ...

‣ Lost opportunity

‣ Missing capability

‣ Frustrated & very vocal scientific staff

‣ Problems in recruiting, retention, publication & product development

16Thursday, May 22, 14

Page 24: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

17

It’s a risky time to be doing Bio-IT

11

What are the drivers in Bio-IT today?

Thursday, May 22, 14

Page 25: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

18

Genomics: Next Generation Sequencing (NGS)

Thursday, May 22, 14

Page 26: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

It’s like the hard drive of life

19

The big deal about DNA

‣ DNA is the template of life

‣ DNA is read --> RNA

‣ RNA is read --> Proteins

‣ Proteins are the functional machinery that make life possible

‣ Understanding the template = understanding basis for disease

Thursday, May 22, 14

Page 27: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Sequencing by SynthesisHow does NGS work?

20Thursday, May 22, 14

Page 28: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Reference assembly, variant callingHow does NGS work?

21Thursday, May 22, 14

Page 29: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Reference assembly, variant callingHow does NGS work?

21Thursday, May 22, 14

Page 30: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Reference assembly, variant callingHow does NGS work?

21Thursday, May 22, 14

Page 31: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Gateway to personalized medicineThe Human Genome

‣ 3.2 Gbp

‣ 23 chromosomes

‣ ~21,000 genes

‣ Over 55M known variations

22Thursday, May 22, 14

Page 32: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

...and why NGS is the primary driver

23

The Problem...

‣ Sequencers are now relatively cheap and fast

‣ Some can generate a human genome in 18 hours, for $2,000

‣ Everyone is doing it

‣ Can generate 3TB of data in that time

‣ First genome took 13 years and $2.7B to complete

Thursday, May 22, 14

Page 33: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

24

Other Methodologies Not Far Behind

Thursday, May 22, 14

Page 34: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

High-throughput Imaging

‣ Robotics screening millions of compounds on live cells 24/7

• Not as much data as genomics in volume, but just as complex

• Data volumes in the 10’s TB/week

‣ Confocal Imaging• Scanning 100’s of tissue sections/

week, each with 10’s of scans, each with 20-40 layers and multiple florescent channels

• Data volumes in the 1’s - 10’s TB/week

25Thursday, May 22, 14

Page 35: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

High-power, dense detector MRI scanners in use 24/7 at large research hospitals

High-res medical imaging

‣ Creating 3D models of brains, comparing large datasets

‣ Using those models to perform detailed neurosurgery with real-time analytic feedback from supercomputer in the OR (cool stuff)

‣ Also generates 10’s of TB/week

26Thursday, May 22, 14

Page 36: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

27

This is a huge problem

‣ Causing a literal deluge of data, in the 10’s of Petabytes

‣ NIH generating 1.5PB of data/month

‣ First real case in life science where 100Gb networking might really be needed

‣ But, not enough storage or compute

Thursday, May 22, 14

Page 37: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

28

And the problem is getting even bigger

Thursday, May 22, 14

Page 38: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

We have them allFile & Data Types

‣ Massive text files

‣ Massive binary files

‣ Flatfile ‘databases’

‣ Spreadsheets everywhere

‣ Directories w/ 6 million files

‣ Large files: 600GB+

‣ Small files: 30kb or smaller

29Thursday, May 22, 14

Page 39: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

30

Application characteristics‣ Mostly SMP/threaded apps

performance bound by IO and/or RAM

‣ Hundreds of apps, codes & toolkits

‣ 1TB - 2TB RAM “High Memory” nodes becoming essential

‣ Lots of Perl/Python/R

‣ MPI is rare• Well written MPI is even rarer

‣ Few MPI apps actually benefit from expensive low-latency interconnects*

• *Chemistry, modeling and structure work is the exception

Thursday, May 22, 14

Page 40: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Why, giant meta-analyses, of course

31

What to do with all that data?

‣ Typical problem across all of big data: how do you use it?

‣ In life sciences: no real standards of data formats

‣ Data scattered all over, despite push for Data Commons

‣ Not always accessible

‣ Combining the data if you have it all is a real challenge

Thursday, May 22, 14

Page 41: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Scientists don’t like to share (really!)A Compounding Problem...

‣ The fear: • if someone sees data before it

is published, they might steal it and publish it themselves (getting scooped)

‣ Causes:• Long time to publication• Outdated methods of

assigning scientific credit• Not properly incentivized

32Thursday, May 22, 14

Page 42: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Sharing requiredA Problem for Data Commons

‣ Data piling up

‣ Bad network infrastructures

‣ Few central analytics platforms

‣ Wild-west file formats/algorithms

‣ No sharing33

Thursday, May 22, 14

Page 43: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Sharing requiredA Problem for Data Commons

‣ Data piling up

‣ Bad network infrastructures

‣ Few central analytics platforms

‣ Wild-west file formats/algorithms

‣ No sharing33

Hyperscale analytics will only work if the data is accessible!

Thursday, May 22, 14

Page 44: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

34

But wait! There’s more!

Thursday, May 22, 14

Page 45: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

35

Local IT inhibiting research

‣ IT originally developed for business administration

‣ Tight policies, security at the expense of performance

‣ IT determines resources, not users

‣ Result: lack of institutional investment and lack of talent

‣ Doesn’t work for science

Thursday, May 22, 14

Page 46: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

IT staff become a part of the research projectsSuccessful Bio-IT orgs

‣ When IT and scientists at the table together, mutual understanding happens

‣ IT is driven to innovate to accommodate science

‣ Scientists better understand resourcing

‣ Roadblocks are removed, science is enabled

36Thursday, May 22, 14

Page 47: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Up to a two line subtitle, generally used to describe the takeaway for the slide

37

How are these issues being solved?

Thursday, May 22, 14

Page 48: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Compute related design patterns largely static

38

Core Compute

‣ Linux compute clusters are the baseline compute platform

‣ Even our lab instruments know how to submit jobs to common HPC cluster schedulers

‣ Compute is not hard. It’s a commodity that is easy to acquire & deploy in 2014

Thursday, May 22, 14

Page 49: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Defensive hedge against Big Data / HDFS

39

Compute: Local Disk is Back

‣ We’ve started to see organizations move away from blade servers and 1U pizza box enclosures for HPC

‣ The “new normal” may be 4U enclosures with massive local disk spindles - not occupied, just available

‣ Why? Hadoop & Big Data

‣ This is a defensive hedge against future HDFS or similar requirements

• Remember the ‘meta’ problem - science is changing far faster than we can refresh IT. This is a defensive future-proofing play.

‣ Hardcore Hadoop rigs sometimes operate at 1:1 ratio between core count and disk count

Thursday, May 22, 14

Page 50: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

New and refreshed HPC systems running many node types

40

Compute: Huge trend in ‘diversity’

‣ Accelerated trend since at least 2012 ...• HPC compute resources no longer homogenous;

many types and flavors now deployed in single HPC stacks

‣ Newer clusters mix-and-match to match the known use cases:

• GPU nodes for compute

• GPU nodes for visualization• Large memory nodes (512GB +)• Very Large memory nodes (1TB +)

• ‘Fat’ nodes with many CPU cores• ‘Thin’ nodes with super-fast CPUs• Analytic nodes with SSD, FusionIO, flash or large

local disk for ‘big data’ tasks

Thursday, May 22, 14

Page 51: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Most use Amazon Web ServicesBig push for cloud use

‣ Many Orgs are pushing for cloud

‣ Unsupported scientists end up using cloud

‣ It’s fast, flexible, affordable, if done right

‣ If done wrong, way more expensive than local compute

‣ Biggest problem: getting data to it!

41Thursday, May 22, 14

Page 52: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

42

Storage & Data Management‣ LifeSci core requirement:

• Shared, simultaneous read/write access across many instruments, desktops & HPC silos

‣ NAS = “easiest” option • Scale Out NAS products are the

mainstream standard

‣ Parallel & Distributed storage for edge cases and large organizations with known performance needs

• Becoming much more common: GPFS has taken hold in LifeSci

Thursday, May 22, 14

Page 53: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

43

Storage & Data Management

‣ Storage & data mgmt. is the #1 infrastructure headache in life science environments

‣ Most labs need “peta capable” storage due to unpredictable future

• Only a small % will actually hit 1PB• Often forced to trade away performance

in order to obtain capacity

‣ Object stores, ZFS and commodity “Nexentastor-style” methods are making significant inroads

Thursday, May 22, 14

Page 54: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

44

Data Movement & Data Sharing

‣ Peta-scale data movement needs

• Within an organization• To/from collaborators• To/from suppliers• To/from public data repos

‣ Peta-scale data sharing needs• Collaborators and partners may be

all over the world

Thursday, May 22, 14

Page 55: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Physical & Network

45

We Have Both Ingest Problems

‣ Significant physical ingest occurring in Life Science

• Standard media: naked SATA drives shipped via Fedex

‣ Cliche example:• 30 genomes outsourced means 30

drives will soon be sitting in your mail pile

‣ Organizations often use similar methods to freight data between buildings and among geographic sites

Thursday, May 22, 14

Page 56: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

46

Physical Ingest Just Plain Nasty

‣ Most common high-speed network: FedEx

‣ Easy to talk about in theory

‣ Seems “easy” to scientists and even IT at first glance

‣ Really really nasty in practice• Incredibly time consuming• Significant operational burden• Easy to do badly / lose data

Thursday, May 22, 14

Page 57: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

47

Networking

‣ Major 2014 focus

‣ May surpass storage as our #1 infrastructure headache

‣ Why?• Petascale storage meaningless

if you can’t access/move it• 10-Gig, 40-Gig and 100-Gig

networking will force significant changes elsewhere in the ‘bio-IT’ infrastructure

Thursday, May 22, 14

Page 58: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

48

Network: Speed @ Core and Edge

‣ Remember 2004 when research storage requirements started to dwarf what the enterprise was using?

‣ Same thing is happening now for networking

‣ Research core, edge and top-of-rack networking speeds may exceed what the rest of the organization has standardized on

Thursday, May 22, 14

Page 59: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Network: ‘ScienceDMZ’

‣ Very fast “low-friction” network links and paths with security policy and enforcement specific to scientific workflows

‣ “ScienceDMZ” concept is real and necessary

‣ BioTeam will be building them in 2014 and beyond

‣ Central premise:• Legacy firewall, network and security methods architected

for “many small data flows” use cases• Not built to handle smaller #s of massive data flows

• Also very hard to deploy ‘traditional’ security gear on 10Gigabit and faster networks

‣ More details, background & documents at http://fasterdata.es.net/science-dmz/

49

Background traffic or

competing bursts

DTN traffic with wire-speed

bursts

10GE

10GE

10GE

Thursday, May 22, 14

Page 60: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

50

Simple Science DMZ:

Image source: “The Science DMZ: Introduction & Architecture” -- esnet

Thursday, May 22, 14

Page 61: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

51

What does it all mean?

Thursday, May 22, 14

Page 62: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

What just happened?

52

Laboratory Knowledge

Thursday, May 22, 14

Page 63: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

What just happened?

52

Laboratory Knowledge

Thursday, May 22, 14

Page 64: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

What just happened?

52

Laboratory Knowledge

Thursday, May 22, 14

Page 65: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

What just happened?

52

Laboratory Knowledge

Thursday, May 22, 14

Page 66: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

What just happened?

52

Laboratory Knowledge

Thursday, May 22, 14

Page 67: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

What just happened?

52

Laboratory Knowledge

Thursday, May 22, 14

Page 68: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Converged Infrastructure

53

The meta issue

‣ Individual technologies and their general successful use are fine

‣ Unless they all work together as a unified solution, it all means nothing

‣ Creating an end-to-end solution based on the use case (science!): converged infrastructure

Thursday, May 22, 14

Page 69: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

It’s what we doConvergence

54

Laboratory Knowledge

Thursday, May 22, 14

Page 70: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

It’s what we doConvergence

54

Laboratory Knowledge

Converged Solution

Thursday, May 22, 14

Page 71: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

It’s what we doConvergence

54

Laboratory Knowledge

Converged Solution

Thursday, May 22, 14

Page 72: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

55

What’s the future look like?

Thursday, May 22, 14

Page 73: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Small desktop appliances; gateways

56

Local Compute Appliances

‣ BioTeam is developing appliances: affordable compute for labs without IT resources

‣ More of these will make their way into the field

‣ Act at gateways to larger resources

‣ Pre-packaged with best-practices, low IT burden

Thursday, May 22, 14

Page 74: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Also known as hybrid cloudsHybrid HPC

‣ Relatively new idea• small local footprint• large, dynamic, scalable, orchestrated

public cloud component

‣ DevOps is key to making this work

‣ High-speed network to public cloud required

‣ Software interface layer acting as the mediator between local and public resources

‣ Good for tight budgets, has to be done right to work

‣ Not many working examples yet57

Thursday, May 22, 14

Page 75: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Commodity Science DMZsScientific Networks

‣ As data grows, need clean/fast networks

‣ Security needs to be addressed, but performance is key

‣ Internet2 enables direct, clean, long-distance, affordable fast networking

58Thursday, May 22, 14

Page 76: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Reduce guessing game on scientists’ partCommoditized Infrastructure

‣ Building blocks matched to use cases

‣ Blocks designed as converged infrastructure

‣ Let scientists get back to the science

‣ Make the infrastructure serve the science

59Thursday, May 22, 14

Page 77: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Central storage of knowledge with computeData Commons

‣ Common structure for data storage and indexing (a cloud?)

‣ Associated compute for analytics

‣ Development platform for application development (PaaS)

‣ Make discovery more possible

60Thursday, May 22, 14

Page 78: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Data Integration Platform!

Source 1! Source 2! Source 3!

Analytics/Data Retrieval!

Don’t move data, move computeData Integration Systems

‣ PaaS is a meta-index and development platform

‣ Index knows where the data is

‣ System reads data as needed from remote sources

‣ Requires good networks61

Thursday, May 22, 14

Page 79: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Not so distant future!Fully converged laboratories

62

Laboratory Knowledge

Converged Solution

Thursday, May 22, 14

Page 80: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Not so distant future!Fully converged laboratories

62

Laboratory Knowledge

Converged Solution

New Bottleneck!

Thursday, May 22, 14

Page 81: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Not so distant future!Fully converged laboratories

63

Laboratory

Thursday, May 22, 14

Page 82: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

Not so distant future!Fully converged laboratories

63

Laboratory

Knowledge

Thursday, May 22, 14

Page 83: From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life Sciences

64

end; Thanks!Slides can be found at http://www.slideshare.com/arieberman

Thursday, May 22, 14