rpi talk foster september 2011

34
www.ci.anl.gov www.ci.uchicago.edu Accelerating data- intensive science by outsourcing the mundane Ian Foster

Upload: ian-foster

Post on 10-May-2015

1.083 views

Category:

Technology


2 download

DESCRIPTION

A talk at the RPI-NSF Workshop on Multiscale Modeling of Complex Data, September 12, 2011, Troy NY, USA.We have made much progress over the past decade toward effectivelyharnessing the collective power of IT resources distributed across theglobe. In fields such as high-energy physics, astronomy, and climate,thousands benefit daily from tools that manage and analyze largequantities of data produced and consumed by large collaborative teams.But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that far more--ultimatelymost?--researchers will soon require capabilities not so different from those used by these big-science teams. How is the general population of researchers and institutions to meet these needs? Must every lab be filledwith computers loaded with sophisticated software, and every researcher become an information technology (IT) specialist? Can we possibly afford to equip our labs in this way, and where would we find the experts to operate them?Consumers and businesses face similar challenges, and industry hasresponded by moving IT out of homes and offices to so-called cloud providers (e.g., GMail, Google Docs, Salesforce), slashing costs and complexity. I suggest that by similarly moving research IT out of the lab, we can realize comparable economies of scale and reductions in complexity. More importantly, we can free researchers from the burden of managing IT, giving them back their time to focus on research and empowering them to go beyond the scope of what was previously possible.I describe work we are doing at the Computation Institute to realize this approach, focusing initially on research data lifecycle management. I present promising results obtained to date and suggest a path towardslarge-scale delivery of these capabilities.

TRANSCRIPT

Page 1: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

Accelerating data-intensive science by outsourcing the mundane

Ian Foster

Page 2: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

Page 3: Rpi talk foster september 2011

The data deluge

1330 molec. bio databases Nucleic Acids Research (96 in Jan 2001)

Genomic sequencing output x2 every 9 month>300 public centers

100,000 TB

MACHO et al.: 1 TB

Palomar: 3 TB2MASS: 10 TB

GALEX: 30 TBSloan: 40 TB

Pan-STARRS: 40,000 TB

Climate model intercomparisonproject (CMIP) of the IPCC

2004: 36 TB

2012: 2,300 TB

Page 4: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

4

Big science has achieved big successes

All build on NSF OCI (& DOE)-supported Globus Toolkit software

LIGO: 1 PB data in last science run, distributed worldwide

ESG: 1.2 PB climate datadelivered to 23,000 users; 600+ pubs

OSG: 1.4M CPU-hours/day, >90 sites, >3000 users, >260 pubs in 2010

Robust production solutionsSubstantial teams and expenseSustained, multi-year effortApplication-specific solutions, built on common technology

Page 5: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

5

But small science is struggling

More data, more complex dataAd-hoc solutionsInadequate software, hardwareData plan mandates

Page 6: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

6

Medium-scale science struggles too!• Dark Energy Survey

receives 100,000 files each night in Illinois

• They transmit files to Texas for analysis … then move results back to Illinois

• Process must be reliable, routine, and efficient

• The cyberinfrastructure team is not large

Image credit: Roger Smith/NOAO/AURA/NSF

Blanco 4m on Cerro Tololo

Page 7: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

7

The challenge of staying competitive

"Well, in our country," said Alice … "you'd generally get to somewhere else — if you run very fast for a long time, as we've been doing.”

"A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!"

Page 8: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

8

Current approaches are unsustainable

• Small laboratories– PI, postdoc, technician, grad students– Estimate 5,000 across US university community– Average ill-spent/unmet need of 0.5 FTE/lab?

• Medium-scale projects– Multiple PIs, a few software engineers– Estimate 500 across US university community– Average ill-spent/unmet need of 3 FTE/project?

• Total 4000 FTE: at ~$100K/FTE => $400M/yr Plus computers, storage, opportunity costs, …

Page 9: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

9

And don’t forget administrative costs

42% of the time spent by an average PI on a federally funded research project was reported to be expended on administrative tasks related to that project rather than on research — Federal Demonstration Partnership faculty burden survey, 2007

Page 10: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

10

You can run a company from a coffee shop

Page 11: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

11

Because businesses outsource their IT

Web presence Email (hosted Exchange) Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmt

Software as a Service

(SaaS)

Page 12: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

12

And often their large-scale computing too

Web presence Email (hosted Exchange) Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmt Data analytics Content distribution

Infrastructure as a Service

(IaaS)

Software as a Service

(SaaS)

Page 13: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

13

Let’s rethink how we provide research IT

Accelerate discovery and innovation worldwide by providing research IT as a service

Leverage software-as-a-service to• provide millions of researchers with

unprecedented access to powerful tools; • enable a massive shortening of cycle times in

time-consuming research processes; and• reduce research IT costs dramatically via

economies of scale

Page 14: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

14

Time-consuming tasks in science

• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment

with simulation• Search the literature

• Communicate with colleagues

• Publish papers• Find, configure, install

relevant software• Find, access, analyze

relevant data• Order supplies• Write proposals• Write reports• …

Page 15: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

15

Time-consuming tasks in science

• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment

with simulation• Search the literature

• Communicate with colleagues

• Publish papers• Find, configure, install

relevant software• Find, access, analyze

relevant data• Order supplies• Write proposals• Write reports• …

Page 16: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

16

A B

Data movement can be surprisingly difficult

Page 17: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

17

A B

Discover endpoints, determine available protocols, negotiate firewalls, configure software,

manage space, determine required credentials, configure protocols, detect and respond to failures, determine expected performance, determine actual performance, identify diagnose and correct network misconfigurations, integrate with file systems, …

Data movement can be surprisingly difficult

It took 2 weeks and much help from many people to move 10 TB between California and Tennessee.

(2007 BES report)

Page 18: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

18

Globus Online’s SaaS/Web 2.0 architecture

Fire-and-forget data movementAutomatic fault recoveryHigh performanceNo client software installAcross multiple security domains

Web interface

HTTP REST interfacePOST https://transfer.api.globusonline.org/ v0.10/transfer <transfer-doc>

Command line interfacels alcf#dtn:/scp alcf#dtn:/myfile \ nersc#dtn:/myfile

GridFTP serversFTP servers

Other protocols:HTTP, WebDAV, SRM, …

Globus Connecton local computers

(Hosted on)

(Operate) OpenIDOAuthShibboleth

Page 19: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

19

Example application: UC sequencing facility

Sequencing instrument

Mac using Globus Connect

iBi File Server

iBi general-purpose compute cluster

Sequencing-specific compute cluster

Mount drive

Delivery of data to customer

Page 20: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

20

Statistics and user feedback

• Launched November 2010>1700 users registered>500 TB user data moved>30 million user files moved>150 endpoints registered

• Widely used on TeraGrid/XSEDE; other centers & facilities; internationally

• >20x faster than SCP• Faster than hand-tuned

“Last time I needed to fetch 100,000 files from NERSC, a graduate student babysat the process for a month.”

“I expected to spend four weeks writing code to manage my data transfers; with Globus Online, I was up and running in five minutes.”

“Transferred 28 MB in 20 minutes instead of 61 hours. Makes these global climate simulations manageable.”

Page 21: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

21

Moving 586 Terabytes in two weeks

Page 22: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

22

Monitoring provides deep visibility

Page 23: Rpi talk foster september 2011

Terabyte

Gigabyte

Megabyte

Kilobyte

20 Terabytes in less than one day

20 Gigabyes in more than two days

Page 24: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

24

Common research data management steps

• Dark Energy Survey• Galaxy genomics• LIGO observatory

• SBGrid structural biology consortium• NCAR climate data applications• Land use change; economics

Page 25: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

25

We have choices of where to compute

• Campus systems– First target for many researchers

• XSEDE supercomputers– 220,000 cores, peer-reviewed awards– Optimized for scientific computing

• Open Science Grid– 60,000 cores; high throughput

• Commercial cloud providers– Instant access for small tasks– Expensive for big projects

Users insist that they need everything connected

Page 26: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

26

Towards “research IT as a service”

Page 27: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

27

Research data management as a service• GO-User

– Credentials and other profile information

• GO-Transfer– Data movement

• GO-Team– Group membership

• GO-Collaborate– Connect to collaborative

tools: Jira, Confluence, …

• GO-Store– Access to campus, cloud,

XSEDE storage• GO-Catalog

– On-demand metadata catalogs

• GO-Compute– Access to computers

• GO-Galaxy– Share, create, run

workflows

Today

Fall

Prototype

Page 28: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

28

SaaS services in action: The XSEDE vision

XUAS

Page 29: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

29

Data analysis as a service: Early steps

Securely and reliably:1. Assemble code2. Find computers3. Deploy code4. Run program5. Access data6. Store data7. Record workflow8. Reuse workflow

[3, 4]

VM imageApp codeWorkflowGalaxyCondor

Data store

[5, 6]

We have built such systems for biological, environmental, and economics researchers

[1, 2]

[7, 8]

Page 30: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

30

SaaS economics: A quick tutorial

• Lower per-user cost (x10?) via aggregation onto common infrastructure– $400M/yr $40M/yr?

• Initial “cost trough” due to fixed costs

• Per-user revenue permits positive return to scale

• Further reduce per-user cost over time

$

Time0

X10 reduction in per-user cost: $50K $5K/yr per lab $300K $30K/yr per project

Page 31: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

31

A national cyberinfrastructure strategy?

LL

LL

L

LL

L

LL

L

LL

L

LL

L

LL

L

LL

L

LL

L

LP P P P

Research data management Collaboration, computationResearch administration

• To providemore capability formore people at less cost …

• Create infrastructure – Robust and universal– Economies of scale– Positive returns to scale

• Via the creative use of– Aggregation (“cloud”)– Federation (“grid”)

Small and medium laboratories and projects

aaS

P

Page 32: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

32

Acknowledgments

• Colleagues at UChicago and Argonne– Steve Tuecke, Ravi Madduri, Kyle Chard, Tanu Malik,

and others listed at www.globusonline.org/about/goteam/

• Carl Kesselman and other colleagues at other institutions

• Participants in the recent ICiS workshop on “Human-Computer Symbiosis: 50 Years On”

• NSF OCI and MPS; DOE ASCR; and NIH for support

Page 33: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

33

For more information

• www.globusonline.org; @globusonline: Twitter• Foster, I. Globus Online: Accelerating and

democratizing science through cloud-based services. IEEE Internet Computing(May/June):70-73, 2011.

• Allen, B., Bresnahan, J., Childers, L., Foster, I., Kandaswamy, G., Kettimuthu, R., Kordas, J., Link, M., Martin, S., Pickett, K. and Tuecke, S. Globus Online: Radical Simplification of Data Movement via SaaS. Communications of the ACM, 2011.

Page 34: Rpi talk foster september 2011

www.ci.anl.govwww.ci.uchicago.edu

Thank you!

[email protected]

www.globusonline.org@globusonline