the cloud, “big data” and supporting a major ground data system on $5 a day douglas hughes jet...

14
The Cloud, “Big Data” and Supporting a Major Ground Data System on $5 a Day Douglas Hughes Jet Propulsion Laboratory, California Institute of Technology August 15, 2011 Copyright 2011 California Institute of Technology Government Sponsorship Acknowledged

Upload: april-copeland

Post on 23-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: The Cloud, “Big Data” and Supporting a Major Ground Data System on $5 a Day Douglas Hughes Jet Propulsion Laboratory, California Institute of Technology

The Cloud, “Big Data” and Supporting a Major Ground Data System on $5 a Day

Douglas HughesJet Propulsion Laboratory, California Institute of Technology

August 15, 2011

Copyright 2011 California Institute of Technology

Government Sponsorship Acknowledged

Page 2: The Cloud, “Big Data” and Supporting a Major Ground Data System on $5 a Day Douglas Hughes Jet Propulsion Laboratory, California Institute of Technology

The difference between a rut and a grave is the depth.

Gerald BurrillEpiscopal Bishop of Chicago

Page 3: The Cloud, “Big Data” and Supporting a Major Ground Data System on $5 a Day Douglas Hughes Jet Propulsion Laboratory, California Institute of Technology

A Word from Vivek Kundra

“When evaluating options for new IT deployments, OMB will require that agencies default to cloud-based solutions whenever a secure, reliable, cost-effective cloud option exists.”*

*25 Point Implementation Plan to Reform Federal Information Management.Vivek KundraU.S. Chief Information OfficerDecember 9, 2010

We need to comply with the secure, reliable and cost-effective aspects in any hosting we do but we’ll also consider the following points from the paper that are particularly applicable to the Cloud.

• Economical – Pay as you go.• Flexible – Capacity on demand and per demand• Fast – Rapid provisioning of a wide variety of IT services

Page 4: The Cloud, “Big Data” and Supporting a Major Ground Data System on $5 a Day Douglas Hughes Jet Propulsion Laboratory, California Institute of Technology

Framing the Larger Challenge

• Declining NASA, capital investment, and project budgets

• Increasing data return from on-board instruments and expansion of data products on the ground

• Focus on maximizing science value and minimizing everything else

• Extended missions, reprocessing and campaigns

• Ground data system (GDS) cost (proposal and mission lifetime)

Page 5: The Cloud, “Big Data” and Supporting a Major Ground Data System on $5 a Day Douglas Hughes Jet Propulsion Laboratory, California Institute of Technology

Embracing a Project Challenge

• First - Win the proposal– or you have nothing

• The GDS development, sustainment and operating costs need to be minimized– or you won’t win the proposal

• Forecast/guess/plan the GDS capability you’ll need in 5 years plus mission life and what IT infrastructure services will be there to support it– or be prepared to deliver less or eat into the project

budget

Page 6: The Cloud, “Big Data” and Supporting a Major Ground Data System on $5 a Day Douglas Hughes Jet Propulsion Laboratory, California Institute of Technology

Polar Orbiting, Land Mass Observation S/C

RGS

DAAC

GDS

PI

14 Orbits/day (102.8 mins/orbit)16.3GB/orbit

14 Orbits/day16.3GB/orbit83TB/year

L1B – 60GB/orbitL2 – 60GB/orbit615TB/year

Reference MissionData Profile

(Processing & Distribution Only)

L0/L1/L2At EOM

2-Year Primary Missionw/ 3-Year Extended

Potential for 3PB Delivered

To PI

Page 7: The Cloud, “Big Data” and Supporting a Major Ground Data System on $5 a Day Douglas Hughes Jet Propulsion Laboratory, California Institute of Technology

RGS

GDS

PI

14 Orbits/day16.3GB/orbit83TB/year

L1B – 60GB/orbitL2 – 60GB/orbit615TB/year

Now with Cloud Goodness

JPL Has Life-of-Mission Archive

L0L1B L2

Processing

Page 8: The Cloud, “Big Data” and Supporting a Major Ground Data System on $5 a Day Douglas Hughes Jet Propulsion Laboratory, California Institute of Technology

Processing ProfileBenchmark Server

Intel, 2*3GHz, quad core32GB RAMInternal SAS disks

Single Orbit Metrics* Processing Times

*Simplified, Serial Processing

Seconds/orbit Minutes/day Minutes/monthL0->L1A 19.29 4.50 136.89L1A->L1B 47.48 11.08 336.96L1B->L2 24.69 5.76 178.58

91.45 21.34 652.42

Activity SecondsIngest L0 900.00 OC-3L0->L1A 19.29L1->L1B 47.48Deliver L1B 3,325.00 OC-3L1B->L2 24.69Deliver L2 3,325.00 OC-3Cleanup 180.00

Total 7,821.45Minutes 130.36

Page 9: The Cloud, “Big Data” and Supporting a Major Ground Data System on $5 a Day Douglas Hughes Jet Propulsion Laboratory, California Institute of Technology

Internal & Cloud Cost Profile• Internal Cost - $3.9k/month

– Based on JPL IT Hosting Charges

• 2 dedicated servers– Lease

• 1 VM (from internal Cloud)• 2TB FC partitions

• Startup time: 30-45 days

– Ready for developers

• Only commitment: 2 years on servers

• Flexibility: Limited by acquisition and internal resource limits

• Cloud Cost - $5.5k/month– Based on AWS (calculator)

• 3 High-memory Double Extra-Large VMs

• 4 Large VMs for file handling

• 2TB FC LUNs

• Data Transfer In (6.91TB) and Out* (51.25TB)

– The largest single cost component*

• Startup time: ~1-2 hours

– Ready for developers

• Commitment: N/A

• Flexibility: On demand

*AWS and JPL are looking at ways to reduce this cost and by the time of this project’s launch date (2016), it is expected this cost will be significantly less.

It is important to understand your cost for the different scenarios for the duration of the project.

Page 10: The Cloud, “Big Data” and Supporting a Major Ground Data System on $5 a Day Douglas Hughes Jet Propulsion Laboratory, California Institute of Technology

This is not your data center…

but you can “rent” its capacity and services,as much as you needand when you need it.

Image from http://www.sentecpro.com/lush_green_server.jpg

Page 11: The Cloud, “Big Data” and Supporting a Major Ground Data System on $5 a Day Douglas Hughes Jet Propulsion Laboratory, California Institute of Technology

Can Cloud Resources Help and How?• Re-processing & Campaigns

– Algorithm updates often cause re-processing of substantial amounts of data

– Ample processing resources available in both number and size

• Turn VMs on when you need them but definitely turn them off when you don’t

– Reprocess data using Cloud resources

• Does not disturb normal pipeline and allows system to “keep up” with on-going downlinks

• Distribution– Integrated content distribution can speed delivery of science data products to consumers

with no traffic crossing your Center’s border

– Cost based on volume

• Augmenting Internal Resources– Recovering from an internal outage – “catching up”

– “Keeping up” when additional internal resources are not available

• Prototype and Development– Rapid ramp-up

– Use and delete

Page 12: The Cloud, “Big Data” and Supporting a Major Ground Data System on $5 a Day Douglas Hughes Jet Propulsion Laboratory, California Institute of Technology

Security

• Sensitivity of data and algorithms can dictate sourcing including which Cloud service can be used

– ITAR/EAR, IP, competition-sensitive etc.

• Secure images can be built for the Cloud, even a public Cloud

• Application security– Auditing and logging– Authentication and authorization– Data communication and data protection– JPL IT Security Cloud Application Security Guideline

• Periodic Scanning– System and application

• Monitoring and Alerting

• Patching and Updating

• Cloud IT Security Plan

Page 13: The Cloud, “Big Data” and Supporting a Major Ground Data System on $5 a Day Douglas Hughes Jet Propulsion Laboratory, California Institute of Technology

Moving Forward• Learn from Others

– Industry

– NASA

– JPL Cloud Working Group

– “Lessons Learned”

• Learn by Doing– One good experiment…

• Start Small• Consider “mixed mode”• Appropriate work load in the appropriate Cloud

– JPL has developed a Cloud Application Suitability Model (CASM) to guide us with this

– Internal Cloud, Virtual Private Cloud (VPC), Public Cloud

• Measure everything• Cloud Brokering Service could eventually provide access to multiple Clouds, resulting

in favorable pricing and a wider choice of services from a single interface

Page 14: The Cloud, “Big Data” and Supporting a Major Ground Data System on $5 a Day Douglas Hughes Jet Propulsion Laboratory, California Institute of Technology

• Thank you!

• Questions?

• Contact: Douglas Hughes– [email protected]– (818) 354-1186