neon tnc2010, may 31, vilnius maarten koopmans for uninett sigma [email protected]
TRANSCRIPT
NEON
TNC2010, May 31, VilniusMaarten Koopmans for UNINETT Sigma
22
WhoWho•ING Group -2002•SURFnet 2002-6•ICTU (govt) 2006-8•vrijheid.net 2008-
•qtask.com•ibeamsystems.com•Uninett Sigma
•ING Group -2002•SURFnet 2002-6•ICTU (govt) 2006-8•vrijheid.net 2008-
•qtask.com•ibeamsystems.com•Uninett Sigma
Different mindsets
NEON Goals
state-of-the art of cloud computing;
cost of moving and running non-HPC jobs on a cloud computing environment;
how to do this in practice;
a list of identified risks/benefits on a short/long perspective.
Areas
Shortlists
#8: Why deliver?
Resource
Cost inMedium DC
≈ 1000 servers
Cost inVery Large DC
≈ 50,000 serversRatio
Network $95 / Mbps / month $13 / Mbps / month 7.1x
Storage $2.20 / GB / month $0.40 / GB / month 5.7x
Administration ≈140 servers/admin >1000 servers/admin 7.1x
#7 Design to fail
#6 USP: “root” - really?
#5 Can you trust it?
“Cloud computing is about gracefully losing control while maintaining accountability even if the operational responsibility falls upon one or more third parties.”
#4.5 “Core infra”
ComputingComputingComputingComputing
STORAGESTORAGE
Queues
#4 Public clouds lead
• Spot instances
• Elastic load balancing
• Virtual Private Cloud
• Elastic map-reduce
• Cloudfront
• .....
• SQS
• SimpleDB
• Cloudwatch
• Autoscaling
• RDS
#3: Management
#2 Keep an eye on Apache
ZooKeeper
#1 The USERS are key
Why again?
Let’s zoom in on storage
Storage: requirements
• No client to access the data.
•Transparent versioning.
•Transparent encryption - both transport and storage
•AAI integration
• Allow sharing of resources
Current Cloud storage
API based - complex for end users
No AAI integration at all
So...
AAI: enrollment
AAI
Cloudbacked storage
initial request
access grantedtoken returned (rotating?)
authenticate user user authenticated
“Just WebDAV”
Webdav daemon
Encryption
Resource naming
Versioning
Storage cloud
Continue
Map resource to hash code version = 1
Stream data metered through encryption ...........into the cloud
Metering
...Locking...
Webdav daemon Locking
Resource naming
Versioning
Return lock
Map resource to hash code Get reference to most recent version
Acquire lock
...downloading...
Webdav daemon Decryption
Resource naming
Versioning
Storage cloud
Continue
Get most recent version = 1
Stream data metered via decryption ...........from the cloud
Metering
Stand on the shoulders of giants• WebDAV widely deployed, lots of 3rd party clients.
• Service on top of Java VM
• Scala (integration language)
• Cloud access libraries (often Java based)
• Apache ZooKeeper (configuration management, locking)
• Apache Cassandra or HBase (metering)
• AAI integration components
• ...
Lessons learnt so far:Lessons learnt so far:• WebDAV is a nice start for client-less access to file based resources.
• CPU intensive due to the encryption per "stream". A language (model) with concurrency support is a big plus. This breaks the trend of asynchronous I/O based network services.
• Stand on the shoulders of giants: Apache Zookeeper, Bookkeeper, the JVM, Scala language, libraries for cloud access
• OS X requires DAV level 2 and does a lot of locking. But: from 10.5.x onwards it also does HTTP 1.1 chunked encoding; that broke a lot of servers.
• Windows works best with digest authentication.
• Linux seems to be most forgiving and least demanding.
• all clients support SSL.
• WebDAV's XML is relatively simple but the usage may differ per client type.
Cloud management
Computing• Torque/PBS on AWS via RightScale• OSGi on Eucalyptus• MPI• R
•Challenge: Matlab, BLAST etc. - how to deal with licensing?