cloud computing for education and research customized cloud platform for computing on your terms !...
TRANSCRIPT
Cloud Computing for Education and ResearchCustomized cloud platform for computing on your terms !
CSUPERB symposium, Jan 3rd 2013• Nirav Merchant ([email protected]) • Andre Mercer ([email protected])
Topic Coverage• Introduction to cloud computing• Challenges and Features • iPlant Atmosphere• iPlant Data Store• Designing course work and training material
using Atmosphere and Data Store• Using Atmosphere + Data Store (hands on)• Explore use of these resources on your own
and ask questions !
Survey Results
• 12 responses• I would like to use cloud based resources for:
teaching (1) research (10) both (10)• I have used cloud resources before:
Yes (1) No (11)• I have tuned the content to best of my ability to
what participants have requested • I promised not to use much jargon, please stop
and ask questions ANY time
• We have designed iPlant to be consistent with the pillars of CIF21High Performance ComputingData and Data AnalysisVirtual OrganizationLearning and Workforce
The iPlant CollaborativeCyberinfrastructure Philosophy
Typical EndUsers
ComputationalUsers
TeragridXSEDE
The iPlant CollaborativeCyberinfrastructure for the Plant Sciences
• For a challenge as broad as “plant science,” focus on specific applications/tools is a moving target, and never enough.
• Most important to build a platform that can support diverse and constantly evolving needs. “Cyberinfrastructure” is, in fact, infrastructure. The platform can lift all the apps, not select winners and losers.
“The useful lifetime of our analysis toolchains is now 6 months” -Matthew Trunnel, Broad Institute
The iPlant CollaborativeCyberinfrastructure for the Plant Sciences
The iPlant CollaborativeCyberinfrastructure for the Plant Sciences
• The iPlant CI is designed as infrastructure. • This means it is a platform upon which other projects
can build. • Use of the iPlant infrastructure can take one of several
forms: Storage Computation Hosting Web Services Scalability
The iPlant CollaborativeWays to access iPlant
• Atmosphere: For cloud infrastructure• iPlant Data Storage: All data large and small• The Discovery Environment: Integrated Web apps. • MyPlant: Social Networking. • DNASubway: Annotation and more• Standalone Apps: TNRS, TreeViewer, PhytoBisque, etc• The API: For programmers embedding iPlant CI
capabilities• Command line for experts (through TeraGrid/XSEDE)
Cloud Computing• Not a singular technology component• Not a black box or alien technology• Not a “elixir of scalability”, “panacea for Big
Data” etc.• It cannot keep growing and scaling without
planning (and architecting your application)• Unfortunate victim of marketing hype• Further complicated by use of jargon, TLA,
private cloud, community cloud, hybrid cloud …
What is cloud computing ?
http://geekandpoke.typepad.com/geekandpoke/2009/03/let-the-clouds-make-your-life-easier.html
Cloud Computing
• Amazingly flexible technology• It’s a platform that comprise of many uniquely flexible
components (more later)• Allows us to create “purpose built appliances”• Allows us to finally “script our infrastructure”• Allows mixing and matching of components that you
need to do your science• Opens up many new avenues and approaches for
teaching topics that require complex (pre configured) software tools and data
Often overheard
I do my analysis using the “cloud”
It’s the close equivalent of saying: I do my research using “science”
13
Cloud Computing Zen• Don’t get frustrated…
– This is cutting (bleeding) edge technology– There will be plenty of WTF#$@ moments
• Be patient… – Instructions/infrastructure keep changing (s/w version)
• Be flexible…– There will be unanticipated issues along the way
• Be constructive…– Use wiki, forums and share knowledge– Make everyone’s experience better
• Be creative…– There is more than one way to do it (TIM-TOWTDI)
iPlant URL’s you should know
• Wiki.iplantcollaborative.org• Forum.iplantcollaborative.org• ask.iplantcollaborative.org• www.iplantcollaborative.org
Impromptu survey
• How many of you use command line• How many of you are windows, mac, linux
users ?• How many of you use HPC ? (or know what
HPC is)• What campus resources do you use to teach
computing based workshops/courses
Atmosphere: motivation• Standalone GUI-based applications are frequently
required for analysis • GUI apps not easily to transform into web apps• Need to handle complex software dependencies
(e.g specific bioperl version and R modules)• Users needing full control of their software stack
(occasional sudo access)• Need to share desktop/applications for
collaborative analysis (remote collaborators)• Availability of Next Gen map-reduce based
algorithms (currently we have limited support)
17
SaaS: Software as a Service(e.g. Clustering/Assembly is a service)
IaaS: Infrastructure as a Service (get computer time with a credit card and with a Web interface like EC2)
PaaS: Platform as a ServiceIaaS plus core software capabilities on which you build SaaS
(e.g. Hadoop/MapReduce is a Platform)
Cyberinfrastructure Is “Research as a Service”
http://salsahpc.indiana.edu
As a Service modelsM
ore
Pain
More
Flexib
ility Pro
du
ctiv
ity
But where do I start ?• Not very helpful searching for “cloud computing ”
related terms (as you will most likely get bombarded by commercials and advertisements in the first few hits !)
• NIST: National Institute of Standards and Technology Cloud Computing Synopsis and Recommendations(Special Publication 800-146 : May 2012)http://www.nist.gov/customcf/get_pdf.cfm?pub_id=911075
What it is
Challenges of existing platforms
• Amazon Web Services (AWS)http://aws.amazon.com/
• Flexible and scalable • High level of expertise required for
configurations• Fairly challenging for biologists to master all
steps• Limited lifecycle management (cost, time)
Steps to get started !
What is Atmosphere ?• Self-service cloud infrastructure • Designed to make underlying cloud
infrastructure easy to use by novice user• Built on open source Eucalyptus (OpenStack)• Fully integrated into iPlant authentication and
storage and HPC capabilities• Enables users to build custom
images/appliances and share with community• Cross-platform desktop access to GUI
applications in the cloud (using VNC)• Provide easy web based access to resources
Who is this tutorial designed for ?
• Users wanting to launch configured images in atmosphere (like app store)
• Developers for application distribution• Prototyping/Testing new software/modules• Tailored software training setups (custom
workshops/laboratory courses etc)• Extend compute capabilities of existing
applications i.e. utilize iPlant API
• API-compatible implementation of Amazon EC2/S3 interfaces
• Virtualize the execution environment for applications and services
• Up to 12 core / 48 GB instances• Access to Cloud Storage + EBS• Run servers, CloudBurst desktop use
cases. Big data and the desktop are co-local again!
>60 hosted applications in Atmosphere today, including users from USDA, Forest Service, database providers, etc.
(30 more for postdocs and grad students for training classes)
The iPlant CollaborativeProject Atmosphere™: Custom Cloud Computing
Atmosphere: Collaboration
iPlant Data Store
Lifecycle
Users of Atmosphere for teaching• Workshops:
– Frontiers and Techniques in Plant SciencesCSHL 2011,2012
– Genotyping by Sequencing Cornell Computational Biology
• Graduate/U. Graduate course work:– BCB 660 Volker Brendel and Amy Toth
Fall 2011, Iowa State University– ISTA 420/520 Nirav Merchant & Eric Lyons
Fall 2012, Univ. of Arizona– Intro. Bioinformaics, Anne Lorraine
Fall 2012l Univ. of North Carolina• Popular community contributed images:
– PhytoMorph (Nate Miller, U. Wisconsin)– Twig2Genome (Haibao Tang, JCVI)– Julin Maloof, UC Davis*
Courses Using Atmosphere
Asian Wild Rice DistributionThe Research• Genetic studies documented
geographic subdivision of Asian wild rice ( Oryza rufipogon ), the progenitor of cultivated Asian rice.
– Cause unknown. • Use species distribution modeling
(SDM) to examine environmental factors associated with the spatial and temporal distribution of O. rufipogon.
• Compare estimated distribution during Last Glacial Maximum (LGM) to genetic data.
Problem• Analysis requires large datasets
Results• Present distribution of O. rufipogon
(Fig. A).• Projected paleodistribution at LGM was
separated into disconnected east and west ranges (Fig. B).
– Consistent with current geographic pattern of genetic variation, with two genetic groups that intergrade (Fig. D).
• Annual precipitation contributes most to SDM estimates.
• SDM projections for year 2080 indicate an increasing probability of presence and range expansion (Fig. C).
– Indicates global warming is less threat to this endangered species than other human-mediated factors.
Scalable science• 325 records of O. rufipogon sample
locations from two sources.
iPlant enabled Huang and Schaal to successfully pursue this research.
(A) present, (B) Last Glacial Maximum, (C) Future 2080, (D) Genetic variation.
iPlant Workshop at BSA, July 2011• Pu Huang (Washington U.) attended.• Learned about Atmosphere, iPlant’s
cloud computing platform.
P Huang and B.A. Schaal, Am. J. Botany 99(11). 2012.
Hands On Lab
Atmosphere Login
• Visit http://www.iplantcollaborative.org/
• Next click on the Atmosphere Login Image (should be about mid page)
Click the Login button and enter your iPlant username and password
Atmosphere Intro screenClick the Launch New Instance Button
1. Search for NGS Viewers v3 08/20/2012(an instance type) and select the purple icon. 2. Give it a name and select the instance size (choose m1.small).
• By selecting different sizes you will notice project resources change.3. When ready, press the Launch Instance
Understanding Instance Metrics• After an image has launched, you can view information about it.• Resource Usage Metrics
– My Resource Usage at the top of the screen shows how much of your quota in CPUs and GB of memory is being used by your running instances. (Seen at the top)
– Instance Details
• The Instance Details tab displays important information about the instance, including the ID assigned to the instance when it was launched, name of the image it is using, unique EMI ID, the instance size, the date you launched the image, and the IP address, which you will need when logging in to the instance.
– Instance Metrics
• Instance Metrics allow you to drill down into the usage expended for the running image.
Logging into an Instance
• Via ssh- If the Shell tab is disabled, you can log into your instance via SSH for you operating system.
• In your terminal window type: $ssh your_iplant_username@instance_ip_address
For example, mine would look like: $ssh [email protected] Enter your iPlant password and you should be logged into your instance
Terminating an Instance
• Click instance to terminate in the My Instances list.
• Either– Click the Terminate Instance icon in your My
Instances list or
– Click the Terminate Instance button on the Data tab.
• Click OK to the warning message.
Requesting More Resources• Enter the amount or resources you are
requesting.• Enter the justification for the request.• Click the Request Resources button (right side
of page).– Your request will be reviewed and you will receive
a response within 2 working days.
Reporting an Instance Problem
• Select the instance which you are having problems with.
• Click report instance
• Fill out the Instance Error form.
• When finished, press the Report this Instance button.
Logging in via VNC
• Airport VNC runs a built-in Java VNC viewer from a web browser within the Atmosphere Airport interface and requires Java. This is the more common use.
• Select the VNC tab – If prompted, allow the Java applet to run
• In the VNC Server field, enter the IP address for your instance, appending :1 after the IP address (should be auto-populated already). Press connect.
Enter your username and password
Here you have successfully logged via VNC.
Terminating a VNC session
• You can terminate a VNC Viewer session either from the VNC tab in Airport or from the VNC Viewer application window.
• To terminate the session from Airport: Click the 'X' from the My Instances list or from the VNC tab:
Hands on exercise• Launching a instance (one per team)• Connecting to it (vnc and ssh) using the web browser
and vnc client software• Bringing data from iDS to Atmosphere (use idrop or
icommands)• Launching a application• Installing a new application (optional)• Saving data back to iDS• Collaborating with other users (sharing your session)• Terminating the instance when you are done