cloud computing lec7 v1...– bulk synchronous parallel model [valiant, 1990][jakovits et al, hpcs...
TRANSCRIPT
Basics of Cloud Computing – Lecture 7
More AWS, Serverless Computing
and Cloud Research
Satish Srirama
Outline
• More Amazon Web Services
• More on serverless computing
• Cloud based Research @ Mobile & Cloud Lab
3/27/2018 Satish Srirama 2/34
Cloud Providers and Services – we
already discussed• Amazon Web Services
– Amazon EC2
– Amazon S3
– Amazon EBS
– Amazon Elastic Load Balancing
– Amazon Auto Scale
– Amazon CloudWatch
• Eucalyptus
• OpenStack
• SciCloud
• Management providers– AWS Management Console
– OpenStack Horizon
– RightScale
• PaaS– Google AppEngine
– Windows Azure
3/27/2018 Satish Srirama 3/34
MORE AWS
3/27/2018 Satish Srirama 4
AWS we discuss
• AWS Management Console
• AWS Identity and Access Management
• AWS CloudFormation
• Amazon Elastic MapReduce
3/27/2018 Satish Srirama 5/34
AWS Management Console
• Hope some of you have started using Amazon accounts
• You can manage your complete Amazon account with management console– AMI Management
– Instance Management
– Security Group Management
– Elastic IP Management
– Elastic Block Store
– Key Pair management etc.
• Have different panes for different services
3/27/2018 Satish Srirama 6/34
AWS Management Console - screenshot
https://console.aws.amazon.com/
AWS Identity and Access Management
(IAM)
• How can an enterprise or group of people use a single credit card?
• Manage IAM users
– Create new users and manage them
– Create groups
• Manage permissions
– Creating policies
• Manage credentials
– Create and assign temporary security credentials
3/27/2018 Satish Srirama 8/34
IAM policy
• Example policy giving access to complete EC2
http://aws.amazon.com/iam/
3/27/2018 Satish Srirama 9/34
AWS CloudFormation
• Provides an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion
• It is based on templates model– Templates describe the AWS resources, the associated
dependencies, and runtime parameters to run an app.
– The templates describe stacks, which are set of software and hardware resources.
– Something similar to CloudML and RightScale server templates
• Hides several details– How the AWS services need to be provisioned
– Subtleties of how to make those dependencies work.
3/27/2018 Satish Srirama 10/34
AWS CloudFormation
• Amazon provides several pre-built templates to start common apps as:
– WordPress (blog)
– LAMP stack
– Gollum (wiki used by GitHub)
– …
• There is no additional charge for AWS CloudFormation. You pay for AWS resources (e.g. EC2 instances, Elastic Load Balancers, etc.) http://aws.amazon.com/cloudformation/
3/27/2018 Satish Srirama 11/34
Amazon Elastic MapReduce
• Last lecture already discussed it as “Data Processing as a Service”
• Web interface and command-line tools for running Hadoop jobs on EC2
• Data stored in Amazon S3
• Monitors job and shuts machines after use
• Running a job– Upload job jar & input data to S3
– Create the cluster
– Create a Job Flow as steps
– Wait for the completion and examine the results
3/27/2018
http://aws.amazon.com/elasticmapreduce/
Satish Srirama 12/34
Other interesting AWS
• Amazon Relational Database Service
– Provides access to the capabilities of familiar database engines
– MySQL, Oracle or Microsoft SQL Server
• NoSQL databases
– Simple DB
– DynamoDB
• AWS Snowball & Snowmobile
– Exabyte-scale data transfer services used to move extremely large amounts of data to AWS
3/27/2018 Satish Srirama 13/34
A BIT MORE ON SERVERLESS
COMPUTING
3/27/2018 Satish Srirama 14
3/27/2018 Satish Srirama 15
Serverless computing - continued
• Newer workloads are a better fit for event driven programming– Execute application logic in response to database triggers
– Execute app logic in response to sensor data
– Execute app logic in response to scheduled tasks etc.
• Applications are charged by compute time (millisecond) rather than by reserved resources
• Greater linkage between cloud resources used and business operations executed
• Serverless in a nutshell– Event-action platforms to execute code in response to
events
3/27/2018 Satish Srirama 16/34
Current platforms for serverless
3/27/2018 Satish Srirama 17/34
Apache OpenWhisk
• Initiated by IBM but now an Apache project
• Open source cloud platform
• Serverless deployment and operations model
• Optimal utilization and granular pricing
• Scales on a per-request basis
• Supports JS, Swift, Python, Java, Docker
3/27/2018 Satish Srirama 18/34
Triggers, actions, rules (and packages)
• Services or data sources define the events
they emit as triggers
• Developers associate the actions to handle
the events via rules
• Packages are a shared
collection of Triggers and
Actions
3/27/2018 Satish Srirama 19/34
Triggers and Rules
• Trigger examples
– changes to database records
– IoT sensor readings that exceed a certain
temperature
– new code commits to a GitHub repository
– simple HTTP requests from web or mobile apps
• Rule is an association of a trigger to an action
– Many to many mapping
3/27/2018 Satish Srirama 20/34
Actions
• They can be
– small snippets of JavaScript or Swift code
– custom binary code embedded in a Docker container
• Instantly deployed and executed whenever a trigger fires
• It is also possible to directly invoke an action by using the OpenWhisk API, CLI, or iOS SDK
• A set of actions can also be chained
3/27/2018 Satish Srirama
CLI – Command line interface
21/34
Actions - continued
• Create an action : wsk action create firstactionHello-Python.py
• Invoke an action : wsk action invoke firstaction --result
• Update an action : wsk action update firstactionHello-Python.py
3/27/2018 Satish Srirama 22/34
System overview
3/27/2018 Satish Srirama
https://github.com/apache/incubator-openwhisk/blob/master/docs/about.md
23/34
The internal flow of processing
For more information read
https://github.com/apache/incubator-openwhisk/blob/master/docs/about.md
CLOUD BASED RESEARCH @
MOBILE & CLOUD LAB
3/27/2018 Satish Srirama 25
Scientific Computing on the Cloud
• Public clouds provide very convenient access to computing resources– On-demand and in real-time
– As long as you can afford them
• High performance computing (HPC) on cloud– Virtualization and communication latencies are major
hindrances [Srirama et al, SPJ 2011; Batrashev et al, HPCS 2011]
• Things have improved significantly over the years
• Containerization is a good alternative
– Research at scale• Cost-to-value of experiments
3/27/2018 Satish Srirama 26/34
Communication pattern of Cluster vs
Cloud
• Cloud has huge troubles with communication/transmission latencies– Virtualization technology is the culprit
• Performance Comparison of virtual machines and Linux containers (e.g. Docker)
3/27/2018 Satish Srirama
[Srirama et al, SPJ 2011]
27/34
Migrating Scientific Workflows to the
Cloud
• Workflow can be represented as weighted directed acyclic graph (DAG)
• Partitioning the workflow into groups with graph partitioning techniques [Srirama and Viil, HPCC 2014]
– Such that the sum of the weights of the edges connecting to vertices in different groups is minimized
– Utilized Metis’ multilevel k-way partitioning
• Scheduling the workflows with tools like Pegasus– Considered peer-to-peer file manager (Mule) for Pegasus
• A framework is developed to help the scientist with the procedure [Viil and Srirama, JSC 2018]
Adapting Computing Problems to
Cloud• Reducing the algorithms to cloud computing frameworks
such as MapReduce [Srirama et al, FGCS 2012]
• Designed a classification on how the algorithms can be adapted to MR– Algorithm � single MapReduce job
• Monte Carlo, RSA breaking
– Algorithm � n MapReduce jobs• CLARA (Clustering), Matrix Multiplication
– Each iteration in algorithm � single MapReduce job• PAM (Clustering)
– Each iteration in algorithm � n MapReduce jobs • Conjugate Gradient
• Applicable especially for Hadoop MapReduce
3/27/2018 Satish Srirama 29/34
Issues with Hadoop MapReduce
• It is designed and suitable for:
– Data processing tasks
– Embarrassingly parallel tasks
• Has serious issues with iterative algorithms
– Long „start up“ and „clean up“ times ~17 seconds
– No way to keep important data in memory between MapReduce job executions
– At each iteration, all data is read again from HDFS and written back there at the end
– Results in a significant overhead in every iteration
3/27/2018 Satish Srirama 30/34
Alternative Approaches
• Restructuring algorithms into non-iterative versions
– CLARA instead of PAM [Jakovits & Srirama, Nordicloud 2013]
• Alternative MapReduce implementations that are designed to handle iterative algorithms [Jakovits and Srirama,
HPCS 2014]
– E.g. Twister, HaLoop, Spark [Distributed Data Processing on the Cloud - LTAT.06.005
(Fall 2018)]
• Alternative distributed computing models
– Bulk Synchronous Parallel model [Valiant, 1990] [Jakovits et al, HPCS
2013]
– Building a fault-tolerant BSP framework (NEWT) [Kromonov et
al, HPCS 2014]
3/27/2018 Satish Srirama 31/34
This week in lab
• Continue working with Google AppEngine
3/27/2018 Satish Srirama 32/34
Next Lecture
• Continue with Research on cloud
3/27/2018 Satish Srirama 33/34
References
• Check Amazon videos and webinars at http://aws.amazon.com/resources/webinars/
• Mike Roberts, “Serverless Architectures”, https://martinfowler.com/articles/serverless.html
• Abel Avram, “FaaS, PaaS, and the Benefits of the Serverless Architecture”, https://www.infoq.com/news/2016/06/faas-serverless-architecture
• Apache OpenWhisk - https://github.com/apache/incubator-openwhisk/blob/master/docs/about.md
• List of Publications - Satish Narayana Srirama - http://math.ut.ee/~srirama/publications.html
• [Srirama et al, FGCS 2012] S. N. Srirama, P. Jakovits, E. Vainikko: Adapting Scientific Computing Problems to Clouds using MapReduce, Future Generation Computer Systems Journal, 28(1):184-192, 2012. Elsevier press. DOI 10.1016/j.future.2011.05.025.
• [Jakovits and Srirama, HPCS 2014] P. Jakovits, S. N. Srirama: Evaluating MapReduce Frameworks for Iterative Scientific Computing Applications, The 2014 (12th) International Conference on High Performance Computing & Simulation (HPCS 2014), July 21-25, 2014, pp. 226-233. IEEE.
• [Kromonov et al, HPCS 2014] I. Kromonov, P. Jakovits, S. N. Srirama: NEWT - A resilient BSP framework for iterative algorithms on Hadoop YARN, The 2014 (12th) International Conference on High Performance Computing & Simulation (HPCS 2014), July 21-25, 2014, pp. 251-259. IEEE.
• [Srirama and Viil, HPCC 2014] S. N. Srirama, J. Viil: Migrating Scientific Workflows to the Cloud: Through Graph-partitioning, Scheduling and Peer-to-Peer Data Sharing, 16th Int. Conf. on High Performance and Communications (HPCC 2014) workshops, August 20-22, 2014, pp. 1137-1144. IEEE.
• [Viil and Srirama, JSC 2018] J. Viil, S. N. Srirama: Framework for Automated Partitioning and Execution of Scientific Workflows in the Cloud, The Journal of Supercomputing, ISSN: 0920-8542, 2018. Springer. DOI: 10.1007/s11227-018-2296-7 (In Print)
• [Jakovits and Srirama, Nordicloud 2013] P. Jakovits, S. N. Srirama: Clustering on the Cloud: Reducing CLARA to MapReduce, 2nd Nordic Symposium on Cloud Computing & Internet Technologies (NordiCloud 2013), September 02-03, 2013, pp. 64-71. ACM.
• [Jakovits et al, HPCS 2013] P. Jakovits, S. N. Srirama, I. Kromonov: Viability of the Bulk Synchronous Parallel Model for Science on Cloud, The 2013 (11th) International Conference on High Performance Computing & Simulation (HPCS 2013), July 01-05, 2013, pp. 41-48. IEEE.
• [Srirama et al, SPJ 2011] S. N. Srirama, O. Batrashev, P. Jakovits, E. Vainikko: Scalability of Parallel Scientific Applications on the Cloud, Scientific Programming Journal, Special Issue on Science-driven Cloud Computing, 19(2-3):91-105, 2011. IOS Press. DOI 10.3233/SPR-2011-0320.
• [Batrashev et al, HPCS 2011] O. Batrashev, S. N. Srirama, E. Vainikko: Benchmarking DOUG on the Cloud, The 2011 International Conference on High Performance Computing & Simulation (HPCS 2011), 2011, pp. 677-685. IEEE.
• [Valiant, 1990] L. G. Valiant: A bridging model for parallel computation, Commun. ACM, vol. 33, no. 8, pp. 103–111, Aug. 1990.