cloud computing lec7 v1...– bulk synchronous parallel model [valiant, 1990][jakovits et al, hpcs...

Basics of Cloud Computing – Lecture 7

More AWS, Serverless Computing

and Cloud Research

Satish Srirama

Outline

• More Amazon Web Services

• More on serverless computing

• Cloud based Research @ Mobile & Cloud Lab

3/27/2018 Satish Srirama 2/34

Cloud Providers and Services – we

already discussed• Amazon Web Services

– Amazon EC2

– Amazon S3

– Amazon EBS

– Amazon Elastic Load Balancing

– Amazon Auto Scale

– Amazon CloudWatch

• Eucalyptus

• OpenStack

• SciCloud

• Management providers– AWS Management Console

– OpenStack Horizon

– RightScale

• PaaS– Google AppEngine

– Windows Azure


MORE AWS

3/27/2018 Satish Srirama 4

AWS we discuss

• AWS Management Console

• AWS Identity and Access Management

• AWS CloudFormation

• Amazon Elastic MapReduce


AWS Management Console

• Hope some of you have started using Amazon accounts

• You can manage your complete Amazon account with management console– AMI Management

– Instance Management

– Security Group Management

– Elastic IP Management

– Elastic Block Store

– Key Pair management etc.

• Have different panes for different services


AWS Management Console - screenshot

https://console.aws.amazon.com/

AWS Identity and Access Management

(IAM)

• How can an enterprise or group of people use a single credit card?

• Manage IAM users

– Create new users and manage them

– Create groups

• Manage permissions

– Creating policies

• Manage credentials

– Create and assign temporary security credentials


IAM policy

• Example policy giving access to complete EC2

http://aws.amazon.com/iam/


AWS CloudFormation

• Provides an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion

• It is based on templates model– Templates describe the AWS resources, the associated

dependencies, and runtime parameters to run an app.

– The templates describe stacks, which are set of software and hardware resources.

– Something similar to CloudML and RightScale server templates

• Hides several details– How the AWS services need to be provisioned

– Subtleties of how to make those dependencies work.


AWS CloudFormation

• Amazon provides several pre-built templates to start common apps as:

– WordPress (blog)

– LAMP stack

– Gollum (wiki used by GitHub)

– …

• There is no additional charge for AWS CloudFormation. You pay for AWS resources (e.g. EC2 instances, Elastic Load Balancers, etc.) http://aws.amazon.com/cloudformation/


Amazon Elastic MapReduce

• Last lecture already discussed it as “Data Processing as a Service”

• Web interface and command-line tools for running Hadoop jobs on EC2

• Data stored in Amazon S3

• Monitors job and shuts machines after use

• Running a job– Upload job jar & input data to S3

– Create the cluster

– Create a Job Flow as steps

– Wait for the completion and examine the results

3/27/2018

http://aws.amazon.com/elasticmapreduce/

Satish Srirama 12/34

Other interesting AWS

• Amazon Relational Database Service

– Provides access to the capabilities of familiar database engines

– MySQL, Oracle or Microsoft SQL Server

• NoSQL databases

– Simple DB

– DynamoDB

• AWS Snowball & Snowmobile

– Exabyte-scale data transfer services used to move extremely large amounts of data to AWS


A BIT MORE ON SERVERLESS

COMPUTING


Serverless computing - continued

• Newer workloads are a better fit for event driven programming– Execute application logic in response to database triggers

– Execute app logic in response to sensor data

– Execute app logic in response to scheduled tasks etc.

• Applications are charged by compute time (millisecond) rather than by reserved resources

• Greater linkage between cloud resources used and business operations executed

• Serverless in a nutshell– Event-action platforms to execute code in response to

events


Current platforms for serverless


Apache OpenWhisk

• Initiated by IBM but now an Apache project

• Open source cloud platform

• Serverless deployment and operations model

• Optimal utilization and granular pricing

• Scales on a per-request basis

• Supports JS, Swift, Python, Java, Docker


Triggers, actions, rules (and packages)

• Services or data sources define the events

they emit as triggers

• Developers associate the actions to handle

the events via rules

• Packages are a shared

collection of Triggers and

Actions


Triggers and Rules

• Trigger examples

– changes to database records

– IoT sensor readings that exceed a certain

temperature

– new code commits to a GitHub repository

– simple HTTP requests from web or mobile apps

• Rule is an association of a trigger to an action

– Many to many mapping


Actions

• They can be

– small snippets of JavaScript or Swift code

– custom binary code embedded in a Docker container

• Instantly deployed and executed whenever a trigger fires

• It is also possible to directly invoke an action by using the OpenWhisk API, CLI, or iOS SDK

• A set of actions can also be chained

3/27/2018 Satish Srirama

CLI – Command line interface

21/34

Actions - continued

• Create an action : wsk action create firstactionHello-Python.py

• Invoke an action : wsk action invoke firstaction --result

• Update an action : wsk action update firstactionHello-Python.py


System overview


https://github.com/apache/incubator-openwhisk/blob/master/docs/about.md

23/34

The internal flow of processing

For more information read

https://github.com/apache/incubator-openwhisk/blob/master/docs/about.md

CLOUD BASED RESEARCH @

MOBILE & CLOUD LAB


Scientific Computing on the Cloud

• Public clouds provide very convenient access to computing resources– On-demand and in real-time

– As long as you can afford them

• High performance computing (HPC) on cloud– Virtualization and communication latencies are major

hindrances [Srirama et al, SPJ 2011; Batrashev et al, HPCS 2011]

• Things have improved significantly over the years

• Containerization is a good alternative

– Research at scale• Cost-to-value of experiments


Communication pattern of Cluster vs

Cloud

• Cloud has huge troubles with communication/transmission latencies– Virtualization technology is the culprit

• Performance Comparison of virtual machines and Linux containers (e.g. Docker)


[Srirama et al, SPJ 2011]

27/34

Migrating Scientific Workflows to the

Cloud

• Workflow can be represented as weighted directed acyclic graph (DAG)

• Partitioning the workflow into groups with graph partitioning techniques [Srirama and Viil, HPCC 2014]

– Such that the sum of the weights of the edges connecting to vertices in different groups is minimized

– Utilized Metis’ multilevel k-way partitioning

• Scheduling the workflows with tools like Pegasus– Considered peer-to-peer file manager (Mule) for Pegasus

• A framework is developed to help the scientist with the procedure [Viil and Srirama, JSC 2018]

Adapting Computing Problems to

Cloud• Reducing the algorithms to cloud computing frameworks

such as MapReduce [Srirama et al, FGCS 2012]

• Designed a classification on how the algorithms can be adapted to MR– Algorithm � single MapReduce job

• Monte Carlo, RSA breaking

– Algorithm � n MapReduce jobs• CLARA (Clustering), Matrix Multiplication

– Each iteration in algorithm � single MapReduce job• PAM (Clustering)

– Each iteration in algorithm � n MapReduce jobs • Conjugate Gradient

• Applicable especially for Hadoop MapReduce


Issues with Hadoop MapReduce

• It is designed and suitable for:

– Data processing tasks

– Embarrassingly parallel tasks

• Has serious issues with iterative algorithms

– Long „start up“ and „clean up“ times ~17 seconds

– No way to keep important data in memory between MapReduce job executions

– At each iteration, all data is read again from HDFS and written back there at the end

– Results in a significant overhead in every iteration


Alternative Approaches

• Restructuring algorithms into non-iterative versions

– CLARA instead of PAM [Jakovits & Srirama, Nordicloud 2013]

• Alternative MapReduce implementations that are designed to handle iterative algorithms [Jakovits and Srirama,

HPCS 2014]

– E.g. Twister, HaLoop, Spark [Distributed Data Processing on the Cloud - LTAT.06.005

(Fall 2018)]

• Alternative distributed computing models

– Bulk Synchronous Parallel model [Valiant, 1990] [Jakovits et al, HPCS

2013]

– Building a fault-tolerant BSP framework (NEWT) [Kromonov et

al, HPCS 2014]


This week in lab

• Continue working with Google AppEngine


Next Lecture

• Continue with Research on cloud


References

• Check Amazon videos and webinars at http://aws.amazon.com/resources/webinars/

• Mike Roberts, “Serverless Architectures”, https://martinfowler.com/articles/serverless.html

• Abel Avram, “FaaS, PaaS, and the Benefits of the Serverless Architecture”, https://www.infoq.com/news/2016/06/faas-serverless-architecture

• Apache OpenWhisk - https://github.com/apache/incubator-openwhisk/blob/master/docs/about.md

• List of Publications - Satish Narayana Srirama - http://math.ut.ee/~srirama/publications.html

• [Srirama et al, FGCS 2012] S. N. Srirama, P. Jakovits, E. Vainikko: Adapting Scientific Computing Problems to Clouds using MapReduce, Future Generation Computer Systems Journal, 28(1):184-192, 2012. Elsevier press. DOI 10.1016/j.future.2011.05.025.

• [Jakovits and Srirama, HPCS 2014] P. Jakovits, S. N. Srirama: Evaluating MapReduce Frameworks for Iterative Scientific Computing Applications, The 2014 (12th) International Conference on High Performance Computing & Simulation (HPCS 2014), July 21-25, 2014, pp. 226-233. IEEE.

• [Kromonov et al, HPCS 2014] I. Kromonov, P. Jakovits, S. N. Srirama: NEWT - A resilient BSP framework for iterative algorithms on Hadoop YARN, The 2014 (12th) International Conference on High Performance Computing & Simulation (HPCS 2014), July 21-25, 2014, pp. 251-259. IEEE.

• [Srirama and Viil, HPCC 2014] S. N. Srirama, J. Viil: Migrating Scientific Workflows to the Cloud: Through Graph-partitioning, Scheduling and Peer-to-Peer Data Sharing, 16th Int. Conf. on High Performance and Communications (HPCC 2014) workshops, August 20-22, 2014, pp. 1137-1144. IEEE.

• [Viil and Srirama, JSC 2018] J. Viil, S. N. Srirama: Framework for Automated Partitioning and Execution of Scientific Workflows in the Cloud, The Journal of Supercomputing, ISSN: 0920-8542, 2018. Springer. DOI: 10.1007/s11227-018-2296-7 (In Print)

• [Jakovits and Srirama, Nordicloud 2013] P. Jakovits, S. N. Srirama: Clustering on the Cloud: Reducing CLARA to MapReduce, 2nd Nordic Symposium on Cloud Computing & Internet Technologies (NordiCloud 2013), September 02-03, 2013, pp. 64-71. ACM.

• [Jakovits et al, HPCS 2013] P. Jakovits, S. N. Srirama, I. Kromonov: Viability of the Bulk Synchronous Parallel Model for Science on Cloud, The 2013 (11th) International Conference on High Performance Computing & Simulation (HPCS 2013), July 01-05, 2013, pp. 41-48. IEEE.

• [Srirama et al, SPJ 2011] S. N. Srirama, O. Batrashev, P. Jakovits, E. Vainikko: Scalability of Parallel Scientific Applications on the Cloud, Scientific Programming Journal, Special Issue on Science-driven Cloud Computing, 19(2-3):91-105, 2011. IOS Press. DOI 10.3233/SPR-2011-0320.

• [Batrashev et al, HPCS 2011] O. Batrashev, S. N. Srirama, E. Vainikko: Benchmarking DOUG on the Cloud, The 2011 International Conference on High Performance Computing & Simulation (HPCS 2011), 2011, pp. 677-685. IEEE.

• [Valiant, 1990] L. G. Valiant: A bridging model for parallel computation, Commun. ACM, vol. 33, no. 8, pp. 103–111, Aug. 1990.

cloud computing lec7 v1...– bulk synchronous parallel model [valiant, 1990][jakovits et al, hpcs...

Documents