applications david perezfaculty.washington.edu/wlloyd/courses/tcss562/g2f19.pdf · medical image...
TRANSCRIPT
A Programming Model and Middleware for High Throughput Serverless Computing
ApplicationsAuthors: Alfonso Pérez Germán Moltó Miguel Caballer Amanda Calatrava Instituto deInstrumentación para Imagen Molecular (I3M)
David PerezDavid FosterRashad HatchettSameer PuriGroup 2
Outline
What is the proposed High Throughput Computing Programming Model and how does it enable legacy code to be run on serverless platforms?
What is the SCAR framework and how it simplifies and automates the application deployment process?
Test cases and results2
Introduction
Commercial large-scale computing infrastructure and container-based technologies hold great potential for scientific applications
Serverless computing can be highly parallelized which greatly benefits certain data-processing applications
3
Problems
1. Effectively utilizing serverless computing resources requires certain expertisea. Complicated configuration and usage can deter average users
2. Not all legacy code can be easily repurposed to fit the FaaS modela. Functionality has to be broken down into a set of event-driven functions
which imposes an extra development cost and prevents the migration of some applications
3. Cloud providers impose restrictions on language runtimesa. If your app is written in an unsupported language, too bad :( 4
The BIG Problem
There are groups of users and legacy applications that can’t take advantage of the benefits serverless computing has to offer
5
Research Questions
How can we simplify the application deployment process and allow even average users to easily run existing code on FaaS platforms?
How can we give more High Throughput Computing applications such as medical image processors and other highly-parallel scientific workloads access to the scalability, power, and parallelism of serverless computing?
The more people that have access to the resources available, the better.
6
Background / Related Work
Analysis of serverless computing offerings -- McGrath et al. [24] and Gannon [14]
Web service tools for serverless computing -- Up [4] and OpenLambda [20]
Case studies of data analytics over serverless platforms -- Glikson [15]
Legacy code challenges and lack of serverless patterns -- Baldini et al. [6]
Methods to define serverless functions for AWS Lambda -- (AWS SAM) [3]
7
Summary
The authors introduce a new serverless programming model combined with SCAR middleware.
With their model they address the issues of:
i) the difficulty of running legacy code on serverless platforms
ii) the lack of patterns for building serverless solutions
8
SCAR Framework
SCAR stands for Serverless Container Aware Architectures.
It requires roughly 36 MB of RAM and 16 MB of disk space.
Not all containers can be used. The docker image has a size limitation (220 MB is pushing it).
9
SCAR Framework (2)
SCAR is divided into two parts, a client and a supervisor.
The client is a python script which:
i) validates user input information.
ii) creating the deployment package, which includes udocker (a tool to execute containers without root privileges);
iii) creates the Lambda function containing the SCAR Supervisor; 10
SCAR Framework (3)
iv) provides access to Logs generated by invocations of the Lambda function
v) provides means for the user to manage the lifecycle of the Lambda function (init, list, run, delete)
vi) manages the configuration to trigger events from an S3 bucket to a Lambda function
11
SCAR supervisor
The supervisor is the code representation of the lambda function, set to run in python.
The supervisor will:
i) retrieve the Docker image from Docker Hub using udocker into lambda allocated temporary space.
ii) create the container out of the Docker image and sets the appropriate execution mode for udocker; 12
SCAR supervisor(2)
iii) If triggered from S3, manages the staging of input data into container and the stage out of the output results back into S3;
iv) passes down the environment variables to the container.
v) Passes the generated output to cloudwatch logs.
13
14
1. Using SCAR client, user creates the lambda function, specifying the Docker image and the script.
2. Using SCAR client, the user invokes the lambda function by specifying the S3 bucket and a folder with files.
3. The SCAR client automatically performs invocations of the lambda function.
1515
4/5.The SCAR supervisor deployed inside the lambda function retrieves the input file from the S3 bucket.
6. Each lambda invocation, through the SCAR supervisor, executes the container and runs the specified script.
7. The last step of the execution consists of transferring the output files from the lambda function to the S3 bucket.
Key Contributions
The author’s proposed model in combination with the SCAR middleware...
1. Makes deploying legacy applications easier because it removes requirement to re-write code to fit FaaS model
2. Allows a greater range of applications to take advantage of cloud resources
3. Simplifies the process enough so that average users can utilize it16
Testing Approach
Image processing workload with (1, 10, 100, 1000) images
Video analysis workload with (1, 10, 100, 1000) keyframes
Remedy cold start by warming up containers
Collect execution times for each experiment
Do cost analysis of 1000 image processing workload17
Testing Environments
18
Hardware CPU RAM
Local PC 4 8GB
EC2 (c5.large) 2 (virtual) 4GB
EC2 (c5.18x.large) 72 (virtual) 144GB
Lambda 2 (virtual) per instance 3008MB
Image Processing Results
19
Video Analysis Results
20
Cost Comparison
21
Evaluation
c5.18xlarge yields solid performance due to the number of cores available, but Lambda wins out on batches of 100 or more
For this 1000 image job, Lambda was:
● ~721x faster than C5.large and ~20x faster than c5.18xlarge● ~2.86x more expensive than a C5.large and ~2.89x more than a
C5.18xlarge
22
Conclusions
AWS Lambda offers easy parallelism in combination with the SCAR middleware which traditional IaaS can’t match
The AWS Lambda approach is more expensive than EC2 instances, but is easier to configure and launch, especially when used in combination with SCAR
1000 image job required no additional configuration or services which EC2 would have required, so this can be seen as a tradeoff
23
Critique: Strengths
The work required to setup Lambdas, buckets, and API Gateway endpoints is very condensed
Usage of containers allows users to create customized runtime environments, effectively removing the limitation imposed by cloud providers
The immediate access to huge amounts of scalability, the performance increase, and relatively low cost increases are compelling
24
Critique: Weaknesses
Made mention of some open-source frameworks but never detailed their use
Subject to the memory/execution time restrictions of the FaaS provider
If the containers for the services were made available, the test conditions would be reproducible
No quantifiable analysis done to assess the ease of use25
Critique: Evaluation
The paper does a decent job outlining the usage of the proposed model and middleware, but could have a better analysis of its effectiveness
The paper has no test case showcasing the ability to run an application that cannot be repurposed into lambda, which the paper does address as a problem it aims to solve
26
Identify GAPS
Doesn’t address vendor lock-in
Limited to applications in containers
Requires familiarity with parallel computing
27
Questions?
28