![Page 1: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/1.jpg)
Serverless Platforms
Diptanu Choudhury@diptanu
1
QCon New York
![Page 2: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/2.jpg)
2
![Page 3: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/3.jpg)
The GOES Mission
A lineage of geostationary satellites
Current mission is called GOES-R
ABI views earth with 16 different spectral bands
![Page 4: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/4.jpg)
GOES-R Products
• Atmospheric events
• Fire control
• Vegetation Monitoring
• NASA makes most of the data available to research community
![Page 5: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/5.jpg)
5Product Development Workflow
![Page 6: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/6.jpg)
6Product Development Workflow
Iterative Experimentationbased Model Development
Deployment of Models in Production and Image Processing
Data serving and
Collaboration APIs
![Page 7: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/7.jpg)
7The organization
Science Engineering
• Plans mission objectives
• Develop models and products
• Conducts research and drives collaboration
• Operates infrastructure to run models
• Design systems to scale to petabytes of data and 10sof 1000s of cores.
• Design real time services for services to consume data
• Dev-Infra to improve productivity of researchers
![Page 8: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/8.jpg)
Infrastructure
• Cluster Schedulers that can scale to 10s of 1000s of cores
• Storage services that can scale to petabytes of data
• Interactive batch processing systems
• Services to stream real time information
• APIs to access data from various transformation jobs
![Page 9: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/9.jpg)
9
Job Management System
Authorization/Authentication
Service
Identity Management
Telemetry System
Compute Node
Compute Node
Object Store
Jobs
![Page 10: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/10.jpg)
10
![Page 11: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/11.jpg)
Infrastructure Stack of Circa 2017
Locking and Co-ordination Management
Cluster Schedulers
Block Storage Service Discovery
Telemetry Services
Object Store
MicroservicesBatch Processing
LBaaS
Identity Management
Tracing Services
Applications
The Platform
![Page 12: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/12.jpg)
Platforms as a Service
Locking and Co-ordination Management
Cluster Schedulers
Block Storage Service Discovery
Telemetry Services
Object Store
MicroservicesBatch Processing
LBaaS
Identity Management
Tracing Services
Applications
Release Management
Build Infrastructure
Configuration Management
![Page 13: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/13.jpg)
13
Serverless Architecture
![Page 14: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/14.jpg)
Serverless Architecture
Function ControllerRequests
Application Logic
Application Container
Payload
![Page 15: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/15.jpg)
Serverless Architecture
• Incremental evolution over PAAS
• Serverless from the perspective of users
• Clear separation of concerns between infrastructure engineering and application developers
![Page 16: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/16.jpg)
Serverless Platform
• Optimistically concurrent Cluster Scheduler
• Application Container
• Function Controller
• RPC Middleware
• Function Invocation shims in data sources
• Packaging and distribution service
![Page 17: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/17.jpg)
Cluster Scheduler
• Low latency and high throughput scheduling
• No head of line blocking
• Optimistically concurrent
• Scale up to a large number of containers
![Page 18: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/18.jpg)
18
![Page 19: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/19.jpg)
19
THE PATH TO PRODUCTION FOR A SCHEDULER IS FILLED WITH PAIN
CHOOSE AN EXISTING SCHEDULER IF YOU CAN
![Page 20: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/20.jpg)
Event Driven Scheduler
Evaluations ~= State Change Event
![Page 21: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/21.jpg)
Data Model
![Page 22: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/22.jpg)
22
External Event
Evaluation Creation
Evaluation Queuing
Evaluation Processing
Optimistic Coordination
State Updates
![Page 23: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/23.jpg)
Templated Jobs
• Can be used for low throughput use cases
• Expensive to start new processes
• Bounded by network operations of scheduler
![Page 24: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/24.jpg)
Variability
• Sharing resources
• Daemons
• Global resources
• Maintainance activities
• Queuing
• Power Limits
![Page 25: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/25.jpg)
25Source: Tale at Scale
![Page 26: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/26.jpg)
Reduce Variability
• Differentiate service classes
• Break down long running functions into small ones.
• Minimize background activity
![Page 27: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/27.jpg)
Application Containers
• Application containers range from language sandboxes to full blown containers
• Reduced surface area helps in making the UX better
• Application containers have function invocation middleware which are packaged during build
![Page 28: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/28.jpg)
Function Controller
• Maintains the network topology of functions
• Optimizes for low latency and not cost
• Works with the scheduler to scale containers
• Works with telemetry system to scale underlying clusters
![Page 29: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/29.jpg)
Software Load Balancer
• Service discovery aware LB
• Reactive Socket protocol for delivery of request payloads to function containers
• Doesn't effect container life cycles
• Drops requests which doesn't have any containers associated
![Page 30: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/30.jpg)
Reactive Sockets
• RSocket is a means for asynchronous communication between functions and data sources
• Messages are multiplexed over a single connection
• Flexible interaction models
• Provides mechanisms for back pressure and request cancellation
![Page 31: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/31.jpg)
RPC Middleware
• RPC implementation moved into the platform
• Hedged requests to reduce tail latency
• Tied requests to reduce mean and tail latency
• Heavy use of back pressure
![Page 32: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/32.jpg)
Cluster Topology
• Scheduler chooses the cluster topology for containers
• Function Controller can change the topology based on real time latency
• Functions which are chained needs to be placed as close as possible
![Page 33: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/33.jpg)
Caching
• Various forms of caching
• Depends on life cycle management of functions
• External caches like Redis, Memcached works best
• Hard to use in-memory clustered caches like Groupcache
![Page 34: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/34.jpg)
External Data Sources
• Functions can read from external data sources just like traditional applications
• Avoid polling external data sources
• Data Sources like Kafka could co-operate with functions by a shim which calls functions with data mutations
![Page 35: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/35.jpg)
Build and Distribution
• Packaging is part of the server less experience
• Packaging pipeline should transform a function to an application container.
• Unit Testing should be much easier with Functions than they were for traditional applications
• Performance of the platform can be improved by caching containers on compute nodes
![Page 36: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/36.jpg)
Operations in the world of Serverless
• Reduced amount of operations for developers
• Metrics related to various concerns of the platform should be very targetted.
![Page 37: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/37.jpg)
Application metrics
• Latency at the API Gateway level
• Throughput of events processed
• Latency distribution and planning.
![Page 38: Serverless Platforms - QCon New York...Infrastructure • Cluster Schedulers that can scale to 10s of 1000s of cores • Storage services that can scale to petabytes of data • Interactive](https://reader033.vdocument.in/reader033/viewer/2022042218/5ec462b5754feb7aad16f837/html5/thumbnails/38.jpg)
Platform Metrics
• Latency of function invocation
• Throughput of events dispatched to the platform
• Number of active functions