![Page 1: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/1.jpg)
Building ServerlessData Infrastructure in the AWS Cloud
Ryan Plant@ryan_plant
November 10, 2017
![Page 2: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/2.jpg)
ThankstoourSponsors!Partners
Premier
Marquee:
Prize:
![Page 3: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/3.jpg)
Gettheapp!Givefeedback!
![Page 4: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/4.jpg)
WHAT WE’LL COVER
The New Data Economy
Reference Architecture
Using the AWS Cloud
![Page 5: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/5.jpg)
The world’s most valuable resource is no longer oil, but data…
May 6th, 2017
![Page 6: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/6.jpg)
Data => Revenue(but extraction, refinement, packaging, and distribution needed)
![Page 7: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/7.jpg)
DW
Traditional Data Warehousing
Volume, variety, and velocity…
Advanced analytics…
Artificial intelligence…
”What got us here won’t (entirely) get us there…”
Mostly proprietary…
Costly and complex to scale…
![Page 8: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/8.jpg)
Next Generation Data Infrastructure
(i.e. the “data lake”)
![Page 9: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/9.jpg)
James “Data Lake” Dixon
![Page 10: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/10.jpg)
If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption –the data lake is a large body of water in a more natural state…
![Page 11: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/11.jpg)
From Data Warehouses to Lakes
A data pond, lake, ocean is not a product it’s an architecture…(and architecture is a principled and pattern-oriented approach to building systems)
Any and all data…Any source and format…
Any time…
![Page 12: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/12.jpg)
WHAT WE’LL COVER
The New Data Economy
Reference Architecture
Using the AWS Cloud
![Page 13: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/13.jpg)
APPS & SOURCES
STORAGE AND PROCESSING LAYER
SERVING LAYER
Storage
Catalog
ProcessingAnalytics
& Artificial
IntelligenceIngestion
Models & Marts
DATA OPS
API
Search Security
Config
Telemetry
Cost Mgmt
![Page 14: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/14.jpg)
DATA OPS
Security
Config
Telemetry
Cost Mgmt
SERVING LAYER
Models & Marts
API
Search
APPS & SOURCES
STORAGE AND PROCESSING LAYER
StorageIngestion
Catalog
ProcessingAnalytics
& Artificial
Intelligence
![Page 15: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/15.jpg)
Data Ingestion Pipelines
SERVICESERVICE
SERVICE
MONOLITHMONOLITH
MONOLITH Change Data Capture(CDC)
STREAMS
MESSAGING
FILE EXTRACTS
STORAGE
source data aggregated, stored indefinitelymany supported formats
append
append
PUT
Securitysegregation & encryption
![Page 16: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/16.jpg)
Storage and Catalog
STORAGE
RAW REFINED
Catalog
• Register source and schema• Data attribute inventory• Relationships and dependencies• Etc…
dataIngestion
![Page 17: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/17.jpg)
Catalog
Raw to Refined Processing Pipelines
STORAGE
RAW REFINED
Processing Pipelines
dataIngestion
C1 C2 C3 C..n
• Preserve RAW data; enrich only• Apply transforms to create new, REFINED
datasets (e.g. customer partitioned views)• Catalog new datasets• Enable new use cases:
• Reporting/Analytical views• Machine/Deep Learning
X Y ZALL DATA
![Page 18: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/18.jpg)
Processing Pipelines
Catalog
Analytics and AI
STORAGE
RAW REFINED
dataIngestion
Analytics and Artificial Intelligence
C1 C2 C3 C..nALL DATAX Y Z
… … …
![Page 19: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/19.jpg)
DATA OPS
Security
Config
Telemetry
Cost Mgmt
APPS & SOURCES
STORAGE AND PROCESSING LAYER
StorageIngestion
Catalog
ProcessingAnalytics
& Artificial
Intelligence
SERVING LAYER
Models & Marts
API
Search
![Page 20: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/20.jpg)
Processing Pipelines
Catalog
Curation and Serving
STORAGE
RAW REFINED
dataIngestion
Analytics and Artificial Intelligence
C1 C2 C3 C..nALL DATAX Y Z
Models and Marts
… … …
Search
… … …
![Page 21: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/21.jpg)
Processing Pipelines
Catalog
STORAGE
RAW REFINED
dataIngestion
Analytics and Artificial Intelligence
C1 C2 C3 C..nALL DATAX Y Z
Models and Marts
… … …
Search
… … …
API
![Page 22: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/22.jpg)
APPS & SOURCES
STORAGE AND PROCESSING LAYER
SERVING LAYER
Storage
Catalog
ProcessingAnalytics
& Artificial
IntelligenceIngestion
Models & Marts
DATA OPS
API
Search Security
Config
Telemetry
Cost Mgmt
![Page 23: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/23.jpg)
WHAT WE’LL COVER
The New Data Economy
Reference Architecture
Using the AWS Cloud
![Page 24: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/24.jpg)
Lots of software, hardware, etc.
![Page 25: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/25.jpg)
TRADITIONAL INVESTMENT IN NEXT GENERATION DATA
![Page 26: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/26.jpg)
CAPITAL AND RISK BARRIERS
acquire/write and maintain software
procure, install, and maintain hardware
get commercial real estate license
![Page 27: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/27.jpg)
![Page 28: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/28.jpg)
PUBLIC CLOUD ECONOMIES OF SCALE
![Page 29: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/29.jpg)
CLOUD OPTIMIZATION
Infrastructure as a ServiceSomeone else’s hardware and real estate
Your software, your (virtual) servers
Platform as a ServiceSomeone else’s software, servers, hardware and real estate
Your custom application software
Software as a ServiceSomeone else’s application software, you provide the data
(everything else doesn’t matter)
Cycle TimeCapital OptimizationDifferentiation Focus
High
Higher
Highest
![Page 30: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/30.jpg)
Go Serverless!(as much as possible)
![Page 31: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/31.jpg)
everything is an event: messages, log entries, file I/Os, clock alarms, etc.listen for events: trigger a handler with an eventstateless event handling: avoid state, persist as event source, handoff as soon as possibleautomation through orchestration and coordination
Principles for event-driven, reactive data infrastructure primed for serverless architectures
![Page 32: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/32.jpg)
StorageIngestion
SQS
SNS
Kinesis
DynamoDB/RDS
event triggers y = f (x)
y = f (x, y)
y = f ([x, y])
event handlers
AWS Glacier(archival)
/{source}-raw/{key}/YYYY-MM-DD/{source}-refined/{key}/YYYY-MM-DD
AWS Lambda AWS S3(ready)
KMS(encryption) lifecycle policies
IAM + Directory(access control)
CloudWatch/Trail
to S3 direct
AWS Step Functions(coordinated state)
![Page 33: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/33.jpg)
Catalog
StorageSources
Ingestion
AWS Glue(serverless ETL/ELT)
source crawlers
metadata
classifier
classifierdoSomething(…) {…} trigger
Processing Pipelines
jobs and job runner
To Targets
![Page 34: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/34.jpg)
Catalog
Storage
Sources &
Targets Ingestion
Processing Pipelines
AWS Glue(serverless ETL/ELT)
AWS EMR(Managed Hadoop)
Streaming
Kinesis
Batch
AWS Batch
Targets &
SourcesIngestion
Serving Layer
![Page 35: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/35.jpg)
Catalog
Storage
Processing Pipelines
AWS Glue(serverless ETL/ELT)
Serving Layer
AWS ElasticSearch(managed ES)
AWS RedShiftSpectrum
(Parallel DW)
SourcesIngestion
AWS Athena(Ad-hoc Query)
![Page 36: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/36.jpg)
Catalog
Storage
Processing Pipelines
Serving Layer
SourcesIngestion
AWS API Gateway(serverless APIs)
AWS QuickSight(visualization)
AWS Cognito(Web/Mobile Identity and SSO)
![Page 37: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/37.jpg)
WHAT WE’LL COVER
The New Data Economy
Reference Architecture
Using the AWS Cloud
![Page 38: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/38.jpg)
CLOUD OPTIMIZATION
Infrastructure as a ServiceSomeone else’s hardware and real estate
Your software, your (virtual) servers
Platform as a ServiceSomeone else’s software, servers, hardware and real estate
Your custom application software
Software as a ServiceSomeone else’s application software, you provide the data
(everything else doesn’t matter)
Cycle TimeCapital OptimizationDifferentiation Focus
High
Higher
Highest
![Page 39: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/39.jpg)
CLOUD OPTIMIZATION
Infrastructure as a ServiceSomeone else’s hardware and real estate
Your software, your (virtual) servers
Platform as a ServiceSomeone else’s software, servers, hardware and real estate
Your custom application software
Software as a ServiceSomeone else’s application software, you provide the data
(everything else doesn’t matter)
You are likely here…
Aim here…
TBD
Opportunity!
Public Cloud R&D Investment
![Page 40: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/40.jpg)
SERVERLESS: USE CAUTION
The floor is wet (and is constantly getting mopped!)
The edges are sharp:• Development, Test, Debug tools and experience• Configuration and Deployment challenges• Variable, non-deterministic performance
Extremely new (but inevitable) paradigm…
![Page 41: Building Serverless Data Infrastructure in the AWS Cloud](https://reader034.vdocument.in/reader034/viewer/2022051521/5a6478fc7f8b9a40568b4651/html5/thumbnails/41.jpg)