experiences with serverlessbig data · be serverless and serve data amazon s3 aws lambda aws lambda...
TRANSCRIPT
Munich, 17.10.16Markus Schmidberger, Head of Data Service
ExperienceswithServerless BigDataAWS Meetup – Munich 2016
2glomex – A company of ProSiebenSat.1 Media SE
Key Components of our Data Service
ContentDiscoveryFindthemostrelevantcontentforourcustomersandtheirusers.
Real-TimeMonitoringEnableourdevelopmentteamstoserveourcontenttoourusersinthebestqualitypossible.
AnalyticsProvideourteamsaccesstothedatatoenabledata-drivendevelopmentofnewfeaturesandproducts.
3glomex – A company of ProSiebenSat.1 Media SE
Micro-Service Architecture
INGEST STOREPROCESS &
ANALYSEVISUALIZE &
SERVE
AdProxy Log Import Service
Player Feedback Import Service
Data PlatformAccess
Data ScienceAnalytics Service
TechnicalMonitoring
Service
Dev / Ops Analytics Service
Content Discovery Service
KPI & Analytics Service
MetadataService
ContentImport Service
Data Platform Monitoring Service
Data QualityService
Data Management
Service
Data Layer
Data API
Data Lake
External Data Import Service
Portal
CDN files
data stream
data stream
Team
VAS Log Import Service
data stream
other modules
Real-Time Dashboards
ContentAPI
Data Platform - MicroService Layout
CDN Log Import Service
Data Science UI
4glomex – A company of ProSiebenSat.1 Media SE
Lambda Architecture
BatchProcessing• KPIsforMES• MESBilling
DataScience• CDNBills• DataInsights
Real-time• Real-timeplayer
monitoring• Internaldashboard
5glomex – A company of ProSiebenSat.1 Media SE
Lambda Architecture
Graphic provided by http://lambda-architecture.net
≠AWSLambda
6glomex – A company of ProSiebenSat.1 Media SE
AWSLambda
Notification
Amazon S3 AWS Lambda processes the object
Amazon S3
New object uploaded
Amazon DynamoDB
7glomex – A company of ProSiebenSat.1 Media SE
AWSLambdaExecution
8glomex – A company of ProSiebenSat.1 Media SE
Be serverless for ETL
9glomex – A company of ProSiebenSat.1 Media SE
Be serverless and serve data
AWSLambda AWSLambda AmazonAPIGatewayAmazonS3
10glomex – A company of ProSiebenSat.1 Media SE
Be serverless for Recommendations
Recommendation Pipeline
Publisher’sURL RecommenderSystem
SearchDownloadPagecontent
ContentPlaylist
11glomex – A company of ProSiebenSat.1 Media SE
• Read data from Kinesis Firehose / S3
• Server downtime / scheduler
• Load to ElasticSearch
• Clean ElasticSearch and Redshift
• Advanced Redshift monitoring
• EBS Snapshots
Be serverless everywhere
12glomex – A company of ProSiebenSat.1 Media SE
Agile DevOps
Continous Delivery of Micro-services – „Automate all the things“
13glomex – A company of ProSiebenSat.1 Media SE
Agile Cloud Deployment
Glomex Cloud Deployment Tools
KumoAWSCloudFormation
RamudaAWSLambda
YugenAWSAPIGateway
TenkaiAWSCodeDeploy
• IncooperationwithOperationsTeam
• Usedbyotherteams
• Simplifyclouddeploymentsdrastically
• Slack- andmonitoringintegration
14glomex – A company of ProSiebenSat.1 Media SE
Agile DevOps
Cross-functional team responsible to push components to production themselves
15glomex – A company of ProSiebenSat.1 Media SE
AWS Lambda Limits
5 min512 MB
AWS Lambda Timeout
AWS Lambda temp disk
• Howtoprocess800MBgziped logfile?
• Howtosplitcompressedgzip files?
• SplitterusingAmazonSQSandAmazonEC2SpotInstances
16glomex – A company of ProSiebenSat.1 Media SE
• Lambda function deployment package size (.zip/.jar file)• 50 MB
• Total size of all the deployment packages that can be uploaded per region• 75 GB
• CreateLogGroup• 500 log groups / account / region
• Lambda functions have to be wired• Be aware of retries
• 3 not configurable• Traceback and error output available via CloudWatch Logs
• https://github.com/jorgebastida/awslogs• No local development environment
More AWS Lambda limits
17glomex – A company of ProSiebenSat.1 Media SE
More Facts
20 GB5 Billion
Per day click-stream data IN (player, vas, adproxy)
Click-stream records processed per day
~100 ms Data freshness to S3
25 GB300 Million
Per day as zipped CDN log-files
CDN record processed per day
< 1 min Data freshness to API
18glomex – A company of ProSiebenSat.1 Media SE
More Facts
600 rec/sec
1 $ / hour
Processing time
Cost for 25 GB/dayCDN processing
6 Parallel AWS Lambda functions
2.3 min Average run-time of AWS Lambda AWS Lambda duration
Redshift CPU
~ 89% - 97.8% Accuracy
19glomex – A company of ProSiebenSat.1 Media SE
Key Takeaways
LambdaArchitecture
Enrichyourtraditional,batch-drivenBI-workflowwithreal-timeanalytics
UseLambda-Architectureasaguidingprincipleandadaptittoyourneeds
20glomex – A company of ProSiebenSat.1 Media SE
Analyze
TakeActionsAutomate
1
23
Key Takeaways
21glomex – A company of ProSiebenSat.1 Media SE
Key Takeaways
AWSmanagedservicesprovideanrobustwaytoruncomplexbigdatainfrastructures
Followbest-practicesprovidedbyAWSandthecommunity
Focusonfeaturesdevelopmentandrobustpipelinesnotoninfrastructuremanagement
Munich, 17.10.16Markus Schmidberger, Head of Data Service@cloudHPC,[email protected]
Wearehiring…
• DataEngineers
• ProductOwner
• BigDataProductLineManager