aws re:invent 2016| gam401 | riot games: standardizing application deployments using amazon ecs and...
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Adam Rozumek
November 28, 2016
Riot Games: Standardizing
Application Deployments Using
Amazon ECS and Terraform
GAM401
RIOT GAMES | GAM 401Standardizing Application
Deployments Using Amazon ECS
and Terraform
SYSTEMS ENGINEER
ADAM ROZUMEK
INTRODUCTIONSWHO AM I?
AMAZON ECS. TERRAFORM. VIDEO GAMES.
WHAT TO EXPECT
ECS CONSOLIDATION WINS & LESSONS
Adapting existing deployments & infrastructure
TERRAFORM ENCAPSULATION STRATS
Object Oriented Development Operations
MULTI-GAME MODULAR SERVICE DESIGN
A different kind of scaling
1
2
3
AMAZON ECS. TERRAFORM. VIDEO GAMES.
WHAT TO EXPECT
ECS CONSOLIDATION WINS & LESSONS
Adapting existing deployments & infrastructure
TERRAFORM ENCAPSULATION STRATS
Object Oriented Development Operations
MULTI-GAME MODULAR SERVICE DESIGN
A different kind of scaling
1
2
3
AMAZON ECS. TERRAFORM. VIDEO GAMES.
WHAT TO EXPECT
ECS CONSOLIDATION WINS & LESSONS
Adapting existing deployments & infrastructure
TERRAFORM ENCAPSULATION STRATS
Object Oriented Development Operations
MULTI-GAME MODULAR SERVICE DESIGN
A different kind of scaling
1
2
3
AMAZON ECS. TERRAFORM. VIDEO GAMES.
WHAT TO EXPECT
ECS CONSOLIDATION WINS & LESSONS
Adapting existing deployments & infrastructure
TERRAFORM ENCAPSULATION STRATS
Object Oriented Development Operations
MULTI-GAME MODULAR SERVICE DESIGN
A different kind of scaling
1
2
3
7.5MILLION
PEAK CONCURRENT
PLAYERS
100MILLION
MONTHLY ACTIVE
PLAYERS
MORE THAN
27MILLION
DAILY ACTIVE
PLAYERS
MORE THAN MORE THAN
2016 LEAGUE OF LEGENDS STATS
DATA PRODUCTS & SERVICESOUR MISSION
Empower teams at Riot to make timely, data-informed products by maintaining a
scalable and reliable data platform
AWS re:Invent 2015 | (GAM303) Riot Games: Migrating Mountains of Big Data to AWS
Sean Maloney
PROBLEM
PROBLEMSCALING TOTAL OWNERSHIP ON AWS
Total ownership
We want to empower developers to:
• Provision their own infrastructure
• Execute their own deployments
• Monitor their own metrics
PROBLEMSCALING TOTAL OWNERSHIP ON AWS
Resource attribution
• Who owns these EBS volumes?
• What applications depend on these security groups?
• Can these AMIs be deleted?
PROBLEMSCALING TOTAL OWNERSHIP ON AWS
Security
PROBLEMSCALING TOTAL OWNERSHIP ON AWS
Security
• Auditing is important, but it’s reactive
• Operational time sink
Security Monkey AWS Trusted Advisor
PROBLEMSCALING TOTAL OWNERSHIP ON AWS
Enforcing conventions
• Common syntax for tags
• Simplify reservations
• Standardize networks
AWS re:Invent 2014 | (GAM304) How Riot Games re:Invented Their AWS Model
Jonathan McCaffrey
Marty Chong
PROBLEMSCALING TOTAL OWNERSHIP ON AWS
Enforcing conventions
• Common syntax for tags
• Simplify reservations
• Standardize networks
AWS re:Invent 2014 | (GAM304) How Riot Games re:Invented Their AWS Model
Jonathan McCaffrey
Marty Chong
PROBLEMSCALING TOTAL OWNERSHIP ON AWS
Amazon EC2
Container Service Terraform
CONTAINERS
CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS
• Dockerfiles capture application dependencies
CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS
• Dockerfiles capture application dependencies
• Common use cases have great community support
CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS
• Dockerfiles capture application dependencies
• Common use cases have great community support
• Profit from our own engineering community
CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS
CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS
Embrace the abstraction
• Plan for failure at all levels
• Avoid manual intervention whenever possible
• It’s ephemeral all the way down
CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS
Scheduling is hard
We need to:
• Quickly and “fairly” run tasks
• Prevent resource conflicts
• Provide reasonable fault tolerance
CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS
We need AWS hardware
• Enable total ownership of the AWS resources backing our containers
• Avoid the security, resource attribution, and convention degradation pitfalls
HISTORY
E
CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS
COS
First iteration (2013)
• Traditional ASGs
• AMIs configured to launch a single Docker container
• Managed by Netflix’s Asgard
E
CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS
COS
First iteration (2013)
• Easy to adopt
• Familiar AWS concepts
E
CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS
COS
First iteration (2013)
• No increased resource utilization
• No elasticity improvements
CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS
Second iteration (2013)
• Third party container orchestration
• Production scale had unexpected challenges
CONTAINERS STANDARDIZED APPLICATION UNITS… ON ECS!
Third iteration: Amazon EC2 Container Service
• ECS AMI provides all necessary software
• Designed with AWS integration in mind
• Free!
AMAZON EC2 CONTAINER SERVICEKEY FEATURE TIMELINE
Nov 2014
ECS ANNOUNCEDRe:Invent 2014
AMAZON EC2 CONTAINER SERVICEKEY FEATURE TIMELINE
Nov 2014 April 2015 Dec 2015 August 2016
ECS ANNOUNCEDRe:Invent 2014
ECS GENERALLY
AVAILABLE
ECR & NEW REGIONSECS becomes available in the final
missing region we need (Frankfurt)
for our globally deployed
applications
AMAZON EC2 CONTAINER SERVICEKEY FEATURE TIMELINE
Nov 2014 April 2015 Dec 2015 May 2016 July 2016 August 2016
ECS ANNOUNCEDre:Invent 2014
ECS GENERALLY
AVAILABLE
ECR & NEW REGIONSECS becomes available in the final
missing region we need (Frankfurt)
for our globally deployed
applications
SERVICE
SCALINGAutomatic task count
scaling based on
CloudWatch metrics
introduced
AMAZON EC2 CONTAINER SERVICEKEY FEATURE TIMELINE
Nov 2014 April 2015 Dec 2015 May 2016 July 2016 August 2016
ECS ANNOUNCEDRe:Invent 2014
ECS GENERALLY
AVAILABLE
ECR & NEW REGIONSECS becomes available in the final
missing region we need (Frankfurt)
for our globally deployed
applications
SERVICE
SCALINGAutomatic task count
scaling based on
CloudWatch metrics
introduced
TASK SPECIFIC
IAM ROLESUnlocked a lot of cluster
sharing potential
AMAZON EC2 CONTAINER SERVICEKEY FEATURE TIMELINE
Nov 2014 April 2015 Dec 2015 May 2016 July 2016 August 2016
ECS ANNOUNCEDre:Invent 2014
ECS GENERALLY
AVAILABLE
ECR & NEW REGIONSECS becomes available in the final
missing region we need (Frankfurt)
for our globally deployed
applications
SERVICE
SCALINGAutomatic task count
scaling based on
CloudWatch metrics
introduced
TASK SPECIFIC
IAM ROLESUnlocked a lot of cluster
sharing potential
APPLICATION
LOAD BALANCERSSeveral key improvements
for ECS
BEYOND ECSINFRASTRUCTURE AS CODE
• At scale, orchestrating infrastructure in a consistent, reproducible way is key
BEYOND ECSINFRASTRUCTURE AS CODE
• At scale, orchestrating infrastructure in a consistent, reproducible way is key
Total ownership
TERRAFORM
TERRAFORMINFRASTRUCTURE AS OBJECT-ORIENTED CODE
• Intelligent & flexible flow control
TERRAFORMINFRASTRUCTURE AS OBJECT-ORIENTED CODE
• Intelligent & flexible flow control
• Configuration drift managed
TERRAFORMINFRASTRUCTURE AS OBJECT-ORIENTED CODE
• Intelligent & flexible flow control
• Configuration drift managed
• AWS resources documented
TERRAFORMINFRASTRUCTURE AS OBJECT-ORIENTED CODE
• Intelligent & flexible flow control
• Configuration drift managed
• AWS resources documented
• Share common use cases
VPC NATGateway
Route 53Hosted Zone
RouteTables
VPNGateway
VPC InternetGateway
Application Subnets Tools
Instances
Instances
Instances
Availability
Zone C
Availability
Zone B
Availability
Zone A
TERRAFORMINFRASTRUCTURE AS OBJECT-ORIENTED CODE
TERRAFORMINFRASTRUCTURE AS OBJECT-ORIENTED CODE
Simplified infrastructure definitions
CASE STUDIES
ECS CLUSTERTERRAFORM BUILDING BLOCKS
VPC NATGateway
Route 53Hosted Zone
RouteTables
VPNGateway
VPC InternetGateway
Application Subnets Tools
Instances
Instances
Instances
Availability
Zone C
Availability
Zone B
Availability
Zone A
ECS Cluster
Auto Scaling Group
Security Group
Security Group
Security Group
Instance Instance Instance Instance Instance Instance
Launch Configuration User DataIAM Role CloudWatch
Alarms
ECS CLUSTERTERRAFORM BUILDING BLOCKS
Module 1
Module 2
Module 3
ECS CLUSTERTERRAFORM BUILDING BLOCKS
MICROSERVICESWITHOUT SERVICE ENDPOINTS
ECS Cluster
Autoscaling Group
Security Group
Security Group
Security Group
Instance Instance Instance Instance Instance Instance
Launch Configuration User DataIAM Role CloudWatch
Alarm
ECS Service Task Definition
CloudWatch Alarms IAM Role
MICROSERVICESWITH SERVICE ENDPOINTS
ECS Cluster
ECS Service Task Definition
CloudWatch Alarms IAM Role
Application Load Balancer
Monitoring
CloudWatch Alarms SNS Topics
Security Group
Security Group
Listeners
Target GroupsRoute 53
PERSISTENT DATALOSE ECS HOSTS WITHOUT LOSING DATA
ECS Cluster
ECS Service
Application Load Balancer
Monitoring
CloudWatch Alarms SNS Topics
Security Group
Security Group
Listeners
Target GroupsRoute 53
Attachment Group
EBS EBS EBS
Elastic Network Interface
Elastic IP
LESSONS LEARNED
LESSONSWHAT WORKED FOR US
• Socialize your templates & take advantage of the community!
• Terraform community modules
• Terraform AWS blog
LESSONSWHAT WORKED FOR US
• Break apart your stacks
ECS Cluster
LESSONSWHAT WORKED FOR US
• Break apart your stacks, but don’t overdo it
ECS Service
ALB
SNS Topics
Route 53
IAM Role
Task Definition
CloudWatch Alarms
LESSONSWHAT WORKED FOR US
• Break apart your stacks, but don’t overdo it
• Use modules to reduce code duplication
CloudWatch Alarms SNS Topics
ECS Cluster ALB
LESSONSWHAT WORKED FOR US
• Use remote state files
LESSONSWHAT WORKED FOR US
• Be liberal with your cluster provisioning
• Don’t risk resource contention in production
• With good orchestration, additional clusters != additional operational overhead
LESSONSWHAT WORKED FOR US
• Tag everything all of the time
• Keep your tags organized in your Terraform templates
• Have top level variables that get applied to every resource
• Create a tag for every dimension that is useful to your business
LESSONSWHAT WORKED FOR US
• Centralize your logs
or
Amazon
CloudWatch
Logs
LESSONSWHAT WORKED FOR US
• Stay up to date with release blogs and application updates
• ECS updates on the AWS blog
• Terraform’s open source GitHub repository changelog
LESSONSWHAT WORKED FOR US
• Capture your AMI generation process
LESSONSWHAT WORKED FOR US
• Profile your memory requirements, monitor for scheduling issues
LESSONSWHAT WORKED FOR US
• Take advantage of flow control in Terraform
• Use the Terraform graph tool to visualize your dependencies
RIOT ENGINEERING BLOGTHESE PROBLEMS AND MORE
https://engineering.riotgames.com/
Riot engineering
How we use data
http://na.leagueoflegends.com/en/tag/insights
JOIN US!
ENJOY NOMS.
DRINK DRINKS.
PLAY GAMES.
Tuesday
November 29th, 2016
6:30 – 10:30PM
Paiza Lounge
The Venetian
36th Floor
Thank you!
Remember to complete
your evaluations!