(net409) how twilio migrated its services from ec2-classic to ec2-vpc
TRANSCRIPT
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
@Sumbry
Director of Cloud Services
Twilio.com
October, 2015
NET409
Movin' On Up to the VPCHow Twilio Migrated its Infrastructure from
EC2-Classic to EC2-VPC
Purpose of this talk
- Learn about Twilio
- Review legacy infrastructure
- Why EC2-VPC?
- How we built the Twilio Cloud
- How we migrated
- Internal tools developed
- Lessons learned
What Is a Twilio?
- A global communications company
- A real-time communications API
- Used by over 500,000 developers
- Requires low-latency resilient infrastructure
- Has lots of infrastructure on EC2-Classic
Who are Twilio customers?
Legacy Twilio
What did Twilio look like yesterday?
- Twilio has used AWS since 2008
- Three products
- All infrastructure located in us-east-1
- Hundreds of instances
- 10/8 shared private network
- Non-consecutive EIPs
Before global
What is going global?
- Launched outside US
- Global provisioning
- Route traffic between regions
- Low-latency communications
- Global service discovery
The network after global
Problems with going global
- Overlapping 10/8 networks
- Proxies not ideal, point-to-point
- Routing around failovers
- Need low latency connectivity
Why EC2-VPC?
What is EC2-VPC?
EC2-VPC is the next major revision of the EC2 platform:
- Software Defined Network
- Elastic Network Interfaces
- HVM and SR-IOV
What is a software defined network?
- Define your own network
- VPC and subnet routing tables
- Network Access Control Lists
- Provision networks like virtual machines
- Protects data-in-transit
What are elastic network interfaces?
- Public and Private EIPs
- Multiple Private EIPs per interface
- Multiple ENIs per instance
- Security groups follow an ENI
- ENI has a MAC address
What are HVM instances?
- Hardware Virtualized Machine instances
- PCI Express speeds to network adapter
- Low-latency access to network adapter
- Up to 10 GB network speeds
Why move to EC2-VPC?
- SDN solves overlapping 10/8 networks
- Route tables eliminates proxies
- Routing around failovers is an API call
- HVM solves low latency connectivity problem
The Twilio Cloud
What is the Twilio Cloud?
- Iteration 2.0 of our infrastructure
- Addresses many EC2-Classic limitations
- Connectivity between data centers
- Automatic failover and redundancy
- Provider agnostic
What does the Twilio Cloud look like?
What about routing?
We built it, did they come?
We solved all previous issues but no one used it:
- Twilio Cloud was isolated from EC2-Classic
- Existing services had no migration path
Data center migration
Why is a migration like moving data centers?
- Separate infrastructure from EC2-Classic
- Need to migrate all your compute
- Zero downtime
The networks
What problems do we need to solve?
- Move an instance from Classic to VPC
- Network connectivity
- Instance discoverability
- No service interruptions
Classic deploy
VPC deploy
Kill Classic
Steps to migrate a service
Wait - you just invented a bunch of stuff …
- Bridge EC2-Classic and VPC?
- Global Service Discovery?
- Multiple Service Deployments?
- WTF!
Migration tools
What are the tools for migrating to EC2-VPC?
We modified existing internal tools:
- IP Tunnel Manager / ClassicLink
- Global Service Discovery
- HAProxy Distributed Load-Balancing
- Config-Renderer
What is IP Tunnel Manager ClassicLink?
ClassicLink allows you to link
your EC2-Classic instance to
a VPC in your same account
in the same region.
It provides network
connectivity between EC2-
Classic and EC2-VPC
instances.
What is Global Service Discovery?
GSD stores IP addresses for any service in the cluster and
serves them on-demand.
What is distributed load balancing?
Every instance in the cluster runs its own instance of
HAProxy. It load balances requests to any downstream
services.
What is Config-Renderer?
Config-Renderer renders configuration files filled with data
from Global Service Discovery, like HAProxy Configs!
What about deploying services?
Our internal
provisioning tool
called BoxConfig lets
us deploy services
with the click of a
button.
How does it all work?
Unix philosophy
We use lots of small tools and combine them:
- Twilio Cloud to route
- ClassicLink to bridge
- HAProxy for distributed load-balancing
- Global Service Discovery for IP info
- Config-Renderer to write HAProxy configs
- BoxConfig to deploy
In conclusion
Where are you today?
- The Twilio Cloud is live today
- Routes traffic through nine virtual data centers
- Over 100 IPSEC Mesh links
- Automatic region failover thanks to EIGRP
- 35% of Twilio infrastructure is in EC2-VPC
- We can complete the migration in 2015
What are some lessons learned?
- Properly subnet your VPC. You have one shot.
- No need to do a giant migration all at once.
- Tools need to work both ways in case you screw up.
- Less complexity always wins.
Thank you!
Remember to complete
your evaluations!
Related Sessions