scaling the platform for your startup

Post on 15-Jul-2015

713 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Scaling the Platform for your Startup

Andreas Chatzakis, AWS Solutions Architecture Peter Mounce, Senior Software Developer at JUST EAT

15th April 2015, AWS London Summit

Why are you here?

•  Building the technology platform for your startup •  You want to prepare for success •  Learn about design patterns & scalability •  A pragmatic approach for startups

Priorities for startups

•  Racing within a window of opportunity •  Small team with no legacy •  Focus on solving a problem •  Avoid over-engineering & re-engineering •  Reduce risk of failure when you go viral

A scalable architecture

•  Can support growth in users, traffic, data size •  Without practical limits •  Without a drop in performance •  Seamlessly - just by adding more resources •  Efficiently - in terms of cost per user

Day 1 – Dev & private beta

Single host

THE server (e.g. Apache,

MySQL)

Elastic IP www.example.com

Amazon Route 53 DNS service

Server Image (AMI)

Day 2 - Public beta

We need a bigger server

•  Add larger & faster storage (EBS) •  Use the right instance type •  Easy to change instance sizes •  Not our long term strategy •  Will hit an endpoint eventually •  No fault tolerance

Separating web and DB

•  More capacity •  Scale each tier individually •  Tailor instance for each tier

–  Instance type –  Storage

•  Security –  Security groups –  DB in a private VPC subnet

But how do I choose what DB technology I need?

SQL? NoSQL?

Why start with a Relational DB?

•  SQL is versatile & feature-rich •  Lots of existing code, tools, knowledge •  Clear patterns to scalability (for read-heavy apps) •  Reality: eventually you will have a polyglot data layer

–  There will be workloads where NoSQL is a better fit –  Use the right tool for each workload

Key Insight: Relational Databases are Complex

•  Our experience running Amazon.com taught us that relational databases can be a pain to manage and operate with high availability

•  Poorly managed relational databases are a leading cause of lost sleep and downtime in the IT world!

•  Especially for startups with small teams

Relational Databases MySQL, Aurora, PostgreSQL, Oracle, SQL Server

Fully managed; zero admin Amazon

RDS

Aurora

Improving efficiency

Offload static content •  Amazon S3: highly available hosting that scales

–  Static files (JavaScript, CSS, images) –  User uploads

•  S3 URLs – serve directly from S3 •  Let the web server focus on dynamic content

Amazon CloudFront •  Worldwide network of edge locations •  Cache on the edge

–  Reduce latency –  Reduce load on origin servers –  Static and dynamic content –  Even few seconds caching of popular content can have huge impact

•  Connection optimizations –  Optimize transfer route –  Reuse connections –  Benefits even non cachable content

CloudFront

CloudFront for static & dynamic content

AmazonRoute 53

EC2 instance(s)

S3 bucket

Static content

Dynamic content

css/* js/* Images/*

Default(*)

CloudFront

distribution

Database caching •  Faster response from RAM •  Reduce load on database

Application server

1. If data in cache, return result

2. If not in cache, read from DB

RDS database

Amazon ElastiCache

3. And store in cache

Amazon ElastiCache: in-memory cache

•  Simple to Deploy •  Managed

–  Automatically replaces failed nodes –  Patch management

•  Elastic •  Compatible ElastiCache

Day 3 – Paying customers

High Availability

Availability Zone a

RDS DB instance

Web server

S3 bucket for static assets

www.example.com

Amazon Route 53 DNS service

Amazon CloudFront

ElastiCache node 1

High Availability

Availability Zone a

RDS DB instance

Availability Zone b

Web server

Web server

S3 bucket for static assets

www.example.com

Amazon Route 53 DNS service

Amazon CloudFront

ElastiCache node 1

High Availability

Availability Zone a

RDS DB instance

Availability Zone b

www.example.com

Amazon Route 53 DNS service

Elastic Load Balancing

Web server

Web server

S3 bucket for static assets

Amazon CloudFront

ElastiCache node 1

Elastic Load Balancing

•  Managed Load Balancing Service •  Fault tolerant •  Health Checks •  Distributes traffic across AZs •  Elastic – automatically scales its capacity

High Availability

Availability Zone a

RDS DB instance

Availability Zone b

www.example.com

Amazon Route 53 DNS service

Elastic Load Balancing

Web server

Web server

S3 bucket for static assets

ElastiCache node 1

Amazon CloudFront

High Availability

Availability Zone a

RDS DB instance

Availability Zone b

www.example.com

Amazon Route 53 DNS service

Elastic Load Balancing

Web server

Web server

RDS DB standby

S3 bucket for static assets

ElastiCache node 1

Amazon CloudFront

Data layer HA

Availability Zone a

RDS DB instance

ElastiCache node 1

Availability Zone b

S3 bucket for static assets

www.example.com

Amazon Route 53 DNS service

Elastic Load Balancing

Web server

Web server

RDS DB standby

Data layer HA

Availability Zone a

RDS DB instance

ElastiCache node 1

Availability Zone b

S3 bucket for static assets

www.example.com

Amazon Route 53 DNS service

Elastic Load Balancing

Web server

Web server

RDS DB standby

ElastiCache node 2

User sessions •  Problem: Often stored on local disk

(not shared) •  Quickfix: ELB Session stickiness •  Solution: DynamoDB

Elastic Load Balancing

Web server

Web server

Logged in Logged out

Amazon DynamoDB

•  Managed document and key-value store •  Simple to launch and scale

•  To millions of IOPS •  Both reads and writes

•  Consistent, fast performance •  Durable: perfect for storage of session data

https://github.com/aws/aws-dynamodb-session-tomcat

http://docs.aws.amazon.com/aws-sdk-php/guide/latest/feature-dynamodb-session-handler.html

Day 4 – Let’s go viral!

Replace guesswork with elastic IT

Startups pre-AWS

Demand

Unhappy Customers

Waste $$$

Traditional

Capacity

Capacity

Demand

AWS Cloud

Scaling the web tier

Availability Zone a

RDS DB instance

ElastiCache node 1

Availability Zone b

S3 bucket for static assets

www.example.com

Amazon Route 53 DNS service

Elastic Load Balancing

Web server

Web server

RDS DB standby

ElastiCache node 2

Scaling the web tier

Availability Zone a

RDS DB instance

ElastiCache node 1

Availability Zone b

S3 bucket for static assets

www.example.com

Amazon Route 53 DNS service

Elastic Load Balancing

Web server

Web server

RDS DB standby

ElastiCache node 2

Web server

Web server

Scaling the web tier

Availability Zone a

RDS DB instance

ElastiCache node 1

Availability Zone b

S3 bucket for static assets

www.example.com

Amazon Route 53 DNS service

Elastic Load Balancing

Web server

Web server

RDS DB standby

ElastiCache node 2

Web server

Web server

Automatic resizing of compute clusters based on demand

Feature   Details  

Control   Define  minimum  and  maximum  instance  pool  sizes  and  when  scaling  and  cool  down  occurs.  

Integrated  to  Amazon  CloudWatch  

Use  metrics  gathered  by  CloudWatch  to  drive  scaling.  

Instance  types   Run  Auto  Scaling  for  on-­‐demand  and  Spot  Instances.  CompaDble  with  VPC.  

aws autoscaling create-auto-scaling-group --auto-scaling-group-name MyGroup --launch-configuration-name MyConfig --min-size 4 --max-size 200 --availability-zones us-west-2c, us-west-2b

Auto Scaling Trigger auto-scaling policy

Amazon CloudWatch

Decompose into small, loosely coupled, stateless

building blocks

Prerequisite

What does this mean in practice?

•  Only store transient data on local disk •  Needs to persist beyond a single http request?

–  Then store it elsewhere

User uploads

User Sessions

Amazon S3

AWS DynamoDB

Application Data

Amazon RDS

Having decomposed into small, loosely coupled,

stateless building blocks

You can now Scale out with ease

Having  done  that…  

Having decomposed into small, loosely coupled,

stateless building blocks

We can also Scale back with ease

Having  done  that…  

Take the shortcut

•  While this architecture is simple you still need to deal with: –  Configuration details –  Deploying code to multiple instances –  Maintaining multiple environments (Dev, Test, Prod) –  Maintain different versions of the application

•  Solution: Use AWS Elastic Beanstalk

AWS Elastic Beanstalk (EB) •  Easily deploy, monitor, and scale three-tier web

applications and services. •  Infrastructure provisioned and managed by EB •  You maintain control. •  Preconfigured application containers •  Easily customizable. •  Support for these platforms:

Loose coupling with SQS

Tight  coupling  

•  Place  tasks  into  Amazon  Simple  Queue  Service  (SQS)  •  SQS  –  buffer  that  protects  backend  systems  •  Process  asynchronously  -­‐  at  own  pace  •  Remove  delay  from  latency  sensiDve  paths  

SQS

Get Message

Back End EC2 Instance

Put Message

Front End EC2 Instance

Day 5 – Add more features

Mobile

Push Notifications

Mobile Analytics Cognito Cognito

Sync

Analytics

Kinesis Data Pipeline RedShift EMR

Your Applications

AWS Global Infrastructure

Network

VPC Direct Connect Route 53

Storage

EBS S3 Glacier CloudFront

Database

DynamoDB RDS ElastiCache

Deployment & Management

Elastic Beanstalk OpsWorks Cloud

Formation Code

Deploy Code

Pipeline Code

Commit

Security & Administration

CloudWatch Config Cloud Trail IAM Directory KMS

Application

SQS SWF App Stream

Elastic Transcoder SES Cloud

Search SNS

Enterprise Applications

WorkSpaces WorkMail WorkDocs

Compute

EC2 ELB Auto Scaling Lambda ECS

AWS building blocks Inherently Scalable & Highly Available Scalable & Highly Available

!  Elastic Load Balancing

!  Amazon CloudFront

!  Amazon Route53

!  Amazon S3

!  Amazon SQS

!  Amazon SES

!  Amazon CloudSearch

!  AWS Lambda

!  …

!  Amazon DynamoDB

!  Amazon Redshift

!  Amazon RDS

!  Amazon Elasticache

!  …

"  Amazon EC2

"  Amazon VPC

Automated Configurable With the right architecture

Stay focused as you scale your team

AWS  Cloud-­‐Based  

Infrastructure  

Your  Business  

More  Time  to  Focus  on  Your  Business  

Configuring  Your  Cloud  Assets  

70%  

30%  70%  

On-­‐Premise  Infrastructure  

30%  

Managing  All  of  the    “UndifferenDated  Heavy  Li[ing”  

Day 6 – Growing fast

Scaling Relational DBs

•  Increase RDS instance specs –  Larger instance type –  More storage / more PIOPS

•  Read Replicas (Master – Slave) –  Scale out beyond capacity of single DB instance –  Available in Amazon RDS for MySQL, PostgreSQL and Amazon Aurora –  Writes => master –  Replication lag –  Reads with tolerance to stale data => read replica (slave) –  Reads with strong consistency requirements => master

Scaling the DB

Web server

Web server

Web server

Web server

Availability Zone a

RDS DB instance

ElastiCache node 1

Availability Zone b

S3 bucket for static assets

www.example.com

Amazon Route 53 DNS service

Elastic Load Balancing

RDS DB standby

ElastiCache node 2

Scaling the DB

Web server

Web server

Web server

Web server

Availability Zone a

RDS DB instance

ElastiCache node 1

Availability Zone b

S3 bucket for static assets

www.example.com

Amazon Route 53 DNS service

Elastic Load Balancing

RDS DB standby

ElastiCache node 2

RDS read replica

Scaling the DB

Web server

Web server

Web server

Web server

Availability Zone a

RDS DB instance

ElastiCache node 1

Availability Zone b

S3 bucket for static assets

www.example.com

Amazon Route 53 DNS service

Elastic Load Balancing

RDS DB standby

ElastiCache node 2

RDS read replica

RDS read replica

What if your app is write-heavy?

Challenge: You will eventually hit the write throughput or storage limit of the master node Solutions: •  Federation (splitting into multiple DBs based on function) •  Sharding (splitting one data set across multiple hosts)

Database federation •  Divide tables into smaller

autonomous databases •  Harder to do cross-function

queries •  Won’t help with single huge

functions/tables

Forums DB

Users DB

Products DB

Sharded horizontal scaling

•  Store subset of rows into each database shard

•  More complex at the application layer

•  No practical limit on scalability

•  Operation complexity

User ShardID

002345 A 002346 B 002347 C 002348 B 002349 A

Shard C

Shard B

Shard A

NoSQL data stores

•  Trade query & integrity features of Relational DBs for –  More flexible data model –  Horizontal scalability & predictable performance

DynamoDB Provisioned read/write performance per table

Massive and Seamless Scale

•  Distributed system that can scale both reads and writes –  Sharding + Replicas

•  Automatic partitioning: –  Data set size growth –  Provisioned capacity increases table

Summary

Amazon Route 53 DNS service No limit

Availability Zone a

RDS DB instance

ElastiCache node 2

Availability Zone b

S3 bucket for static assets

www.example.com

Elastic Load Balancing

RDS DB standby

ElastiCache node 3

RDS read replica

RDS read replica

DynamoDB

RDS read replica

ElastiCache node 4

RDS read replica

ElastiCache node 1

CloudSearch Lambda SES SQS

A quick review •  Keep it simple and stateless •  Make use of managed self-scaling services •  Multi-AZ and AutoScale your EC2 infrastructure •  Use the right DB for each workload •  Cache data at multiple levels •  Simplify operations with deployment tools

Next steps? READ! •  aws.amazon.com/documentation •  aws.amazon.com/architecture •  aws.amazon.com/start-ups ASK FOR HELP! •  forums.aws.amazon.com •  aws.amazon.com/support

Performance testing @ JUST EAT (Or: DoS yourself every night in production to prove you can take it)

@justeat_tech + @petemounce http://tech.just-eat.com

Please wait while I start my DoS attack... (Demo - start fake load, show dashboards)

@justeat_tech + @petemounce http://tech.just-eat.com

The problem with performance tests & continuous delivery

●  Don’t want to sacrifice continuous delivery & decoupled teams

●  Don’t want performance to suffer All the usual problems: ●  Bottleneck through single environment ●  Individual tests take too long

@justeat_tech + @petemounce http://tech.just-eat.com

Why?

Continuously test ●  performance ●  capacity If we find a problem Thursday night: 1.  don’t run fake load over the weekend 2.  enjoy weekend as normal 3.  fix it next week with leisure

@justeat_tech + @petemounce http://tech.just-eat.com

Gamble!

OH: “We deploy tens of small changes a day. I bet we won’t break production...”

OH: “Let’s just do it in production with fake traffic at the same time as customers!”

@justeat_tech + @petemounce http://tech.just-eat.com

Not that much of a gamble, really We have tight feedback loops at this point.

Engineers being on call

... highly invested in not regressing performance.

@justeat_tech + @petemounce http://tech.just-eat.com

How?

Pick scenarios we care about

Pick data variations to exercise

Add header(s) to discriminate fake load vs customer load

Run it every night during peak time

If no alerts fire, we’re good

@justeat_tech + @petemounce http://tech.just-eat.com

What did we gain?

Continuous confidence in capacity

@justeat_tech + @petemounce http://tech.just-eat.com

What did we gain?

Continuous confidence in dealing with spikes

@justeat_tech + @petemounce http://tech.just-eat.com

What did we gain?

Performance as a 1st-class concern

@justeat_tech + @petemounce http://tech.just-eat.com

What did we gain?

Tests become independent of environments’ data

@justeat_tech + @petemounce http://tech.just-eat.com

(Remind me to stop my DoS attack now) (Demo - stop fake load, show dashboards)

@justeat_tech + @petemounce http://tech.just-eat.com

@justeat_tech + @petemounce http://tech.just-eat.com

Yes, we’re recruiting too. http://tech.just-eat.com/jobs

top related