scaling the platform for your startup

Scaling the Platform for your Startup

Andreas Chatzakis, AWS Solutions Architecture Peter Mounce, Senior Software Developer at JUST EAT

15th April 2015, AWS London Summit

Why are you here?

•  Building the technology platform for your startup •  You want to prepare for success •  Learn about design patterns & scalability •  A pragmatic approach for startups

Priorities for startups

•  Racing within a window of opportunity •  Small team with no legacy •  Focus on solving a problem •  Avoid over-engineering & re-engineering •  Reduce risk of failure when you go viral

A scalable architecture

•  Can support growth in users, traffic, data size •  Without practical limits •  Without a drop in performance •  Seamlessly - just by adding more resources •  Efficiently - in terms of cost per user

Day 1 – Dev & private beta

Single host

THE server (e.g. Apache,

MySQL)

Elastic IP www.example.com

Amazon Route 53 DNS service

Server Image (AMI)

Day 2 - Public beta

We need a bigger server

•  Add larger & faster storage (EBS) •  Use the right instance type •  Easy to change instance sizes •  Not our long term strategy •  Will hit an endpoint eventually •  No fault tolerance

Separating web and DB

•  More capacity •  Scale each tier individually •  Tailor instance for each tier

–  Instance type –  Storage

•  Security –  Security groups –  DB in a private VPC subnet

But how do I choose what DB technology I need?

SQL? NoSQL?

Why start with a Relational DB?

•  SQL is versatile & feature-rich •  Lots of existing code, tools, knowledge •  Clear patterns to scalability (for read-heavy apps) •  Reality: eventually you will have a polyglot data layer

–  There will be workloads where NoSQL is a better fit –  Use the right tool for each workload

Key Insight: Relational Databases are Complex

•  Our experience running Amazon.com taught us that relational databases can be a pain to manage and operate with high availability

•  Poorly managed relational databases are a leading cause of lost sleep and downtime in the IT world!

•  Especially for startups with small teams

Relational Databases MySQL, Aurora, PostgreSQL, Oracle, SQL Server

Fully managed; zero admin Amazon

Aurora

Improving efficiency

Offload static content •  Amazon S3: highly available hosting that scales

–  Static files (JavaScript, CSS, images) –  User uploads

•  S3 URLs – serve directly from S3 •  Let the web server focus on dynamic content

Amazon CloudFront •  Worldwide network of edge locations •  Cache on the edge

–  Reduce latency –  Reduce load on origin servers –  Static and dynamic content –  Even few seconds caching of popular content can have huge impact

•  Connection optimizations –  Optimize transfer route –  Reuse connections –  Benefits even non cachable content

CloudFront

CloudFront for static & dynamic content

AmazonRoute 53

EC2 instance(s)

S3 bucket

Static content

Dynamic content

css/* js/* Images/*

Default(*)

CloudFront

distribution

Database caching •  Faster response from RAM •  Reduce load on database

Application server

1. If data in cache, return result

2. If not in cache, read from DB

RDS database

Amazon ElastiCache

3. And store in cache

Amazon ElastiCache: in-memory cache

•  Simple to Deploy •  Managed

–  Automatically replaces failed nodes –  Patch management

•  Elastic •  Compatible ElastiCache

Day 3 – Paying customers

High Availability

Availability Zone a

RDS DB instance

Web server

S3 bucket for static assets

www.example.com

Amazon CloudFront

ElastiCache node 1

High Availability

Availability Zone a

RDS DB instance

Availability Zone b

Web server

www.example.com

Amazon CloudFront

ElastiCache node 1

High Availability

Availability Zone a

RDS DB instance

Availability Zone b

www.example.com

Elastic Load Balancing

Web server

Amazon CloudFront

ElastiCache node 1

•  Managed Load Balancing Service •  Fault tolerant •  Health Checks •  Distributes traffic across AZs •  Elastic – automatically scales its capacity

High Availability

Availability Zone a

RDS DB instance

Availability Zone b

www.example.com

Web server

ElastiCache node 1

Amazon CloudFront

High Availability

Availability Zone a

RDS DB instance

Availability Zone b

www.example.com

Web server

RDS DB standby

ElastiCache node 1

Amazon CloudFront

Data layer HA

Availability Zone a

RDS DB instance

ElastiCache node 1

Availability Zone b

www.example.com

Web server

RDS DB standby

Data layer HA

Availability Zone a

RDS DB instance

ElastiCache node 1

Availability Zone b

www.example.com

Web server

RDS DB standby

ElastiCache node 2

User sessions •  Problem: Often stored on local disk

(not shared) •  Quickfix: ELB Session stickiness •  Solution: DynamoDB

Web server

Logged in Logged out

Amazon DynamoDB

•  Managed document and key-value store •  Simple to launch and scale

•  To millions of IOPS •  Both reads and writes

•  Consistent, fast performance •  Durable: perfect for storage of session data

https://github.com/aws/aws-dynamodb-session-tomcat

http://docs.aws.amazon.com/aws-sdk-php/guide/latest/feature-dynamodb-session-handler.html

Day 4 – Let’s go viral!

Replace guesswork with elastic IT

Startups pre-AWS

Demand

Unhappy Customers

Waste $$$

Traditional

Capacity

Demand

AWS Cloud

Scaling the web tier

Availability Zone a

RDS DB instance

ElastiCache node 1

Availability Zone b

www.example.com

Web server

RDS DB standby

ElastiCache node 2

Availability Zone a

RDS DB instance

ElastiCache node 1

Availability Zone b

www.example.com

Web server

RDS DB standby

ElastiCache node 2

Web server

Availability Zone a

RDS DB instance

ElastiCache node 1

Availability Zone b

www.example.com

Web server

RDS DB standby

ElastiCache node 2

Web server

Automatic resizing of compute clusters based on demand

Feature Details

Control Define minimum and maximum instance pool sizes and when scaling and cool down occurs.

Integrated to Amazon CloudWatch

Use metrics gathered by CloudWatch to drive scaling.

Instance types Run Auto Scaling for on-‐demand and Spot Instances. CompaDble with VPC.

aws autoscaling create-auto-scaling-group --auto-scaling-group-name MyGroup --launch-configuration-name MyConfig --min-size 4 --max-size 200 --availability-zones us-west-2c, us-west-2b

Auto Scaling Trigger auto-scaling policy

Amazon CloudWatch

Decompose into small, loosely coupled, stateless

building blocks

Prerequisite

What does this mean in practice?

•  Only store transient data on local disk •  Needs to persist beyond a single http request?

–  Then store it elsewhere

User uploads

User Sessions

Amazon S3

AWS DynamoDB

Application Data

Amazon RDS

Having decomposed into small, loosely coupled,

stateless building blocks

You can now Scale out with ease

Having done that…

Having decomposed into small, loosely coupled,

stateless building blocks

We can also Scale back with ease

Having done that…

Take the shortcut

•  While this architecture is simple you still need to deal with: –  Configuration details –  Deploying code to multiple instances –  Maintaining multiple environments (Dev, Test, Prod) –  Maintain different versions of the application

•  Solution: Use AWS Elastic Beanstalk

AWS Elastic Beanstalk (EB) •  Easily deploy, monitor, and scale three-tier web

applications and services. •  Infrastructure provisioned and managed by EB •  You maintain control. •  Preconfigured application containers •  Easily customizable. •  Support for these platforms:

Loose coupling with SQS

Tight coupling

•  Place tasks into Amazon Simple Queue Service (SQS) •  SQS – buffer that protects backend systems •  Process asynchronously -‐ at own pace •  Remove delay from latency sensiDve paths

Get Message

Back End EC2 Instance

Put Message

Front End EC2 Instance

Day 5 – Add more features

Mobile

Push Notifications

Mobile Analytics Cognito Cognito

Analytics

Kinesis Data Pipeline RedShift EMR

Your Applications

AWS Global Infrastructure

Network

VPC Direct Connect Route 53

Storage

EBS S3 Glacier CloudFront

Database

DynamoDB RDS ElastiCache

Deployment & Management

Elastic Beanstalk OpsWorks Cloud

Formation Code

Deploy Code

Pipeline Code

Commit

Security & Administration

CloudWatch Config Cloud Trail IAM Directory KMS

Application

SQS SWF App Stream

Elastic Transcoder SES Cloud

Search SNS

Enterprise Applications

WorkSpaces WorkMail WorkDocs

Compute

EC2 ELB Auto Scaling Lambda ECS

AWS building blocks Inherently Scalable & Highly Available Scalable & Highly Available

!  Elastic Load Balancing

!  Amazon CloudFront

!  Amazon Route53

!  Amazon S3

!  Amazon SQS

!  Amazon SES

!  Amazon CloudSearch

!  AWS Lambda

!  …

!  Amazon DynamoDB

!  Amazon Redshift

!  Amazon RDS

!  Amazon Elasticache

!  …

"  Amazon EC2

"  Amazon VPC

Automated Configurable With the right architecture

Stay focused as you scale your team

AWS Cloud-‐Based

Infrastructure

Your Business

More Time to Focus on Your Business

Configuring Your Cloud Assets

30% 70%

On-‐Premise Infrastructure

Managing All of the “UndifferenDated Heavy Li[ing”

Day 6 – Growing fast

Scaling Relational DBs

•  Increase RDS instance specs –  Larger instance type –  More storage / more PIOPS

•  Read Replicas (Master – Slave) –  Scale out beyond capacity of single DB instance –  Available in Amazon RDS for MySQL, PostgreSQL and Amazon Aurora –  Writes => master –  Replication lag –  Reads with tolerance to stale data => read replica (slave) –  Reads with strong consistency requirements => master

Scaling the DB

Web server

Availability Zone a

RDS DB instance

ElastiCache node 1

Availability Zone b

www.example.com

RDS DB standby

ElastiCache node 2

Scaling the DB

Web server

Availability Zone a

RDS DB instance

ElastiCache node 1

Availability Zone b

www.example.com

RDS DB standby

ElastiCache node 2

RDS read replica

Scaling the DB

Web server

Availability Zone a

RDS DB instance

ElastiCache node 1

Availability Zone b

www.example.com

RDS DB standby

ElastiCache node 2

RDS read replica

What if your app is write-heavy?

Challenge: You will eventually hit the write throughput or storage limit of the master node Solutions: •  Federation (splitting into multiple DBs based on function) •  Sharding (splitting one data set across multiple hosts)

Database federation •  Divide tables into smaller

autonomous databases •  Harder to do cross-function

queries •  Won’t help with single huge

functions/tables

Forums DB

Users DB

Products DB

Sharded horizontal scaling

•  Store subset of rows into each database shard

•  More complex at the application layer

•  No practical limit on scalability

•  Operation complexity

User ShardID

002345 A 002346 B 002347 C 002348 B 002349 A

Shard C

Shard B

Shard A

NoSQL data stores

•  Trade query & integrity features of Relational DBs for –  More flexible data model –  Horizontal scalability & predictable performance

DynamoDB Provisioned read/write performance per table

Massive and Seamless Scale

•  Distributed system that can scale both reads and writes –  Sharding + Replicas

•  Automatic partitioning: –  Data set size growth –  Provisioned capacity increases table

Summary

Amazon Route 53 DNS service No limit

Availability Zone a

RDS DB instance

ElastiCache node 2

Availability Zone b

www.example.com

RDS DB standby

ElastiCache node 3

RDS read replica

DynamoDB

RDS read replica

ElastiCache node 4

RDS read replica

ElastiCache node 1

CloudSearch Lambda SES SQS

A quick review •  Keep it simple and stateless •  Make use of managed self-scaling services •  Multi-AZ and AutoScale your EC2 infrastructure •  Use the right DB for each workload •  Cache data at multiple levels •  Simplify operations with deployment tools

Next steps? READ! •  aws.amazon.com/documentation •  aws.amazon.com/architecture •  aws.amazon.com/start-ups ASK FOR HELP! •  forums.aws.amazon.com •  aws.amazon.com/support

Performance testing @ JUST EAT (Or: DoS yourself every night in production to prove you can take it)

@justeat_tech + @petemounce http://tech.just-eat.com

Please wait while I start my DoS attack... (Demo - start fake load, show dashboards)

The problem with performance tests & continuous delivery

●  Don’t want to sacrifice continuous delivery & decoupled teams

●  Don’t want performance to suffer All the usual problems: ●  Bottleneck through single environment ●  Individual tests take too long

Continuously test ●  performance ●  capacity If we find a problem Thursday night: 1.  don’t run fake load over the weekend 2.  enjoy weekend as normal 3.  fix it next week with leisure

Gamble!

OH: “We deploy tens of small changes a day. I bet we won’t break production...”

OH: “Let’s just do it in production with fake traffic at the same time as customers!”

Not that much of a gamble, really We have tight feedback loops at this point.

Engineers being on call

... highly invested in not regressing performance.

Pick scenarios we care about

Pick data variations to exercise

Add header(s) to discriminate fake load vs customer load

Run it every night during peak time

If no alerts fire, we’re good

What did we gain?

Continuous confidence in capacity

What did we gain?

Continuous confidence in dealing with spikes

What did we gain?

Performance as a 1st-class concern

What did we gain?

Tests become independent of environments’ data

(Remind me to stop my DoS attack now) (Demo - stop fake load, show dashboards)

Yes, we’re recruiting too. http://tech.just-eat.com/jobs

scaling the platform for your startup

relational db

managed relational databases

db technology

cache amazon elasticache

sql server

instance sizes

web server focus

memory cache simple

Technology

scaling from new start to enterprise platform

product innovation: scaling a platform business...

scaling europe's startups - the approach of startup europe...

how startup valuation works. early stage growth / scaling...

mongodb versatility: scaling the mapmyfitness platform

david stack: 5 factors of scaling startup growth

startup trend report 2019 - startup alliance korea...startup...

scaling from startup to sme - iotuk.org.uk · 4 scaling...

scaling the platform for your startup

scaling together: overcoming barriers in corporate-startup

lean scaling – from lean startup to lean enterprise -...

participatory scaling of a mobile learning platform

tech startup day - scaling up: going global - andaman 7 -...

scaling the lean startup in the enterprise

scaling a marketing intelligence platform - aleph.bet event...

startup business series: scaling your b2b sales

scaling big data platform for big data pipeline

tech startup day - scaling up - going global - tom...

scaling a startup: tips for painless growth

the startup journey: scaling a new venture