treiner aws infrastructure documentationtreiner aws infrastructure documentation prepared for:...

25
Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 1 of 25 Treiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined Treiner through glo project, an entrepreneurship program, where students can get experience in working as a member of a Melbourne startup. And I have signed a third-party contract by Start Global before joining Treiner, whereby I owned all IP / contributions that I have built / developed / contributed as part of working with Treiner.

Upload: others

Post on 15-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 1 of 25

Treiner AWS infrastructure

Documentation

Prepared for: Treiner.co

Prepared by: Norman Wong Chiew Look

Date: 3 December 2019

Disclosure: I joined Treiner through glo project, an entrepreneurship program, where students can get experience in working as a member of a Melbourne startup. And I have signed a third-party contract by Start Global before joining Treiner, whereby I owned all IP / contributions that I have built / developed / contributed as part of working with Treiner.

Page 2: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 2 of 25

Table of Contents What could be improved: ........................................................................................................ 3

Prerequisite for working with Cloudformation ....................................................................... 6

Intro to Cloudformation ........................................................................................................ 6

Generic System Design Pattern .......................................................................................... 7

AWS Resources Use Case ................................................................................................... 7

Official AWS CLI Documentation ........................................................................................ 7

Cloudformation templates ................................................................................................... 7

Cloudformation Resource identifiers .................................................................................. 7

Cloudformation concepts .................................................................................................... 7

Create Stack vs Deploy .................................................................................................... 7

Change Set ........................................................................................................................ 7

Update ............................................................................................................................... 8

Stack Policy ....................................................................................................................... 8

Drift Detection ................................................................................................................... 8

Conditions ......................................................................................................................... 8

Delete Stack ...................................................................................................................... 8

Nested Stack ..................................................................................................................... 8

Macros ............................................................................................................................... 9

Custom Resources ........................................................................................................... 9

Serverless Application Model (SAM) ............................................................................... 9

Stackset ............................................................................................................................. 9

Fault-Tolerance, High Availability and Disaster Recovery ...................................................10

Data Security ...........................................................................................................................11

Data At-Rest .........................................................................................................................11

Data In-Transit .....................................................................................................................11

Current Staging AWS infrastructure ......................................................................................12

Current Prod AWS infrastructure ...........................................................................................13

AWS Infrastructure Cost ........................................................................................................14

Staging Infrastructure Cost: On-Demand pricing ...............................................................14

Staging Infrastructure Cost: Savings from Full up-front payment for Reserved Instances .15

Staging Infrastructure: Using the minimum number of instances and smallest instance types

..............................................................................................................................................16

Page 3: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 3 of 25

Production Infrastructure Cost: On-Demand pricing .........................................................17

Production Infrastructure Cost: Savings from Partial upfront payment for Reserved

Instances ...............................................................................................................................18

Production Infrastructure: Using the minimum number of instances to achieve fault

tolerance & high availability, and largest instance types ........................................................19

Current Network Architecture ................................................................................................20

Reserved IPs ........................................................................................................................20

StagingVPC: Sydney ...........................................................................................................20

ProdVPC: Sydney ................................................................................................................22

Architecture Design consideration ........................................................................................25

Traditional server approach vs containerisation vs serverless approach ......................25

Aurora RDS: Serverless aurora vs traditional aurora .......................................................25

Elasticache: Redis vs Memcached .....................................................................................25

What could be improved: 1 Hardening apache web server

● https://geekflare.com/apache-web-server-hardening-security/

2 Implement HSTS header on apache web server

● https://stackoverflow.com/questions/47025426/how-to-implement-http-strict-transport-security-

hsts-on-aws-elastic-load-balan

● https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Strict-Transport-Security

3 Test the Redis BackupName parameter to check whether an elasticache cluster can be created from

the cluster backup

4 Consider buying reserved instances for a ec2, elasticache or rds to save money in the long term.

Research on how to auto-use reserved instances instead of on-demand instances, when cloudformation

deploy the resources.

● The minimum number of reserved instances for high availability and fault tolerance are:

○ Zone 1: 1 web server, 1 RDS and 1 elasticache instance to act as the primary nodes.

○ Zone 2: 1 web server, 1 RDS and 1 elasticache instance to act as read replica.

● An Aurora cluster storage volume spans multiple availability zones, which will be shared by all

primary nodes and Read Replicas nodes in the same cluster. When primary node fail, read

replica will be promoted. And a new read replica will spin up. Can have up to 15 Aurora Read

Replicas in 1 cluster

● Nodes in Elasticache does not share the same storage, and each Read Replicas will sync with

the primary node storage volume. Auto-failover has already been enabled in the cloudformation

stack. When primary node fail, read replica will be promoted. And a new read replica will spin up.

Page 4: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 4 of 25

5 Consider implementing system manager to do all operation work liked patching, log aggregation,

monitoring aggregation from all resources, passing any variables or secrets.

● The Systems Manager Run Command helps you to remotely and securely perform on-demand

changes running Linux shell scripts and Windows PowerShell commands on a targeted ec2

instance with a SSM agent installed. With this, you don’t need to spin up a bastion host:

https://aws.amazon.com/blogs/infrastructure-and-automation/toward-a-bastion-less-world/

6 Consider whether to implement cloudfront to act as a content delivery network

● https://aws.amazon.com/blogs/startups/how-to-accelerate-your-wordpress-site-with-amazon-

cloudfront/

7 Consider implementing VPC peering on the Staging and Production environment, so a resource in the

staging environment can access a resource in the production environment and vice versa.

- https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html

Page 5: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 5 of 25

8 Consider implementing a blue-green deployment with the Staging and Prod environment.

- And instead of having a separate aurora cluster or redis cluster for staging and production

environment. The cloudformation template should be redesigned to share the same databases

across the Staging and Production environment.

9 For the staging environment, consider whether to replace the NAT gateway to a NAT instance

● Setting up bastion instance: https://www.theguild.nl/cost-saving-with-nat-instances/

● You can save up to 55.58 AUD/month using a 1-year All-upfront Reserved T3a micro EC2

Instance.

● Cons: You have to do security patching to the NAT instance yourself.

● Comparison between NAT instance and NAT Gateway:

https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-comparison.html

Page 6: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 6 of 25

10 Consider adding cloudwatch alarm to know if the elasticache primary node and read replica node have

reach the maximum memory utilization, Read/Write throughput and needs a new read replica to be

added:

● https://www.bluematador.com/blog/how-to-monitor-amazon-rds-with-cloudwatch

● Same goes to Aurora

● There’s a horizontal Auto Scaling at the Read replicas level for

Aurora:https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Integrating.A

utoScaling.html

● While there is a horizontal Auto Scaling at the shard/cluster level for Redis Elasticache:

https://aws.amazon.com/about-aws/whats-new/2017/11/amazon-elasticache-for-redis-introduces-

dynamic-addition-and-removal-of-shards-while-continuing-to-serve-workloads/

11 If you are actively contributing to the repository hosting the cloudformation stacks, then you can

consider creating a CI/CD pipeline that automatically validate the yaml syntax:

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-validate-template.html

Then deploy the cloudformation stack to the S3 bucket hosting the templates, then run a shell script to

perform the cloudformation deploy command:

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-

changesets-create.html

12 To avoid the Web Servers in the private subnets routing traffic to any AWS resources not in the

VPC (liked sqs, s3,etc) through the public internet. Consider creating a VPC endpoint to connect the

resources in the AWS cloud but outside the VPC, then update the Private route table to route traffic to the

Target resource destination.

Prerequisite for working with Cloudformation A cloudformation template can be in yaml or json format. You can convert yaml to json and json to yaml in

the Cloudformation Designer. You can also use Cloudformation Designer to auto-generate a diagram of

the stack infrastructure.

1. Setting up Visual Studio Code for Cloudformation

https://hodgkins.io/up-your-cloudformation-game-with-vscode

2. Used the following resources for architecting resources in AWS:

Intro to Cloudformation

- https://aws.amazon.com/cloudformation/

- https://aws.amazon.com/cloudformation/features/

Page 7: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 7 of 25

Generic System Design Pattern

- https://github.com/donnemartin/system-design-primer#system-design-topics-start-here

AWS Resources Use Case

- https://tutorialsdojo.com/comparison-of-aws-services/

- https://aws.amazon.com/whitepapers/?whitepapers-main.sort-

by=item.additionalFields.sortDate&whitepapers-main.sort-

order=desc&awsf.whitepapers-content-category=content-category%23intro%7Ccontent-

category%23well-arch-framework&awsf.whitepapers-content-type=content-

type%23whitepaper

Official AWS CLI Documentation

- https://docs.aws.amazon.com/cli/latest/reference/cloudformation/index.html#cli-aws-

cloudformation

- https://docs.aws.amazon.com/cli/latest/reference/configure/

Cloudformation templates

- https://github.com/awslabs/aws-cloudformation-templates

Cloudformation Resource identifiers

- https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-template-

resource-type-ref.html

Cloudformation concepts

Create Stack vs Deploy

- https://stackoverflow.com/questions/49945531/aws-cloudformation-create-stack-vs-deploy

- New Stack? Then, use create-stack

- Existing stack? Then, use deploy. Basically the change set command, but you don’t need to write

any condition to evaluate whether the stack exist.

Change Set

- https://theithollow.com/2018/01/22/introduction-aws-cloudformation-change-sets/

- # If you already have an existing stack deployed, and

Page 8: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 8 of 25

- # If you have rewritten or make changes to the cloudformation template,

then use this command to

- # get a preview before updating. It will check if cloudformation

template is well-formed before updating the deployed stack

configuration

- # You don't need this if you're not changing your stack template

- If you use this command and your update fail, then cloudformation will rollback only the failed

update change.

Update

- https://www.alexdebrie.com/posts/understanding-cloudformation-updates/

- https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-

updatereplacepolicy.html

- If you use this command and your update fail, then you have to delete the entire stack and

recreate it/them from scratch again because you can’t update a fail update.

Stack Policy

- A stack policy can be used to protect stack resources from accidental or mistaken update/delete:

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/protect-stack-

resources.html

Drift Detection

- https://dzone.com/articles/introduction-to-aws-cloudformation-drift-detection

Conditions

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/conditions-section-structure.html

Delete Stack

- https://docs.aws.amazon.com/cli/latest/reference/cloudformation/delete-stack.html

- This command will delete all resources or stacks, unless DeletionPolicy: Retain

- Reasons why stacks failed to delete resources:

- Before running the Delete command, use Drift Detection first on all stacks to check if

the current infrastructure is consistent with the stack templates. Otherwise, deletion will

fail.

- Some resources have a long creation time. For example: Aurora (5-10 minutes),

Elasticache (10-30 minutes), CloudFront (15 mins - 1 hour). So, if you run the Delete

Stack command when these resources are still in the Creation phase, the deletion will

fail.

Nested Stack

- https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-nested-

stacks.html

Page 9: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 9 of 25

- https://www.trek10.com/blog/cloudformation-nested-stacks-primer/

- Best practice: Only have 1 parent node and 1 child level, introducing too many child levels

introduce too much complexity and made debugging difficult

Macros

● https://www.alexdebrie.com/posts/cloudformation-macros/

Custom Resources

● Allow you to define resources or property not listed in the cloudformation documentation:

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-template-

resource-type-ref.html

● https://binx.io/blog/2018/08/25/building-cloudformation-custom-resources-is-plain-and-

simple/

Serverless Application Model (SAM)

- A preprocessor that transform into a cloudformation template during runtime using AWS-

managed macros: AWS::Serverless Transform

- https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/template-macros.html

Stackset

● For consistently deploying the same Cloudformations stack across different regions. Example:

Deploying the same infrastructure in Sydney, singapore, india.

Page 10: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 10 of 25

Note: You can test the stack and the different concepts in AWS console to understand the concepts

before writing a linux shell script, python, node or whatever scripting language that can run on the

operating system used by Gitlab CI.

Fault-Tolerance, High Availability and Disaster

Recovery In the case that a resource or an entire AWS data centre hosting all the infrastructure in an availability

zone has undergone a major disaster and is unavailable, then all resources in the production environment

have been configured to automatically failover to the resources hosted in the second availability zone.

● The application load balancer will do a periodic health check at the instance level and will send

a signal to Autoscaling Group to spin up a new web server in case it detects an unhealthy

instance. Autoscaling group will automatically terminate all unhealthy instances.

● An Aurora cluster storage volume spans multiple availability zones, which will be shared by

all primary nodes and Read Replicas nodes in the same cluster. When primary node fail, read

replica will be promoted. And a new read replica will spin up. You can have up to 15 Aurora

Read Replicas in 1 cluster. If you want to set up a Read Replica in another region, then you can

use this as a reference: https://stackoverflow.com/questions/46639969/how-can-we-create-cross-

region-rds-read-replica-using-aws-cloud-formation-templa

Page 11: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 11 of 25

● Nodes in Elasticache does not share the same cluster storage or individual storage. But data

from the primary node are asynchronously sync with the Read Replicas. Auto-failover has already

been enabled in the cloudformation stack. When primary node fail, read replica will be promoted.

And a new read replica will spin up in the same region. All Read Replicas originating from a

cluster can only exist in the same region where the cluster was created. Therefore, there is no

cross region Read Replication for elasticache.

In the case where we need to restore data to a Redis or Aurora cluster. We can pass in the most

recent snapshot ID to the AuroraBackup parameter and snapshot name to the RedisBackup parameter in

the cloudformation storage stack.

In the case where an entire Sydney region goes under, we can just upload the cloudformation stacks to

an s3 bucket in another region and redeploy the root stack. Snapshots can be copy into another region.

So, you can restore the data to a database in another region, but your encryption key (KMS) used by

the snapshots is constrained to Sydney and may not be recoverable. So, you may want to look up

cross-region management of KMS key.

Right now, route 53 health check is disable, but if the route53 health check is configured to be enabled

in the future. Route53 will automatically set up 15 health checkers.

If you choose the default health check interval of 30 seconds, each of the Route 53 health checkers in

data centers around the world will send your endpoint a health check request every 30 seconds. On

average, your endpoint will receive a health check request about every two seconds. If you choose

an interval of 10 seconds, the endpoint will receive a request more than once per second.

Reference: https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/health-checks-creating-

values.html

Data Security

Data At-Rest All Aurora and Redis cluster have been configured to used a KMS key to encrypt the data at-

rest in both the staging and production environment.

Any future usage of s3 for storing images, videos, logs, templates, etc can be encrypted with

KMS.

Data In-Transit

Since the redis-cli does not support TLS/SSL tunnelling, stunnel was used as a TLS/SSL tunnel

for the connection between the web server and the redis cluster: https://www.stunnel.org/

The current aurora configuration uses rds-ca-2015 TLS/SSL certificate pre-installed in the

Aurora instances to verified and established a trusted TLS/SSL connection between the ec2 and

aurora cluster. The pre-installed cert will expires on March 5, 2020. You’ll need to upgrade this

to rds-ca-2019 manually for now until AWS change the default CA cert to the 2019 edition:

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL.html

Page 12: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 12 of 25

Current Staging AWS infrastructure

Page 13: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 13 of 25

Current Prod AWS infrastructure

Page 14: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 14 of 25

AWS Infrastructure Cost

Cost calculator used: https://cloudcraft.co/

Note: EC2, RDS and Elasticache instances can be reserved for 1 year or 3 years with the ability

for you to pay all the cost up-front or partial up-front.

You can save more money by (a) paying full-upfront cost and/or (b) reserving instances for

3-years instead of 1 year.

Staging Infrastructure Cost: On-Demand pricing

Page 15: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 15 of 25

Staging Infrastructure Cost: Savings from Full up-front payment for

Reserved Instances

Page 16: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 16 of 25

Staging Infrastructure: Using the minimum number of instances and

smallest instance types

Page 17: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 17 of 25

Production Infrastructure Cost: On-Demand pricing

Page 18: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 18 of 25

Production Infrastructure Cost: Savings from Partial upfront payment for

Reserved Instances

Page 19: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 19 of 25

Production Infrastructure: Using the minimum number of instances to

achieve fault tolerance & high availability, and largest instance types

Page 20: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 20 of 25

Current Network Architecture

Reserved IPs

In any given subnet, liked 10.0.0.0/19  for example. There are always five IP addresses that are

reserved. This means that these IPs will not be assigned to any of the instances you spin up.

10.0.0.0 Network address.

10.0.0.1 Reserved by AWS for the VPC router.

10.0.0.2

Reserved for DNS server

10.0.0.3

Reserved by AWS for future use.

10.0.31.255

Network broadcast address.

Reference: https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Subnets.html

StagingVPC: Sydney

3 availability zones. Therefore, 2 subnets bit (2^2 = 4. For 3 AZs and 1 spare)

Each availability zone needs a Private and Public subnet . Thus, 1 subnet bit (2^1 = 2. For 1 public and 1

private subnet)

Parent network: 10.0.0.0/16:

10.0.0.0/18  —  AZ A

10.0.0.0/19 — Private

10.0.32.0/19-Public

Page 21: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 21 of 25

10.0.64.0/18 — AZ B

10.0.64.0/19 — Private

10.0.96.0/19-Public

10.0.128.0/18 — AZ C

10.0.128.0/19 — Private

10.0.160.0/19-Public

10.0.192.0/18 — Spare

CIDR network: 10.0.0.0/19

SubnetName SubnetID 1st Avail IP Last Avail IP

(BC-1)

Broadcast

Address

PrivateSubnet1 10.0.0.0 10.0.0.4 10.0.31.254 10.0.31.255

PublicSubnet1 10.0.32.0 10.0.32.4 10.0.63.254 10.0.63.255

PrivateSubnet2 10.0.64.0 10.0.64.4 10.0.95.254 10.0.95.255

PublicSubnet2 10.0.96.0 10.0.96.4 10.0.127.254 10.0.127.255

PrivateSubnet3 10.0.128.0 10.0.128.4 10.0.159.254 10.0.159.255

PublicSubnet3 10.0.160.0 10.0.160.4 10.0.191.254 10.0.191.255

CIDR Range 10.0.0.0/19

Netmask 255.255.224.0

(2 ^ (32-19)) - 5 = (2 ^ 13) - 5 = 8192 - 5 = 8187 Available IPs for you to use

Route Table

Public

Page 22: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 22 of 25

10.0.0.0/16 — Local

0.0.0.0/0  —  Internet Gateway

Internal-only (ie, Protected / Private)

Elastic IP - NAT Gateway

Security Group

Bastion-host:

SSH TCP 22 0.0.0.0/0, ::/0

*Unless Treiner have their own corporate IP range, I recommend starting up a bastion host in the public

subnet only when you need it. Otherwise, use system manager to do any sort of operation work.

ApplicationLoadBalancers:

HTTP TCP 80 0.0.0.0/0, ::/0

HTTPS TCP 443 0.0.0.0/0, ::/0

Web-Servers:

HTTP TCP 80 Source: ApplicationLoadBalancers

HTTPS TCP 443 Source: ApplicationLoadBalancers

SSH TCP 22 Source: Bastion-host

AuroraRDS:

MYSQL/Aurora TCP 3306 Web-Servers

Redis-elasticache

Custom TCP 6879 Source: Web-Servers

ProdVPC: Sydney

3 availability zones . Therefore, 2 subnets bit (2^2 = 4. For 3 AZs and 1 spare)

Each availability zone needs a Private and Public subnet . Thus, 1 subnet bit (2^1 = 2. For 1 public and

1 private subnet)

Page 23: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 23 of 25

Parent network: 10.1.0.0/16:

10.1.0.0/18  —  AZ A

10.1.0.0/19 — Private

10.1.32.0/19-Public

10.1.64.0/18 — AZ B

10.1.64.0/19 — Private

10.1.96.0/19-Public

10.1.128.0/18 — AZ C

10.1.128.0/19 — Private

10.1.160.0/19-Public

10.1.192.0/18 — Spare

CIDR network: 10.1.0.0/19

SubnetName SubnetID 1st Avail IP Last Avail IP

(BC-1)

Broadcast

Address

PrivateSubnet1 10.1.0.0 10.0.0.4 10.0.31.254 10.0.31.255

PublicSubnet1 10.1.32.0 10.0.32.4 10.0.63.254 10.0.63.255

PrivateSubnet2 10.1.64.0 10.0.64.4 10.0.95.254 10.0.95.255

PublicSubnet2 10.1.96.0 10.0.96.4 10.0.127.254 10.0.127.255

PrivateSubnet3 10.1.128.0 10.0.128.4 10.0.159.254 10.0.159.255

PublicSubnet3 10.1.160.0 10.0.160.4 10.0.191.254 10.0.191.255

CIDR Range 10.1.0.0/19

Page 24: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 24 of 25

Netmask 255.255.224.0

(2 ^ (32-19)) - 5 = (2 ^ 13) - 5 = 8192 - 5 = 8187 Available IPs for you to use

Route Table

Public

10.1.0.0/16 — Local

0.0.0.0/0  —  Internet Gateway

Internal-only (ie, Protected / Private)

Elastic IP - NAT Gateway

Security Group

Bastion-host:

SSH TCP 22 0.0.0.0/0, ::/0

*Unless Treiner have their own corporate IP range, I recommend starting up a bastion host in the public

subnet only when you need it. Otherwise, use system manager to do any sort of operation work.

ApplicationLoadBalancers:

HTTP TCP 80 0.0.0.0/0, ::/0

HTTPS TCP 443 0.0.0.0/0, ::/0

Web-Servers:

HTTP TCP 80 Source: ApplicationLoadBalancers

HTTPS TCP 443 Source: ApplicationLoadBalancers

SSH TCP 22 Source: Bastion-host

AuroraRDS:

MYSQL/Aurora TCP 3306 Web-Servers

Redis-elasticache

Custom TCP 6879 Source: Web-Servers

Page 25: Treiner AWS infrastructure DocumentationTreiner AWS infrastructure Documentation Prepared for: Treiner.co Prepared by: Norman Wong Chiew Look Date: 3 December 2019 Disclosure: I joined

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) Page 25 of 25

Architecture Design consideration

Traditional server approach vs containerisation vs serverless

approach

The serverless approach is a paradigm in designing application native to the cloud, a completely

serverless approach would have been considered if the application development was just starting out,

and can just run on api gateway and lambda functions without the need of an ec2 web server.

The traditional server approach is suitable for lifting and shifting on-premise application or application

that was already in development but not build natively for the cloud. This is especially suitable for the

laravel app, as it is a MVC monolith that was in development for 2-4 months before I began designing

the AWS architecture.

Since the application was already using Vagrant for consistent development across developers’

machines, docker which basically does the same thing as Vagrant was not considered.

But if deploying docker is needed in the future, you can use AWS fargate, which is an AWS-managed

service for container orchestration. This means that you don’t have to manage any server or do any

security patching.

Alternatively, you can use AWS ECS which is just a self-managed docker orchestration on an EC2

cluster. With the current deployment approach, you will need to dedicate 1 ec2 instance completely to

act as a web server or any other server. But with ECS, you can have multiple containerised web servers

hosted in 1 ec2 instance that are completely isolated from each other.

Aurora RDS: Serverless aurora vs traditional aurora

Serverless aurora only support up to aurora-mysql engine 5.6.x which had an issue with the laravel

application which could only be fixed with an engine version upgrade to 5.7.x.

Elasticache: Redis vs Memcached

Redis supports memory data persistence and provides two major persistence policies, RDB

snapshot and AOF log. Whereas, Memcached does not support data persistence

operations: https://medium.com/@Alibaba_Cloud/redis-vs-memcached-in-memory-data-

storage-systems-3395279b0941

This means that data stored on Redis cache survive on reboot(s) by default, which means that

there will be less Write requests in writing the user sessions to the Redis cluster and more

Read requests in retrieving the user sessions, which also means that we don’t have to

worry about adding more Redis shards/clusters so that we have more Write nodes.

Thus, we only need to worry about scaling out at the Redis Read Replica nodes level and not at

the shard/cluster level.