dpd:aws developer training

AWS:ACBJ:DPD

Our new AWS environments from the developer perspective.

- Welcome and thank you for taking time from your busy schedule to listen to me talk for a little while.

- A lot of work has gone into the cloud forklift

- A lot changes have been made to how ops and developers do things, and little of it has been communicated to non-cloud devs here.

- So Im going to cover some it today: aws services were using, how to develop against them, how staging/prod compares with on-prem, how were deploying now, as well as future plans for infrastructure.

- Feel free to stop me if you have questions along the way, and I will be stopped periodically for questions at the end of major sections.

AWS:Services

EC2 S3 SQS ElasticCache Lambda Functions CloudFront RDS Redshift - CodeDeploy

- AWS has a lot of services they provide

- We use a lot of them already

- Lets cover some of the serviecs we use and how we use them.

AWS:EC2

Elastic compute cloudVirtual servers (instances)

- Elastic Compute Cloud

- Virtual Machines, referred to as INSTANCES

- Many types, optimized by type of workload: t2 burstable, m4 general, c4 compute, x1,r3 - memory

- Its all about cost and aligning type to workload

- Very simliar to our servers on-prem, running CentOS, Apache, PHP, Elasticsearch, etc...

AWS:AMI

Amazon machine image

- Amazon Machine Images

- Usually shortened to AMI.

- For us, this is a pre-built linux server image that all new EC2 instances start out as.

- We create our own, using packer.io (hashicorp).

- We pre-build our AMIs with a certain version of Apache, PHP, Elasticsearch, Logstash, and a bunch of other software we use.

- The build pipeline can produce AMIs in AWS, as well as Vagrant boxes.

- This pipeline provides us with a much quicker provisioning time, both in Vagrant as well as in Autoscale groups.

- We didnt previously have this sort of automation, outside of some ansible-based post-install configure steps when we manually built servers.

AWS:AutoScaling

Automatically scaling EC2

- Auto Scaling Groups

- This sub-service of EC2 allows us to define instances as part of a group, with a known recipe for instance provisioning, called a Launch Config

- This group has a Minimum, Maximum, and Desired number of instances.

- The Desired number can change based on alarms; such as CPU, or Network utilization, causing more EC2 instances to be launched to handle more load.

- Scaling can also be timed, if a lot of traffic is expected at a certain time.

AWS:ElasticLoadBalancing

Dynamic load balancer

- Elastic Cload Balancers

- Usually shortened to ELB

- On-prem we use a pair of redundant Netscalers to load-balance our applications.

- Hand out traffic evenly to a group of application servers, the back-end

- Autoscale groups can be linked to Elastic Load Balancers.

- So, when an Autoscale group adds a new EC2 instance to itself, it also gets added to the ELB.

- Recently, Advanced Load Balancers have been added, which allows for layer 7 routing based on a ruleset.

- ELBs have healthchecks to make sure the application is performing well enough to receive traffic from users; keepalive.php

AWS:CloudWatch

Cloud & network monitoring

- Cloudwatch defaultly collects metrics on all AWS resources: EC2, ELB, SQS, etc

- EC2: CPU, Disk, Network Utilization

- ELB: latency, 400/500s, HealthyHosts

- We also send our logs to them from application servers, for centralized access.

- Also there are alarms, based on the metrics or log filters, that can alert us through Pagerduty.

- This server performs a similar function to our existing on-prem nagios and cacti services, as well as the logview ELK stack.

AWS:S3

Simple storage service

- Simple Storage Service

- Highly-availble object storage platform, 11 nines of durability, globally distributed

- The unit is a bucket

- Technically flat, but prefixes allow organization by folder

- It is not a filesystem, listing lookups are expensive.

- Were using S3 for pdf delivery, in conjunction with Cloudfront and to store media assets for the medialibrary.

- Multiple classes of storage; regular, infrequently accessed, and reduced redundancy

- Each succeeding type is cheaper, with usage caveats

- Were using S3 very similarly to our on-prem gluster, or storage platform.

AWS:RDS

Relational database service

- Relational Database Service

- Aurora, Mysql, MariaDB, PostGreSQL, Oracle, SQLServer; managed by AWS

- Built-in failover and disk scaling, when using their flavor: Aurora

- AWS manages upgrades and patching

- All of our databases are on RDS

- Snapshot and restore via API

- RDS is replacing our on-prem hand-built and managed cluster of mysql databases.

AWS:ElastiCache

Memcache or redis

- Elasticache is a caching service

- Protocol compliant with memcache

- Also supports redis

- When used with the AWS php extension, the service can scale to match workload without code change

- Elasticcache has replaced our on-prem memcached servers.

AWS:SQS

Simple queuing service

- Simple Queuing Service

- Replaced our on-prem RabbitMQ cluster

- Scaling and highly available

- Recently announced fifo queues with gauranteed order and non-duplicates (not available in our region yet).

- With these caveats in mind, we recommend the use of timestamps and unique Ids in messages on SQS to deal with the duplicate and out of order issues.

- CMS, Medialibrary, and track.bizjournals.com use this to write scale Elasticsearch upserts. BI is using SQS as well.

AWS:CodeDeploy

Automated & trackable artifact deployment

- Code deployment via API

- Handles reverts if a deployment fails on the first targets

- Has a nice API to track deployments per application, and deployment group (which we think of as environment)

- Has hooks into Autoscale groups, so that when a group provisions a new server, it automatically gets the code its groups peers are running, and the instance is only added to the ELB when the deployment is finished.

- CodeDeploy has replaced our rsync-based deployment scripts used on-prem.

- I will cover our usage of this in more depth later in this presentation.

AWS:CloudFront

Content distribution network

- Content Distribution Network

- You can use IPs or S3 buckets as origin servers

- Supports HTTP2 and HTTPS, as well as RTMP.

- Edge locations all over the world, with geo-based DNS resolution, to minimize transfer time, when the cache is hit.

- Our Pdf editions are delivered this way, straight from an S3 origin, with expiring links, called Pre-Signed URLs.

- Assets.bizjournals.com was one of our first AWS services we utilized during the forklift.

- Integral part of our future plans for full-page caching.

- Replaced our previous CDN provider, Akamai.

AWS:Redshift

Data-warehouse solution

- Datawarehouse storage, queryable via SQL. Cheaper and more performant for big data sets than a plain RDS.

- A lot of 3rd party BI platforms integrate with it; DOMO, tableau

- Pushing clickstream data to it from Omniture now.

AWS:Lambda

Serverless compute engine

- Serverless Compute Engine

- Executes blocks of code; supports Java, Python, NodeJS, and now C#

- Limited to 5 minutes of execution,1.5GB of memory, 1024 file descriptors, and 1024 process threads

- Triggered by a schedule, like cron, or by events; from S3, or SQS/SNS (queue services).

- Combined with API Gateway, its a cheap and fast way to create REST services without having to manage servers.

- Example, we copy new and changed medialibrary objects from production s3 bucket to staging, based on S3 events.

- Example; some RDS staging snapshot restores are done this way.

- Some monitoring automation: server notify groups in New Relic.

AWS:CloudFormation

Infrastructure as code

- Infrastructure as Code

- We can define our infrastructure with code, more specifically as a JSON document, in a parameterized way.

- The basic building blocks of a Cloudformation document are; parameters and resources.

- Some example parameters are: instance type and size, environment name, keypair name, ami, etc

- Some example resources are: ELB, Autoscale Group (with tags built based on parameters), Cloudwatch Alarms to watch system metrics, etc

- This service allows us to quicky create replicated clusters of application servers, in a particular environment, with a particular AMI, via API call.

AWS:Route53

Domain name system

- Amazons DNS System

- managed via API.

- Can resolve based on geographic location, or service response time.

- Can slowly transition between two destinations, using weighting, maybe to test a new infrastructure size.

- DNS Changes are much quicker now than in the past when IT owned this.

AWS:VPC

Virtual private cloud

- Our private network environment in AWS

- We have it split into internal subnets, 10.220.x.x, and public subnets, where the IPs are dynamic.

- Were keeping ELBs only in the public subnets, all the app servers and databases are in the private subnets.

- Security Groups, very much like firewall rules; IPs and ports, but also other Sgs. Acbj.production can always talk to itself.

- We have a VPN from ACBJ offices into the private subnets in our VPC, for your dev access.

- All requests out of our VPC go through NAT Gatway services, that always have known IPs, for whitelisting with partner organizations.

AWS Services

Questions?

How do you use these services
( in development )?

- Development code needs to access these services

- Production code needs to access these services

- OPS needs a way to track who uses what and to protect us all from ourselves

- Additionally, Devs need to be able to get into AWS dev instances via SSH.

Developer Workflow

Vagrant with AWS keysShared AWS dev instancesSSH access

- Lets talk about your workflows with these services and resources:

- Vagrant boxes and AWS Service Keys

- Shared AWS Dev EC2 Instances

- SSH Access to EC2 instances

Vagrant

Matches AMIs software stackFewer bugs in productionKeys to access AWS services

- Vagrant allows development in a sandbox with the exact same application stack as our production infrastructure.

- The new vagrant is built using the same pipeline that we use to build AMIs

- Less chance of environment-related errors making it to production.

- Less time you have to spend playing with supporting software as a dev, more time coding.

- Less time spent by ops supporting your random environment, in fact we wont anymore

Vagrant AWS Keys

Services, not servers

Install as root and vagrantNever in code

- To use AWS Services in development, you will need AWS access keys. These are keys to Services, not servers, server or instance access I will cover in a few slides.

- Take a look at the provided cheatsheet for an example key installation; Vagrant AWS Key Install

- The keys come in two parts: access key id, and secret access key

- Each developer has a keypair.

- Your keypair needs to be done as the root and the vagrant user, on the vagrant box.

- EC2 instances authenticate against AWS Services another way, so NEVER put these keys in your code.

- Also, dont share your keys, they all have the same permissions, so theres no reason to.

Shared Dev Servers in AWS

Manually created for nowPlans to automate as an APISSH Access

- Manually created for now; magento, media, jenkins.

- Plan to automate the process so that you can provision for yourself in the future via API call.

- Connecting to these servers via ssh is done via a shared ssh key: Development.pem

- Shared, so everyone uses the same.

- Production and staging infrastructure uses a different ssh key

- See cheatsheet for an example of how you connect via .pem keys

- If you need a copy let us know.

AWS:IAM

Identity and access management

RolesPolicies&Users/Groups

- Identity and Access Management.

- AWSs access control system, naturally another API

- Very granular, and via CloudTrail, allows us to audit all service access at the user level.

- Integrated with most AWS services that you will interact with.

- Big concepts are POLICIES, ROLES, USERS, and GROUPS

IAM:Policies

- The heart of access control is the policy.

- These are json documents granting certain ACTIONS on certain RESOURCES

- ARN Amazon Resource Number

- See this example granting access to an S3 bucket

- AWS Services have a default DENY (firewalls/security groups too)

IAM:Users vs. Roles

Policies are assigned to both for access to services

EC2 RolesUsers Dev Environments

- These policies are then applied to roles and users or groups

- EC2 instances have Roles, which have policies

- Users/groups just have policies

Application POV

- When an application using the AWS SDK needs access to a service, it follows this flow

- First, it looks within its application environment for configured keys

- Then it looks to see if its on an EC2 instance, and if that instances roles permit the access.

- Reiterate: NO KEYS IN CODE! Its a security concern in the case of compromise, and it hinders our ability to rotate keys or decommission keys when employees leave.

Access and Workflow

Questions?

- When an application using the AWS SDK needs access to a service, it follows this flow

- First, it looks within its application environment for configured keys

- Then it looks to see if its on an EC2 instance, and if that instances roles permit the access.

- Reiterate: NO KEYS IN CODE! Its a security concern in the case of compromise, and it hinders our ability to rotate keys or decommission keys when employees leave.

Cloud vs. On-Premises

Static serversSingle datacenterHand/script built serversSlow server provisioningTrustable storage

AutoscalingMulti-datacenterInfrastructure as codeQuick provisioningInstances disposable

- Now lets compare our on-prem vs. cloud production and staging environments.

- Multi-datacenter is built in, for more resiliency.

- Instances are disposable and autoscaling, which has a particular set of a implications.

- We can more easily and quickly setup replicatable environments

- Monitoring and log access has changed alot

On-Premises Architecture

- Our previous setup

- 20 static servers, some bare-metal, some vms.

- Serving multiple applications on the same servers, sharing the same database cluster, behind a single loadbalancer.

- Problems with single apps affected all of them

- Network capacity was a thing to be considered, see previous point

AWS/Cloud Architectures

- This is a diagram of our www.bizjournals.com infrastructure in AWS.

- Each application has its own servers and loadbalancer.

- Grey blocks are CloudFormation defined infrastructure; parameterized for what changes, such as instance type, AMI, and application environment.

- Upon instance creation, scripts on the instance read the instance metadata from the EC2 api and then writes the application environment to a file that our apache configs and the code can access.

- Based on this environment definition, the code then know which datastores to access; so production data isnt changed by staging apps and vice versa.

- Apps are less coupled to each others performance, which means more uptime generally.

- Autoscaling means clusters respond to increased usage and load.

Environmental Separation

Keeping staging and production separate

but...

also keeping them as similar as possible

- We still have environmental separation, obviously, and weve made it better

- Our infrastructure needs to be as similar as possible between staging and production, only differing in what environment gets passed to the application.

- Our datastores and other supporting services, whether a database or an S3 bucket, also need to be as similar as possible between environments, but still separate.

- So we have some naming standards that help with this automation, and with visibility during troubleshooting scenarios.

- AWS Services provision really quickly and scale easily, so theres less cost in managing multiple versions of everything, a big win.

- Were building our infrastructure with code, which also makes it cheap and easy to provision multiple controllable environments.

- These ideas are some of the biggest wins of the move to AWS.

Naming Standards: Datastores

-.bizj-.com

cms-elasticsearch.bizj-production.com

Exception to the rule (naturally):

bizjournals-db-(read|write).bizj-production.com

- See cheatsheet for these example of naming schemes.

- No one version of anything; always environment-specific versions.

- Speeds up troubleshooting when you need to verify data between environments.

- Datastores: mysql, elasticsearch, solr, etc

- Keep an eye out for aberations, and lets try to fix them.

Naming Standards: AWS Services

-

s3://acbj-medialibrary-production

- See cheatsheet for these example of naming schemes.

- No one version of anything; always environment-specific versions.

- AWS Services:SQS, S3, Lambda, etc...

AutoScaling, Disposable Instances

AWS shuts down instancesAutoscaling adds instancesDont store app state on the filesystemUse a datastore for app state

Think about failure

Databases might failover mid-transaction.

Instances might fail before writing state.

Think about failure more...

Monitoring & Logging

Instances come and go

You cant trust their filesystems?

What about monitoring them, and logs?

Monitoring

Server monitoring is an agent, so new ones provision themselves

Application performance monitoring is also an agent.

Synthetics are manually created to measure client-performance.

Application health, front-end and back-end and all other critical monitoring is in New Relic, accessible to everyone.

- On-prem, instances were manually configured in a centralized polling server, which gathered data and alerted on certain data conditions.

- Now instances are built with agents that turn on and then push data to a centralized system for alerting.

- Servers: CPU, mem, disk, etc

- APM: back-end code transactions and performance

- Synthetics: real-browser testing, much like selenium, written in Node, so you can do asserts and manage headers and so forth.

Logging

Logs get shipped to Cloudwatch

See CHEAT_SHEET.md for details on access.

Access from the command line for developers, so grep and all the regular tools work.

Questions?

How are we deploying now?

Bob does it right?

No more rsync.

New scripts, using CodeDeploy

New approach to what gets deployed

Artifact

Application code

Configs: Apache,Elasticsearch,logrotate

The same exact artifact in very environment

- This is a new idea for us, but backed up by industry best-practices for Stored Builds

- We build an artifact, containing the code itself and all the software and systems configuration that support that code.

- These parts are turned into an artifact, which can run in any environment.

- When CodeDeploy receives the API call, it references the artifact location in S3.

DPD Codedeploy Process

- Then, the CodeDeploy agent runs on every instance in that deployment group

-

Codedeploy Implications

Deployments are a little slower

They can run in parallel

Trackable; cause/initiator, app, env

Revertable without dev help

Whats next?

Working on automated dev environment creation

Research other AWS services yourself and talk to your dev architect about them.

Questions?

- We plan to implement an API or script that allows the dynamic launching of new environments, that expire.

- Feel free to investigate other AWS services and mention these to your teams dev architect for consideration.

- Any questions?

dpd:aws developer training

Engineering