summit - automate best practices and operational health...

Heitor Lessa, Solutions Architect @ AWSStephen Gran, Senior Technical Architect @ Piksel

June 28th

Automate best practices and operational health for your AWS resources

with Trusted Advisor and AWS Health

What to expect from this session

• Learn about Trusted Advisor best practices and how to safely automate them in your environment.

• Get familiar with AWS Health and the Personal Health Dashboard (PHD).

• Learn how to automate remediation actions and customize Health alerts.

What’s in your AWS account(s)?

Availability Zone #1

www.example.com

Elastic Load Balancing

DatabaseEC2 instance

web appserver

Autoscaling Group #1

As you expand and change, entropy starts increasing:

Too much complexity! Time to optimize!

So what is Trusted Advisor (TA)?AWS Trusted Advisor provides best practices (or checks) in

four categories: cost optimization, security, fault tolerance, and performance improvement.

Red (action recommended)Yellow (investigation recommended)

Green (no problem detected)

AWS Trusted AdvisorOver 50 million recommendations provided to AWS customers resulted in $500m+ in cost savings for users of Trusted Advisor

How does it work?

“We estimate an average 33 percent monthly savings on our total AWS spend- Amit Vora, CTO for Hungama

How did Trusted Advisor help Hungama? It highlighted the three following things:

• Underutilized EC2 Instances

• Amazon EC2 Reserved Instances

• Underutilized EBS Volumes

Case Study – Hungama Digital Media

Building Automation

Using Trusted Advisor as a Web Service

AWS Trusted Advisor

AWS Lambda

Actions on AWS resources

AmazonCloudwatch

events

Notifications

With (not so) great automation come great risks

Production databases/instances could be considered idle.

- Low traffic period.

- Different system resource (e.g. memory) might be in use.

Database

Show me the money!

Turn idle instances off based on Trusted Advisor and Tags

Examples available in Githubhttps://github.com/aws/Trusted-Advisor-Tools

Trusted Advisor Best Practiceshttps://aws.amazon.com/premiumsupport/trustedadvisor/best-practices/

AWS Health and Personal Health Dashboard (PHD)AWS service health, notifications and automation

PHDAmazon

CloudWatchEvents

AWS Health and Personal Health Dashboard

Visibilityandtransparencyintoyourresources

Customnotificationsandautomatedactions

Remediation guidanceandknowledgearticles

Increased transparency and visibility

- Service Health Dashboard too generic- Increased transparency into underlying infrastructure- Remediation guidance for faster time-to-resolution- AWS Health API for easy integration- Custom notifications with predictable delivery- Automated actions for auto-remediation

AWS service integrations

Service-level insights into

healthAll AWS services

Amazon EC2

Amazon EBS

AmazonSES

Amazon VPC

AWS Direct Connect

Elastic Load Balancing

Amazon Elasticsearch Service

AmazonCognito

Amazon ElastiCache

AmazonRDS

Resource and service-level insights into

health

AWS Certificate Manager

AWSCloudTrail

AWS Personal Health Dashboard

Getting started with the Personal Health Dashboard

- From AWS Service Health Dashboard- From AWS website- From AWS Management Console navigation bar alert

How does the Personal Health Dashboard work?

AWSServicesandresourcesyouuse

Personal Health Dashboard

••describe-events••describe-event-details••describe-affected-entities••…

••Set Rules to extract events of interest••Set Targets for rules (Amazon

SNS, Amazon SQS, AWS Lambda, Amazon Kinesis)

Push notifications through

CloudWatch Events

HealthService

In-houseorthird-partymonitoringandevent

managementsystems

How does it work?

Examples

AWS Health Tools

aws/aws-health-tools

Automated actions in response to AWS Health events

Open source, community driven

Customized alerts in response to AWS Health events

AWS Health Tools - Examples

NotificationsviaSMS,SNS,Slack

Respondtoincidents:EC2storage,ELBScaling,PauseCodePipelinestages

Notify Slack via AWS Lambda and Amazon Cloudwatch Events

Post alerts from AWS Health to a Slack Channel

Includes brief info about alert received

Quick access to PHD Consolehttps://git.io/vQspJ

Stop or terminate EC2 instances with instance store drive performance degraded

One or more physical storage drives affected

Instance storage performance degradation

Stop/Terminate EC2 Instances based on tagshttps://git.io/vQsVE

Disable AWS CodePipeline Stage Transition using AWS Lambda and Amazon Cloudwatch Events

Stop future deployments temporarily upon alerts

Prevents further stages to possibly fail

Manual intervention to re-enable after investigation https://git.io/vQsp1

Code Pipeline stage transition disabled

PikselStephen Gran, Senior Technical Architect

Piksel – SaaS platform for video delivery

Piksel - Challenges

Maximize uptime and operational resource Minimize costs

Piksel - Automating Operational Health

ELB health check

Autoscaling groups

Cloudwatch metrics

Piksel – Some examples and their metrics

API server

Transcode worker

Container fleet

Piksel - Harder example

EC2 Health checks

Automatic disk attachment

Cloudwatch Auto Recover

Results

Lower transcode times

Higher confidence in the platform

Lower TCO - Staff and AWS spend

Piksel – Cost Savings

Turn off unused environments

Scheduled daily deletion/creation

Resilience as a bonus

It’s really, really hard to do what AWS does

More engineering capacity

Thank you!!

We are hiring!!

Minimize operational overhead

Improve platform resilience

Gain engineering excellence

AWS Trusted Advisor

AWS Health

Amazon Cloudwatch

Events

AWS Lambda

Automate Best Practices

Thank you!

summit - automate best practices and operational health...

Documents

2016 amazon virtual summit: feedvisor

advanced approaches to amazon vpc and amazon route 53 | aws...

should we automate? - institute for manufacturing ·...

aws paris summit 2014 - t1 - introduction à amazon ec2

2018 amazon virtual summit day 1

2016 amazon virtual summit: live q&a

the 2017 amazon virtual summit: day 1

for self-service predictions g data & a s automate your...

automate amazon s3 storage with alexandria

2018 amazon virtual summit - day 2

amazon ecs with docker | aws public sector summit 2016

ecommece summit atlanta navigating the amazon

automate - amazon s3...automate 6666hqt english español...

the 2017 amazon virtual summit: day 3

summit - amazon web services mark… · @ric__harvey

summit - amazon web services... · • amazon cloudwatch...

amazon search sellers summit presentation

aws summit berlin 2013 - amazon redshift

summit - amazon web services... · summit © 2019, amazon...

london community summit 2016 - chef automate