summit - automate best practices and operational health...
Post on 11-Jun-2018
218 Views
Preview:
TRANSCRIPT
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Heitor Lessa, Solutions Architect @ AWSStephen Gran, Senior Technical Architect @ Piksel
June 28th
Automate best practices and operational health for your AWS resources
with Trusted Advisor and AWS Health
What to expect from this session
• Learn about Trusted Advisor best practices and how to safely automate them in your environment.
• Get familiar with AWS Health and the Personal Health Dashboard (PHD).
• Learn how to automate remediation actions and customize Health alerts.
What’s in your AWS account(s)?
Availability Zone #1
www.example.com
Elastic Load Balancing
DatabaseEC2 instance
web appserver
Autoscaling Group #1
So what is Trusted Advisor (TA)?AWS Trusted Advisor provides best practices (or checks) in
four categories: cost optimization, security, fault tolerance, and performance improvement.
Red (action recommended)Yellow (investigation recommended)
Green (no problem detected)
AWS Trusted AdvisorOver 50 million recommendations provided to AWS customers resulted in $500m+ in cost savings for users of Trusted Advisor
“We estimate an average 33 percent monthly savings on our total AWS spend- Amit Vora, CTO for Hungama
How did Trusted Advisor help Hungama? It highlighted the three following things:
• Underutilized EC2 Instances
• Amazon EC2 Reserved Instances
• Underutilized EBS Volumes
Case Study – Hungama Digital Media
Using Trusted Advisor as a Web Service
AWS Trusted Advisor
AWS Lambda
Actions on AWS resources
AmazonCloudwatch
events
Notifications
With (not so) great automation come great risks
Production databases/instances could be considered idle.
- Low traffic period.
- Different system resource (e.g. memory) might be in use.
Database
Examples available in Githubhttps://github.com/aws/Trusted-Advisor-Tools
Trusted Advisor Best Practiceshttps://aws.amazon.com/premiumsupport/trustedadvisor/best-practices/
PHDAmazon
CloudWatchEvents
AWS Health and Personal Health Dashboard
Visibilityandtransparencyintoyourresources
Customnotificationsandautomatedactions
Remediation guidanceandknowledgearticles
Increased transparency and visibility
- Service Health Dashboard too generic- Increased transparency into underlying infrastructure- Remediation guidance for faster time-to-resolution- AWS Health API for easy integration- Custom notifications with predictable delivery- Automated actions for auto-remediation
AWS service integrations
Service-level insights into
healthAll AWS services
Amazon EC2
Amazon EBS
AmazonSES
Amazon VPC
AWS Direct Connect
Elastic Load Balancing
Amazon Elasticsearch Service
AmazonCognito
Amazon ElastiCache
AmazonRDS
Resource and service-level insights into
health
AWS Certificate Manager
AWSCloudTrail
Getting started with the Personal Health Dashboard
- From AWS Service Health Dashboard- From AWS website- From AWS Management Console navigation bar alert
AWSServicesandresourcesyouuse
Personal Health Dashboard
••describe-events••describe-event-details••describe-affected-entities••…
API
••Set Rules to extract events of interest••Set Targets for rules (Amazon
SNS, Amazon SQS, AWS Lambda, Amazon Kinesis)
Push notifications through
CloudWatch Events
HealthService
In-houseorthird-partymonitoringandevent
managementsystems
AWS Health Tools
aws/aws-health-tools
Automated actions in response to AWS Health events
Open source, community driven
Customized alerts in response to AWS Health events
AWS Health Tools - Examples
NotificationsviaSMS,SNS,Slack
Respondtoincidents:EC2storage,ELBScaling,PauseCodePipelinestages
aws/aws-health-tools
Notify Slack via AWS Lambda and Amazon Cloudwatch Events
Post alerts from AWS Health to a Slack Channel
Includes brief info about alert received
Quick access to PHD Consolehttps://git.io/vQspJ
Stop or terminate EC2 instances with instance store drive performance degraded
One or more physical storage drives affected
Instance storage performance degradation
Stop/Terminate EC2 Instances based on tagshttps://git.io/vQsVE
Disable AWS CodePipeline Stage Transition using AWS Lambda and Amazon Cloudwatch Events
Stop future deployments temporarily upon alerts
Prevents further stages to possibly fail
Manual intervention to re-enable after investigation https://git.io/vQsp1
Piksel – Cost Savings
Turn off unused environments
Scheduled daily deletion/creation
Resilience as a bonus
Recap
Minimize operational overhead
Improve platform resilience
Gain engineering excellence
AWS Trusted Advisor
AWS Health
Amazon Cloudwatch
Events
AWS Lambda
Automate Best Practices
OSS
top related