(dvo312) sony: building at-scale services with aws elastic beanstalk

Post on 13-Jan-2017

1.751 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Sumio Okada, Engineer, Sony

Shinya Kawaguchi, Engineer, Sony

October 2015

DVO 312

Building At-Scale Services

with AWS Elastic BeanstalkBuild a Cloud-native Authentication and Profile Management Platform on AWS

What to expect from the session

You will learn how to use AWS Elastic Beanstalk:

• As a platform to easily build customized web application at scale on

AWS.

• To seamlessly build cloud-native applications with other AWS

services.

Agenda

• Introduction

• Architecture

• Implementation

• Conclusion

Introduction

Who are we?We provide cloud solutions for Sony products and applications.

TV Side View

Smart Tennis Sensor Smart B-Trainer

Play Memories Online

Previous platform

An incident

Previous platform

• Built on the top of IaaS

• Self managed ‘base services’

• Monolithic system

Motivation of rebuild

• Agility

• Robustness

• Efficiency

Achievement - agility

BeforeItem

Deployment time Half a day 40 Min.

Zero downtime release

Release trouble rate 30% 0%

After

Release interval Bi-weekly NA (on demand)

Achievement - robustness

Before AfterItem

Access surges impact Unstable or down No impact

IaaS trouble impact Service damage No impact

Emergency operation Auto recover/healing

Related service down Affecting an entire system Minimum impact

Achievement - efficiency

Before AfterItem

Config management Manual Git (Infrastructure as Code)

7+ self-managed

services

0Infra for management

Scaling Not flexible Auto Scaling

Architecture

Auth & Profile

Mutually independent microservices

Service Providers

Frontend

Backend

Third party

Authentication

Services

Service Providers

Third party

Authentication

Services

Backend

Authentication and profile management system

Frontend

Auth & Profile

System overviewAuthentication and profile management system - 1

Public

PublicPrivatePublic

PrivatePublic

AZ-2

us-west2

AZ-1

NAT

NAT

HA

Service Providers

NATAPI

NATAPI

S3

Data Pipeline

Batch

EC2

Resource

Batch

Config

Log

Backup

Profile

DB

DynamoDB

API Call DynamoDB/S3

Route53

Third party

Authentication

Services

System overviewAuthentication and profile management system - 2

Public

PublicPrivatePublic

PrivatePublic

AZ-2

us-west2

Route53

AZ-1

S3

Service Providers

API Call DynamoDB/S3

Data Pipeline

Batch

EC2

Resource

NAT

NATAPI

NATAPI

NAT

Batch

Config

Log

Backup

Profile

DB

DynamoDB

HA

Third party

Authentication

Services

us-west2

System overview – CloudFormationBase layer

Public

PublicPrivatePublic

PrivatePublic

AZ-2

AZ-1

S3

NAT

NATProfile

DB

Dynamo DB

CloudFormation

HA

Public

PublicPrivatePublic

PrivatePublic

AZ-2

us-west2

AZ-1

S3

NAT

NATProfile

DB

Dynamo DB

HA

System overview - Elastic BeanstalkApplication layer

Elastic Beanstalk

NATAPI

NATAPI

Continuous delivery system

Code Repository

Development

Push Code

3 Build

Kick off

4 Unit Test

5 Push Image

6 Provision & Deploy

7 Sanity Test

Result

Delivery system without self-managed infrastructure

1

2

3

4

6

7

8

Development

QA5 Integration Test5

Get Image

Production

Throttling and Circuit BreakerSelf-defense for robustness

Throttling Circuit Breaker

APIs

Throttling Circuit Breaker

Third party

Authentication

Services

Zero-management infrastructure

EC2

Cloud Watch,

Logs

SNS

S3

Lambda

Redshift

Targets Monitoring

Metrics

Notification / Communication

Log Analysis

Logs

Import

Logs,

Metrics

Implementation

Auth

entication &

Pro

file

Managem

ent

Pla

tform

Implementation - motivation

Reproducible

Scalable

Highly available and fault tolerant

Secure and robust

Transparent

Auth

entication &

Pro

file

Managem

ent

Pla

tform

Implementation - motivation

Reproducible

Scalable

Highly available and fault tolerant

Secure and robust

Transparent

Infrastructure as code

• Automated operations

• Version control

• Continuous delivery

Infrastructure as code

• Versioning:

• CloudFormation templates

• Elastic Beanstalk configuration files (.ebextensions/*.config)

• Application/environment configuration files

• Automation scripts

Auth

entication &

Pro

file

Managem

ent

Pla

tform

Implementation - motivation

Reproducible

Scalable

Highly available and fault tolerant

Secure and robust

Transparent

Auto Scaling based on custom metric

• Custom Metric via Data Pipeline

AppApp

Alarms

ELB Metrics

ELB MetricsCloudWatch Data Pipeline

Auto Scaling group

Custom Metric

(Successful Response Rate per Instance)

Auto Scaling based on custom metric

• Custom scaling policies via .ebextensionsResources:

AutoScalingScaleOutPolicy:

Type: AWS::AutoScaling::ScalingPolicy

Properties:

AdjustmentType: ChangeInCapacity

AutoScalingGroupName: { "Ref" : "AWSEBAutoScalingGroup" }

ScalingAdjustment: 2

AutoScalingScaleOutAlarm:

Type: AWS::CloudWatch::Alarm

Properties:

Namespace: { "Fn::GetOptionSetting" : { "OptionName" : "AutoScalingMetricNamespace" } }

MetricName: { "Fn::GetOptionSetting" : { "OptionName" : "AutoScalingMetricName" } }

Dimensions: [ { "Name" : "LoadBalancerName", "Value" : { "Ref" : "AWSEBLoadBalancer" } } ]

...

AlarmActions: [ { "Ref" : "AutoScalingScaleOutPolicy" } ]

Auto Scaling based on custom metric

Disable default scaling policies via .ebextensionsResources:

AWSEBCloudwatchAlarmHigh:

Type: AWS::CloudWatch::Alarm

Properties:

AlarmActions: [ { "Ref" : "AWS::NoValue" } ]

AWSEBCloudwatchAlarmLow:

Type: AWS::CloudWatch::Alarm

Properties:

AlarmActions: [ { "Ref" : "AWS::NoValue" } ]

Auth

entication &

Pro

file

Managem

ent

Pla

tform

Implementation - motivation

Reproducible

Scalable

Highly available and fault tolerant

Secure and robust

Transparent

High availability for application

• Zero downtime deployment

• Auto healing based on deep health check

• Disk space shortage prevention

Zero downtime deployment

Auto Scaling group

• Rolling deployments

• Update application instances one by one

Batch

Batch

Batch

App

Working

App

Working

App

Working

Zero downtime deployment

Auto Scaling group

• Rolling deployments

• Update application instances one by one

Batch

Batch

Batch

App

Working

App

Working

App

Updating

Zero downtime deployment

• Rolling deployments via .ebextensionsoption_settings:

"aws:elasticbeanstalk:command":

BatchSizeType: Fixed

BatchSize: 1

Zero downtime deployment

Conflict between rolling deployments and scaling out

• Taken care of by Elastic Beanstalk

Zero downtime deployment

• Rolling updates

• Dynamic batch size

Auto Scaling group

MinSize 2

MaxSize 10

Batch

Batch

App

Working

App

Working

App

Working

App

Working

Increased by

scaling out

Zero downtime deployment

• Rolling updates

• Keep the number of in-service instances

Auto Scaling group

MinSize 2

MaxSize 10

Batch

Batch

App

Working

App

Working

App

Working

App

Working

New

Launching

New

Launching

Zero downtime deployment

• Rolling updates

• Keep the number of in-service instances

Auto Scaling group

MinSize 2

MaxSize 10

BatchApp

Working

App

Working

New

Launching

New

Launching

BatchNew

Working

New

WorkingApp

Terminating

App

Terminating

Zero downtime deployment

• Rolling updates via .ebextensionsoption_settings:

"aws:autoscaling:updatepolicy:rollingupdate":

RollingUpdateEnabled: true

MaxBatchSize: <num of running instances> / 2 # eg.) 2

MinInstancesInService: <num of running instances> # eg.) 4

Zero downtime deployment

Tradeoff

• Rolling deployments/updates

Definite app version switching

Low tolerance to deployment failure (rolling deployments)

Zero downtime deployment

Tradeoff

• Rolling deployments/updates

Definite app version switching

Low tolerance to deployment failure (rolling deployments)

• CNAME swap

High tolerance to deployment failure

DNS propagation

Zero downtime deployment

Tradeoff

• Rolling deployments/updates

Definite app version switching

Low tolerance to deployment failure (rolling deployments)

• CNAME swap

High tolerance to deployment failure

DNS propagation

Auto healing based on deep health check

• Deep health check

• Accuracy of system time

• Accessibility to main database (DynamoDB)

Auto healing based on deep health check

• Deep health check configuration via .ebextensionsoption_settings:

"aws:elasticbeanstalk:application":

"Application Healthcheck URL": /1/status

"aws:elb:healthcheck":

Interval: 15

Timeout: 10

HealthyThreshold: 3

UnhealthyThreshold: 3

Auto healing based on deep health check

• Auto healing configuration via .ebextensionsResources:

AWSEBAutoScalingGroup:

Type: AWS::AutoScaling::AutoScalingGroup

Properties:

HealthCheckType: ELB

Auto healing based on deep health check

Rolling deployments with auto healing configuration

Problem

• Unexpected instance termination caused by Elastic Beanstalk

Auto healing based on deep health check

Rolling deployments with auto healing configuration

Problem

• Unexpected instance termination caused by Elastic Beanstalk

Workaround

• Suspend HealthCheck process in AWSEBAutoScalingGroup

during rolling deployments

Disk space shortage prevention

• Docker image local cache size

0%

20%

40%

60%

80%

100%

1 2 … n

Free

Docker Image Local Cache

System

Rolling Deployments

Dis

k U

sage

Pulling new layers

Disk space shortage prevention

• Remove unused Docker images via .ebextensionsfiles:

"/opt/elasticbeanstalk/hooks/appdeploy/post/99_01_remove-unused-docker-images.sh":

mode: "000755"

owner: root

group: root

content: |

#!/bin/bash

docker images | grep -v "aws_beanstalk/" | grep -v "REPOSITORY" \

| xargs -I {} /bin/bash -c '

repository=$(echo "{}" | awk "{ print \$1 }")

tag=$(echo "{}" | awk "{ print \$2 }")

image_id=$(echo "{}" | awk "{ print \$3 }")

docker rmi $image_id || docker tag $image_id $repository:$tag || true

' || true

Disk space shortage prevention

• Docker container log size

• Container logs captured by Elastic Beanstalk

• /var/log/eb-docker/containers/eb-current-app/*-stdouterr.log

• Original container logs

• /var/lib/docker/containers/<cid>/<cid>-json.log

Disk space shortage prevention

• Docker container log size

• Container logs captured by Elastic Beanstalk

Rotated

• Original container logs

Keeps growing in size

Disk space shortage prevention

• Docker container logs truncation via .ebextensionsfiles:

"/etc/cron.hourly/cron.logtruncate.docker.json.log.conf":

mode: "000755"

owner: root

group: root

content: |

#!/bin/sh

# truncate docker container logs here.

# see appendix for the actual script implementation.

...

High availability for NAT

• NAT instance in AutoScalingGroup

• Periodic route table monitoring

NAT instance in AutoScalingGroup

• Static resources created via CloudFormation

Public Subnet

Public Subnet

Private Subnet for Apps

Private Subnet for Apps

AZ-2

AWS Region

AZ-1

tag:NetworkSegment NAT-A

tag:NetworkSegment NAT-B

Internet

MinSize 1

MaxSize 1

MinSize 1

MaxSize 1

NAT instance in AutoScalingGroup

• Dynamic NAT instances

Public Subnet

Public Subnet

Private Subnet for Apps

Private Subnet for Apps

AZ-2

AWS Region

AZ-1

NAT

Pending

NAT

Pending

tag:NetworkSegment NAT-A

Public IP

Internet

tag:NetworkSegment NAT-B

Public IP

tag:NetworkSegment NAT-A

tag:NetworkSegment NAT-B

AutoScalingGroup launches

new NAT instance.

NAT instance in AutoScalingGroup

• Dynamic NAT instance configuration via cloud-init

Public Subnet

Public Subnet

Private Subnet for Apps

Private Subnet for Apps

AZ-2

AWS Region

AZ-1

NAT

Running

NAT

Running

tag:NetworkSegment NAT-A

Elastic IP

Internet

tag:NetworkSegment NAT-B

Elastic IP

tag:NetworkSegment NAT-A

tag:NetworkSegment NAT-B

Disable SRC/DST check,

Assign Elastic IP, etc...

NAT instance in AutoScalingGroup

• Route table lookup

Public Subnet

Public Subnet

Private Subnet for Apps

Private Subnet for Apps

AZ-2

AWS Region

AZ-1

NAT

Running

NAT

Running

Internet

New NAT Instance looks up

route tables based on tag.

tag:NetworkSegment NAT-A

tag:NetworkSegment NAT-B

tag:NetworkSegment NAT-A

Elastic IP

tag:NetworkSegment NAT-B

Elastic IP

NAT Instance in AutoScalingGroup

• Dynamic route configuration

Public Subnet

Public Subnet

Private Subnet for Apps

Private Subnet for Apps

AZ-2

AWS Region

AZ-1

NAT

Running

NAT

Running

tag:NetworkSegment NAT-A

tag:RoutingStatus OK

tag:NetworkSegment NAT-B

tag:RoutingStatus OK

Internet

tag:NetworkSegment NAT-A

Elastic IP

tag:NetworkSegment NAT-B

Elastic IP

Periodic route table monitoring

• Running normally

Public Subnet

Public SubnetPrivate Subnet

Private Subnet

AZ-2

AWS Region

AZ-1

NAT

Running

NATApp

NATApp

NAT

Running

tag:NetworkSegment NAT-A

tag:RoutingStatus OK

tag:NetworkSegment NAT-B

tag:RoutingStatus OK

0.0.0.0/0 Active

tag:NetworkSegment NAT-A

Internet

0.0.0.0/0 Active

tag:NetworkSegment NAT-B

NAT Instances monitor route tables

located in different AZs periodically.

Periodic route table monitoring

• Black hole route detection

Public Subnet

Public SubnetPrivate Subnet

Private Subnet

AZ-2

AWS Region

AZ-1

NAT

Terminated

NATApp

NATApp

NAT

Running

tag:NetworkSegment NAT-A

tag:RoutingStatus OK

tag:NetworkSegment NAT-B

tag:RoutingStatus OK

0.0.0.0/0 Black Hole

tag:NetworkSegment NAT-A

Internet

0.0.0.0/0 Active

tag:NetworkSegment NAT-B

Healthy NAT Instance detects

blackhole internet route.

AWS Region

Periodic route table monitoring

• Outbound traffic takeover

Public Subnet

Public SubnetPrivate Subnet

Private Subnet

AZ-2

AZ-1

NAT

Terminated

NATApp

NATApp

NAT

Running

tag:NetworkSegment NAT-A

tag:RoutingStatus TakenOver

tag:NetworkSegment NAT-B

tag:RoutingStatus OK

Internet

0.0.0.0/0 Active

Healthy NAT Instance takes

over outboud traffic to internet.

tag:NetworkSegment NAT-A

tag:NetworkSegment NAT-B

AWS Region

Periodic route table monitoring

• Outbound traffic takeover

Public Subnet

Public SubnetPrivate Subnet

Private Subnet

AZ-2

AZ-1

NAT

Terminated

NATApp

NATApp

NAT

Running

tag:NetworkSegment NAT-A

tag:RoutingStatus TakenOver

tag:NetworkSegment NAT-B

tag:RoutingStatus OK

Internet

0.0.0.0/0 Active

NAT

Pending

tag:NetworkSegment NAT-A

AutoScalingGroup launches

new NAT instance.

tag:NetworkSegment NAT-B

AWS Region

Periodic route table monitoring

• Route table lookup

Public Subnet

Public SubnetPrivate Subnet

Private Subnet

AZ-2

AZ-1

NAT

Terminated

NATApp

NATApp

NAT

Running

tag:NetworkSegment NAT-A

tag:RoutingStatus TakenOver

tag:NetworkSegment NAT-B

tag:RoutingStatus OK

Internet

0.0.0.0/0 Active

NAT

Running

tag:NetworkSegment NAT-A

tag:NetworkSegment NAT-B

New NAT Instance looks up

route tables based on tag.

AWS Region

Periodic route table monitoring

• Outbound traffic recovery

Public Subnet

Public SubnetPrivate Subnet

Private Subnet

AZ-2

AZ-1

NAT

Terminated

NATApp

NATApp

NAT

Running

tag:NetworkSegment NAT-A

tag:RoutingStatus OK

tag:NetworkSegment NAT-B

tag:RoutingStatus OK

tag:NetworkSegment NAT-B

Internet

0.0.0.0/0 Active

NAT

Running

tag:NetworkSegment NAT-A

New NAT Instance recovers

internet route.

0.0.0.0/0 Active

Periodic route table monitoring

Network capacity planning for NAT instances

• Need to consider total amount of outbound traffic coming

from application instances across Availability Zones

Auth

entication &

Pro

file

Managem

ent

Pla

tform

Implementation - motivation

Reproducible

Scalable

Highly available and fault tolerant

Secure and robust

Transparent

Source IP address whitelisting

• Without whitelisting

AWSEBLoadBalancerSecurityGroup

No Inbound Rules

App

App

App

x.x.x.1 x.x.x.6x.x.x.5

Applied by

Elastic Beanstalk

AWSEBLoadBalancer

Source IP address whitelisting

• With whitelisting

ip-whitelist-group1-1

HTTPS TCP 443 x.x.x.1/32 …

AWSEBLoadBalancerSecurityGroup

No Inbound Rules

ip-whitelist-group1-2

HTTPS TCP 443 x.x.x.2/32

ip-whitelist-group1-3

HTTPS TCP 443 x.x.x.3/32

ip-whitelist-group1-4

HTTPS TCP 443 x.x.x.4/32

Configuration

files

tag:IPWhitelistGroup DefaultGroup

tag:IPWhitelistGroup Group1

tag:IPWhitelistGroup Group1

App

App

App

x.x.x.1 x.x.x.6

Rules

Rules

Rules

Rules

x.x.x.5

Applied via script

SecurityGroups

Max 200 (4*50) rules are available

AWSEBLoadBalancer

Add rules

via script

Source IP address whitelisting

• Tagging built-in resources via .ebextensionsResources:

AWSEBLoadBalancer:

Type: AWS::ElasticLoadBalancing::LoadBalancer

Properties:

Tags:

- { Key: IPWhitelistGroup, Value: Group1 }

AWSEBLoadBalancerSecurityGroup:

Type: AWS::EC2::SecurityGroup

Properties:

GroupDescription: "Load Balancer Security Group"

VpcId: { "Fn::GetOptionSetting" : { "OptionName" : "VPCId" } }

Tags:

- { Key: IPWhitelistGroup, Value: DefaultGroup }

Source IP address whitelisting

Fill required properties in security group for ELB

via .ebextensionsResources:

AWSEBLoadBalancer:

Type: AWS::ElasticLoadBalancing::LoadBalancer

Properties:

Tags:

- { Key: IPWhitelistGroup, Value: Group1 }

AWSEBLoadBalancerSecurityGroup:

Type: AWS::EC2::SecurityGroup

Properties:

GroupDescription: "Load Balancer Security Group"

VpcId: { "Fn::GetOptionSetting" : { "OptionName" : "VPCId" } }

Tags:

- { Key: IPWhitelistGroup, Value: DefaultGroup }

Specifying GroupDescription and VpcId is also required

in order to modify AWSEBLoadBalancerSecurityGroup

resource via .ebextensions.

Connection/request throttling

• Throttling per client (source IP address)

Amazon Linux

Docker Container

App

APIs

Internal

Service

External

Services

Over Limit

Over Limit

Third party

Authentication

Services

Internal

Service

Connection/request throttling

• Throttling per remote user (internal service)

Amazon Linux

Docker Container

External

ServicesOver Limit

Over Limit

Internal

Service

App

APIs

Third party

Authentication

Services

Connection/request throttling

• nginx configuration file installation via .ebextensionsfiles:

"/etc/nginx/throttling/limit-zone-def.conf":

mode: "000644"

owner: root

group: root

content: |

# include in http context

limit_conn_zone $http_x_forwarded_for zone=conn_perclient:10m;

limit_conn_zone $hostname zone=conn_total:1m;

limit_conn_status 429;

limit_req_zone $remote_user zone=req_perservice:10m rate=150r/s;

limit_req_zone $hostname zone=req_total:1m rate=200r/s;

limit_req_status 429;

Connection/request throttling

• nginx configuration file installation via .ebextensionsfiles:

"/etc/nginx/throttling/limit-per.conf":

mode: "000644"

owner: root

group: root

content: |

# include in location context

limit_conn conn_perclient 75;

limit_req zone=req_perservice burst=300 nodelay;

Connection/request throttling

• nginx configuration file installation via .ebextensionsfiles:

"/etc/nginx/throttling/limit-total.conf":

mode: "000644"

owner: root

group: root

content: |

# include in location context

limit_conn conn_total 300;

limit_req zone=req_total burst=400 nodelay;

Connection/request throttling

• nginx configuration script (.ebextensions/nginx-conf.sh)#!/bin/bash

EB_CONFIG_HTTP_PORT=$(/opt/elasticbeanstalk/bin/get-config container -k instance_port)

cat > /etc/nginx/sites-available/nginx-docker-proxy.conf <<EOF

...

include throttling/limit-zone-def.conf;

server {

listen $EB_CONFIG_HTTP_PORT;

location / {

...

include throttling/limit-per.conf;

include throttling/limit-total.conf;

}

location ~ /.+?/status {

...

include throttling/limit-per.conf;

}

}

EOF

rm -f /etc/nginx/sites-enabled/*

ln -sf /etc/nginx/sites-available/nginx-docker-proxy.conf /etc/nginx/sites-enabled/

Connection/request throttling

• nginx configuration via .ebextensionscontainer_commands:

nginx-conf-for-throttling:

command: 'bash .ebextensions/nginx-conf.sh'

Connection/request throttling

Tradeoff

Advantages taken from throttling

Low compatibility

External

Services

Internal

Services

Circuit Breaker

• Proxy object for each external service

Amazon Linux

Docker Container

App

Open

Closed

Closed

ClosedAPIs

Immediate failure

Third party

Authentication

Services

Auth

entication &

Pro

file

Managem

ent

Pla

tform

Implementation - motivation

Reproducible

Scalable

Highly available and fault tolerant

Secure and robust

Transparent

Comprehensive log monitoring

Cloud Watch,

Logs

SNS

S3

Lambda

Redshift

Targets Monitoring

Metrics

Notification / Communication

Log Analysis

Logs

Import

Logs,

Metrics

AppNAT

Comprehensive log monitoring

• LogGroup creation via .ebextensionsResources:

CWLSyslogMessagesLogGroup:

Type: "AWS::Logs::LogGroup"

DependsOn: AWSEBBeanstalkMetadata

Properties:

LogGroupName: { "Fn::Join" : [ "-", [ { "Ref" : "AWSEBEnvironmentName" },

"syslog-messages" ] ] }

RetentionInDays: 14

Comprehensive log monitoring

• CloudWatch Logs agent config file via .ebextensionsResources:

AWSEBAutoScalingGroup:

Metadata:

"AWS::CloudFormation::Init":

CWLogsAgentConfigSetup:

files:

"/tmp/cwlogs/conf.d/core-logs.conf":

content : |

[/var/log/messages]

file = /var/log/messages

log_group_name = `{ "Ref" : "CWLSyslogMessagesLogGroup" }`

log_stream_name = {instance_id}

datetime_format = %b %d %H:%M:%S

Notification / Communication

Searchable log retention

Cloud Watch,

Logs

SNS

S3

Lambda

Redshift

Targets Monitoring

Metrics

Log Analysis

Import

Logs,

Metrics

AppNAT

Logs

Notification / Communication

Searchable log retention

Cloud Watch,

Logs

SNS

S3

Lambda

Redshift

Targets Monitoring

Metrics

Log Analysis

Import

Logs,

Metrics

AppNAT

flush_interval 60s

flush_at_shutdown true

Logs

Searchable log retention

• td-agent configuration via .ebextensionsfiles:

"/etc/sysconfig/td-agent":

mode: "000644"

owner: root

group: root

content: |

# Run as root user

TD_AGENT_ARGS="/usr/sbin/td-agent --group td-agent --log /var/log/td-agent/td-agent.log --use-v1-config \

--suppress-repeated-stacktrace"

DAEMON_ARGS="--user root“

commands:

01-prepare-installer:

command: ... # Install td-agent installation script to /tmp/td-agent/install-td-agent-v2.sh

02-run-installer-td-agent:

command: bash /tmp/td-agent/install-td-agent-v2.sh

03-setup-configration:

command: ... # Configure log sources for td-agent

04-restart-td-agent:

command: service td-agent restart

Searchable log retention

• Enable ELB to upload access logs to Amazon S3Resources:

AWSEBLoadBalancer:

Type: AWS::ElasticLoadBalancing::LoadBalancer

Properties:

AccessLoggingPolicy:

S3BucketName: { "Fn::GetOptionSetting" : { "OptionName" : "LogsBucketName" } }

S3BucketPrefix: "elb"

Enabled: true

EmitInterval: 5 # minutes

Conclusion

Challenges and expectations

• Compatibility

• Ease of operation test

Trouble-less eight months in production with

Elastic Beanstalk

• FlexibilitySatisfy customization needs

• ReliabilityNo major problems

• SimplicitySimplified DevOps

Thank you!

Question and answer

Remember to complete

your evaluations!

Appendix

Sony open source software

• gobreaker

• Go implementation of circuit breaker

• Available on GitHub

• https://github.com/sony/gobreaker

• Feel free to submit pull requests and raise issues on the

GitHub project

Sony open source software

• Sonyflake

• Go implementation of distributed unique ID generator

• Available on GitHub

• https://github.com/sony/sonyflake

• Small utility for AWS (VPC) included

• Example running on EB provided

• Feel free to submit pull requests and raise issues on the

GitHub project

Articles

• Continuous Delivery with Golang and Docker

• https://circleci.com/stories/sony

References

• Advanced network automation

• (ARC401) Black-Belt Networking for the Cloud Ninja | AWS

re:Invent 2014

• Docker container log rotation

• https://github.com/docker/docker/issues/7333

• https://docs.docker.com/reference/logging/overview/

Auto Scaling designScale out timing chart

Execute Policy

Running

In ServiceOut of Service

App Startup

ELB Determination

Health Check Grace Period

Deployment

In Service Dead Line Resume Auto Scaling

EC2 State

ELB Instance State

Cooldown Period (scale out policy)

Register Instance

Pending

Auto Scaling

Timers

* in the case of HealthCheckType: ELB

Auto Scaling designScale out timing parameters

Execute Policy

Running

In ServiceOut of Service

App Startup

45 ELB Determination

HealthCheck Interval x HealthyThreshold

Health Check Grace Period 600

Deployment

In Service Dead Line Resume Auto Scaling

Margin 300

Margin for

Balancing & Metric

EC2 State

ELB Instance State

Cooldown Period (scale out policy) 900

300 avg.

15 3

300

Register Instance

Pending

Auto Scaling

Timers

* in the case of HealthCheckType: ELB

Examples

• Elastic IP association via cloud-init#!/bin/bash

REGION=$1

EIP_ALLOCATION_ID=$2

INSTANCE_ID=$(curl --silent http://169.254.169.254/latest/meta-data/instance-id)

while true; do

INSTANCE_STATUS=$(aws --region "${REGION}" --output text \

ec2 describe-instance-status \

--instance-ids "${INSTANCE_ID}" \

--filters Name=instance-state-name,Values=running)

if [[ $? = 0 && "${INSTANCE_STATUS}" != "" ]]; then

aws --region "${REGION}" --output text \

ec2 associate-address --instance-id "${INSTANCE_ID}" \

--allocation-id "${EIP_ALLOCATION_ID}" && break

fi

sleep 5s

done

Examples

• Elastic IP association via cloud-init

• associate-address command fails if the instance is still in

pending state

• Need to wait for the instance to become running state before

executing associate-address command

Examples

• Connection draining

Keep accepting requests (10~20s)

ConnectionDrainingTimeout

Examples

• Connection draining via .ebextensionsoption_settings:

"aws:elb:policies":

ConnectionDrainingEnabled: true

ConnectionDrainingTimeout: 80 # 20 + 60 seconds

Examples

• Docker container log truncation#!/bin/sh

cidfile=$(/opt/elasticbeanstalk/bin/get-config container -k app_deploy_file)

[ ! -r "${cidfile}" ] && exit 0

cid=$(cat "${cidfile}")

scid=${cid::12}

dockerlog="/var/lib/docker/containers/${cid}/${cid}-json.log"

[ ! -w "${dockerlog}" ] && exit 0

# The eb-log file made by Elastic Beanstalk.

eblog="/var/log/eb-docker/containers/eb-current-app/${scid}-stdouterr.log"

# PID of docker logs command related to the Container-ID.

logspids=$(ps aux | grep "docker logs -f ${scid}" | grep -v grep | awk '{print $2}')

for logspid in ${logspids}

do

# Count FD of docker logs related to the eb-log file.

eblogfd=$(lsof -p ${logspid} | grep "${eblog}" | wc -l)

# Expect to be redirected stdout and stderr to the eb-log file.

[ ! ${eblogfd} -eq 2 ] && continue

# Now, can truncate the docker-log file.

cat /dev/null > ${dockerlog}

break

done

Examples

• Run ntpd in slew mode via .ebextensionsfiles:

"/etc/sysconfig/ntpd":

mode: "000644"

owner: root

group: root

content: |

OPTIONS="-g -x"

commands:

"ntpd-service-restart":

command:

service ntpd restart

Examples

• Scaling event notification via .ebextensionsResources:

AWSEBAutoScalingGroup:

Type: AWS::AutoScaling::AutoScalingGroup

Properties:

HealthCheckType: ELB

NotificationConfiguration:

TopicARN: { "Fn::GetOptionSetting" : { "OptionName" : “ASGTopicArn" } }

NotificationTypes:

- autoscaling:EC2_INSTANCE_LAUNCH

- autoscaling:EC2_INSTANCE_LAUNCH_ERROR

- autoscaling:EC2_INSTANCE_TERMINATE

- autoscaling:EC2_INSTANCE_TERMINATE_ERROR

Examples

• td-agent installation script#!/usr/bin/env bash

Enterprise Linux 7 (releasever is '7')

# add GPG key

rpm --import http://packages.treasuredata.com/GPG-KEY-td-agent

# add treasure data repository to yum

cat > /etc/yum.repos.d/td.repo <<EOF

[treasuredata]

name=TreasureData

baseurl=http://packages.treasuredata.com/2/redhat/7/\$basearch

gpgcheck=1

gpgkey=http://packages.treasuredata.com/GPG-KEY-td-agent

EOF

# install the toolbelt

yum install -y td-agent-2.1.5-1

# install plugins

/opt/td-agent/embedded/bin/fluent-gem install --no-document fluent-plugin-tail_path -v "=0.0.3"

/opt/td-agent/embedded/bin/fluent-gem install --no-document fluent-plugin-forest -v "=0.3.0"

/opt/td-agent/embedded/bin/fluent-gem install --no-document fluent-plugin-add -v "=0.0.3"

# this plugin will be no longer required in next td-agent version.

/opt/td-agent/embedded/bin/fluent-gem install --no-document fluent-plugin-s3 -v "=0.5.7"

# enable service

chkconfig td-agent on

top related