gaming dev ops - eduardo saito

Post on 08-May-2015

267 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Alert workflow in Gaming DevOps

Eduardo Saito Director of Engineering - Server Operations GREE International November 2013

Traditional Alert workflow

NOC

Ops

Dev

SME (Network, DBA,…)

Traditional Alert workflow

NOC

Ops

Dev

SME (Network, DBA,…)

Alert workflow – previous

Critical

Alert workflow – previous

Ops Dev

Critical

Alert workflow – previous

Ops Dev

Critical

Ops: where’s the runbook for this? Ops: app bug or system issue?

Ops: who’s the devel of this game? Phone #?

Ops: I can’t find the developer… who’s his manager?

Critical

Non- Critical

Alert workflow 2.0

Ops Dev

Critical

Ops: where’s the runbook for this? Ops: app bug or system issue?

Ops: who’s the devel of this game? Phone #?

Ops: I can’t find the developer… who’s his manager?

Alert Workflow 3.0 - current

Ops

Dev, Project X, Server

Alert Workflow 3.0 - current

Ops

Dev, Project X, Server

Dev, Project Y, Client, Android Dev, … Each alert go directly to

the right team that can resolve it !

Alerts go to the person that can resolve

Type Scope Checked by Who to page?

ELB Load balancer health-check

ELB No one – email alert only

System-level Check cpu / disk / memory / network

Pingdom / Nagios

Ops team

App-level Application issues / bugs

Pingdom Dev and Ops teams

Alerts go to the person that can resolve

Type Scope Checked by Who to page?

ELB Load balancer health-check

ELB No one – email alert only

System-level Check cpu / disk / memory / network

Pingdom / Nagios

Ops team

App-level Application issues / bugs

Pingdom Dev and Ops teams

Alerts go to the person that can resolve

Type Scope Checked by Who to page?

ELB Load balancer health-check

ELB No one – email alert only

System-level Check cpu / disk / memory / network

Pingdom / Nagios

Ops team

App-level Application issues / bugs

Pingdom Dev and Ops teams

Alerts go to the person that can resolve

Type Scope Checked by Who to page?

ELB Load balancer health-check

ELB No one – email alert only

System-level Check cpu / disk / memory / network

Pingdom / Nagios

Ops team

App-level Application issues / bugs

Pingdom Dev and Ops teams

App-level alerts can be triggered by issues in:

•  Server-side •  Client-side

•  iOS •  Android

Dev and Ops are responsible

Team On-call

Ops 8

Dev 32, from 20 games (Server-side or client-side Android or iOS)

Analytics 5

Big display dashboard = quick status

Big display dashboard = quick status

IM Bot = better communication

Skype Bot informs in the

game channel that an alert was

triggered

Ops and Dev receive the alert, and

troubleshoot

IM Bot = better communication

Skype Bot detects issue is resolved

and send all-clear

IM Bot = better communication

Thank You!

eduardo.saito@gree.net We’re hiring! Vancouver and San Francisco http://gree-corp.com/jobs

top related