how hootsuite manages its growing microservice landscape - adam arsenault

Post on 07-Jan-2017

457 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Service ExplosionHOW HOOTSUITE MANAGES ITSGROWING MICROSERVICE LANDSCAPE

Specialist Software Developer - Mobile Web and APIsAdam Arsenault

@Adam_Arsenault

● Road to SOA

● Service Graph

● Voltron

● Demo

● Lessons Learned

What We’ll Talk aboutApp

S2

S3

S4S1

● Founded in 2008

● PHP Monolith

● SOA started in 2013

● Hyper growth

● Continuous Integration

● ~20 services and counting

Road to SOA

“Organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations”- Melvin Conway

Engagement MobilePublisher Analytics

Platform Tools Labs

Stable Teams

KABOOM!!!!

Ex. 1 - Integration Failures

● Dev merges changes

● Changes go to staging

● Integration tests fail

● Release pipeline frozen

KABOOM!!!!

Ex. 2 - Production Downtime

● Service goes down in production

● On call / teams affected get notifications

● Sift through flood of notifications to figure out what’s broken

KABOOM!!!!

VISIBILITY

The Service Graph

The Service Graph

App

The Service Graph

App

S1 S2

The Service Graph

App

S2 S3 S4S1

The Service GraphApp

S2

S3

S4S1

The Service GraphApp

S2

S3

S4S1

S5S6

S7

The Service GraphApp

S2

S3

S4S1

S5S6

S7

The Service GraphApp

S2

S3

S4S1

S5S6

S7

The API

DependencySomething that a service needs to function properly.

Types:

1. Internal

2. Traversable

App

Cachedb S1

/status/aboutReturns metadata information about the service or app such as version, description, maintainers, links to documentation, and gets the status of each individual dependency.

/status/:dependencyA configured status endpoint at '/status/:dependency

Examples:● '/status/service-core'● '/status/db'

OK

/status/aggregateReturns the overall status by checking all registered status checks and giving a simple response.

Examples:● OK● CRIT - error message

OK

/status/traverseEnables service graph traversal and execution of an "action" at the last level of traversal.

App

S2

S3

S4S1

Usage

MonitorOK / CRIT

Debug

S1

db S2

Explore and LearnApp

S2

S3

S4S1

S5S6

S7

Document

Status of single machine Overall status of application and services

Alerts / notifications Troubleshoot by drilling down

Monitoring Strategy

Technologies

Architecture

Browser 1 Browser N...

Play App

Status Poller Actor

WS Actor 1

WS Actor N

...

DEMO

“When there is a production issue, I see lots of people go to Voltron to perform diagnostics on what might be wrong”Geordie Henderson - VP Software Development

“Voltron is often the first to tell us when snowflake is down”Brandon Okert - Junior Software Developer Publisher

“When a critical service goes down, everything starts alerting and reporting problems, but Voltron gets through the noise by letting you drill down”Michael Reid - Senior Software Developer Platform

“We suspected the connection between dashboard and Billing Service was broken, but Voltron told us the communication channel was okay.”Martin Jung - Software Developer Mobile Web and APIs

Lessons Learned

• Productivity

• Happiness

Visibility Empowers

• Automate all the things

• Identify problems early and fix

• 10x factor

SOA Tools Early

• Standardize

• Add to a service framework

• Share common status checks

Make Checking Status Easy

• Synchronized views

• Performance

Websockets for Real time

Future Work

Analytics

Real Time Graph View

Open Source

Thank you!Questions?

Specialist Software Developer - Mobile Web and APIs@Adam_Arsenault

Adam Arsenault

top related