Service ExplosionHOW HOOTSUITE MANAGES ITSGROWING MICROSERVICE LANDSCAPE
Specialist Software Developer - Mobile Web and APIsAdam Arsenault
@Adam_Arsenault
● Road to SOA
● Service Graph
● Voltron
● Demo
● Lessons Learned
What We’ll Talk aboutApp
S2
S3
S4S1
● Founded in 2008
● PHP Monolith
● SOA started in 2013
● Hyper growth
● Continuous Integration
● ~20 services and counting
Road to SOA
“Organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations”- Melvin Conway
Engagement MobilePublisher Analytics
Platform Tools Labs
Stable Teams
KABOOM!!!!
Ex. 1 - Integration Failures
● Dev merges changes
● Changes go to staging
● Integration tests fail
● Release pipeline frozen
KABOOM!!!!
Ex. 2 - Production Downtime
● Service goes down in production
● On call / teams affected get notifications
● Sift through flood of notifications to figure out what’s broken
KABOOM!!!!
VISIBILITY
The Service Graph
The Service Graph
App
The Service Graph
App
S1 S2
The Service Graph
App
S2 S3 S4S1
The Service GraphApp
S2
S3
S4S1
The Service GraphApp
S2
S3
S4S1
S5S6
S7
The Service GraphApp
S2
S3
S4S1
S5S6
S7
The Service GraphApp
S2
S3
S4S1
S5S6
S7
The API
DependencySomething that a service needs to function properly.
Types:
1. Internal
2. Traversable
App
Cachedb S1
/status/aboutReturns metadata information about the service or app such as version, description, maintainers, links to documentation, and gets the status of each individual dependency.
/status/:dependencyA configured status endpoint at '/status/:dependency
Examples:● '/status/service-core'● '/status/db'
OK
/status/aggregateReturns the overall status by checking all registered status checks and giving a simple response.
Examples:● OK● CRIT - error message
OK
/status/traverseEnables service graph traversal and execution of an "action" at the last level of traversal.
App
S2
S3
S4S1
Usage
MonitorOK / CRIT
Debug
S1
db S2
Explore and LearnApp
S2
S3
S4S1
S5S6
S7
Document
Status of single machine Overall status of application and services
Alerts / notifications Troubleshoot by drilling down
Monitoring Strategy
Technologies
Architecture
Browser 1 Browser N...
Play App
Status Poller Actor
WS Actor 1
WS Actor N
...
DEMO
“When there is a production issue, I see lots of people go to Voltron to perform diagnostics on what might be wrong”Geordie Henderson - VP Software Development
“Voltron is often the first to tell us when snowflake is down”Brandon Okert - Junior Software Developer Publisher
“When a critical service goes down, everything starts alerting and reporting problems, but Voltron gets through the noise by letting you drill down”Michael Reid - Senior Software Developer Platform
“We suspected the connection between dashboard and Billing Service was broken, but Voltron told us the communication channel was okay.”Martin Jung - Software Developer Mobile Web and APIs
Lessons Learned
• Productivity
• Happiness
Visibility Empowers
• Automate all the things
• Identify problems early and fix
• 10x factor
SOA Tools Early
• Standardize
• Add to a service framework
• Share common status checks
Make Checking Status Easy
• Synchronized views
• Performance
Websockets for Real time
Future Work
Analytics
Real Time Graph View
Open Source
Thank you!Questions?
Specialist Software Developer - Mobile Web and APIs@Adam_Arsenault
Adam Arsenault