redundant devops

Redundant Devopsabout reinventing the wheel

Szabolcs Szabolcsi-Toth @_Nec

Senior Engineer

JSConf Budapest 2017

Curator, Organizer

What’s this about?

Metrics Error Logs Logging Secret Store Service Discovery Process Supervision Running Programs Connecting Services

Metrics

Metrics are • time bound, historical

• numeric data

• software, network or hardware property

Metrics are great! • see trends

• mark releases

• notice anomalies: spikes & gaps

• create alerts

!

?

Metric delivery • collect (scrape) or push data?

• collect periodically

• put metric data where it can be collected

Tools for metrics

• prometheus

• graphite

Node best practices • put your metrics on an accessible endpoint /metrics /status

• there are node libs to automate this instrument http

• let the metrics tool do scraping, delivery

• watch those nice graphs ☺check out grafana

Key metrics • latency check for slow queries, create performance tests on them iterate code, re-test againdo not average, use a histogram

• resource usageslow memory leaks disk is getting full predict resource shortage via trends

latency

Sending metrics is not the job of your app

Error logs

Catch errors as fast as possible! • instant alert of production errors

• use while feature testing

• keep an eye on it during releases

• aggregate errors in a single service, see all

• catch before the user

Ideal error reports have• environment of error

build / release / branch / server

• stack trace exact code location

• custom data anything that helps identifying the problem

Error log delivery• can happen any time,

hopefully rare

• push data

• expect the unexpected,handle the unhandled

• never log secrets

• sampling, throttling, timeoutdo not let error logging itselfkill your app

Tools & services for error reporting• airbrake

• errbit (airbrake api, open source)

• sentry

• raygun

• rollbar

• …

Integrate, get notified! • pagerduty

• slack / hipchat chatops - resolve, react within your chat

Logging

Logging vs Error logging • logging is anticipated

• error logs are occasional

Log levels

Log levels, recap • fatal - needs instant intervention, see error logs

• error - inform the user, see error logs

• warn - escalate if happens again

• info - just a step in a regular flow

• debug - full of lines, and traces

Benefits of logging, custom logs • debug

• custom events

• tracking the usage and behaviour of app

• profile, AB test, product development

Logging in node • console.log

• bragi

• debug

• npmlog

• winston

Logging in node - general • has timestamps

• has loglevels

• can be routed to stdout/stderr

• can be formatted

• create or use Correlation ID

Correlation ID quick quidecID

cID

cIDcID

cID

cIDcID

cID

cID

services

logs

Best practices • just put it to stdout

(docker & kubernetes clearly ecourages this)

• let the log collector handle it

• pipe stdout to a file, or whatever you like

• able to set to debug mode runtime use signals

• never log secrets

Log collectors • fluentd

• logstash

• syslog-ng

• rsyslog

A good log collector should • read from stdout / file tail

• use your correlation ID

• remove the burden of transferring your logs

Remote logging • Stackdriver (fluentd based)

• Elasticsearch (fluentd based)

Sending logs is not the job of your app

Secret Store

Secrets • passwords / usernames

• db names

• API keys

• private keys

NOT Secret Storage × source code

× private VCS repositories

× config files

× simple database fields

× ENV variables

Benefits • ACL, policies

access set of secrets by revokeable tokens

• centralized key rotationedit, update all secrets at one place

• single use access, n-use access

• time bound keys

• audit log

• runtime accessno secrets stored on disk

• build-time access

How it works

build server

app server

Secret Store

Build time Run time

Version Control

secret/name secret/name

secrets built in the deployed code

secrets were requested on app startup, stored onlyin memory

- token- secret/name

- actual secrets

- token- secret/name

- actual secrets

Secret store server • powerful encryption

• has to be unlocked on start

• secrets are totally inaccessible without unlocking

Secret store services • HashiCorp Vault

• Amazon KMS

• Docker Swarm

• Keywhiz

Never store your secrets in your source code

Service Discovery

Service discovery can help • Service Registration

and notify other services of the registered one

• Service Discoverysearching for services?

• Monitoring is a service active and responding?

• Load Balancing direct traffic to the new service

How it works • can act like a DNS

simple usecaseinternal network

• can write / create configsmore complexmore control

How it works

APP

SD AGENT check PORTcheck PID

LBStart scraping metrics

Loadbalancer directstraffic

Service registry

Service discovery agent • separate task, job, process

• can be configured what to check

• independent of your app

Service discovery services • Apache Zookeeper

• Netflix Eureka

• HashiCorp Consul

• Doozer

• Etcd (can be used to build service discovery)

Registering services is not the job of your app

Process Supervision

Process supervision • keeping your app working

• based on some property you definenot just process id, butportpinghttp response

• can fail after trying

Process supervision in Node-land • PM2

• forever

Process supervision in general • monit

manage any processsmall footprint simple

Pro Con

UsingMonit

Not usingMonit

monit can instantly restart your failing

service

you might not know why it was failing

MTTR* can be relatively high

you can debug what actually

happened

*Mean Time To Repair

Running Programs

Simple role • start & stop your app

watch the process itself handle process state

• send signals to the appsignals can be interpreted as tasks

Running Programs in general • runit

• upstart

• systemd

• Supervisord

• God

• Circus

A good program runner • distribution independent

you can migrate your scripts any time

• easy to config

monit + runit (or similar) • avoid using auto restart in both

can create weird race conditions, they do not know about each other

• use runit to configure app start/stop

• let monit decide when to restart & use runit

Connecting Services

Goals & benefits • decoupling

separate services loosen up the connection between them

• scalingscale up easily when needed scale down after

HTTP based APIs

vs

Message Queues

HTTP based APIs

LOA

D B

ALA

NC

ER

Service “1” Service “2”

Message Queues

Service “1” Service “2”

MESSAGE QUEUE

HTTP based APIs or

Message Queues?

It depends

HTTP APIs

• async / sync • remote • open API

Msg Queues

• async (usually) • grouped, close • low latency

Lessons learned

Prototype & learn

• use whatever modules and services you like

• get ready to go to live & production environments

• get ready to scale easily

Focus your app

• your app should do it’s job!

• not sending logs, metrics, notifying service registries or keeping itself running

• keep it simple

Talk to your ops

• they are here to run your app

• can help you a lot

• get on a common ground

• ask the right questions

With many thanks to

Peter Wilcsinszky / @pepov

Ferenc Kovacs / @Tyr43l

https://twitter.com/pepov

https://twitter.com/Tyr43l

Let’s talk! :)

Find me around here, or come visit us in 2 weeks!

JSConf Budapest 2017

Thank you

redundant devops

Engineering