monitoring microservices: docker, mesos and kubernetes visibility at scale
Post on 06-Jan-2017
403 views
Embed Size (px)
TRANSCRIPT
Monitoring microservices: Docker, Mesos and
Kubernetes visibility at scale
Me
Alessandro Gallotta Software Engineer @sysdig
@alex_gallotta
@sysdig
Introducing Sysdig
Capture system events, filter them, run useful scripts Lua scripting Open Source Nice curses UI
lsof
nets
tat
tcpd
ump
htopps
stra
ce
and more
track user activity top files/processes/connections by cpu bytes
logs containers tracers you name it, we track it
Design Goals
Production-ready Simple lightweight
Rich data Natural workflow Native support for containers Native support for and more
Demo time
Containers are Great
Simple Scalable Isolated Service-oriented Elastic Flexible Separation of concerns
But Some Things Are Becoming More Complex
CacheWebserverDatabase
Legacy Monolitic App
But Some Things Are Becoming More Complex
Computing Node
Computing Node
Computing Node
Service1Service2Service3
Computing Node
Computing Node
Computing Node
Container-based App
But Some Things Are Becoming More Complex
Computing Node
Computing Node
Computing Node
Computing Node
Computing Node
Computing Node
Container-based App
Service1Service2Service3
But Things Are Becoming More ComplexComputing Node
Computing Node
Computing Node
Service1Service2Service3
Computing Node
Computing Node
Computing Node
Container-based App
Two Problems
Problem #1: How Do We Get Data Out of These Guys?
Computing Node
Computing Node
Computing Node
Service1Service2Service3
Computing Node
Computing Node
Computing Node
Container-based App
System Network Process JVM Response Time Requests Errors
Problem #2: How Do We Get Make Sense of the Data?
Computing Node
Computing Node
Computing Node
Service1Service2Service3
Computing Node
Computing Node
Computing Node
Container-based App
Complexity Calls for Great Monitoring
Isolated Automated Orchestration-aware Simple Scalable
The Orchestrated Version of This
Complexity Also Calls for Great Troubleshooting
Whats the network activity of my
Marathon group?
Whats using the CPU the Wordpress
task?
How the hell does my Mesos task
work?!
Wheres the bottleneck?Whats the response
time of my login service?
What transactions is my Redis service serving?
Hypervisor
How Do I Get Data Out of These Things: VMs
VM1 VM3 VM2
Hypervisor
Monitoring VMs, Option 1
VM1 VM3 VM2
Hypervisor-level instrumentation, Amazon CloudWatch
Hypervisor
Monitoring VMs, Option 2
VM1 VM3 VM2
Monitoring Agent
OS
Monitoring Containers
Container1 Container3 Container2
OS
Monitoring Containers, Option 1
Container1 Container3 Container2
Monitoring Agent
OS
Monitoring Containers, Option 1
Container1 Container3 Container2
Monitoring Agent
Not scalable Not composable Adds dependencies/size Kills the concept of one process per container
OS
Monitoring Containers, Option 2
Container1 Container3 Container2
Container runtime level monitoring Kernel-level instrumentation
OS
Monitoring Containers, Option 3
Container1 Monitoring Container
Container2
Sysdig Data Collection
Kernel
Container1
Docker
Container2
Docker
Container3
LXCAppApp
Sysdig Data Collection
Kernel
Container1
Docker
Container2
Docker
Container3
LXCAppApp
Instrumentation through kernel module
Sysdig Data Collection
Kernel
Container1
Docker
Container2
Docker
Container3
LXCAppApp
sysdig
Docker
Capture and analysis
Sky cloud is the limit
Correlate data Scale with your infrastructure Alerts, notifications, visualization tools Continuous data collection and retention from production systems
Sysdig Cloud
Sysdig evolution for the cloud Preserve the premises production ready natural workflow ease of use 0 to low config needed
Out of the box support
Demo time 2
How About Security?
Did someone log into one of our containers?
Has something been installed in
one of the containers?
Have we been hacked?Were configuration files
changed?
How About Security?
Did someone log into one of our containers?
Have we been hacked?Were configuration files
changed?
Has something been installed in
one of the containers?
An anomaly detection system built on top of the sysdig engine
Falco Architecture
Kernel
Container1
Docker
Container2
rkt
Container3
LXCAppApp
Rule system
Docker
File activity Network Activity User Activity Process execution IPC
Rules Examples
rule: shell_in_container desc: a shell running in a container condition: container.id != host and proc.name = bash output: Shell running in container (user=%user.name container_id=%container.id container_name=%container.name shell=%proc.name parent=%proc.pname) priority: WARNING
Rules Examples
rule: mysqld_spawn_process desc: mysqld spawning a new process after startup. condition: spawn_process and proc.name = mysqld and not proc_is_new output: mysqld spawned new process after startup (user=%user.name command=%proc.cmdline file=%fd.name) priority: WARNING
Rules Examples
macro: open_connection condition: syscall.type=connect and evt.dir=< and fd.sockfamily =ip
rule: system_binaries_network_activity desc: any network connection initiated by system binaries that are not expected to send or receive any network traffic condition: open_connection and proc.name in (ls, ps, mkdir, ) output: Known system binary made network connection (user=%user.name command=%proc.cmdline connection=%fd.name) priority: WARNING"
Thank You!www.sysdig.org
www.sysdig.org/falco
@alex_gallotta
@sysdig
github.com/draios
www.sysdig.com
http://www.sysdig.org/http://www.sysdig.org/falco/https://github.com/draios/sysdighttps://github.com/draios/sysdighttp://www.sysdig.com/