performance tuningfiles.informatandm.com/uploads/2018/10/performance... · 2018-10-15 ·...
TRANSCRIPT
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Brief Intro
Who am I?
What do I do now?
What did I do previously?
How did I get in here?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload
What is your biggest constraint?
CPU? Disk? Memory? Network?
How do you find out if you don't know
already?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload
What is your biggest constraint?
“I don’t know.”
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload
What is your biggest constraint?
“I don’t know.”
So how can we determine that?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload
What is your biggest constraint?
“I don’t know.”
So how can we determine that?
We could estimate on the back of an
envelope...
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload
What is your biggest constraint?
“I don’t know.”
So how can we determine that?
We could estimate on the back of an
envelope…
”So it’s a database, I know it’s going to be
memory hungry, so I’ll go with the largest
x1e instance type I can find, I don’t care
how much I spend.”
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload
No one here has ever been asked if they can decrease their AWS spend.
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload
Better:
“OK, back of the envelope math won’t cut it. Let’s do some benchmarking with artificial load similar to what we think we’ll be seeing in the near future.”
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload
Best:
“OK, artificial load tests aren’t cutting it. Can we test with our actual workload?”
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload
So what do I do if I already know my biggest bottleneck is CPU / Disk / Memory / Network?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: CPU
CPU bound?
What are some symptoms of this?
“I moved from a C5.xl (4 vCPUs) to a C5.2xl (8 vCPUs) and performance improved!”
“mpstat -P ALL shows %idle below 5% per core”
“CPU utilization is always high and my developers tell me it's not a bug!”
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: CPU
Quick note - “So what does an AWS vCPU actually equate to anyway?”
– Not a discrete CPU core
– Deducing the number of actual cores from the number of hyperthreads
– Does it matter?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: CPU
A quick comparison of C5 instance types:
So what's the difference between a c5 and a c5.18xl?
2 vCPUs vs 72 vCPUs
4 GB of RAM vs 144 GB of RAM
“Maybe” 10 Gpbs vs 25 Gbps
$62 a month vs $2,529 a month
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: CPU
What is a “burstable” CPU and why should I care?
– Saving money when you're only CPU bound some of the time
– CPU credit mechanism
– What happens when I run out of CPU credits?
– Monitoring CPU credit usage
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Disk
Disk bound?
What are some symptoms of this?
“I keep running out of disk space”
“My storage isn't fast enough!”
“I need (x) IOPs and I'm not getting them!”
“My iowait is really, really high”
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Disk
A quick summary of the available storage types on AWS:
Instance store aka ephemeral aka “I hate my data”
EBS – magnetic, general purpose (SSD) and provisioned IOPs (I hate money)
EFS – Replacing NFS, for when you need to share data across multiple instances
S3 – Larger, slower, cheaper, and why object store based storage is not to be
mistaken for block based storage (S3FS is not your friend)
Glacier – Cheaper S3 storage until you need to read it
Storage Gateway
(You should probably be using AWS EFS unless using it for on-prem)
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Disk
So what is instance store storage anyway and why is it terrible?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Disk
So what is instance store storage anyway and why is it terrible?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Disk
So what is instance store storage anyway and why is it terrible?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Disk
So what is instance store storage anyway and why is it terrible?
Local, non-redundant storage, either magnetic hard drives, or SSD / NVMe SSD
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Disk
So what is instance store storage anyway and why is it terrible?
Local, non-redundant storage, either magnetic hard drives, or SSD / NVMe SSD
“That doesn’t sound… So terrible. Why the fuss?”
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Disk
So what is instance store storage anyway and why is it terrible?
“Why shouldn't I trust it? It's so cheap!”
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Disk
So what is instance store storage anyway and why is it terrible?
“I hate my data and I don't care about it.”
• The perfect use case for instance store!
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Disk
S3 looks really cheap!
So this S3FS thing lets me mount a S3 bucket as a local volume! This is awesome!
“Hey, I ran fsck on my S3FS volume and now all of my data is gone, can you help me?”
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Disk
So S3 is object based storage and EBS and instance store are block based – how does
that matter in my use case anyway? Why should I care?
Advantages of object based storage
Why object based storage isn't a replacement for block based storage
Is there a good use case for object based storage?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Disk
EBS – incredibly redundant, multiple copies of your data on multiple servers
EBS – Has its own network pipe, doesn't fight the rest of your network traffic
EBS snapshots – my life is saved!
EBS – Larger, faster volumes
EBS elastic volumes (post 2017) – scaling on the fly!
EBS storage types – magnetic, General Purpose (SSD), Provisioned IOPs, Throughput
Optimized and Cold HDD
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Disk
EBS – How redundant is it anyway? How safe is my data?
What does five nines (99.999%) of availability actually equate to?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Disk
EBS – Why should I go with magnetic storage in this day and age?
Previous generation magnetic vs newer Throughput Optimized HDD or Cold HDD
Cost is a big factor for us! IOPs, less so.
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Disk
EBS – So my DBA keeps telling me we're getting choked by a lack of IOPs – are
provisioned IOPs really worth it?
Is money an object?
– No? Provisioned IOPs all day, every day.
– Yes! Time to profile SSD based volume performance and EBS optimized
instance types
– Maybe? Switching volume types on the fly if you have the right instance
type (C1, C3, CC2, CR1, G2, I2, M1, M3, and R3)
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Disk
EBS – Is it worth it to go with a EBS optimized instance type?
“I know exactly how many IOPs I need and I'm not getting them right now.” YES
“I know how many IOPS I need, I'm not getting them and I'm not quite ready to set my
money on fire with provisioned IOPs yet.” YES.
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Memory
Memory bound?
Symptoms?
“I've been all over this app and it's not a memory leak!”
“Performance increases as we increase the amount of available RAM”
“We enabled swapping to survive and it's terrible”
“CPU utilization isn't terribly high”
“The iowait percentage is reasonably low”
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Network
Network bound?
Symptoms?
“I've looked at iftop / vnstat / iptraf and it's scary”
“We moved from a C4 to a C5 and stopped dropping packets”
“We've researched moving to a CDN”
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Network
Network bound?
Latency vs jitter vs packet loss:
Latency
Jitter
Packet loss
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Network
Network bound?
Can you use a CDN to solve these problems?
When is a CDN the appropriate solution?
– Global vs local user base
– Large amounts of static resources
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload: Network
Network bound?
Enhanced AWS networking and you
Verifying cpu0 is not handling 100% of your network load while the other cores site idle
Intra-VPC bandwith vs AZ to AZ bandwidth vs region to region bandwidth
When you'd like to exceed the speed of light but are unable to
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload
What is the worst acceptable performance for your workload?
Who defines this? Users? Internal demands? Workload demands?
What are the consequences for poor performance?
– User base loss– Financial loss– Loss of self respect /s
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Profiling Your Workload
Why go with newer instance types?
Migrating from a m4.xl to a m5.xl
I saved money and performance increased, what now?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Auto Scaling
What would happen if you didn’t have auto scaling?
– Upset customers?
– Lost sales?
– Lost reputation?
Consumers are fickle!
Worst case scenario?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Auto Scaling
What would happen if you didn’t have auto scaling?
Performance requires having a service that is available to begin with!
In general, customers can be the biggest impact on performance.
– Too few vs what you estimated and you over-pay for resources
– Too many vs what you estimated and you suffer downtime
– Auto scaling provides a buffer when demand cannot be easily forecast
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Auto Scaling
Avoiding churn
• What is churn anyway?
• Why should I care?
Keeping the bills low
• Billing alerts
• A quick note on instance reservations
• A quick note on profiling cost with Cost Explorer
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Auto Scaling
Setting appropriate scaling thresholds
– How much work does an instance complete before being
terminated?
– Scale up fast, scale down slow
– Time between launch and the instance doing usable work
– Why maintaining a pool of hot spares is a terrible idea
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Auto Scaling
Handling spikes in load
– Profiling future events based on past performance
– Do you know when your load will spike?
– Strongly coupled or decoupled?
• Decoupling storage from compute
• Can my workload be decoupled?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Selecting An Instance Family
What AWS instance family should I go with?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Virtualization types on AWS
Prehistory – Entirely virtualized
History – Paravirtualized
Today – HVM and PV-HVM
Devices and SR-IOV
The Future – KVM replacing Xen
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Virtualization types on AWS
Prehistory:
Entirely virtualized – CPU, memory, storage, network
Upside? No longer tied to the hardware
Downside? 100% virtualized hardware is slow!
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Virtualization types on AWS
History:
Paravirtualized – The guest OS knows the truth!
What are paravirtualized drivers?
What makes them faster?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Virtualization types on AWS
Today:
HVM and PV-HVM – CPU support for virtualization
What kind of speed increase?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Virtualization types on AWS
Today:
Devices and SR-IOV – PCI-E hardware virtualization support
What is “Single Root I/O Virtualization” anyway?
Faster network adapters
Faster (PCI-E) SSD storage - NVMe
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Virtualization types on AWS
The Future:
KVM replacing Xen?
What's this “Nitro” architecture anyway?
So what's managing the storage and network layers then?
Bare metal instances on AWS? What year is this?!
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Measuring Performance
Calculating efficiency
– Y axis: Cost of compute vCPU per hr
– X axis: Cost of per GB memory per hr
– Ideal solution: Uppermost right corner
– Not one size fits all – “What about network / disk performance?”
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Measuring Performance
Performance to Spend ratio
Performance to Workload completed
Performance per instance?
Performance vs instance type vs instance size
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Measuring Performance
Performance to Spend ratio
How fast can I get my work done if I don't care about money?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Measuring Performance
Performance to Workload completed
– The metric you really care about!
– Diminishing returns on performance tuning
– When is “good enough” is actually enough?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Measuring Performance
Performance per instance?
– Across instances with the same workload, does performance differ?
– What percentage performance difference is enough to be concerned
about?
– What outliers can tell us
– When to investigate vs when to just terminate the instance and move
on?
– ASG churn – beware!
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Measuring Performance
Performance vs instance type vs instance size
• Not just scaling out, but scaling up
• Do all workloads respond positively to scaling up?
• Scaling instance sizes down when appropriate
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
What kind of performance increase can I see from kernel tuning?
Quick overview of virtual file systems and kernel space vs user space
How do I tune things?
Reading current values
Setting single values
Setting groups of values – tuned to the rescue
How do I make these changes permanent?
What is context switching and why is it bad?
What things can be tuned?
OK, wow, that's a lot of settings – which should I pay attention to?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
What kind of performance increase can I see from kernel tuning?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
Depending on your monthly AWS spend, it might be worth it
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
Linux (UNIX) philosophy of “everything is a file”
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
“The whole point with "everything is a file" is not that you have some
random filename..., but the fact that you can use common
tools to operate on different things.”
-Linus Torvalds
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
“I'm always right. This time I'm just even more right than usual.”
-Also Linus Torvalds
https://www.mail-archive.com/[email protected]/msg83284.html
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
A very convenient way to convey information about the
current state of processes and the kernel itself.
root@here:/proc# cat 4/task/4/statusName: bashState: S (sleeping)Tgid: 4Pid: 4PPid: 3TracerPid: 0Uid: 1000 1000 1000 1000Gid: 1000 1000 1000 1000FDSize: 4Groups:VmPeak: 0 kBVmSize: 16352 kBVmLck: 0 kBVmHWM: 0 kBVmRSS: 3728 kBVmData: 0 kBVmStk: 0 kBVmExe: 956 kBVmLib: 0 kBVmPTE: 0 kBThreads: 1SigQ: 0/0SigPnd: 0000000000000000ShdPnd: 0000000000000000SigBlk: 0000000000000000SigIgn: 0000000000000000SigCgt: 0000000000000000CapInh: 0000000000000000CapPrm: 0000000000000000CapEff: 0000000000000000CapBnd: 0000001fffffffffCpus_allowed: 00000001Cpus_allowed_list: 0Mems_allowed: 1Mems_allowed_list: 0voluntary_ctxt_switches: 150nonvoluntary_ctxt_switches: 545root@here:/proc#
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
How do I read the current value of a setting?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
Old fashioned:
root@here:~# cat /proc/sys/vm/swappiness60root@here:~#
root@here:~# sysctl vm.swappinessvm.swappiness = 60root@here:~#
Newer and improved with sysctl:
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
OK, enough already – how do I tune something?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
Old fashioned:
root@here:~# cat /proc/sys/vm/swappiness60root@here:~#root@here:~# echo "0" > /proc/sys/vm/swappinessroot@here:~#root@here:~# cat /proc/sys/vm/swappiness0root@here:~#
root@here:~# sysctl -w vm.swappiness=0vm.swappiness = 0root@here:~#
Newer and improved with sysctl:
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
Changing a single setting on the fly is fine when experimenting, but how do I
get my changes to survive on reboot?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
Making that change permanent
root@here:~# vim /etc/sysctl.conf...
# Do not send ICMP redirects (we are not a router)#net.ipv4.conf.all.send_redirects = 0## Do not accept IP source route packets (we are not a router)#net.ipv4.conf.all.accept_source_route = 0#net.ipv6.conf.all.accept_source_route = 0## Log Martian Packets#net.ipv4.conf.all.log_martians = 1## Minimizing the amount of swappingvm.swappiness = 20vm.dirty_ratio = 80vm.dirty_background_ratio = 5 ...
add changes, save and quit :x...load the sysctl.conf file...root@here:~# sysctl -p
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
So I have a lot of things I want to change, entire groups of things – this is painful!
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
So I have a lot of things I want to change, entire groups – this is painful!
Tuned to the rescue!
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
Using Tuned profiles
Dynamic tuning? Tell me more!
Note: Tuned settings take priority over all other saved settings
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
Dynamic tuning? Tell me more!
Note: Tuned settings take priority over all other saved settings
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
So what things can be tuned?
root@here:~# sysctl -afs.binfmt_misc.status = enabledfs.binfmt_misc.WSLInterop = enabledfs.binfmt_misc.WSLInterop = interpreter /initfs.binfmt_misc.WSLInterop = flags:fs.binfmt_misc.WSLInterop = offset 0fs.binfmt_misc.WSLInterop = magic 4d5afs.inotify.max_queued_events = 16384fs.inotify.max_user_instances = 128fs.inotify.max_user_watches = 8192kernel.cap_last_cap = 36kernel.domainname = localdomainkernel.hostname = herekernel.keys.root_maxkeys = 1000000kernel.ostype = Linuxkernel.overflowgid = 65534kernel.overflowuid = 65534kernel.ngroups_max = 65536kernel.pid_max = 32768kernel.random.entropy_avail = 4096kernel.random.poolsize = 4096kernel.randomize_va_space = 2kernel.sem = 32000 1024000000 500 32000kernel.threads-max = 32768kernel.shmmax = 4294967295kernel.shmmni = 4096kernel.yama.ptrace_scope = 1...root@here:~#
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
OK. That's a big list! Which ones should I look at in particular?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
CPU tunables
– Scheduler class
– Priorities
– Migration latency
– Tasksets and using them when you start digging
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
Memory tunables
– Virtual memory
– Swappiness
– Overcommit
– OOM behavior and why OOM killer really is your friend
– “Huge pages – are they worth it?” Or “Wow, that made it worse”
– NUMA balancing on larger instances (8 or more vCPUs)
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
File system tunables
– Page cache flushing behavior
– Things to tune for a given filesystem
– Vm.dirty ratio and why it matters a lot
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
Storage and I/O tunables
– Read ahead size
– In-flight requests
– I/O scheduler – be careful!
– Volume stripe width – (when using magnetic storage)
– Md chunk size and stripe width (magnetic specific)
– Improving SSD performance with the noop scheduler
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
Networking
– TCP buffer sizes
– TCP backlog
– Device backlog
– TCP reuse (careful on this one!)
– Net.ipv4 tunables
– Net.ipv6 tunables
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Kernel Tuning
Hypervisor
– Using HVM, not PV, right?
– Kernel clocksource – which is right for your use case?
– Is clock drift an actual problem for me?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Monitoring Performance Changes
You have a dashboard, right?
What things should I be watching?
What tools can I use on the instances themselves?
What are some good remote monitoring tools?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Monitoring Performance Changes
You have a dashboard, right?
So that sounds like a lot of effort – how important is it really?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Monitoring Performance Changes
What things should I be watching?
– Total number of instances
– CPU usage per instance
– Latency spikes
– ASG churn
– Load average
– Network saturation
– Errors
– Blocking I/O
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Monitoring Performance Changes
You have a dashboard, right?
That sounds like a lot of work. Is it worth it?
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Monitoring Performance Changes
What things should I be watching?
– Total number of instances
– CPU usage
– Latency
– ASG churn
– Load average
– Network saturation
– Errors
– Blocking I/O in general
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Monitoring Performance Changes
What tools can I use on the instances themselves?
– System Tap – complex but worth it!
– Strace – Sometimes the old ways are the best ways
– Vmstat
– Pidstat
– SAR – Your friend and mine, but don't leave it on 24/7
– Load average (uptime)
– Dmesg
– Mpstat (especially on larger instance types)
– Iostat
– Free
– Top and its relatives – mtop, htop, etc
– Perf
– lsof
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM
Monitoring Performance Changes
What are some good remote monitoring tools?
– Splunk – Of course. You have money, right?
– Prometheus – Roll your own and boy does it scale!
– Cloudwatch – Free, painful
– Ye Olde Nagios + Thruk – Reliable!
– ELK stack – You're going to hire someone, right?