multi layer monitoring v1

32
Maintaining Non-Stop Services with Multi Layer Monitoring Lahav Savir System Architect and CEO of Emind Systems [email protected] www.emindsys.com

Upload: lahav-savir

Post on 01-Dec-2014

1.454 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Multi Layer Monitoring V1

Maintaining Non-Stop Serviceswith Multi Layer Monitoring

Lahav Savir

System Architect and CEO of Emind Systems

[email protected]

www.emindsys.com

Page 2: Multi Layer Monitoring V1

The approach

• Non-stop applications can’t leave on their own

• More complex systems require more monitoring

•• Proactive system monitoring

• Customized monitors

• Monitor each process, component and application separately and all as a whole

• Proactive correction of problems before they become noticeable by your customers.

• Allow application to function at maximum availability• Allow application to function at maximum availability

• SNMP monitoring of application infrastructure

• Alerting of potential problem or situation prior to accuracy

• Visual layered display of the entire data center

2

Page 3: Multi Layer Monitoring V1

Monitoring is not just forMonitoring is not just for

System Administrators

but for Developers as well

4

Page 4: Multi Layer Monitoring V1

The goal for monitoring is to

keep track of the running services 24/7,

find troubles as early as possible andfind troubles as early as possible and

keep you alerted (only when needed…)

Good monitoring infrastructure provides you a

quick and direct troubleshooting abilities via

visual representation of the system status

5

Page 5: Multi Layer Monitoring V1

Multi-Layered Monitoring

ServicesKeeping SLA, End-to-end service,

User experience monitors

ApplicationsApplication proprietary monitors,

Custom counters

Operating SystemsCPU, Memory, Disk, Network, Processes

Unifie

d D

ashboard

6

InfrastructureNetwork connections, network devices,

Chassis, Routers, Load Balancers, Firewalls

Unifie

d D

ashboard

Page 6: Multi Layer Monitoring V1

Visualize the information

• Use Maps & Views

• Visualize the topology

• Visualize the application flow

• Focus on different layers • Focus on different layers

• Layered views (service & host groups)

• Network, Hardware, OS, Application

• Different roles looking for different info

• Graph Service performance

• Transactions, Success rates, Cache

••

• Aggregated view

• Cluster’s average

7

Page 7: Multi Layer Monitoring V1

Why multi layer ?

• Correlated information on one uniform view

• Network throughput, CPU usage, Application usage

• Generate aggregated reports for different machines & layersmachines & layers

• Collect information from all nodes

• Switches, routers, firewalls, load balancers, storage, servers, applications

• Collect different types of data

• Utilization, throughput, concurrency, cache status

• Application performance, error rates• Application performance, error rates

• Objective

• Find the root cause via visuals on the dashboard !

• Be aware of what’s going on

8

Page 8: Multi Layer Monitoring V1

Unifie

d D

ashboard

9

InfrastructureNetwork connections, network devices,

Chassis, Routers, Load Balancers, Firewalls

Unifie

d D

ashboard

Page 9: Multi Layer Monitoring V1

Infrastructure layer

• Hardware redundancy my be dangerous if you don’t keep your eyes on it

• Administrators not always seeing the HW

• Redundant hardware can fool you (until it dies)• Redundant hardware can fool you (until it dies)

• Vendor specific MIBS / Syslog

• Today’s hardware provides detailed status interfaces

• Power supplies, power usage

• Fans & temperature

• Disk controllers & drives• Disk controllers & drives

• Switch ports, interfaces

• Links, connections

10

Page 10: Multi Layer Monitoring V1

HP HW monitoring

11

Page 11: Multi Layer Monitoring V1

Network infrastructure health

Link Quality Devices utilization

Network throughput

12

Network throughput

Page 12: Multi Layer Monitoring V1

Operating SystemsCPU, Memory, Disk, Network, Processes

Unifie

d D

ashboard

13

Unifie

d D

ashboard

Page 13: Multi Layer Monitoring V1

Operating System health

• Monitoring of OS essentials

•CPU, Memory, Disk I/O, Network traffic, I/O, Network traffic, processes, services

• Use cases

•Failure on log cleanup > disks full

14

Page 14: Multi Layer Monitoring V1

ApplicationsApplication proprietary monitors,

Custom counters

Unifie

d D

ashboard

15

Unifie

d D

ashboard

Page 15: Multi Layer Monitoring V1

Applications health

• Transaction counters

• Measure transaction rates

• Measure counters on

input & outputinput & output

•% Success rates

• Success counters for

primary operations

16

Page 16: Multi Layer Monitoring V1

System input/output monitoring

Outlined topology

17

Page 17: Multi Layer Monitoring V1

Applications health

• Queues

• Processing backlog

• Semaphores / throttle usage

• Latency• Latency

• Measure the time it takes to process request / data chunk

• Optimizations

• Measure compression rates

18

Page 18: Multi Layer Monitoring V1

Applications health

• DB synchronizations

• Replication status

• Replication backlog & delays

19

Page 19: Multi Layer Monitoring V1

Applications health

• Cluster & topology monitoring

• Track application topology changes

• Indicate dependencies status

20

Page 20: Multi Layer Monitoring V1

Application topology

Outlined topology TransportsRoutersFront-ends

Firewalls FirewallsLoad

Balancers

Load

Balancers

21

Page 21: Multi Layer Monitoring V1

ServicesKeeping SLA, End-to-end service,

User experience monitors

Unifie

d D

ashboard

22

Unifie

d D

ashboard

Page 22: Multi Layer Monitoring V1

Cluster overall QoS

•Now it’s a cluster, aggregated counters•Want to know what users are experiencing

23

Page 23: Multi Layer Monitoring V1

User Experience & Service QoS

• Simulate user behavior

•Latency•How long it takes to login•How long it takes to send and •How long it takes to send and receive a message

•How long it takes to “check out”•Success rates•What’s the success rate of the user’s operation

•Download speed•Download speed•What’s the download speedfrom different locations

•What’s your Content Delivery Network (CDN) performance

24

Page 24: Multi Layer Monitoring V1

Recommendations

• Build a generic monitoring infrastructure with generic tools and interfaces

• Use embedded SNMP

•• Net-snmp is extendable (also for Windows)

• PROXY – proxy request to other SNMP agent (embedded)proxy -v 2c -c public 127.0.0.1:50910 1.3.6.1.4.1.15867.2000.3.6

1.3.6.1.4.1.15867.2000.3.6

• PASS – STDOUT based subagentpass .1.3.6.1.4.1.15867.2001 /bin/sh

/usr/local/ixi/GenericSubAgent.sh

• EXEC – run a script• EXEC – run a scriptexec .1.3.6.1.4.1.15867.1100.20.10 axs-imap-test-stat /bin/bash

/opt/mas/scripts/imap_tester.sh last_state

• SSH / Telnet

25

Page 25: Multi Layer Monitoring V1

Using Status files

• Perfect for batch operations

• perl, python, php

• Status fileTIMESTAMP:1276203703TIMESTAMP:1276203703

STATUS:0

HOSTNAME:myserver

• Observerif [ $(get_time_delta ${file}) -gt ${max_d_s} ]; then

err "Delta is greater than ${max_delta} hours"

return

FiFi

if [ "$(parse_status_file ${file} STATUS)" != "0" ]; then

err "Last backup status is not 0“

return

Fi

echo ${ok}

27

Page 26: Multi Layer Monitoring V1

Command line based info.

• Command line applications

• DB, Softswitch, Etc.

• Example

• Snmpd.conf• Snmpd.confpass .1.3.6.1.4.1.15867.1.100 /bin/bash /usr/local/emind/replication_status.sh

• Run the script replication_status.sh

•Execute SQL Query - show slave status\G;•Parse the output•Return data to snmpd•Return data to snmpd

28

Page 27: Multi Layer Monitoring V1

Important to remember

• Define your goals, What’s right for you ?

• Don’t over monitor

• Use methodologies and technologies that fit your network and needs, not the other way around.and needs, not the other way around.

• Build generic interfaces

• SNMP

• Simple command line

• No proprietary protocols

• Agnostic to the monitoring tool

29

Page 28: Multi Layer Monitoring V1

Mobile Access

30

Page 29: Multi Layer Monitoring V1

Leading tools – Open source first…

• Nagios

• Very generic, lot’s of public plug-ins

• Easy to tweak and build it at your own style

• Zabix• Zabix

• A complete monitoring solution

• Less customizable

• Cacti

• Graphing (RRD tool)

• Easy to configure

• Lot’s of public templates• Lot’s of public templates

31

Page 30: Multi Layer Monitoring V1

Monitoring on the Cloud

• Nagios + Dynamic Configuration = Dynagios !

• Key features

• Auto provisioning

• Add, Remove, Suspend, Unsuspended• Add, Remove, Suspend, Unsuspended

• Machines are monitored base on their predefined profiles

• Machines can join / leave the monitor (purposely)

• Join on boot

••• Leave on shutdown

• If crash happens alert will raise

32

Page 31: Multi Layer Monitoring V1

Commercial tools

Feature ManageEngine Orion

OS Requirements Windows & Linux Windows

Modular (applications, IP SLA, Yes YesModular (applications, IP SLA, Netflow, Conf.)

Yes Yes

Very easy to deploy Yes Yes

Multi vendor with lot’s of templates

Yes Yes

Maps Yes Yes

Less customizable … but still flexible

Yes Yes

33

flexible

Est. price for 100 devices $10k $15k

Page 32: Multi Layer Monitoring V1

Questions?

34