avid system monitor ed harper november 2010 1. 2 avid system monitoring overview avid system monitor...

27
Avid System Monitor Ed Harper November 2010 1

Upload: jesse-elliott

Post on 18-Dec-2015

225 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

Avid System MonitorEd Harper

November 2010

1

Page 2: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

2

Avid System Monitoring overview

• Avid System Monitor delivers Enterprise wide monitoring solution for Avid systems and infrastructure switches

Overview• Single GUI visibility to whole infrastructure• Standards based polling and event notification; SNMP,

IP, HTTP• Tightly integrated with Avid Health Monitor• Integrate with enterprise management

Devices Managed•ISIS 5000 & 7000; System Director•Interplay; Media Indexer, Look Up Server, Interplay Engine, Capture, ASF services, Capture•SNMP Network Switches (Cisco, Foundry, Force10)

Capabilities•Real Time Statistics, thresholds•Events, Alarms, Notifications (email)•Historical statistics•Surveillance Dashboard•Flexible reporting tools

Benefits•Proactive real time status and statistics - Identify anomalies, prevent outages - System wide diagnostic tools, faster restoration •Trend analysis

Page 3: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

3

What it is

• A tool to increase the system availability by identifying issues in real time• A tool to help identify potential problems in a system as they are occurring• A single tool for monitoring all necessary components of the “system”, including Avid gear,

network infrastructure, 3rd party devices• A tool that collects performance data over time so that it can be graphed (and trends

identified)• A tool that will continually evolve to identify known problems within a system (after the

knowledge of those problems have been learned during Code Blues, etc)• A window into specific state of the Avid & selected infrastructure system components at a

given point in time. It also provides enough flexibility for customers to refine and fine tune the tool’s outputs once the basic functions are mastered.

Page 4: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

4

Overview

• Avid System Monitor delivers enterprise solution monitoring for Avid systems and infrastructure

– Pro-active system health and status monitoring

– Statistics gathering, graphing and thresholds

– Event logging, intelligent alarm processing and notification

– Dashboard views showing outages and availability• Simple drill down to isolate issues

– Standards based• SNMP, HTTP & IP port status

– Avid Monitoring Gateway service installed on Framework (ASF) enabled devices to provide visibility to Avid System Monitor via HTTP

Page 5: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

5

Monitoring components

Monitoring ServerRecommended platform SR2500

GUI, SNMP & HTTP collection

SQL Database

Java (JDK) Environment

• Avid Service Framework• Provides time sync

Monitored Node

Agents• Interplay Engine• Stream Server• Capture• Media Indexer• Interplay Lookup Service (LUS)• ISIS 7000 System Director•ISIS 5000

Real Time Audit

Agentless• AirSpeed, AirSpeed Multi Stream• Capture Manager• DNS, DHCP services• Time Sync

3rd Party• Cisco switches• Foundry Switches• Force10

Page 6: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

6

Monitoring Environment

• Monitored Avid Services & Devices– Detailed monitoring including status, statistics etc.

• Avid Service Framework (ASF)– Media Indexer (MI)– ASF Lookup Service

• Interplay Engine• Stream Server• Interplay Capture• ISIS 5000 & 7000: System Director

• Real-time inventory– Device up/down status without detailed monitoring

• Workflow Engine, iNews FTS, Workstation Service , Time Sync service, Multicast repeater, LowRes Encoder

• 3rd Party Elements– Windows services; DNS, DHCP etc– Network Switches

• Cisco, Foundry, Force10

Page 7: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

7

Dashboard

• Single screen view with Intelligent grouping of devices & domains

• High level status– Alarms

– Notifications

– Node Status

– Resource Graphs

• Click on any device group to automatically filter information for selected devices

Page 8: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

8

Events & Alarms

• Extensive Event Logging– Severity, source etc– Acknowledgement – Search– Fine grain event details– Correlating up/restore

• Alarms– Flexible rules to allow event

aggregation in alarm view to count multiple occurrences of same event

• Severity

• Last time of event

• Count occurrences

• Link to event details

• Option to auto-clean events

– Operator Instructions specific to alarm & device type

Page 9: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

9

Notifications

• Flexible notification to email– Individuals or groups

• Automatic Escalation– Escalation to higher level group if notification is not

acknowledged within certain time• Example; Minor event sent to Ops team, if unacknowledged

for 20 minutes raised priority to Major and issues notification to Management team

• Notification logging, with timestamps including response time

Page 10: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

10

Statistics & Charts

• Historical statistics gathering, trending, charts• Thresholds set to trigger events and notifications on

‘interesting’ conditions– Specifically tuned to Avid components, based on real world

experience

Page 11: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

11

Threshold Event Notification

• Flexible Threshold engine– Configurable on any counter in the system

– Extensive pre-programmed thresholds provided in Avid monitoring package

– Simple process to add customer specific threshold

Media IndexerMedia Files

Admin configurable trigger levels

Page 12: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

12

Threshold Configuration

• Custom configuration of Threshold Event– Any counter value collected by OpenNMS

– Type; High, Low, Relative Change, Absolute Change

– Datasource; Entity to collect counter data (graph properties)

– Datasource Type; Node or interface

– Datasource Label; String displayed in event

– Value; Threshold value

– Re-arm; Reset/ Cleared value

– Trigger: Number of times the threshold must be broken to create an event

Page 13: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

13

Node View

• Single screen dashboard per node

– Current Status

– Availability; system and individual services

– Notifications, Recent Events, Recent Outages

Page 14: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

14

Outages & Availability

• Calculated 30-Day Availability – Color Coded

• Current Outages– Node or Service down

• Grouped by Device / Service Type– Click to drill down

Page 15: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

15

Surveillance View: Flexible Grouping

• Current Outages by;– Device Type

– Workgroup or location

• Grouping by– Service

– Category– Simple customization

Page 16: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

16

Node Discovery

• Configure OpenNMS to discover devices and services on specific IP address or range

– Automated capability query of generic IP, SNMP and Avid specific services & device capabilities

– Add device names to nodes for readability if desired• IP address and DNS names displayed by default

• Automated capabilities scan every 24 hours

Page 17: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

17

Network Switch Monitoring

• SNMP monitoring and statistics gathering for Cisco, Foundry & Force10 infrastructure Zone 2 switches

Page 18: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

18

Maps

• OpenNMS provides mapping tool with device status– Multiple maps to allow views for LAN, editors etc

– Link discovery find node connectivity• Not all links shown correctly; ISIS switches not manageable so devices appear connected to adjacent

switch

Page 19: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

19

Proving it’s Value (a real field example)

• Phased Roll-out– Monitoring SNMP switches (only)

• Customer Reported AirSpeed “Slow Down”– Avid CS / Systems Engineers queried OpenNMS remotely

– Pulled switch bandwidth utilization • Switches operating correctly

• Within a few minutes troubleshooting team moved on to investigate specific devices

– Without OpenNMS proving switch operation required access labor intensive process of monitoring scripts and driving traffic loads

• Time consuming ~ 1 day to prove switches

Faster resolution

Greater customer satisfaction

Page 20: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

20

Example

• Memory Utilization on Interplay Media Indexer• Charts show steady consumption of server RAM memory

during load test• Performance impacted as memory maxed out• Thresholds provide notification when x% exceeded

Page 21: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

21

Server & System Requirements

Category Requirement

Avid System Monitor Server Recommended; Intel SR2500 Server

Operating System Windows 2003

Processor 2 GHz or better

Memory 2 GB

Java JDK Provided with Avid System Monitor

PostgreSQL Database Provided with Avid System Monitor

Client Browser

Adobe SVG viewer Required for Internet Explorer client browser to view map pages (Firefox etc have SVG viewer built in)

Page 22: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

22

Pricing, Availability etc

• Delivery– Value-add offered to customers with Avid Uptime support

• Software download• Phased roll-out at selected customer Production networks

– Typically switch monitoring

• Pricing– Avid System Monitor available to Avid Uptime support contract

customers– PSG installation

• PSG engagement required

Page 23: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

23

Summary

• Real-Time monitoring of devices, services, networks & infrastructure– Avid Customer Success – Customer IT / Admin

• Statistics, thresholds, events and notifications• Broad Enterprise system support

– Increasing breadth and depth

• Pro-active warnings and notification of potential problems• Improved time to resolution

Page 24: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for
Page 25: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

25

Avid Monitoring Solution

ASF Monitoring Gateway

ISIS ISB, ISIS switch

ASF Health Monitor

Media Indexer

Lookup ServerISIS Engine

OpenNMS GUI

System Director

LAN Switches

Interplay SNMP

Interplay Engine, Stream Server,

Archive

AirSpeed

AirSpeedMS

ISIS client, Editor

Service / IP monitoring

Full Monitoring; events, statistics

SNMP

ICMPHTTP/TCPSNMP Data collection Trap receiverAvid TCP Port monitoringDNS, time sync

ICMP (Ping)Avid TCP Port monitoringDNS

ICMPSNMP

SNMP

ICMPSNMP

ICMPonly

ICMPSNMP

Page 26: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

26

Failure Modes Monitored

• Avid System Monitor is tuned to identify specific failure modes

– As found in field experience / Code Blue• Media Indexer• MI in the HAG with a weight of "0": Indicates an "election issue" which can cause major system slowdown. • Number of quarantined files growing: Indicates a faulty ingest device creating bad files. • Different file count between each of the HAG MI's: Indicates issue with ISIS notifications. Some files will appear offline to some clients. • Different time on each of the machines in the WG: Can be the cause of lost ISIS notifications (see above). • MI Heap usage running dangerously high: Indicates your WG file count or client count is causing too much stress on that MI. Eventually, the MI will thrash. • Number of files added/updated on last full resync, when it's greater than 0.  This value is displayed in the Health Monitor, under each storage pane of the MI. • Interplay Engine• Time to perform login - should be below 15 seconds: indicates engine slowness • Number of journal files - should be below 50: indicates journal integration stuck/dead • Number of deletes - should be below 100 for 5 minute polling intervals during normal production time: indicates deletion during production time • Number of loaded objects/number of total objects - should be above 30%: indicates engine cache warm-up causing slowness  • Backup running flag - should be off during production time • Avid Service Framework Lookup Service (LUS)• For LUS, here are things we could check today via SNMP Gateway. However, these monitor points don't really contribute to most of the problems we see related to ASF.  They are the only

data points that are available today.• Monitor Handle Count (either via gateway or MSFT agent) - should be below some threshold (<5000) • Monitor Thread Count (either via gateway or MSFT agent) - should be below some threshold (<500) • Monitor Events In Queue (via gateway) - should be less than 50 • Check that a process is bound to port 4160 on the box (don't know how to do that with OpenNMS) - confirms that the LUS process is running • Monitor Memory Usage (either via gateway or MSFT agent) - should be below some threshold (<200MB )

• ISIS• ISIS monitors a number of critical areas and sends an event to the Windows event log when values reach a defined value or threshold. You can configure ISIS to send an email when an

error or warning event occurs. You can also configure the System Director to generate an SNMP trap when the event occurs. The top areas include the following:• Temperature and presence of components such as switches, storage elements, and power supplies. • Workspace usage thresholds. For example, an Admin can enable warning and error thresholds. If you can set the workspace threshold to 90%, ISIS will generate an error event when a

workspace reaches 90% full • Disk health issues such as disk failed or disk performance degraded based on continuous monitoring. • Server failover notifications. For example, on a failover system you are notified when the system fails over to the other node. • Metadata problems. For example: if there is a problem opening a metadata file or if the metadata in a file seems out of date

Page 27: Avid System Monitor Ed Harper November 2010 1. 2 Avid System Monitoring overview Avid System Monitor delivers Enterprise wide monitoring solution for

27

Monitored Device Matrix

Device / Service Version(s), Notes Inv Mon

Unity ISIS v2.x √

Interplay Engine V2.x √

Media Indexer V2.x √

Interplay Engine V2.x √

Interplay Lookup Service (LUS) V2.x √

AirSpeed √

AirSpeed Multi Stream √

Capture Manager √

Interplay Capture V2.x √

DNS & DHCP services √

Avid Time Sync Service √

3rd Party Network Switches Cisco / Foundry / Force10 √

Windows Services DNS,DHCP, Time, Anti-virus, auto-update etc √

Inv Real time Inventory; service or server Up/Down

Mon Full monitoring; detailed alarms, statistics etc