italian conference on nagios: michael medin on windows monitoring
DESCRIPTION
Michael Medin point of view on Windows Monitoring explained during the Nagios Conference in Bolazno ItalyTRANSCRIPT
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Going where no man has gone before
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
These slides represent the work and opinions of the author and do not constitute official positions of any organization sponsoring the author‟s work
This material has not been peer reviewed and is presented here as-is with the permission of the author.
The author assumes no liability for any content or opinion expressed in this presentation and or use of content herein.
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Developer (not system manager)◦ Not working with Nagios
Accidentally ended up in our NOC◦ Hated BB
2003: The birth of NSClient++◦ NSClient sucked (Broke Exchange)◦ NRPE_NT was to hard to use
2004: The open source of NSClient++◦ “just for fun”
2007: The rebirth of NSClient++◦ A lot of users emailed me◦ Got a lot of hits on the webpage◦ Intense development lead to 0.3.0!
2010: The Future◦ 0.3.8 out now, ◦ 0.4.x in development (scheduled for beta fall 2010)
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Agents◦ An overview of your options
About NSClient++◦ Quick Introduction
Monitoring◦ Eventlog Checking
◦ WMI (Windows Management Instrumentation)
◦ Scripts
◦ Revisiting WMI
Q/A
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
An overview of the options
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
What is NSClient?◦ A (pretty old) program
NSClient (or pNSClient)
◦ A (pretty limited) protocol
check_nt
◦ A (pretty incorrect) concept
”Windows monitoring”
What is it not?◦ NSClient++!
◦ NSClient++ was written as a replacment for NSClient
◦ But has evolved much since then...
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Agent Age Protocol Licence
SNMP 1990-2008 SNMP Proprietary
NSClient 200x NSClient GPL
NRPE_NT 200x-2006 NRPE GPL
NSClient++ 2004-2010 NRPE,NSClient,NSCA GPL
NC_NET 2004-2009 NSClient,NSCA GPL?
OpMonAgent 2008 NSClient,NRPE GPL?
Agentless WMI recently N/A N/A
... ... ... ...
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
I would use either:◦ NSClient++
◦ NC_NET
I would not use:◦ SNMP
To complex to use (and limited on vanilla hardware)
◦ NSClient/NRPE_NT/OpMonAgent
Old, outdated and has limited functionality
◦ Agentless WMI
Limited functionality (and enforces centralized monitoring)
But...◦ I am biased, so might not wanna take my word for it...
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Protocol Method Encryption Auth Payload Args. Multi Commands
NSClient Active No Yes Unlimited1 Yes1 Yes1
NRPE Active Yes No 10242 Yes No
NSCA Passive Yes Yes 5122 Yes Yes
Future3 A/P/* Yes Yes Unlimited Yes Yes
1) Protocol supports it but not check_nt2) NRPE Payload can be extended with recompile of check_nrpe and configured in NSClient++ 3) A future protocol I am thinking of adding to NSClient++
NSClient++ supports all of them
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
I would use:◦ NRPE (check_nrpe)
For active checks (the server queries information)
◦ NSCA
For passive checks (the client pushes information)
I would not use:◦ NSClient (check_nt)
Limited feature set
Be aware!◦ None of them are safe (from a security perspective)!
◦ But then... Nothing really is...
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Quick Introduction
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Internals:◦ C++ using W32 API◦ Around 40.000 lines of code◦ Actively developed (unfortunately only by me)◦ Modularized design philosophy
Runs on:◦ NT4, w2k, XP, w2k3, Vista, w2k8, Windows 7 ...◦ X86, x64, IA64 (I lack a compiler for that platform, but it works)
Current Version:◦ 0.3.8 (out now, yesterday in fact)◦ Don‟t use 0.2.7!
Most features require NRPE or NSCA Documentation online (WIKI)
◦ http://nsclient.org
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Not supported by a commercial entity◦ Donations welcome◦ Sponsoring available (contact me for details)
Used by a lot of people (I think)◦ Impossible to estimate any figures
Website has:◦ Around 10-15.000 unique visitors per month◦ Around 20-30.000 downloads per month
Please, Help out!◦ Add documentation, report problems, ideas,
thoughts, etc, etc...
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Major simplification to the eventlog checker generated > -2d AND severity = 'error'
Registry checks
Improvements to the file checker
Supports multi-language performance counters
“Automatic” volume support
Improved command line support
Simplified scripting with a new VB Helper◦ Thanks op5!
Many more things…
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Rewritten ”core” using boost◦ Means ”propper handling” (and fewer bugs?)◦ Unix support◦ Improved multitasking◦ Etc.
New settings subsytem◦ Registry, improved ini support, better loader, xml?
Filter-like API (in addition to options)◦ “warn=any drive > 90% or c: > 80%”
New improved client with improved protocol Better .net integration Better customization support CEP - Complex Event Processing?
◦ If anyone wants this let me know!
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
NSClient++ is a command line program!◦ nsclient++ /start (net start nsclientpp)
◦ nsclient++ /stop (net stop nsclientpp)
◦ nsclient++ /test
Configuration:◦ notepad nsc.ini
Testing:1. Local (nsclient++ /test)
2. From CLI (check_nrpe ...)
3. From Nagios (add command)
Works with “anything” (event non Nagios things)
nsclient++ /test
Is your friend!
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Eventlog checking
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
In a galaxy far far away...
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
The good:◦ Powerfull interface!
◦ Simple to use!
◦ out-of-the-box solution!
(on which you can expand)
The bad:◦ Nothing! Really, I mean it!
But...◦ …still a bit “experimental”
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Syntax is friendly and intuitive
Still experimental◦ Should work though, so please try it
Based on SQL WHERE clauses◦ generated > -2d AND severity = 'error'
Automatically detects version to use◦ So no filter=newer option
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Like SQL ”Where” clauses◦ severity = ‟error‟
◦ severity = ‟error‟ OR severity = ‟warning‟
◦ severity = ‟error‟ OR (id = 123 OR id = 345)
◦ severity = ‟error‟ OR id IN (123, 345)
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Type Description
type Type of error. (Microsoft says this is severity)error, warning, info, auditSuccess or auditFailure
source The name of the source of the event.The program who logged the message
generated Time ago the message was generated.When it happened
written Time ago the message was written to the log (don‟t use)
strings Message contents (faster)
message Message text (slower)
id Event id of the log message (this with source in unique)
severity Event severity (I think this is severity)success, informational, warning or error
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Operator Safe Meaning
= eq Equality
!= ne Not equal
> gt Greater then
< lt Less then
=> ge Greater then or equal
=< le Less then or equal
like String similarity (substring matching)
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Name Use Example
convert(...) Converts from one type to anotherUsualy not needed as types are infered
neg(...) Negate value -1 = neg(1)yesterday=neg(tomorrow)yesterday = -tomorrow
in ( ... ) Equals to anyone from a list id in (1,3, 4, 5)
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Option Description
file The “eventlog file” to open.Use multiple file-options to check multiple files.
filter Define the filter (there can only be one)
MaxWarn Maximum hits before a warning state is issued.
MaxCrit Maximum hits before a critical state is issued.
truncate Length of returned data.Since NRPE (and NSClient++) has a limited capacity this is important. Usually 900 is a good value.
syntax How to format the return data
unique Only “one of each” record will be returned.(“count” (MaxWarn/MaxCrit) is not affected)
descriptions If you plan on using the %message% syntax option.(Will impact performance “severely”)
debug=true Displays a lot more information about the check
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
alias_event_log Uses the following definition:
◦ file=application file=system The files to check
◦ MaxWarn=1 MaxCrit=1 Every error is a warning
◦ "filter=generated gt -2d \ Generated less then 2 days ago
◦ AND severity NOT IN ('success', 'informational')" NOT a success or information message
◦ truncate=800 unique descriptions Truncate returned data and make it look pretty
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Filtering is fairly straight forward
The ”parser” will do most of the work for you◦ generated > -2d just works!
enable debug=true to see what happens
Always always always debug in ”/test mode”
Check query times to optimize performance
There is a pretty ok guide on the wiki
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Start with “everything” and work your way down.
System, Application, etc etc Reasonable start filter:
◦ generated > -2d AND severity NOT IN („success‟, „informational‟ )
Need to customize it for your environment. A good idea is to use more then one check
1. Check “all errors” send to /dev/null2. Check “my service” send to admin@server
Don‟t overdo it (eventlog checking is slow)
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
WMI - Windows Management Instrumentation
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
The purpose of WMI is to define a non-proprietary set of environment-independent specifications which allow management information to be shared between management applications.
WMI prescribes enterprise management standards and related technologies that work with existing management standards, such as Desktop Management Interface (DMI) and SNMP.
WMI complements these other standards by providing a uniform model. This model represents the managed environment through which management data from any source can be accessed in a common way.
…yada yada yada…
In short: A bit like SNMP but modern ◦ Though it is actually more then 10 years old
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Everything?◦ Almost...
There is a lot of objects (tables)◦ win32 has 450 objects
◦ Various services will add more (AD, SQL Server, ...)
You can:◦ Read, write and work with “objects”.
But only read via the built-in commands of NSClient++
But you can not:
◦ Check your application (ish)
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Built-in commands are dangerous!◦ No security, allows access to a lot of things!
◦ For instance you can enumerate the file system
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
CheckWMI◦ Check a result set◦ Good for; checking if we have more (or less) then n items...
CheckWMIValue◦ Check a specific value◦ Good for; checking if a value is more or less then n
Custom Scripts◦ For, I think, most things beyond the basics◦ Also improves the security aspect◦ Good for; Everything
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
WQL - WMI Query Language◦ Based upon SQL
◦ Only select … (no update/insert/delete/DDL)
“Tables” are called objects in WMI◦ An object usually correspond to a logical “type”.
Example:◦ select LoadPercentage from win32_Processor
Retrieves system load from the win32_Processor ”object”.
◦ select * from win32_Processor
Retrieves everything from the win32_Processor ”object”.
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Object Description
Win32_Fan Represents the properties of a fan device in the computer system.
Win32_TemperatureProbe Represents the properties of a temperature sensor (electronic thermometer).
Win32_DiskDrive Represents a physical disk drive as seen by a computer running the Windows operating system.
Win32_PhysicalMedia Represents any type of documentation or storage medium.
Win32_TapeDrive Represents a tape drive on a computer system running Windows.
Win32_BaseBoard Represents a baseboard (also known as a motherboard or system board).
Win32_BIOS Represents the attributes of the computer system's basic input or output services (BIOS).
Win32_IDEController Represents the capabilities of an Integrated Drive Electronics (IDE) controller device.
Win32_MemoryArray Represents the properties of the computer system memory array and mapped addresses.
Win32_OnBoardDevice Represents common adapter devices built into the motherboard (system board).
Win32_Processor Represents a device capable of interpreting a sequence of machine instructions on the computer.
Win32_SCSIController Represents a small computer system interface (SCSI) controller on a computer system running Windows.
Win32_USBControllerDevice Relates a USB controller and the CIM_LogicalDevice instances connected to it.
Win32_NetworkAdapter Represents a network adapter on a computer system running Windows.
Win32_Battery Represents a battery connected to the computer system.
Win32_PortableBattery Represents the properties of a portable battery, such as one used for a notebook computer.
Win32_PowerManagementEvent Represents power management events resulting from power state changes.
Win32_UninterruptiblePowerSupply Represents the capabilities and management capacity of an uninterruptible power supply (UPS).
Win32_PrinterRepresents a device connected to a computer system running Windows that is capable of reproducing a visual image on a medium.
Win32_PrintJob Represents a print job generated by a Windows-based application.
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Object DescriptionWin32_SystemDriver Represents the system driver for a base service.
Win32_Directory Represents a directory entry on a computer system running Windows.
Win32_DiskQuota Tracks disk space usage for NTFS file system volumes.
Win32_LogicalDisk Represents a data source that resolves to an actual local storage device.
Win32_Volume Represents an area of storage on a hard disk.
Win32_PageFileUsage Represents the file used for handling virtual memory file swapping on a computer system running Windows.
Win32_NetworkConnection Represents an active network connection in a Windows environment.
Win32_NTDomain Represents a Windows NT domain.
Win32_PingStatus Represents the values returned by the standard ping command.
Win32_ComputerSystem Represents a computer system operating in a Windows environment.
Win32_OperatingSystem Represents an operating system installed on a computer system running Windows.
Win32_Process Represents a sequence of events on a computer system running Windows.
Win32_ProcessStartup Represents the startup configuration of a computer system running Windows.
Win32_ScheduledJob Represents a job scheduled using the Windows NT schedule service.
Win32_BaseService Represents executable objects that are installed in a registry database maintained by the SCM.
Win32_Service Represents a service on a computer system running Windows.
Win32_LogonSession Describes the logon session or sessions associated with a user logged on to Windows 2000 or Windows NT.
Win32_UserAccount Represents information about a user account on a computer system running Windows.
Win32_UserInDomain Association class
Win32_WindowsProductActivation Contains properties and methods related to WPA.
Win32_NTEvent... Yes you can even check the eventlog!
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
NSClient++ has support for executing WQL queries ”as is” and get the result.◦ nsclient++ -noboot CheckWMI <query>
Sample use◦ nsclient++ -noboot CheckWMI select * from win32_Processor
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
A better option is◦ WMI Administrative Tools
◦ Freely avalible from Microsoft
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
1. Checking ”values”Is Load above 50%
Use CheckWMIValue
2. Checking ”items”Is load on more then 3 cores above 50%
Use CheckWMI
3. Checking ”custom things”Check if load is above 50% and less then 5 queries are running
on the database
Use Scripts
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Best way to start
Simple to use...◦ ...if you know your WMI
A sample query:◦ CheckWMIValue
"Query=Select * from win32_Processor“
MaxWarn=80 MaxCrit=90
Check:CPU=LoadPercentage
AliasCol=LoadPercentage
ShowAll=long
◦ (a bit like CheckCPU)
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Scripts
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
External Scripts◦ VB, Perl, Python, ...◦ .exe files◦ .net◦ ...
Lua◦ Lua is a simple programming language◦ Used INSIDE NSClient++◦ Very powerful, and simple◦ A fairly new feature so feel free to suggest things
Modules◦ Written in C++, Vb, .net, ...◦ Very powerful, but much “harder”
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Configuration:◦ [modules]
◦ CheckExternalScripts.dll
◦ ...
◦ [External Scripts]
◦ <alias>=<script>
<alias> is the command from nrpe
<script> is the command to execute
check_es_ok=scripts\ok.bat
◦ [Wrapped Scripts]
◦ <alias>=<script>
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Sample Code:◦ @echo CRITICAL: Everything is not going to be ok!
◦ @exit 2
Exit statuses:◦ 0=OK, 1=Warning, 2=Critical, 3=Unknown
NSC.ini syntax:◦ [External Scripts]
◦ check_bat=scripts\check_test.bat
Or◦ [Wrapped Scripts]
◦ check_test=check_test.bat
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Sample Code:◦ Wscript.Echo “Everything might not be ok"
◦ Wscript.Quit(1)
Exit statuses:◦ 0=OK, 1=Warning, 2=Critical, 3=Unknown
NSC.ini syntax:◦ [External Scripts]
◦ check_test=cscript.exe /T:30 /NoLogo scripts\check_test.vbs
Or◦ [Wrapped Scripts]
◦ check_test=check_test.vbs
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Sample Code:◦ write-host “OK: Everything is wicked!"
◦ exit 0
Exit statuses:◦ 0=OK, 1=Warning, 2=Critical, 3=Unknown
NSC.ini syntax:◦ [External Scripts]◦ check_test=cmd /c echo scriptscheck_test.ps1; exit($lastexitcode) | powershell.exe -command -
Or◦ [Wrapped Scripts]
◦ check_test=check_test.ps1
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
This is exactly as writing ”regular” Nagios scripts.
Find Script on: http://www.monitoringexchange.org
http://exchange.nagios.org
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Configuration:◦ [modules]
◦ LUAScript.dll
◦ ...
◦ [LUA Scripts]
◦ <script>
scripts\test.lua
What, no alias?◦ Not needed (happens inside the script)
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
nscp.print('Loading test script...')
nscp.register('check_foo', „foo')
function foo (command)◦ nscp.print(command)
◦ code, msg, perf = nscp.execute('CheckCPU','time=5','MaxCrit=5')
◦ return code, 'hello from LUA: ' .. msg, perf
end
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
The power of Lua scripts comes from:◦ The ability to run and modify the result of other
commands
◦ The ability to run ”inside” NSClient++
◦ The simplicity of the language
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Revisiting WMI
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
' Default settings for your script.
threshold_warning = 50
threshold_critical = 20
' Create the NagiosPlugin object
Set np = New NagiosPlugin
' Define what args that should be used
np.add_arg "warning", "warning threshold", 0
np.add_arg "critical", "critical threshold", 0
If Args.Exists("warning") Then threshold_warning = Args("warning")
If Args.Exists("critical") Then threshold_critical = Args("critical")
np.set_thresholds threshold_warning, threshold_critical
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Set colInstances = np.simple_WMI_CIMV2(“.”, "SELECT * FROM Win32_Battery")
For Each objInstance In colInstances
WScript.Echo "Battery " & objInstance.Status
& " - Charge Remaining = " & objInstance.EstimatedChargeRemaining
& "% | charge=" & objInstance.EstimatedChargeRemaining
return_code = np.escalate_check_threshold(return_code, objInstance.EstimatedChargeRemaining)
Next
np.nagios_exit "", return_code
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Status Meaning
1 Other
2 Unknown
3 Idle
4 Printing
5 WarmUp
6 Stopped Printing
7 Offline
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
Questions?
CONFERENCE ON NAGIOS & OSS MonitoringMay 20th - Bolzano
http://www.linkedin.com/in/mickem
Information about NSClient++
http://nsclient.org
Slides, and examples at:
http://nsclient.org/nscp/conferances/2010/WPN/