® ibm software group © 2009 ibm corporation itm – monitoring resources using remote agentless...
TRANSCRIPT
®
IBM Software Group
© 2009 IBM Corporation
ITM – Monitoring Resources Using Remote Agentless Technology
Scott Wallace
October 20, 2009
IBM Software Group | Tivoli software
© 2009 IBM Corporation
Agenda
Overview
Planning
Installation / Configuration
Usage Tips
Troubleshooting
Wrap Up
2
IBM Software Group | Tivoli software
© 2009 IBM Corporation
Overview
Agentless monitoring allows you to oversee the IT environment from a set of remote servers.
Some agents already had remote capabilities VMware VI and SAP
Introduced initially as an offering on OPAL
Agentless Operating System monitoring added as a product in ITM 6.2.1 Provides operating system monitoring for AIX, HPUX, Linux, Solaris and
Windows
Monitors using multiple mechanisms: CIM, SNMP, WMI
3
IBM Software Group | Tivoli software
© 2009 IBM Corporation 4
Agentless Monitoring Architecture
IBM Software Group | Tivoli software
© 2009 IBM Corporation 5
Agent or Agentless
Agent-based technology resides directly on a managed server and collects data based on policy set locally or by the management server
Agentless technology resides primarily on a management server and gets its data via a remote application programming interface (API) “Agentless” doesn’t mean nothing is present or
running. Some basic operating system function or base application function is running to provide the information as requested over the network
Resources are still being used and services need to be running on the server
Examples include SNMP, CIM, WMI
IBM Software Group | Tivoli software
© 2009 IBM Corporation 6
Agentless OS Monitoring Metric Overview
Key Operating System Metrics Returned
Logical and Physical Disk Utilization
Network Utilization
Vertical and Physical Memory
System Level Information
Aggregate Processor Utilization
Process Availability
Default Situations for
Disk Utilization Memory Utilization CPU Utilization Network Utilization
IBM Software Group | Tivoli software
© 2009 IBM Corporation 7
Remote Node Capabilities
One agent can represent more than one monitored entity Multiple remote systems in one agent
Each remote node has a unique “Managed System Name” so they can be in different managed system lists for situations
One agent can represent different types of entities Windows and Solaris agents can
monitor different sets of data on different systems
Multiple instances of an agent may co-reside on the same agent server
IBM Software Group | Tivoli software
© 2009 IBM Corporation
Planning
8
IBM Software Group | Tivoli software
© 2009 IBM Corporation 9
Agentless Monitoring for Operating Systems
KR2 Agentless Monitoring for Windows Operating Systems
KR3 Agentless Monitoring for AIX Operating Systems
KR4 Agentless Monitoring for Linux Operating Systems
KR5 Agentless Monitoring for HP-UX Operating Systems
KR6 Agentless Monitoring for Solaris Operating Systems
http://publib.boulder.ibm.com/infocenter/tivihelp/v15r1/topic/com.ibm.itm.doc_6.2.1/welcome.htm
IBM Software Group | Tivoli software
© 2009 IBM Corporation 10
Agentless Monitoring Data Collection and Platforms
Agentless monitors can run from most ITM supported platforms Windows (x86 & x64, not IA64)
x/p/z Linux
Solaris
AIX
HP-UX
Agentless monitors may remotely monitor older versions of listed operating systems and other Linux distributions, depending on capabilities
If you want to use the Windows API data collectors, the Agentless monitor must run on a Windows platform
You may configure different data providers for the Agentless monitors:
Agentless Monitoring for AIX OS SNMP v1, v2c, v3
Agentless Monitoring for HP-UX OS SNMP v1, v2c, v3
Agentless Monitoring for Linux OS SNMP v1, v2c, v3
Agentless Monitoring for Solaris OS CIM-XML SNMP v1, v2c, v3
Agentless Monitoring for Windows OS Windows APIs
- Windows Management Instrumentation (WMI)- Performance Monitor (Perfmon)- Event Log
SNMP v1, v2c, v3
IBM Software Group | Tivoli software
© 2009 IBM Corporation 11
Deployment Considerations
With Agentless Monitoring, a percentage of preparation time needs to be devoted to verifying the native data emitter configurations Ensuring SNMP daemons are installed, configured and started (community strings
and user/pw information verified)
Exposing MIB branches in SNMP configuration files
Verifying Windows passwords and user account rights for Windows API collection
Patch levels for endpoint systems – need to be verified based on the User’s Guides
If possible, use tools like snmpwalk, WMIExplorer, and perfmon to verify the metrics are exposed before pointing ITM to the environments
Decide how many remote systems that need to be monitored and then identify the systems to run the agentless agents
IBM Software Group | Tivoli software
© 2009 IBM Corporation 12
Comparing Agent and Agentless Technologies
Agent Agentless
Service Provider
No ability to put agents in a customer’s environmentUnsuitable Suitable
Speed of Implementation Varies High
Agent maintenance
Distribution of updates Time consuming depending on the environment
Fewer points to deploy
Impact to Testing
“Locked down” server environmentsHigh Low
Command and control capabilities
Take Actions easilyHigh Low
Granularity and coverage of monitoring metrics Greater access Dependent on standards
Data Availability
Real time responsivenessHigh
Polling Lag
Network delay
SecuritySecure communication Standards dependent
IBM Software Group | Tivoli software
© 2009 IBM Corporation
Installation / Configuration
13
IBM Software Group | Tivoli software
© 2009 IBM Corporation
Install
Ensure that the prerequisites are met for the system that you are using for the agent. See the Agentless Agent User Guides for this information
14
IBM Software Group | Tivoli software
© 2009 IBM Corporation
Windows Installer Select the remote system types that you want to monitor from this
Windows host.
Next select the agents you want to install in the depot for remote deploy.
15
IBM Software Group | Tivoli software
© 2009 IBM Corporation
Linux Installer
Select the operating system type or take the default
Select the remote system types that you want to monitor from this Linux host
16
IBM Software Group | Tivoli software
© 2009 IBM Corporation
Install Application Support
TEMS HUB
Remotes
TEPS
TEP Desktop clients
Warehouse Proxy Agent
Warehouse Summarization Agent
17
IBM Software Group | Tivoli software
© 2009 IBM Corporation
Configuring the Agent – Linux
SNMP – v3 [root@rc2test4 /]# itmcmd config -A r4
Agent configuration started...
Enter instance name (default is: ): SLESv3
Edit "Monitoring Agent for Agentless Linux OS" settings? [ 1=Yes, 2=No ] (default is: 1):
Edit 'SNMP connection' settings? [ 1=Yes, 2=No ] (default is: 1):
Port Number (default is: 161):
SNMP Version [ 1=SNMP Version 1, 2=SNMP Version 2c, 3=SNMP Version 3 ] (default is: 1): 3
Edit 'SNMP Version 3' settings? [ 1=Yes, 2=No ] (default is: 1):
Security Level [ 1=noAuthNoPriv, 2=authNoPriv, 3=authPriv ] (default is: ): 2
User Name (default is: ): snmpuser
Auth Protocol [ 1=MD5, 2=SHA ] (default is: ): 1
Enter Auth Password (default is: ):
Re-type : Auth Password (default is: ):
18
IBM Software Group | Tivoli software
© 2009 IBM Corporation
Configuring the Agent – Linux
Priv Protocol [ 1=DES, 2=CBC DES ] (default is: ): 1
Enter Priv Password (default is: ):
Re-type : Priv Password (default is: ):
Edit 'Remote System Details' settings? [ 1=Yes, 2=No ] (default is: 1): 1
No 'Remote System Details' settings available?
Edit 'Remote System Details' settings, [1=Add, 2=Edit, 3=Del, 4=Next, 5=Exit] (default is: 4): 1
Managed System Name (default is: ): rc2SLES
SNMP host (default is: ): 172.17.4.219
'Remote System Details' settings: Managed System Name=rc2SLES
Edit 'Remote System Details' settings, [1=Add, 2=Edit, 3=Del, 4=Next, 5=Exit] (default is: 4): 5
19
Easy to overlook
IBM Software Group | Tivoli software
© 2009 IBM Corporation
Open the Manage Tivoli Enterprise Monitoring Services (MTEMS)
Select the template for the agent type
Fill in the requested information
Configuring the Agent – Windows
20
IBM Software Group | Tivoli software
© 2009 IBM Corporation
Configuring the Agent – Windows
21
IBM Software Group | Tivoli software
© 2009 IBM Corporation
Tips for Using
22
IBM Software Group | Tivoli software
© 2009 IBM Corporation
Considerations for Using Agentless monitors return the last background collection interval of
data when a real-time query results in a timeout with the endpoint system due to network load or latency
With Historical Collection enabled, the collection for all the remote endpoints will be stored on the Agentless Monitoring Server when storage “at the Agent” is selected. Ensure the physical system has sufficient disk space, network bandwidth to the
Warehouse Proxy Agent when monitoring large numbers of remote systems
With the Agentless Monitoring Server now maintaining connections to hundreds of severs, it becomes a more critical component in the infrastructure than a single agent instance
23
IBM Software Group | Tivoli software
© 2009 IBM Corporation 24
Agentless Health Each remote monitor has self-monitoring attribute tables that can be used to monitor the
collection process:
Performance Object Status attributes: Last collection errors encountered Last collection start/finish times Last/average collection duration Refresh interval Number of collections Cache hit/miss/hit percent Intervals skipped (most useful)
Thread Pool attributes: Current/max Thread pool size Current/average/min/max active threads Current/min/max queue length Average wait time Total jobs
Situations may be created against these attribute groups to notify of collection failures
It is recommended that an Operating System agent be co-deployed to the Agentless Monitoring Server to watch CPU, Memory, and Network utilization of the monitors
IBM Software Group | Tivoli software
© 2009 IBM Corporation 25
Performance Tuning Environment VariablesVariable Name Default Value Description
CDP_DP_CACHE_TTL 60 Time in seconds before a query will trigger a new data collection – basically the polling interval.
CDP_DP_THREAD_POOL_SIZE 60 The number of threads created to perform background data collections. The Thread Pool is shared among all attribute groups in all remote nodes in an agent. Rec: that this be set to the # of managed subnodes
CDP_DP_REFRESH_INTERVAL 60 The interval in seconds at which each attribute group cache is updated in the background. Rec: Set to the same # as the polling rate (CDP_DP_CACHE_TTL)
CDP_DP_IMPATIENT_COLLECTOR_TIMEOUT 2 The number of seconds to wait for a data collection to happen before timing out and returning cached data.
CDP_SNMP_RESPONSE_TIMEOUT 2 The number of seconds to wait for each request to time out. Each row in an attribute group is a separate request
CDP_SNMP_MAX_RETRIES 2 The number of times to retry sending the SNMP request after a response timeout
CDP_NT_EVENT_LOG_GET_ALL_ENTRIES_FIRST_TIME
NO Configures whether or not the Windows Event Log data provider should report old log entries on startup, or only new ones
CDP_NT_EVENT_LOG_CACHE_TIMEOUT 3600 Cache lifetime in seconds of an event from the Windows Event Log
CDP_PURE_EVENT_CACHE_SIZE 100 Number of pure events held in cache at any one time. When a query is made, reports all events in the cache at that time. When cache is full, oldest events are removed to make room for new ones
IBM Software Group | Tivoli software
© 2009 IBM Corporation
Troubleshooting
26
IBM Software Group | Tivoli software
© 2009 IBM Corporation 27
Troubleshooting Overview
General Diagnosis Fault Determination
Is the data coming through?
Is the data incorrect?
Specific Diagnosis Agent issues
Remote system setup
Connectivity
Review logs on the agent
TEP issues Application support - workspaces / data
TEMS issues Application support – situation issues
IBM Software Group | Tivoli software
© 2009 IBM Corporation 28
Agent Log Files and Trace Settings Default location:
%CANDLE_HOME%\TMAITM6\logs\<hostname>_<pc>_k<pc>agent_<instance>_<timestamp>-01.log (Windows)
$CANDLE_HOME/logs/<hostname>_<pc>_<instance>_<timestamp>-01.log (UNIX/Linux)
Increase unit traces to isolate the issues
Problem Area KBB_RAS1 setting
General Startup/Initialization ERROR (UNIT:query ALL) running on Windows
ERROR (UNIT:ct_main ALL) running on UNIX/Linux
WMI Data Provider ERROR (UNIT:WMI ALL)
Perfmon Data Provider ERROR (UNIT: QueryClass ALL)
SNMP Data Provider ERROR (UNIT:SNMP ALL)
Windows Event Log Data Provider ERROR (UNIT:EventLog ALL) (UNIT:WinLog ALL)
CIM-XML Data Provider ERROR (UNIT:CIM ALL)
IBM Software Group | Tivoli software
© 2009 IBM Corporation 29
How can I tell if the endpoint is the problem?
Typical endpoint issues: Connectivity
Firewall– SNMP needs ports 161 and 162 open.
– CIM needs ports 5988 and 5989 open.
TCP Stack– Verify TCP connectivity to the remote system using ping, telnet, etc.
DNS– Use nslookup and/or route to verify that the remote system is known to your domain.
SNMP or CIM Daemons not running
Incorrect version of SNMPD or CIM
SNMPD not configured correctly (snmpget, snmpnext, snmpwalk)– snmpget -v 1 –c public rc2testSLES sysUpTime.0
– snmpget -v 3 -u snmpuser -l authNoPriv -a MD5 -A password rc2testSLES sysUpTime.0
IBM Software Group | Tivoli software
© 2009 IBM Corporation 30
How can I tell if the endpoint is the problem?
SNMPD daemon is not running Check the ITM logs for the following lines:
(2009/10/13,20:37:35.0001-A:snmpqueryclass.cpp,1164,"handle_snmp_response_async") ERROR: decoded PDU is null -- this is a timeout scenario
(2009/10/13,20:37:35.0003-29:snmpqueryclass.cpp,1782,"internalCollectData") Timeout occurred. No response from agent 172.17.4.219.
Password error – SNMP v3(2009/10/14,05:58:23.0067-6:snmpqueryclass.cpp,1158,"handle_snmp_response_async") Entry
(2009/10/14,05:58:23.0068-6:snmpqueryclass.cpp,1164,"handle_snmp_response_async") ERROR: decoded PDU is null -- this is a timeout scenario
Password working – SNMP v3(2009/10/14,05:40:01.0017-7:snmpqueryclass.cpp,688,"completeInit") Host: 172.17.4.219, Port:
161, User: snmpuser, Sec Level 1
(2009/10/14,05:40:01.0018-7:snmpqueryclass.cpp,689,"completeInit") Auth password: xxxx, proto: 1, key:
(2009/10/14,05:40:01.0019-7:snmpqueryclass.cpp,690,"completeInit") Priv password: xxxx, proto: 1, key:
(2009/10/14,05:40:01.001A-7:snmpquerymetric.cpp,89,"getOID") Entry
(2009/10/14,05:40:01.001B-7:snmpquerymetric.cpp,91,"getOID") OID=1.3.6.1.2.1.25.2.3.1.1
IBM Software Group | Tivoli software
© 2009 IBM Corporation 31
Windows Agentless Monitor fails to collect perfmon data
When using the Windows Agentless Monitor (r2), the following errors appear in the log:(4891C694.0066-1558:queryclass.cpp,1006,"start") Error adding query for class
PhysicalDisk.(4891C694.0067-1558:queryclass.cpp,1007,"start") \\rc2test3.tivlab.raleigh.ibm.com\
PhysicalDisk(*)\% Disk Write Time - add returned C0000BB8
Potential problems: The Counter may simply not exist. Runing the typeperf command (or perfmon GUI) locally on
the server when you are trying to collect metrics to verify the command comes back cleanly without error.
The Remote Registry service may not be enabled. A remote collector must have registry access to lookup the indexes. Verify the service is enabled and run the typeperf command (or perfmon GUI) remotely to verify the command comes back cleanly without error.
The indexes of counters are corrupt. When a request is made, the string name of the counter is requested. That in turn is matched to an index on the target computer. All the perfmon index dictionary name to number maps are stored in the registry here:
HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Perflib\009
On the failing systems with this problem, the "counter" entry there either has no value, or garbage (those empty rectangles).
IBM Software Group | Tivoli software
© 2009 IBM Corporation 32
Windows Agentless Monitor fails to collect data
Am trying to run the Windows Agentless Monitor (r2) against one of our machines but am getting errors that I don't know what they mean(48BF57E9.0006-
EF4:wmiqueryclass.cpp,728,"internalCollectData") ::collectData==>Could not connect. Error code = 0x80070005
(48BF57E9.0007-AD4:queryclass.cpp,790,"internalCollectData") Authentication failed against host testSys1 as user itoperations, return code = 1326
Potential problems: The User name was not properly specified in the format Domain\User.
IBM Software Group | Tivoli software
© 2009 IBM Corporation 33
Some workspaces have blank views for Linux
On TEP, the Linux Agentless Monitor (r4) only shows data for the "Network" and "System" navigator items. Potential Problems:
By default, Red Hat Linux allows connection with the Host Resources MIB and the UCD MIB only through SNMPv3 connections.
Verify that the following lines are modified or added in the Access Control portion of the /etc/snmp/snmpd.conf:
view systemview included .1.3.6.1.4.1.2021
view systemview included .1.3.6.1.2.1.25
Verify the SNMP daemon is running by using the ps –ef command.
IBM Software Group | Tivoli software
© 2009 IBM Corporation 34
Ignore These Errors
You can ignore these:(48E297DC.0095-17E4:configdata.cpp,65,"getConfigurationProperty")
KR2_WMI_WIN_PASSWORD_1 not found in the hash map
(48E297DC.0097-17E4:configdata.cpp,65,"getConfigurationProperty") KR2_WMI_WIN_PASSWORD_DEFAULT not found in the hash map
The configuration does a fall-back lookup for its required parameters:
Subnode Configuration Default Configuration
These errors indicate that they were not overridden in the subnode
IBM Software Group | Tivoli software
© 2009 IBM Corporation
Wrap Up
35
The Agentless Agent technology gives you relatively quick startup and value with limited intrusion on the monitored system!