net flow troubleshooting

12
SolarWinds Technical Reference IT Management Inspired by You - solarwinds.com Best Practices for Troubleshooting NetFlow Introduction.............................................................. 1 NetFlow Overview ................................................... 1 Troubleshooting NetFlow Service Status Issues ................................................................. 3 Troubleshooting NetFlow Source Issues ........... 4 NetFlow Version 9 Specific Issues .................... 8 The NTA SQL Server ....................................... 10 Troubleshooting Vendor Exporters ....................... 10 This Technical Reference is intended to help the reader identify and correct issues occurring in an Orion NetFlow Traffic Analyzer deployment.

Upload: orellana160

Post on 02-Feb-2016

228 views

Category:

Documents


0 download

DESCRIPTION

Net Flow Troubleshooting

TRANSCRIPT

Page 1: Net Flow Troubleshooting

SolarWinds Technical Reference

IT Management Inspired by You - solarwinds.com

Best Practices for Troubleshooting NetFlow

Introduction.............................................................. 1 NetFlow Overview ................................................... 1

Troubleshooting NetFlow Service Status Issues ................................................................. 3 Troubleshooting NetFlow Source Issues ........... 4 NetFlow Version 9 Specific Issues .................... 8 The NTA SQL Server ....................................... 10

Troubleshooting Vendor Exporters ....................... 10

This Technical Reference is intended to help the reader identify and correct issues occurring in an Orion NetFlow Traffic Analyzer deployment.

Page 2: Net Flow Troubleshooting

Copyright© 1995-2011 SolarWinds. All rights reserved worldwide. No part of this document may be reproduced by any means nor modified, decompiled, disassembled, published or distributed, in whole or in part, or translated to any electronic medium or other means without the written consent of SolarWinds. All right, title and interest in and to the software and documentation are and shall remain the exclusive property of SolarWinds and its licensors. SolarWinds Orion™, SolarWinds Cirrus™, and SolarWinds Toolset™ are trademarks of SolarWinds and SolarWinds.net® and the SolarWinds logo are registered trademarks of SolarWinds All other trademarks contained in this document and in the Software are the property of their respective owners.

SOLARWINDS DISCLAIMS ALL WARRANTIES, CONDITIONS OR OTHER TERMS, EXPRESS OR IMPLIED, STATUTORY OR OTHERWISE, ON SOFTWARE AND DOCUMENTATION FURNISHED HEREUNDER INCLUDING WITHOUT LIMITATION THE WARRANTIES OF DESIGN, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL SOLARWINDS, ITS SUPPLIERS OR ITS LICENSORS BE LIABLE FOR ANY DAMAGES, WHETHER ARISING IN TORT, CONTRACT OR ANY OTHER LEGAL THEORY EVEN IF SOLARWINDS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

Microsoft® and Windows 2000® are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

Graph Layout Toolkit and Graph Editor Toolkit © 1992 - 2001 Tom Sawyer Software, Oakland, California. All Rights Reserved.

Portions Copyright © ComponentOne, LLC 1991-2002. All Rights Reserved.

Document Revised: 07/29/2011

Page 3: Net Flow Troubleshooting

Best Practices for Troubleshooting NetFlow 1

Introduction

This paper focuses on NetFlow troubleshooting and contains information on troubleshooting other flow analysis technologies, such as sFlow, J-Flow and IPFIX. Because of the architectural similarities between these flow technologies the troubleshooting techniques found herein apply to all of them. This paper will focus strictly on troubleshooting NetFlow issues with regard to the Orion NetFlow Traffic Analyzer (NTA). For information on the CBQoS feature of NTA see the Orion NetFlow Traffic Analyzer Administrator Guide. For basic information on NetFlow see New to Networks Volume 3 – NetFlow Basic and Deployment Strategies.

NetFlow Overview

The term NetFlow is often used interchangeably between the three major components of NetFlow technology, the NetFlow cache and the NetFlow exporter on the router or switch, and the NetFlow collector used to analyze the flow information. The NetFlow cache is the active monitoring of traffic and only exists on a NetFlow enabled device. The NetFlow exporter involves sending completed flow information from the device to a NetFlow collector. Here’s what is happening in steps:

1. Users, also known as endpoints, are accessing the information via the central router.

2. As the users’ data flows into a WAN router’s interface, the NetFlow cache creates records about the flows and saves the records in the NetFlow cache.

3. As flows expire the WAN router exports the flow information to the NetFlow collector.

This architecture is common across all the previously mentioned flow technologies.

Page 4: Net Flow Troubleshooting

2 Best Practices for Troubleshooting NetFlow

Before troubleshooting a new installation, be certain that the prerequisites for NetFlow are all in place. These prerequisites include:

The NetFlow Traffic Analyzer module for NPM has been installed without any error messages during the installation.

The target network devices (exporters) have been configured to export NetFlow to the Orion NTA server using the appropriate Orion IP address and port.

The SQL database has sufficient available space in accordance with the Orion NetFlow Traffic Analyzer Administrator Guide.

Problems with NetFlow present themselves at the NetFlow collector, so that is a logical place to begin looking for the source of an issue. The NetFlow sources, NetFlow Collector Status, and Last 25 Traffic Analyzer Events resources on the Orion NetFlow Traffic Analyzer Summary screen are typically the best places to see the overall health of NetFlow. By customizing the NTA summary page you can place these three resources together. This provides the ability to assess the NTA status at a glance, as seen below.

To quickly review this screen, the Last Received NetFlow (top circle) updates each time NetFlow packets are received on a known NetFlow source. The second section shows the status on the NTA receiver, in this case the status is “Up” and the icon is green. If this is not up NTA cannot received NetFlow at all. The Last 25 Traffic Analyzer Events resource is a good place to view recent problems with NetFlow.

Page 5: Net Flow Troubleshooting

Best Practices for Troubleshooting NetFlow 3

Probably the most common NetFlow troubleshooting method starts with examining the three resources seen above. The Traffic Analyzer events screen can display events regarding the NetFlow service, unknown NetFlow sources, and many other events specific to NTA. The Events window is a good place to look for obvious problems, while the NetFlow Collector Status indicates obvious problems with the NTA server.

Troubleshooting NetFlow Service Status Issues If the NetFlow Service fails, this will be seen as shown below. Note the red status indicator next to the down collector.

To troubleshoot the NTA collector service not being in the Started state, as seen below, open Administrative Tools > Services .and check if the SolarWinds NetFlow Service is started. If it is not, then highlight the service and click Start on the right side of the services pane. Also, check if other SolarWinds services are not in the Started state, particularly the SolarWinds Orion Module and the SolarWinds Information Service services.

If the service starts and then stops again, there is an underlying reason causing the service to fail. The most common issues that cause the NetFlow service to shut down are included here:

The NTA module is a trial which has expired. Normally this will be seen in a yellow banner http://www.bumwine.com/at the top of the Orion web console. If you have an expired trial, you can apply a purchased license from the Customer Portal. Follow the directions in the NetFlow Administrator Guide, Chapter 2.

There is an issue with the connection to the SQL database or with the functioning of SQL. Check that the SQL server is up and that the SQL server has sufficient CPU and memory available. Also check disk queue write length. For more information on where to find these metrics see the Managing Orion Performance technical reference.

As a final attempt to reconcile the NetFlow Service being down, run the Configuration Wizard checking all three boxes, Web, Services, and Database, as shown below.

Page 6: Net Flow Troubleshooting

4 Best Practices for Troubleshooting NetFlow

If these steps do not bring the collector back to a stable Up status, and you have an active maintenance contract, you will need to open a ticket with Technical Support.

Troubleshooting NetFlow Source Issues NetFlow sources are the devices exporting flows to the Orion NTA server.

One common forgotten step with adding NetFlow sources is to manage the device and interface sending flow information into Orion NPM. When this occurs the following event appears in the NetFlow Events screen.

Correcting this issue is trivial. Click Manage this device in the event message and NPM will guide you through adding the device and interface. If you add the device but fail to add the interface, the following event occurs.

This can be easily corrected by clicking Edit this device and NPM will guide you through adding the interface.

Page 7: Net Flow Troubleshooting

Best Practices for Troubleshooting NetFlow 5

You may notice that flows are not being received or processes properly when all of the resources on a NetFlow page have messages stating that there is no data for the requested time period, as seen below.

There are two typical causes for this.

The NetFlow exporter was not configured to export during the time you have requested.

There is an issue causing the NTA server to not display the flows.

If you believe that the device is correctly exporting flows but NTA is not displaying data, a packet capture should be taken from the NTA server interface. Wireshark is a commonly used free tool for capturing packets and analyzing protocols. For information on installing and using packet capture tools see the documentation for that particular tool.

The packet capture should be run for about 5 minutes to make sure any NetFlow packets are captured. The below capture was run for ten minutes, and then the captured packets were sorted by protocol looking for NetFlow packets. NetFlow is often listed as CFLOW protocol in protocol analyzers.

Note that there are no CFLOW packets captured. Typically this is because of one of the following:

The exporter is not configured to send NetFlow packets to the NTA server.

There is a device between the exporter and the NTA server blocking the flow data.

NetFlow device configurations vary by model, NetFlow version, and IOS version. All configurations implement three crucial functions.

Turn on the NetFlow cache

Apply NetFlow to interfaces

Direct flow exports to the collector (NTA server)

Page 8: Net Flow Troubleshooting

6 Best Practices for Troubleshooting NetFlow

Examine the device for the required configurations according to your manufacturer, model, and software to ensure that each of these functions is in place. For Cisco IOS devices, the following should be in place:

(config)# ip flow-export version {5/9}

(config-if)# ip flow ingress

(config-if)# ip flow egress

Show commands are very helpful in quickly assessing NetFlow configuration and operation on a Cisco device. First we will begin by examining the flow exporter on the device using the abbreviated version of

the # show ip flow export command.

This screen shows us that two of the three crucial NetFlow configurations are in place. These are:

The main cache is enabled for NetFlow v5.

The exporter is enabled and exporting to 192.168.74.117 and 10.110.6.183 on port 2055

We can also see that 152979 flow records (flows) have been exported in 5101 UDP packets. We can also see that there are no failed exports. While there are no apparent errors, one item is missing. The NTA server we are using is at IP address 10.110.6.196. which is not listed as an export destination. The 10.110.6.183 address was the old NTA server IP, so the configuration needs to be updated to export to the new NTA server IP. The below commands correct this situation.

Notice that we removed the old IP export address prior to adding the new one. NetFlow only allows two export destinations, so adding the new address without deleting the old will result in the below error.

Page 9: Net Flow Troubleshooting

Best Practices for Troubleshooting NetFlow 7

The next configuration to check is to see if NetFlow has been applied to the interface we wish to monitor.

A simple method is to use the #show ip flow interface command, as seen below.

The above screen shows which interfaces have NetFlow applied and the flow direction being monitored. Ingress is defined as data flowing into the interface from the network, while egress is the opposite.

Now that we have corrected the issue with the exporter IP address and have verified that the interfaces are configured, we will look at the packet capture on the NTA server again.

We can see that NetFlow packets are being received at the NTA server interface. Looking at the NTA Summary view seen below we can see that NTA is processing these packets by noting the correct time stamp on the Last Received NetFlow column.

Page 10: Net Flow Troubleshooting

8 Best Practices for Troubleshooting NetFlow

Drilling down to the NTA Interface Details page verifies that NetFlow is being properly received, processed and displayed, as shown below.

NetFlow Version 9 Specific Issues Cisco implemented version 9 to create a structure for selective flow field exporting. This feature is known as Flexible NetFlow (FNF). Version 5 exporters send seven key fields and a handful of non-key fields. Key fields are used to define a unique flow. All fields are part of the IP datagram header. The version 5 key fields are:

Source IP

Destination IP

Protocol

Source port

Destination port

Source interface index

Type of Service

Page 11: Net Flow Troubleshooting

Best Practices for Troubleshooting NetFlow 9

The below screen shot shows these key fields along with several non-key version 5 fields.

The above flow is for an ICMP (protocol 1) packet. ICMP does not use a port number, so the exporter sets the port to 0. Version 9 not only allows you to add many additional non-key fields, but it also allows you to redefine the key fields. This caused issues for the NetFlow collector in attempting to identify unique flows. NTA processes NetFlow version 9 without issue, unless FNF is configured on the exporter and the seven version 5 key fields are not included in the configuration.

Some devices only support version 9 and this is typically not an issue. To ensure that FNF exports are processed by NTA, be certain to include all seven of the version 5 key fields.

FNF enabled version 9 exporters send two types of flow information, flow templates and flow records. Both are required for FNF to function. If you have a version 9 exporter and the NetFlow source appears and disappears randomly, this may be due to the flow template expiring on the NTA server.

The below packet capture shows version 9 FNF data being received but no template is available to define the flow fields.

The flow is telling the NTA server to refer to template 265, but the server does not have that template in memory. To fix this issue add the following to the device NetFlow configuration.

(config)#flow-export template timeout-rate 1

Page 12: Net Flow Troubleshooting

10 Best Practices for Troubleshooting NetFlow

This will configure the exporter to send the template every minute.

Some device will re-index the interface index numbers frequently. While Orion and NTA do compensate for interface re-indexing through device discovery, some devices will lose their NTA IF index due to rapid re-indexing. This is seen in the NetFlow Events resource as seen below.

Creating an automatic rediscovery of these devices to run every 15 minutes will rectify this issue. Only include the devices having the re-indexing issues in this discovery job.

The NTA SQL Server NTA is a very data intensive technology. As such, NTA has the possibility of overtaxing some SQL server implementations. The Managing Orion Performance technical reference contains detailed information on Orion and SQL performance. There are also several options you have to improve SQL performance from the NTA server. These include:

Plan your NetFlow exporters with the network topology in mind. There is not much use in enabling NetFlow on every NetFlow capable device. Data duplication and unnecessary data will only serve to make useful flow information difficult to find.

Use reasonable data retention times. One primary use case for NTA is to quickly identify bottlenecks and the cause behind them. Data overload from unreasonable or unnecessary retention will eventually slow the NTA server.

Make use of the Top Talkers optimization feature in NTA. This feature can greatly enhance NTA performance.

Troubleshooting Vendor Exporters

Troubleshooting J-Flow and SFlow exporters varies greatly depending on the device model, software level and other factors. Below are useful links for troubleshooting issues on these types of exporters.

Juniper J-Flow

Setting up J-Flow on a J-Series router

http://kb.juniper.net/InfoCenter/index?page=content&id=KB12512

Example working J-Flow configuration [Relevant to J-Series and SRX devices] http://kb.juniper.net/InfoCenter/index?page=content&id=KB21023

SRX Getting Started - Configure J-Flow http://kb.juniper.net/InfoCenter/index?page=content&id=KB16677

HP sFlow ImplementationHP SFlow