Network Visualization - The SUNY Technology Conference
83
Network Visualization Bill Kramp Finger Lakes Community College 2010 SUNY Technology Conference Copyright William Kramp 2010. This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise, or to republish, requires written permission from the author.
Copyright William Kramp 2010. This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise, or to republish, requires written permission from the author.
Visualization Outline Viewing SNMP data Visualizing Latency NetFlow and sFlow Log file analysis Network drawings Google Maps
Network events SUNY traffic
Presenter
Presentation Notes
This presentation will discuss the different ways to visualize network data and log files. It will not discuss how to parse the raw data, since that is unique to each campus This presentation will show the different techniques available.
Reasons for visualization Images simpler to understand Can expose anomalies Reveal trends Provide historical references Improve aesthetics of reports
Presenter
Presentation Notes
Many reasons for visualization, it just depends on what works for you.
Visualization Tools Cacti Smokeping Scrutinizer Sparklines Graphviz Google API’s
Presenter
Presentation Notes
Software used to create the content for this presentation
SNMP Data Visualization Bits, bytes, and packets Errors Disk usage CPU and memory usage Temperature Humidity
Presenter
Presentation Notes
Some of the SNMP variables we can monitor
Latency using Smokeping Polls every 5 minutes Send 20 “pings” at each poll period Not limited to just ICMP pings:
HTTP and HTTPS DNS E-mail (SMTP) LDAP and Radius
Data collected: Distribution of the “ping” latency is recorded Records the number of lost “pings”
Presenter
Presentation Notes
Background information on Smokeping
FLCC Resource Monitoring
Presenter
Presentation Notes
Collect several hundred graphs of SNMP data at FLCC Publish top 24 SNMP and Latency graphs to a “dashboard” which anybody on campus can access Six of the 24 dashboard graphs shown here Top-right: Accuplacer Middle-Right: Saranac Bottom-right: Angel
Environment Visualization
Presenter
Presentation Notes
Monitor temperature and humidity in Equipment and server rooms at all sites
Victor Campus Center Traffic
Presenter
Presentation Notes
Point out 2M utilization by tunneled T1 circuit 10M used during the day by security cameras 8am – people start logging in, logged out by 5pm. Usage then doubles over night period when nobody is in the building. Increased bandwidth usage caused by security cameras when it gets dark. The dips about every hour during the night are when the “motion “ activated lights turn on for some reason.
[1]
THE GOODTHE BADAND THE UGLY
THE GOOD
[2]
Presenter
Presentation Notes
The folks from ITEC in Buffalo
Saranac - June 7, 2010
Presenter
Presentation Notes
HTTP pings for Saranac Median respone time is 33.9 ms with no lost HTTP pings (all green samples)
We use Accuplacer for placement testing of students. Used as a baseline for Angel and Saranac Did have external network issues involving Accuplacer a couple of years ago; Smokeping helped isolate the source.
THE BAD
[3]
Presenter
Presentation Notes
Back in September I was hit up about poor performance with the Internet, specifically Angel
Saranac - Sept. 15, 2009
Presenter
Presentation Notes
Problem reported about slow network response with Angel Checked Accuplace and Saranac to see if it was a site wide problem, or just Angel Current average for Saranac is about 33 ms, so we are much faster now the in September 2009
Angel - September 15, 2009
Presenter
Presentation Notes
HTTPS pings with a median of 1.4 seconds, but no lost packet requests, all green Usage of Angel had significantly increased since the previous year with four campuses, FLCC being one of them.
Angel - Sept. 16, 2009, AM
Presenter
Presentation Notes
Performance was fine over night, but the problem started to return in the morning after 8am.
AND THE UGLY
[4]
Presenter
Presentation Notes
The red milk crate at least keeps the enclosures up off the ground
Saranac Frontend – 9/23/2009
Presenter
Presentation Notes
During the Fall of 2010, ITEC was dealing with a cascade of bottlenecks Now I was getting complaints about Angel and Library with poor response times Everything looked fine on the surface Inset image is Saranac from September 15, 2009 These are repsonse times for the front-end servers, but not the backend database searches
Saranac Backend – 9/23/2009
Presenter
Presentation Notes
The median is just under one second, but it’s ping loss is almost 50% Scripted backend search with “strings” to monitor actual database search response times.
Saranac Backend – 9/28/2009
Presenter
Presentation Notes
Back to normal five days later. I was informed in January that this backend monitoring was causing performance problems with their database, so SUNYconnect wanted me to stop the monitoring. Developing alternatives to monitor backend response time without incurring problems for SUNYconnect’s database or application server.
Packetshaper Class Data
Presenter
Presentation Notes
Found the MIB for accessing PacketShaper class data which should help determine the actual bandwidth usage of Angel
SNMP Data Collection Tools Cacti, MRTG,
Open source VMware appliances available
Zenoss, GroundWork Monitor Open source with commercial extensions
Orion Network Monitoring (SolarWinds) Commercial
Smokeping - Latency Tool Open source (Integrated with Zenoss) http://oss.oetiker.ch/smokeping/ Probes:
Developed by Cisco to document network flows by source and destination IP’s and ports. It also identifies the IP protocol and ingress interface
sFlow Similar to NetFlow with the data it collects,
but performs a statistical analysis and reports samples.
Presenter
Presentation Notes
FLCC runs one licensed version which retains data beyond 24 hours, and allows inspection of FLOGS, which are the raw flows. Handles both NetFlow and sFlow traffic from my Brocade core routers (sFlow) and Packet Shaper (NetFlow). Supports up to five devices.
Presenter
Presentation Notes
Traffic flow for the top item of previous slide, which is the tunneled T1 shown in an earlier slide with the security camera traffic.
Traffic flows with server
Presenter
Presentation Notes
Top Conversations using Scrutinizer Shows source and destination system name/IP and Port number.
NetFlow and sFlow Tools sFlow Developer Tools
http://www.sflow.org/developers/tools.php New NetFlow Collector (NNFC)
I receive daily e-mails about logins, changes, etc. And the total number of events recorded for each switch and router. Each switch/router is different, and the number of events usually drop on weekends. If I logged into a switch three times, six events would be logged – 3 for login, and 3 for logging out. If a device rebooted 10 times on that switch, that could generate 30 events. The total events on that switch would be 36 for the day as an example. Had to keep a mental picture of previous days activity to know if things were getting better or worse
Sparklines Proposed by Edward Rolf Tufte
Statistician and Professor Emeritus of Yale “Small, intense, simple data words” [5] Provide a visual representation of data
without overpowering surrounding text.
Switch & Router Sparklines
Presenter
Presentation Notes
HTML page Sparklines small compared to MRTG, Cacti, or Smokeping png files Typical png files are 64 KB, while my Sparkline png files are about 280 bytes each Show past 14 days of event counts, with the 14-day highs and low, as well as the last reading (yesterdays) count of events Explanation of line for CDG01-A112-R1-NSW28-FGS-648: peak of 699 events for 14 days, with a low of 3 on the weekends. There were 409 events on June 8, 2010 Some devices have little activity while others show a lot.
Switch & Router Sparklines
Presenter
Presentation Notes
Bad port connections in two different labs (typical end of semester problems) Notice troughs on weekends when systems are never turned on or rebooted Red color code “alert” is triggered at 2000+ events (arbitrary number that works for FLCC)
Switch & Router Sparklines
Presenter
Presentation Notes
Second line shows an gradual increase over two weeks with no dips over the weekends, but has not tripped the 2000 event marker. Wireless AP was doing warm reboots, but otherwise functioning – no complaints from wireless users. No errors on the port and the controller management didn’t report any problems Removed power for a clean boot of the wireless AP, which cleared the problem.
Sparkline HTML Table Code <td>CDG01-A112-R1-NSW28-FGS-
Filtering of data to isolate the server using “grep”, and then processing to classify by country codes. Can specify how to group the source and destination IP’s by different size masks: 8, 16, 32, 22, etc.
Graphviz “dot” format grep "src=172\.19\.12\." dot-20100115.log |./graph-prep.pl -c -e GB -t 32|./dot-prep.pl Script generated by dot-prep.pl for the Graphviz dot application:
This shows the complete text to create the drawing. A good way to learn the format is to code the small drawings by hand and play with them before scripting.
Passed and blocked traffic Source on left Middle destination
subnets aggregated to /8
Outbound traffic Blocked “violation”
traffic colored Red Egress filters for
Windows ports, SMTP, TFTP and others.
Presenter
Presentation Notes
Small image of larger file that shows blocked and passed traffic
Skype Activity Udp connections Different socket
numbers used Uses countries
around the globe US and Canada
networks most used
Presenter
Presentation Notes
Skype traffic Here is a glimpse of the activity seen from a single PC using hundreds of different UDP ports to systems in countries around the world. “CA“ equals Canada There is a lot of activity even when the person is not using it – new policy is to turn Skype off when not using it.
Analysis of log filesSource Node Event Node Destination
NodeUse Case
Source address Destination Address Destination Port Port scan identification
Source address Destination Port Destination Address Horizontal scans on same port
Source address Action - blocked or permitted
Destination Port Firewall rule set validation
Destination Port Source address Action Identify machines that probe firewall rule set
[6]
Presenter
Presentation Notes
Different ways of viewing data to identify network activity Book listed in resources
Network Drawings
Presenter
Presentation Notes
http://www.computerhistory.org/internet_history/full_size_images/1969_4-node_map.gif Four original nodes of ARPA net: STI (Stanford Research Institute, UC Santa Barbara, UC Los Angeles, Utah
Redundant Brocade Routers
Presenter
Presentation Notes
Two active10-Gbps interfaces per router 21 active 1-Gbps SX-optic interfaces per router 44 1-Gbps copper connections per router
Presenter
Presentation Notes
Standard Visio drawing. Doesn’t include copper connections Still a busy drawing without many details – hard to add any more switches to it. Information like port settings, room, rack, etc missing from drawing
The final straw
Presenter
Presentation Notes
A development in the past five years has complicated the use of Visio to document the network architecture.
VMware Clusters
Presenter
Presentation Notes
Ten ports per server; redundancy for: management, vMotion, iSCSI with 4 ports for vm data. Three servers currently in clusters with room for fourth Dual HP LeftHand SAN’s
Spreadsheet of connections
Presenter
Presentation Notes
Had to switch to solely documenting the network in a spreadsheet. Difficult to visualize the physical network connections
Graph Visualization (Graphviz)
Presenter
Presentation Notes
Shows network connections between racks, but not any servers or storage area networks (SANs) Could automate some of the process by using Foundry Discovery Protocol, which works just like Cisco CDP. Vmware can be configured to support CDP Visualization of the spreadsheet detected errors in the document, and exposed redundancy that was missing in the network design.
Router rack, Rack 9, Servers
Presenter
Presentation Notes
This drawings shows the connections between racks, and connections to the SAN and serves on the rack with their port numbers
Visualization can be used to see what would happen if primary data center is destroyed. Filtering of spreadsheet will leave relevant data to draw remaining network closets. Original drawing is very large with 24 closets + Data Center If primary data center is destroyed, 15 of the data closets are disabled. “Egrep” command removes references to the data center, SAN and servers (SRV), and some other closets that are not needed. Administration and core offices will remain online. Kirk Anne from Geneseo presenting on table top DR drills in this room at 2:15 pm today
Google Maps Google Maps Google Charts Google Docs Google Data Protocol Thematic Mapping
Presenter
Presentation Notes
Thematic mapping was developed by Born Sandvik. It shows relationships between data and geo-locations.
FLCC Event Visualization Data source:
Canandaigua firewall log files Inbound activity only Anti-Virus events IDS Alerts Period - Month of May 2010 Who were the top three global sources
(countries) for these events?
Presenter
Presentation Notes
US #1 with 7496 events
Presenter
Presentation Notes
Canada #2 with 3835
Presenter
Presentation Notes
South Korea 3rd with 511
Thematic Mapping Google Maps API
Requires key from Google for URL Thematic Mapping API
http://api.thematicmapping.org/tmapi-0.1.js Country Border Coordinates
Small sample of the data used for the Thematic map
SUNY Thematic Prism Map
Presenter
Presentation Notes
Prism map with Google charts While working on the presentation a couple of months ago, I had this thought to show network traffic by SUNY campus Gave the task to my student aid at the time, Kyle Bagshaw, to perform a little research: List of SUNY schools, with CIDR’s, coordinates, and FAFSA code Now a college employee, hired as a part-time Network Technician
SUNY Network Traffic Data source:
Canandaigua firewall logs Inbound and outbound connections
10 day sample period: May 24 – May 28, 2010 May 31 – June 4, 2010
Source of SUNY network ranges
Primary source of CIDR’s Google (or other search engines)?
Nope, too much work Doesn’t clearly identify address ranges Was used to identify 3 campus networks
Nmap or ping scans? Nope, unethical and probably illegal
ARIN.net American Registry for Internet Numbers
Presenter
Presentation Notes
Classless Inter-Domain Routing (CIDR)
Presenter
Presentation Notes
Had to scale the network activity to keep things in perspective. Event counts between 0 and 10K were multiplied by 100, colored yellow Event counts between 10K and 1M were multiplied by 10, colored orange Event counts above 1M were not altered, colored red
Presenter
Presentation Notes
Next slides zoom in to view SUNY campus traffic by connections
Presenter
Presentation Notes
FLCC is in red System Administration (orange) in Albany ITEC (orange) in Buffalo
Presenter
Presentation Notes
Incorporated the use of Google charts. This shows 47 connections between FLCC and Purchase for the 10-day sample Graph shows the distribution of services for those 47 connections
Presenter
Presentation Notes
Noticed UDP traffic on port 137 between FLCC and Westchester Inbound and outbound traffic is combined, so we can’t tell the direction Accepted and blocked traffic is also counted
Port 137/udp to Westchester # grep "167.206.248.55" local7 |grep "137/udp" Apr 26 10:11:10 date=2010-04-26 time=10:11:10
KML - Keyhole Markup Language, submitted by Google to the Open Geospatial Consortium Third field for coordinates is height – 561 times 100 because the value is under 10,000. Values of 10K to 1M multiplied by 10; no multiplication of values over 1M. Polygon needs to be drawn counter-clockwise
Raw Log File Entry Apr 30 00:00:00 date=2010-04-30 time=00:00:00
Fields used to generate Google maps Service: port number and protocol Source or destination IP Source and destination were handled separately, but “finished” code would handle them in a single pass
Processing Raw Data File grep "type=traffic" /2010/04/30/local7 |grep
Correlation of source IP address to a SUNY campus CIDR with the appropriate FAFSA code assigned. Also sort and count unique FAFSA codes and services 002668 = Alfred University 002711 = Cornell
Visualization Resources Security Data Visualization
Conti, G. (2007). No Starch Press
Applied Security Visualization Marty, R. (2009). Addison Wesley
Atlas of Cyberspace Dodge, M. & Kitchin, R. (2001). Addison Wesley
References
[1] Removed from slides[2] http://www.itec.suny.edu/info/staff.htm[3] http://www.cnn.com/2009/SHOWBIZ/TV/12/25/charlie.sheen.arrested/[4] http://serverfault.com/questions/9345/worst-wiring-cabling-youve-seen[5] http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR&topic_id=1[6] Security Data Visualization. Conti, G. (2007). No Starch Press