1 authors: scott poretsky, quarry technologies shankar rao, qwest communications ray piatt, cable...
TRANSCRIPT
1
Authors:
Scott Poretsky, Quarry Technologies
Shankar Rao, Qwest Communications
Ray Piatt, Cable and Wireless
58th IETF Meeting – Minneapolis
Accelerated Stress Benchmarking
draft-ietf-bmwg-acc-bench-framework-00.txtdraft-ietf-bmwg-acc-bench-term-01.txtdraft-ietf-bmwg-acc-bench-meth-00.txt
2
Changes for Terminology (01 from 00)New Terms
• Benchmark Planes• Traffic Profile• Startup Conditions• Failure Conditions
3
Traffic Profile
• Definition: Characteristics of the Offered Load to the DUT used for the Accelerated Stress Benchmarking.
• Traffic Profile is reported as follows:
Parameter UnitsPacket Size(s) bytesPacket Rate(interface) array of packets per secondNumber of Flows numberEncapsulation(flow) array of encapsulation type
4
Benchmark Planes• Benchmark Planes
– Control Plane– Data Plane– Security Plane– Management Plane
• For each plane define:– Configuration Set– Startup Conditions– Instability Conditions
• Framework document provides explanation and examples for defining the Configuration Sets, Startup Conditions, Instability Conditions
5
Failure Conditions
• Unexpected Packet Loss• Unexpected Session Loss• Misrouted Packets (wrong next-hop)• Invalid Permit/Deny• Unexpected Management Access Denial• Errored Management Value
6
Benchmarks
• Success Threshold• Accelerated-Life Test Duration• PASS when Accelerated-Life Test
Duration=Success Threshold
7
Methodology
1. Initiate Startup Conditions
2. Establish Configuration Sets
3. Monitor all Benchmark Planes
4. Apply Instability Conditions
5. Measure Failure Conditions
6. Report Benchmarks
7. Change Configuration Set and/or Instability Conditions for next iteration
8
Next Steps
• Comments?1. Incorporate input from draft-jones-
opsec-01 into Framework for Accelerated Stress Benchamrking
2. Complete and Post Methodology3. Incorporate changes to Terminology
driven by Methodology4. … then Terminology and Framework
possibly could be ready for Last Call for IESG Review?
9
Backup Slides
10
Stress Test Model
Example Instability Conditions
•Remote Loss of Carrier
•Interface Shutdown Cycling Rate
•BGP Route Flap Rate
•IGP Route Flap Rate
•Better Next-Hop
•LSP Reroute Rate
•>100% link utilization
BGP
OSPF
LDP
PIM-SM
Data Plane Module
Management Plane Module
Security Plane Module
...
Router/SwitchDUT
Control Plane
Module
Example Router Configuration• Data Plane
– Line Cards/Interfaces• Security Plane
– Traffic Forwarding with and without filters
• Control Plane– Routing
BGPIGP
– MPLSTEL2/L3 VPNs
• Management Plane– User Access– SNMP– Logging/Debug– Packet Statistics Collection
Note: All Instability Conditions produce a Convergence Event
11
Security Plane– 200 Filters– Each with 200 rules
BGP
IGP
LDP
PIM-SM
Traffic Forwarding
Manageability
Security Policy
Router/SwitchDUT
MPLS-TE
Control Plane 100 Sessions (20 EBGP, 80 IBGP) 300K selected routes, 2M Route Instances 80 Adjacencies 5K routes (2500 inter-area, 2500 intra-area)
20 Adjacencies 1K FECs
5K Groups joined
1.5K tunnels (250 ingress tunnels, 250 egress tunnels, and 1K mid-point tunnels)
Transit node for 100 targeted peers (LDP over RSVP)
FRR enabled for local link protection of LDP sessions
Management Plane 1K SNMP GETs per minute 1 telnet session open every 10 minutes 1 telnet session closed every 10 minutes Enable Debug and Logging
Example Stress TestData Plane
Line Rate on 100% of interfaces
Internet Mix of Packet Sizes
2K Flows Configure Qos
Instability 200K BGP route flaps every 10 minutes 1K IGP route flaps every minute 100 MPLS-TE reroutes every minute Cycle interfaces - 1 Link Loss every
minute >100% offered load to 5 outbound
interfaces
.
.
.
12
DUT Behavior During Stress
• Route Convergence due to BGP flaps• Route and FEC Convergence due to IGP flaps
– Common cause of slow memory leak• Flap link to force nested-Convergence Events
– Common cause of high CPU utilization -> exacerbated when scaling protocols and routes
– FRR protection may cause failover to backup path • Correct next-hop forwarding with Nested-Convergence
Events• Logging and Debug continues
– Common cause of high CPU utilization when router is not under stress