performance & troubleshooting @ esnet

20
Performance & Troubleshooting @ ESnet Mary Hester ESnet Science Engagement Lawrence Berkeley National Laboratory

Upload: others

Post on 11-Nov-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance & Troubleshooting @ ESnet

Performance & Troubleshooting @ ESnet Mary Hester ESnet Science Engagement Lawrence Berkeley National Laboratory

Page 2: Performance & Troubleshooting @ ESnet

Main Points

•  Troubleshooting tools •  Troubleshooting methodology •  Case studies

5/16/16 2

Page 3: Performance & Troubleshooting @ ESnet

5/16/16 3

Page 4: Performance & Troubleshooting @ ESnet

public perfSONAR Servers (May 2016) •  ESnet: 50

–  mostly 10G, includes a 40G host in Boston –  About 50% are now a ‘combined’ throughput/latency host

•  GEANT: 22 •  Internet2: 3 •  Around 1600 publicly registered servers

May 16, 2016 © 2016, http://www.perfsonar.net 4

Page 5: Performance & Troubleshooting @ ESnet

Default perfSONAR Throughput Tool: iperf3

•  Iperf3 –  New implementation if iperf from scratch

•  More at: http://fasterdata.es.net/performance-testing/network-troubleshooting-tools/iperf-and-iperf3/

6/2/15 5

Page 6: Performance & Troubleshooting @ ESnet

A small amount of packet loss makes a huge difference in TCP performance

5/16/16

Metro Area

Local (LAN)

Regional

Continental

International

Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss)

With loss, high performance beyond metro distances is essentially impossible

Page 7: Performance & Troubleshooting @ ESnet

Eli’s Testing Methodology 1.  Segment-to-segment testing is unlikely to be helpful 2.  Run long-distance tests 3.  Testers need to be already deployed when you start troubleshooting

5/16/16 7

Page 8: Performance & Troubleshooting @ ESnet

Wide Area Testing – Problem Statement

5/16/16 8

10GE

10GE

10GE

Nx10GE

10GE

10GE

perfSONARperfSONARBorder perfSONAR Science DMZ perfSONAR

perfSONARBorder perfSONAR

perfSONARScience DMZ perfSONAR

PoorPerformance

WAN

University CampusNational Labortory

Page 9: Performance & Troubleshooting @ ESnet

Eli’s Methodology – WAN Troubleshooting

5/16/16 9

10GE

10GE

10GE10GE 10GE10GE

10GE10GE

10GE

10GE

Nx10GE

Nx10GE

100GE

100GE

10GE

10GE

10GE

10GE

10GE

100GE100GE

100GE

perfSONAR

perfSONAR

perfSONARBorder perfSONAR Science DMZ perfSONAR

perfSONAR

perfSONARperfSONAR perfSONAR perfSONAR

perfSONAR

10GE

perfSONAR

perfSONARBorder perfSONAR

perfSONARScience DMZ perfSONAR

Internet2 path~15 msec

ESnet path~30 msec

RegionalPath

~2 msec

Campus~1 msecLab

~1 msec

PoorPerformance

Page 10: Performance & Troubleshooting @ ESnet

Wide Area Testing – Long Clean Test

5/16/16 10

10GE

10GE

10GE10GE 10GE10GE

10GE10GE

10GE

10GE

Nx10GE

Nx10GE

100GE

100GE

10GE

10GE

10GE

10GE

10GE

100GE100GE

100GE

perfSONAR

perfSONAR

perfSONAR

48 msec

Border perfSONAR Science DMZ perfSONAR

perfSONAR

perfSONARperfSONAR perfSONAR perfSONAR

perfSONAR

10GE

perfSONAR

perfSONARBorder perfSONAR

perfSONARScience DMZ perfSONAR

Internet2 path~15 msec

Clean,FastClean,

Fast

ESnet path~30 msec

RegionalPath

~2 msec

Campus~1 msecLab

~1 msec

Page 11: Performance & Troubleshooting @ ESnet

Poorly Performing Tests Illustrate Likely Problem Areas

5/16/16 11

10GE

10GE

10GE10GE 10GE10GE

10GE10GE

10GE

10GE

Nx10GE

Nx10GE

100GE

100GE

10GE

10GE

10GE

10GE

10GE

100GE100GE

100GE

perfSONAR

perfSONAR

perfSONAR

48 msec

Border perfSONAR Science DMZ perfSONAR

perfSONAR

perfSONARperfSONAR perfSONAR perfSONAR

perfSONAR

10GE

perfSONAR

perfSONARBorder perfSONAR

perfSONARScience DMZ perfSONAR

49 msec

49 msec

Internet2 path~15 msec

Clean,Fast

Clean,FastClean,

Fast

Dirty,Slow

Dirty,Slow

Clean,Fast

ESnet path~30 msec

RegionalPath

~2 msec

Campus~1 msecLab

~1 msec

Page 12: Performance & Troubleshooting @ ESnet

Troubleshooting Case studies

4/22/16 12

Page 13: Performance & Troubleshooting @ ESnet

Troubleshooting—Host Tuning •  Long path (~70ms), single stream TCP, 10G cards, tuned hosts •  Why the nearly 2x uptick? Adjusted net.ipv4.tcp_rmem/wmem maximums (used in

auto tuning) to 64M instead of 16M. •  As the path length/throughput expectation increases, this is a good idea. There are limits (e.g.

beware of buffer bloat on short RTTs)

May 16, 2016 13 © 2016, http://www.perfsonar.net

Page 14: Performance & Troubleshooting @ ESnet

Troubleshooting—Host Tuning •  A more complete view – showing the role of MTUs and host tuning (e.g. ‘its all

related’):

May 16, 2016 14 © 2016, http://www.perfsonar.net

Page 15: Performance & Troubleshooting @ ESnet

Troubleshooting—Host Tuning

May 16, 2016 15 © 2016, http://www.perfsonar.net

Page 16: Performance & Troubleshooting @ ESnet

HIDDEN SLIDE

•  OWAMP shows packet loss increase as utilization increaes

Page 17: Performance & Troubleshooting @ ESnet

Monitoring Transatlantic Links

May 16, 2016 17 © 2016, http://www.perfsonar.net

Page 18: Performance & Troubleshooting @ ESnet

Monitoring Transatlantic Links

May 16, 2016 18 © 2016, http://www.perfsonar.net

Page 19: Performance & Troubleshooting @ ESnet

HIDDEN SLIDE

•  perfSONAR testing can be made more precise – this is what happens when you use larger buffers and the omit flag

Page 20: Performance & Troubleshooting @ ESnet

ESnet Science Engagement Lawrence Berkeley National Laboratory

Thank you!