Download - 1 IEPM/PingER Project Les Cottrell, SLAC DoE 2004 PI Network Research Meeting, FNAL Sep 15- 17 ‘04
1
IEPM/PingER Project
Les Cottrell, SLACDoE 2004 PI Network Research Meeting, FNAL Sep 15-
17 ‘04www.slac.stanford.edu/grp/scs/net/talk03/scidac-pinger-sep04.ppt
2
Outline• PingER
– Purpose etc.– Methodology– Results
• PingER-NG ≡ IEPM-BW– Low network impact bandwidth tool (INCITE)– Traceroute viz– Topology (INCITE)
3
PingER• Uses ping to provides lightweight performance
monitoring:– < 100bits/s per pair measured– No software to install at remote sites– Measures loss, RTT, reachability, jitter
• For planning, trouble shooting
• Originally (1990s) for HENP sites
• More recently also to characterize the Digital Divide– ICFA/SCIC, Internet2 Hard to Reach Places, WSIS,
ICTP/eJDS
4
Methodology
• Use ubiquitous ping
• Each 30 minutes from monitoring site to target : – 1 ping to prime caches– by default send11x100Byte pkts followed by
10x1000Byte pkts• Low network impact + no software to install / configure /
maintain at remote sites + no passwords / accounts needed = good for developing sites / regions
• Record loss & RTT, (+ reorders, duplicates)
• Derive throughput, jitter, unreachability …
5
Architecture
• Hierarchical vs. full mesh
WWWWWW
ArchiveArchive
MonitoringMonitoringMonitoringMonitoring MonitoringMonitoring
RemoteRemote
RemoteRemoteRemoteRemote
RemoteRemote
FNAL
Reports & Data
CacheMonitoringMonitoring
SLAC Ping
HTTP
ArchiveArchive
1 monitor hostremote host pair
~35
~550
6
Coverage• In last 9 months added:
– Several sites in Russia (thanks GLORIAD)– Many hosts in Africa (5=>36 now in 27 out of 54 countries)– Monitoring sites in Pakistan and Brazil (Sao Paolo and Rio)
• Now monitoring 650 sites in 115 countries• Working to install monitoring host in Bangalore, India
Monitoring siteRemote site
7
World ViewS.E. Europe, Russia: catching upLatin Am., Mid East, China: keeping upIndia, Africa: falling behind
C. Asia, Russia, S.E. Europe, L. America, M. East, China: 4-5 yrs behind
India, Africa: 7 yrs behind
Important for policy makers
TCP throughput measured from N. America to World Regions
1
10
100
1000
10000
Jan-
95
Jan-
96
Dec
-96
Dec
-97
Dec
-98
Dec
-99
Dec
-00
Dec
-01
Dec
-02
Dec
-03
Dec
-04
Der
ived
TC
P t
hro
ug
hp
ut
in
KB
ytes
/sec
1
10
100
1000
10000
China (13)
S.E. Europe (21)
Europe(150) Canada (27)
Russia(17)
Edu (141)
Latin America (37)
India(7) Africa (30)
Mid East (16)
50% Improvement/year~ factor of 10 in < 6 years
C. Asia (8)
From the PingER project, Aug 2004
Caucasus (8)
8
View from CERN• Confirms view from N. America
TCP throughput from CERN to World Regions
1
10
100
1000
10000
100000
Feb-98 Jun-99 Oct-00 Mar-02 Jul-03 Dec-04
De
riv
ed
TC
P t
hro
ug
hp
ut
Kb
its
/s
Europe
N America
SE Europe
M East
Russia
L America
AfricaChina
India
From the PingER project August 2004.
9
From Developing Regions
TCP throughput measured from Brazil to World Regions
10
100
1000
10000
Jan-04 Feb-04 Mar-04 Apr-04 May-04 Jun-04 Jul-04 Aug-04
De
riv
ed
TC
P t
hro
ug
hp
ut
KB
yte
s/s
Africa E. Asia Europe N. AmericaRussia S. America S. Asia
Latin America
Europe N. America
As expected Brazil to L. America is goodActually dominated by Brazil to BrazilTo Chile & Uruguay poor since goes via US
Brazil (Sao Paolo)
Novosibirsk
NSK to Moscow used to be OK but loss went up in Sep. 2003 GLORIAD may help
TCP throughput from Novosibirsk to world regions
1
10
100
1000
10000
Sep-02 Dec-02 Mar-03 Jun-03 Oct-03 Jan-04 Apr-04 Aug-04
Der
ived
th
rou
gh
pu
t in
Kn
its/
s
Africa AustralasiaBalkans E. AsiaEurope M. EastN. America RussiaS. America S. Asia
big loss increase to Moscow (from < 1% to 2-3%)Moscow
Japan/ChinaN. America
Novosibirsk
10
Technology Achievement Index (TAI)
• TAI captures how well a country is creating and diffusing technology and building a human skills base.
• TAI from UNDP hdr.undp.org/reports/global/2001/en/pdf/techindex.pdf TAI top 12Finland 0.744US 0.733Sweden 0.703Japan 0.698Korea Rep. of 0.666Netherlands 0.630UK 0.606Canada 0.589Australia 0.587Singapore 0.585Germany 0.583Norway 0.579
US & Canada off-scale
11
PingER-NG = IEPM-BW• Need measurement tools for high-performance
paths/applications– BER 10-8 takes > day to see 1 loss– Ping losses ≠ TCP losses
• Build infrastructure to – Measure with:
• Iperf (TCP mem-to-mem), GridFTP, bbftp• Lightweight packet pair dispersion
– Evaluate measurement tools
12
Low impact bandwidth measurement• Goals:
– Make a measurement in < second rather than tens of seconds
– Injects little network traffic– Provide reasonable agreement with more intense methods
(e.g. iperf)
• Enables:– Measurements of low performance links (e.g. to developing
countries)– Helps avoid need for scheduling– More frequent measurements (minutes vs. hours)– Lower impact more friendly
13
Low impact Bandwidth• Use 20 packet pairs to roughly estimate dynamic bw Capacity &
Xtraffic, then Available = Capacity – Xtraffic– Capacity min pair separation; Xtraffic packet pair dispersion
Dynamic bandwidth capacity (DBC)
Available bandwidth =DBC – X-traffic
Cross-traffic
Iperf
ABwE SLAC to Caltech Mar 19, 2004
14
Achievable throughput & file transfer
• IEPM-BW– High impact (iperf, bbftp, GridFTP …) measurements 90+-15 min intervals
Select focal area
Fwd route change
Rev route change
Min RTT
Iperf
bbftpiperf1
abing
Avg RTT
15
Anomalous Event Detection• Too many graphs to scan by hand, need to automate
– SLAC Caltech link performance dropped by factor 5 for ~ month before noticed, fixed within 4 hours of reporting
• Looking for long-term step down changes in bandwidth• Use modified “plateau” algorithm from NLANR
– Divide data into history & trigger buffer– If y < h – * h then trigger, else history (
• When trigger buffer fills: if t < * h, then have an event
16
Route table Example• Compact so can see many routes at once
History navigation
Multiple route changes (due to GEANT), later restored to original route
Available bandwidth
Raw traceroute logs for debugging
Textual summary of traceroutes for email to ISPDescription of route numbers with date last seen
User readable (web table) routes for this host for this day
Route # at start of day, gives idea of root stability
Mouseover for hops & RTT
17
Another example
TCP probe type
Host not pingable
Intermediate router does not
respondICMP checksum
error
Level change
Get AS information for routes
18
Topology• Choose times and hosts and submit request
DLCLRC
CLRC
IN2P3
CESnet
ESnet
JAnetGE
AN
TNodes colored by ISPMouseover shows node namesClick on node to see subroutesClick on end node to see its path backAlso can get raw traceroutes with AS’
Alternate rt
SLAC
Alternate routeHour of day
19
Putting it together
Bandwidth from SLAC to Supernet.org June 2, 2004
0
200
400
600
800
1000
6/2
/04
0:0
0
6/3
/04
0:0
0
Ba
nd
wid
th in
Mb
its
/s
Xtr
Abw
Cap
mh - 2 oh
mh
Route changes
mh=954Mbits/s, mt=753Mbits/s
(mh-mt)/(sqrt((oh**2+o t**2)/2))=2.4
sensitivity = 2; threshold 40%l history buffer length = 600ttrigger buffer length = 60
ESnetCENIC
Abilene
SLAC
SupernetSOX
20
New features in works (with NIIT)
• Improve new site set-up tools• Improve management
– Discover non working links faster
• Improve access to data and meta data– Provide data base with lat/long, country etc.– Add web services access
• Improve visualization:– Provide map with drill down to node information– Automate production of long term trend plots for regions– More node selection capabilities
• Traceroute measurement and analysis
21
More• PingER Project
– http://www-iepm.slac.stanford.edu/pinger/– IEEE Communications Magazine on Network Traffic
Measurements and Experiments.
• ICFA/SCIC Network Monitoring report, Jan ‘04– http://www.slac.stanford.edu/xorg/icfa/icfa-net-paper
-jan04/
• IEPM-BW– http://www-iepm.slac.stanford.edu/