troubleshooting gridftp flows with xsp and periscope dan gunter, presenter ahmed el-hassany, ezra...
TRANSCRIPT
![Page 1: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/1.jpg)
Troubleshooting GridFTP flows with XSP and Periscope
Dan Gunter, presenterAhmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany
![Page 2: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/2.jpg)
Outline Motivation Review of perfSONAR PerfSONAR issues New components to address them
UNIS Periscope XSP NLMI
E2E example with GridFTP Visualizations from SC10 demo Questions & rotten fruit
2/1/20112 Internet2 Joint Techs 2011. Clemson, SC
![Page 3: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/3.jpg)
Motivating Use-Cases Analyzing PBs of experimental data on an HPC cluster Offloading or disseminating PBs of simulation output Large data transfers
sourc
e:
htt
p:/
/xkc
d.c
om
/401/
2/1/20113 Internet2 Joint Techs 2011. Clemson, SC
![Page 4: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/4.jpg)
PerfSONAR Overview Infrastructure & software for network
performance analysis
User orApplication
Abbr Name Purpose
LS Lookup service Find sources of measurements
TopS Topology Service
Describe network topology
MP Measurement point
Retrieve/publish measurements
MA Measurement archive
Store/publish measurements
TS Transformation service
Aggregate, sample, smooth measurements
Discovery
Data
2/1/20114 Internet2 Joint Techs 2011. Clemson, SC
![Page 5: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/5.jpg)
Motivating questions How can we accurately forecast application
performance? How can we detect performance anomalies in real-time? How can we troubleshoot poor application performance?
And improve it!
‘Shooting the gap between expectation and reality
2/1/20115 Internet2 Joint Techs 2011. Clemson, SC
![Page 6: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/6.jpg)
PerfSONAR issues① Data is hard to find
Cannot simply ask “which MPs have data for path”
② Slow Lookups across multiple domains Polling for data = RTT_net + Delay_DB + Delay_WS XML serialization/deserialization
③ E2E analysis is difficult No integrated host, application monitoring Analysis/visualization done client-side and not
exported
④ Measurement frequency is static Always-on and lack of aggregation encourages large
intervals2/1/20116 Internet2 Joint Techs 2011. Clemson,
SC
![Page 7: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/7.jpg)
① Data is hard to find
2/1/20117 Internet2 Joint Techs 2011. Clemson, SC
![Page 8: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/8.jpg)
Unified Network Information Service (UNIS) Merges TS & LS Topology model
Tree of nodes at different layers (Network/Node/Port)
Relations between arbitrary nodes Node properties
‘GIS for networks’ Relates MPs, MAs to topology
2/1/20118 Internet2 Joint Techs 2011. Clemson, SC
![Page 9: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/9.jpg)
② Slow
2/1/20119 Internet2 Joint Techs 2011. Clemson, SC
![Page 10: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/10.jpg)
Periscope: Topologically aware cache PerfSONAR requests have topological locality Pre-fetch and cache relevant perfSONAR
information New protocols to indicate interesting sub-topologies Analysis functions
domain-specific transformations, e.g. forecasting visualization (whee!)
Preserve uniform perfSONAR interface
User or ApplicationperfSONAR interface
Periscope
MP/MA
LS ...
2/1/201110 Internet2 Joint Techs 2011. Clemson, SC
![Page 11: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/11.jpg)
Periscope data representation Follow PerfSONAR data model But use a simpler, more efficient format Many good options:
JSON ✔ BSON ✔ Thrift Avro Protobuf NetLogger
2/1/201111 Internet2 Joint Techs 2011. Clemson, SC
![Page 12: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/12.jpg)
③ E2E Analysis is Difficult
2/1/201112 Internet2 Joint Techs 2011. Clemson, SC
![Page 13: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/13.jpg)
Missing metrics
OSI Layer perfSONAR metrics
Application X
Presentation X
Session X
Transport bandwidth, delay
Network capacity, bandwidth, delay
Data link availability, loss, errors
Physical availability, errors
E2E Component perfSONAR metrics
Disk X
Host / Cluster X
Network “yes”
Network layers End-to-end components
2/1/201113 Internet2 Joint Techs 2011. Clemson, SC
![Page 14: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/14.jpg)
NetLogger Machine Information (NLMI) Basic set of host probes, using /proc
Host interface statistics TCP settings CPU, memory Disk I/O
Export data in Periscope data model
2/1/201114 Internet2 Joint Techs 2011. Clemson, SC
![Page 15: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/15.jpg)
④ Measurement frequency is static
2/1/201115 Internet2 Joint Techs 2011. Clemson, SC
![Page 16: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/16.jpg)
eXtensible Session Protocol (XSP) Establishment, termination, and negotiation of a
session between end-user application processes Session = stateful layer over multiple other NE’s In-band or OOB signaling of control information
Other metadata can also be forwarded
A B C
TCP TCPxspd
xspd
xspd
App AppSession
NE NE NEMetadata
2/1/201116 Internet2 Joint Techs 2011. Clemson, SC
![Page 17: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/17.jpg)
Monitoring GridFTP GridFTP’s XIO allows interception of I/O New XIO layer can talk to a local xspd
Signaling: open/close Performance: aggregated read/write
NetLogger’s nlcalipers library aggregates reads/writes into periodic summaries
XIO layer
GridFTP server
XIO layer
Disk and Network
op
era
tion
xspdsignalingperformance
2/1/201117 Internet2 Joint Techs 2011. Clemson, SC
XIO/XSP
![Page 18: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/18.jpg)
Combining XSP, Periscope, NLMI
2/1/201118 Internet2 Joint Techs 2011. Clemson, SC
xspd
SignalingXIO performance
XIO layer
XSP layer
XIO layer
NLMI Host stats
GridFTP server
XIO layer
XSP layer
XIO layer
NLMI
GridFTP server
...
Periscope
perfSONAR services
ClientsClientsperfSONAR protocols
![Page 19: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/19.jpg)
Visualization
2/1/201119 Internet2 Joint Techs 2011. Clemson, SC
![Page 20: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/20.jpg)
Visualization cont.
2/1/201120 Internet2 Joint Techs 2011. Clemson, SC
![Page 21: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/21.jpg)
Conclusions Periscope provides a platform for perfSONAR
analysis Caching to reduce latency, centralized correlation
Integration with XSP provides transparent monitoring and awareness of application state
Still polling perfSONAR, though – Publish/Subscribe?
D. Martin SwanyFaculty, UD
Ezra KisselGrad student, UD
Ahmed El-HassanyGrad student, UD
Guilty parties
2/1/201121 Internet2 Joint Techs 2011. Clemson, SC
Guilherme FernandesGrad student, UD
![Page 23: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/23.jpg)
Extra slides
2/1/201123 Internet2 Joint Techs 2011. Clemson, SC
![Page 24: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/24.jpg)
UNIS exampletopology
id : esnetdomain
id : urn:ogf:network:domain=ps.es.net,node
_id : urn:ogf:network:domain=ps.es.net:node=albu-cr1name : albu-crldescription : Juniperaddress
type : hostnamevalue : albu-crl
locationlatitude: +35.08longitude : -106.64
2/1/201124 Internet2 Joint Techs 2011. Clemson, SC
![Page 25: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany](https://reader030.vdocument.in/reader030/viewer/2022032806/56649f0c5503460f94c1fff6/html5/thumbnails/25.jpg)
UNIS Example, cont. <unis:port id="urn:ogf:network:domain=ps.es.net:node=albu-
cr1:port=134.55.40.186"> <unis:address type="ipv4">134.55.40.186</unis:address> <unis:address type="hostname">albucr1-sdn-a-
albusdn1.es.net</unis:address> <unis:relation type="over"> <unis:portIdRef>urn:ogf:network:domain=ps.es.net:node=albu-
cr1:port=ge-5/0/0</unis:portIdRef> </unis:relation> <unis:portPropertiesBag> <nmtl3:portProperties> <nmtl3:netmask>255.255.255.252</nmtl3:netmask> </nmtl3:portProperties> </unis:portPropertiesBag> </unis:port> </unis:node></unis:domain></unis:topology>
2/1/201125 Internet2 Joint Techs 2011. Clemson, SC