scalable integrated performance analaysis of multi-gigabit networks ezra kissel, u. delaware ahmed...

Post on 28-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Scalable Integrated Performance Analaysis of Multi-Gigabit Networks

Ezra Kissel, U. DelawareAhmed El-Hassany, Guilherme Fernandes,

Martin Swany, Indiana U.Dan Gunter, Taghrid Samak, LBNL

Jen Schopf, WHOI

What I hope you learn

1. Why we care about bulk data transfer at multi-gigabit rates

2. Why and how detailed monitoring is helpful

3. How dynamic control of monitoring is related to Session Layer protocols

4/16/12 2

Bulk data transfer needs

• Some domains of interest:– Climate simulation (Earth System Grid)– Genomics (JGI)– High-energy physics (Large Hadron

Collider)– Astronomy (Large Synoptic Survey

Telescope)– Astrophysics (FLASH)

Huge data

Analysis

sites

4/16/12 3

Multi-gigabit rates

• Networks connecting national labs and universities have 10Gb/s and soon 100Gb/s capability. one PB = one day at 100Gb/s

• Rarely achieved due to bottlenecks:– Host: Application or Disks– Campus/local networks–Wide area networks

• Hard to tell why, where, or even if there is a problem4/16/12 4

Solution

Monitor all the timeAnalyze all the time

.. but much more when something interesting is happening

Use analysis results as feedback

4/16/12 5

System components

• eXtensible Session Protocol (XSP)– Associate multiple TCP connections, L2 circuits,

as a "session"– Provide channels for bi-directional metadata

• NL-Calipers– Summarize in situ timings of every read/write

• BLiPP– Host and TCP stack info. using XSP channels

• PerfSONAR– Standard information formats and exchange

protocols4/16/12 6

Dynamic Session Monitoring

User

(1) Start xfer

(2) Open session3) data

(3) NL-calipers data

(4) Signal TCP (4) Signal TCP(5) data

(5) data

Look at the performance Networkengineer

4/16/12 7

Bottleneck detection

4/16/12 8

Triangles give "instantaneous" throughput

On fixed intervals, summarize all measurements into mean, min, max, variance for both rate and #bytes

Instrumentatio

n

Analysis: pick lowest mean value as bottleneck, apply t-test

TCP throughputTime series of throughput* for representative TCP experiments: (a) 1 stream memory-to-disk with 100ms latency, (b) 1 stream memory-to-memory with no latency, (c) 1 stream disk-to-disk with no latency, (d) 4 streams memory-to-disk with 100ms latency and 1% loss added at 60 seconds.

4/16/12 9

UDT throughputTime series of throughput* for representative UDT experiments: (a) 4 streams memory-to-disk with 100ms latency, (b) 4 streams memory-to-disk with 100ms latency and 1% loss added at 60 seconds, (c) 4 streams disk-to-disk with 100ms latency, (d) 4 streams memory-to-memory with 100ms latency.

4/16/12 10

Wait, what?

4/16/12 11

Half as many read()s.Others return zero, not counted

Variance

Less work being done

4/16/12 12

Review

• Why we care about bulk data transfer at multi-gigabit rates

• Why and how detailed monitoring is helpful

• How monitoring is related to Session Layer protocols– and how that might integrate with a

management framework

• Questions?4/16/12 13

Related projects

• NetLogger netlogger.lbl.gov• perfSONAR perfsonar.org• XSP damsl.cis.udel.edu/• GENI geni.net• CEDPS cedps-scidac.org

4/16/12 14

Topology-aware Monitoring

4/16/12 15

top related