Round the World Data Round the World Data TransferTransferEnd of LSREnd of LSR (Land Speed Record)(Land Speed Record)
in 10Gbps era in 10Gbps era
Data Reservoir ProjectData Reservoir ProjectMaryMary Inaba Inaba
University of TokyoUniversity of Tokyo
What is Data Reservoir?• Share Scientific Data over long distance
– Physics, astronomy, earth science, biology
• High-speed data transfer on Long Fat pipe Network
• Easy to use– File system transparent
Data Reservoir System
User Programs
Disk Server
IP Switch
File Server File Server
Disk Server
IP Switch IP Switch
Disk Server Disk ServerDisk Server Disk Server
iSCSI Bulk Transfer
Global Network
• Using iSCSI protocol • Without any modification on applicatoins
3rd Generation SC04 SC05Round the World 31,248km 1 to 1, memory to memory transferSingle Stream, Longest Path, Standard MTU TCP Throughput Award Fastest IPv6
Hisotry of Data Reservoir and SC BandWidth Challente
1st GenerationSC02 26 to 26 servers 1GbE interface RTT 200ms, 90 % usage of bottleneck OC-12
2nd Generation SC03 Aggregated 10Gbps 24,000Km 1 and a half round trip between U.S. Tokyo
32 to 32 Servers too many :-<
4th GenerationSC06A pair of machinesDisk to Disk transfer Single 7.2Gbps Dual 8.65 Gbps
Once upon a time,
There started an ambitious project
to construct an L2 network
between CERN and Tokyo
via Amsterdam, Canada, and U.S.
Fortunately ( ! ),
our team got a chance to try it ♪
Network
Tokyo
CERNPittsburgh
Chicago
Amsterdam
GenevaSeattle
VancouverCalgary
Minneapolis
WIDEAPAN/JGN II
IEEAF/Tyco/WIDE
CANARIE
SURFnetAbilene
3rd Generation Data Reservoirstarted
Background• WAN PHY over the world• Programmable 10GbE NIC is available
Challenge How much bandwidth can we use by single stream?
Struggles while the 1st experiment
Almost no information– Ping + loopback is the only source– Different network, different timezone– TELEPHONE must be the most important equ
ipment.
Over 7Gbps between Tokyo and CERN
It is nice of this experiment to have a lot of new friend!
We really appreciate nice adivces.
Submission to
Internet2 Land Speed Record
Experiments while X’mas vacation,
the smallest traffic season!
Some Results
SC04 Band Width Challenge U.S. – Tokyo – U.S. – CERN 31,248km, RTT 433ms, 7.57Gbps Xmas Experiment Season with smallest network traffic. Very Very strict dead-line for preparation
Tokyo Chicago Amsterdam Siattle Tokyo 33,979km, RTT 498ms 7.21Gbps : Update LSR 8times.
Network
Tokyo
CERNPittsburgh
Chicago
Amsterdam
GenevaSeattle
VancouverCalgary
Minneapolis
WIDEAPAN/JGN II
IEEAF/Tyco/WIDE
CANARIE
SURFnetAbilene
Challenge in 2006To attain 90% of 10Gbps
The difficulty WAN PHY (MAX 9.6Gbps) ⇔ LAN PHY
Only 4% of 10Gbps, But, if RTT = 500, the difference is 25MBytes for Round Trip (TCP can control transmission rate with RTT grain)
Another difficulty PCI-X bottleneck → Now, cleared
LSR in 2006 -- New players
• Circuit -- NetIron 40G NetIron RX-4 in Seattle
• GSO (Generic Segmentation Offload ) – Offloading CRC calculation
• Chelsio T310 -- PCI-X2.0 support IPG tuning is available
• Iperf modification with sendfile()
• Hardware Approach for 10Gbit Network TAPEE: Network Analyzer
2006 LSR Challenge, again on X’mas
• Around Dec/10: Seattle line test• Around Dec/20: Round-The-World up• Dec/31: Submission• Jan/8/2007: Round-The-World down
Host
• Xeon 5160 * 1– Woodcrest core– Dual core
• DDR400 2GB
• Chelsio T310-SR on PCI-Express x8– There is no longer bus speed bottleneck
• Linux 2.6.18
Circuit
• Round The World circuit– 522ms RTT– Trans Pacific & Trans Atlantic– WAN PHY & LAN PHY mixed
– Tokyo – [Los Angels] – Chicago – Amsterdam– Amsterdam – [Chicago] – Seattle – Tokyo
AmsterdamNetherLightAt SARA
SURFnetIEEAF CANARIE
L3 switch
Chicago StarLight
L2 switch
Atlantic
Ocean
Pacific
Ocean
WAN PHY
Force10
E1200
HDXc FoundryRX-4
SeattlePacific Northwest
Gigapop
SURFnet
SURFnet
SURFnet
WIDE JGN2
ONS15454
ONS15454
FoundryNI40G
GS4000
WAN PHY
WAN PHY
WAN PHY
HDXc
GS4000
Others
L1 switch
T-LEX
IEEAF
WAN PHY
LAN PHY
JGN2
LAN PHY
CANARIE CA* NET 4
WIDE
WAN PHY
LAN PHY
LSR 200612-2 Network Topology
FoundryRX-4
WAN PHY
Age-1Intel Xeon
Age-2Intel Xeon
FujitsuXG800
JGN2
Tokyo
Force10E300
JGN2
Los Angels
JGN2
WAN PHY
CISCO7609
HDXc
SURFnet
WAN PHY
NYC MANLAN
TransLightLAN PHY
TransLight
LSR distance
From To Distance
HND (35°33'08"N 139°46'47"E) ORD (41°58'43"N 87°54'17"W) 10147 km
ORD (41°58'43"N 87°54'17"W) AMS (52°18'31"N 04°45'50"E) 6630 km
AMS (52°18'31"N 04°45'50"E) SEA (47°26'56"N 122°18'34"W) 7864 km
SEA (47°26'56"N 122°18'34"W) HND (35°33'08"N 139°46'47"E) 7730 km
4 segment path: 32372 km
IPG Tuning
• Chelsio T310 has special function of setting IPG (Inter Packet Gap)– Enables to control the Ethernet NIC transmissi
on rate– Upto 2048 octet (IEEE standard IPG 12 octet)
• Fine Grain TuningFor Standard Frame control 50 ~ 100 %,For 8000B Jumbo Frame 80 ~ 100%
Iperf modification
• We have been used Iperf
• Iperf transmission flow– Allocate several kB buffer– Initialize buffer with random data– while() { write(sock, buffer) }
• This invokes copy between user and kernel space
Iperf modification (cont’d)
• An advice from Chelsio– “Use netperf’s sendfile mode to confirm receiver performance”
• Modification– Iperf-zerocopy transmission flow
• open(temporary file) file descriptor fd• buffer = mmap(fd)• initialize buffer with random data• while() { sendfile(sock, fd) }
– sendfile(2) sends data from kernel
• After some discussion, we concluded that using this version of Iperf meets LSR rule
New submission
• 7.67Gbps average– Standard-Iperf– Peak 8.10Gbps, 20 minutes, No packet loss
• 9.08Gbps average– Iperf-zerocopy
– Peak 9.11Gbps, 5 hours, No packet loss
History of single-stream IPv4 Land Speed Record
2000 2001 2003 2004 2005 2006 2007
Year
1
10
100
Distance bandwidth productPbit m / s
2004/11/9Data Reservoir project
WIDE project149 Pbit m / s
2002
1,000
2005/11/10240 Pbit m / s
10 Gbps * 30,000km
2006/2/20264 Pbit m / s
2004/12/24216 Pbit m / s
History of single-stream IPv6 Land Speed Record
2000 2001 2003 2004 2005 2006 2007
Year
1
10
100
Distance bandwidth productPbit m / s
2004/10/29Data Reservoir project
WIDE project167 Pbit m / s
2002
1,000
2005/11/13Data Reservoir project
WIDE project208 Pbit m / s
10 Gbps * 30,000km
2006/12/28Data Reservoir project
WIDE project272 Pbit m / s