gnew’2004 – 15/03/2004 datatag project status & perspectives olivier martin - cern...
TRANSCRIPT
GNEW’2004 – 15/03/2004
DataTAG project Status & Perspectives
Olivier MARTIN - CERN
GNEW’2004 workshop
15 March 2004, CERN, Geneva
2Final DataTAG Review, 24 March 2004March 15, 2004 2
Presentation outline
Project overview Testbed characteristics and evolution Major networking achievements Where are we? Lambda Grids Networking testbed requirements Acknowledgements Conclusions
3Final DataTAG Review, 24 March 2004March 15, 2004 3
DataTAG Mission
EU US Grid network research High Performance Transport protocols
Inter-domain QoS
Advance bandwidth reservation
EU US Grid Interoperability
Sister project to EU DataGRID
TTransransAAtlantictlantic G Gridrid
4Final DataTAG Review, 24 March 2004March 15, 2004 4
http://www.datatag.orghttp://www.datatag.org
Project partners
5Final DataTAG Review, 24 March 2004March 15, 2004 5
Funding agencies
Cooperating Networks
6Final DataTAG Review, 24 March 2004March 15, 2004 6
EU collaborators
Brunel University
CERN
CLRC
CNAF
DANTE
INFN
INRIA
NIKHEF
PPARC
UvA
University of Manchester
University of Padova
University of Milano
University of Torino
UCL
7Final DataTAG Review, 24 March 2004March 15, 2004 7
US collaborators
ANL
Caltech
Fermilab
FSU
Globus
Indiana
Wisconsin
Northwestern University UIC
University of Chicago
University of Michigan
SLAC
Starlight
8Final DataTAG Review, 24 March 2004March 15, 2004 8
Workplan
WP1: Establishment of a high performance intercontinental Grid testbed
(CERN)
WP2: High performance networking (PPARC)
WP3 Bulk data transfer validations and application performance
monitoring (UvA)
WP4 Interoperability between Grid domains (INFN)
WP5 & WP6 Dissemination and project management (CERN)
9Final DataTAG Review, 24 March 2004March 15, 2004 9
IntegrationIntegration
DataTAG/WP4 frameworkDataTAG/WP4 framework and relationships and relationships
HICB/HIJTB
InteroperabilityInteroperabilitystandardizationstandardization
HEP applications,HEP applications,Other experimentsOther experiments
10Final DataTAG Review, 24 March 2004March 15, 2004 10
Testbed evolution The DataTAG testbed evolved from a simple 2.5 Gb/s
Layer3 testbed (Sept. 2002) into an extremely rich multi-vendor 10 Gb/s Layer2/Layer3 testbed (Sept. 2003) Alcatel, Chiaro, Cisco, Juniper, PRocket Exclusive access to the testbed is granted through an advance
testbed reservation application Direct extensions to Amsterdam UvA/Surfnet (10G) & Lyon
INRIA/VTHD (2.5G) Layer 2 extension to INFN/CNAF over GEANT & GARR
using Juniper’s CCC Layer 2 extension to the OptiPuter project at UCSD
(University of California San Diego) through Abilene and CENIC under way.
1st L2/L3 Transatlantic testbed with native 10Gigabit Ethernet access.
11Final DataTAG Review, 24 March 2004March 15, 2004 11
Cisco7606
r04chiCisco7609 stm16
(T-Systems)
r05chi-JuniperM10
r06chi-Alcatel7770
r05gva-JuniperM10
r06gvaAlcatel7770
SURFNET
stm16(Colt)backup+projects
s01chiExtreme S5i
VTHD/INRIAstm16
(FranceTelecom)
Chicago Geneva
ONS15454
Alcatel 1670 Alcatel 1670
SURFNETCESNET
ONS15454
stm64(GC)
CNAFGEANT
Linux PCs
Linux PCs
STM64
[email protected] last update: 20030909
Linux PCs
Juniper T320 Juniper T320
Linux PCs
JuniperM10
GEANT
Cisco7609
Linux PCs
StarLight Cisco6509
StarLight Force10
ABILENE
1G ethernet
2.5G STM16
10G ethernet
10Gbps Optical wave (T-Systems)
VTHD/INRIAAlcate l7770
10G STM64
DataTAG testbed phase 1 (2.5Gbps)
DataTAG testbed phase 2 (10Gbps) simplified
12Final DataTAG Review, 24 March 2004March 15, 2004 12
13Final DataTAG Review, 24 March 2004March 15, 2004 13
DataTAG testbed
Alcatel
Chiaro
Cisco
Juniper
PRocket
14Final DataTAG Review, 24 March 2004March 15, 2004 14
Main networking achievements (1)
Internet landspeed records have been beaten one after the other by the DataTAG project partners and/or teams closely associated with DataTAG: Atlas Canada lightpath experiments during iGRID2002 (Gigabit
Ethernet) and Telecom World 2003 (10Gigabit Ethernet, aka WAN-PHY)
New Internet2 landspeed record (I2 LSR) by Nikhef/Caltech team (SC2002)
FAST, GridDT, HS-TCP, Scalable TCP experiments (DataTAG partners & Caltech)
Intel 10GigE tests between CERN (Geneva) and SLAC (Sunnyvale) (CERN, Caltech, Los Alamos Nationa Laboratory, SLAC)
2.38 Gbps sustained rate, single flow, 1TB in one hour I2 LSR awarded during Internet2 Spring member meeting (April 2003)
15Final DataTAG Review, 24 March 2004March 15, 2004 15
ATLAS Canada Lightpath trials
TRIUMF Vancouver & CERN Geneva through Amsterdam
CANARIE2xGbE
circuits StarLight SURFnet 2xGbE
circuits
NetherLight
“A full Terabyte of real data was transferred at rates equivalent to a full CD (680MB) in under 8 seconds and a DVD in under 1 minute” Wade Hong et al 09/2002
Subsequent 10GigE WAN-PHY Experiments during Telecom World 2003
Bringing effective data transfer rates below one second per CD!
March 15, 2004 16
On Feb. 27-28 2003, a On Feb. 27-28 2003, a terabyte of data was transferred in terabyte of data was transferred in 3700 seconds3700 seconds by S. Ravot of Caltech between the Level3 by S. Ravot of Caltech between the Level3 PoP in Sunnyvale near SLAC and CERN through the PoP in Sunnyvale near SLAC and CERN through the TeraGrid router at StarLight from memory to memory TeraGrid router at StarLight from memory to memory with with a single TCP/IPv4 stream. a single TCP/IPv4 stream. This achievement translates to This achievement translates to anan average rate of 2.38 Gbps average rate of 2.38 Gbps (using large windows and (using large windows and 9kB “jumbo frames”). This beat the former record by a 9kB “jumbo frames”). This beat the former record by a factor of ~2.5 and factor of ~2.5 and used the 2.5Gb/s link at 99% efficiency.used the 2.5Gb/s link at 99% efficiency.
10GigE Data Transfer Trial10GigE Data Transfer Trial
European CommissionEuropean Commission
Huge distributed effort, 10-15 highly skilled people monopolized for several weeks!
17Final DataTAG Review, 24 March 2004March 15, 2004 17
10G DataTAG testbed extension to Telecom World 2003 and
Abilene/Cenic
Sponsors: Cisco, HP, Intel, OPI Sponsors: Cisco, HP, Intel, OPI (Geneva’s Office for the Promotion of (Geneva’s Office for the Promotion of Industries & Technologies), Services Industries & Technologies), Services
Industriels de Geneve, Telehouse Industriels de Geneve, Telehouse Europe, T-SystemsEurope, T-Systems
On September 15, 2003, the DataTAG On September 15, 2003, the DataTAG project was the first transatlantic testbed project was the first transatlantic testbed offering direct 10GigE access using offering direct 10GigE access using Juniper’sJuniper’s VPN layer2/10GigE emulation.VPN layer2/10GigE emulation.
18Final DataTAG Review, 24 March 2004March 15, 2004 18
Main networking achievements (2)
Latest IPv4 & IPv6 I2LSR were awarded, live from the Internet2 fall member meeting in Indianapolis, to Caltech & CERN during Telecom World 2003: May 6, 2003:
987 Mb/s single TCP/IP v6 stream October 1, 2003
5.44 Gb/s single TCP/IP v4 stream between Geneva and Chicago: 1.1TB in 26 minutes or one 680MB CD in 1 second
More records have been established by Caltech & CERN since then: November 6, 2003:
5.64 Gb/s single TCP/IP v4 stream between Geneva and Los Angeles (CENIC PoP) across DataTAG and Abilene.
November 11, 2003, 4 Gb/s single TCP/IP v6 stream between Geneva and Phoenix (Arizona) through
Los Angeles February 24, 2004
6.25 Gb/s with 9 streams for 638 seconds, i.e. half a terabyte transferred between CERN in Geneva and the CENIC PoP in Los Angeles across DataTAG and Abilene.
19Final DataTAG Review, 24 March 2004March 15, 2004 19
Internet2 landspeed record history (IPv4&IPv6)
0.000
1.000
2.000
3.000
4.000
5.000
6.000
Month Mar-00 Apr-02 Sep-02 Oct-02 Nov-02 Feb-03 May-03 Oct-03 Nov-03 Nov-03Month
Evolution of the I2LSR in Gigabit/second
IPv4 (Gb/s)
IPv6 (Gb/s)
0
10000
20000
30000
40000
50000
60000
70000
Month Mar-00 Apr-02 Sep-02 Oct-02 Nov-02 Feb-03 May-03 Oct-03 Nov-03 Nov-03
Month
Internet2 landspeed record history(in terabit-meters/second)
IPv4 terabit-meters/second)
IPv6 (terabit-meters/second)
Impact of a single multi-Gb/s flow on the Abilene backbone
20Final DataTAG Review, 24 March 2004March 15, 2004 20
Significance of I2LSRs to the Grid?
Essential to establish the feasibility of multi-Gigabit/second single stream IPv4 & IPv6 data transfers: Over dedicated testbeds in a first phase Then across academic & research backbones Last but not least across campus networks Disk to disk rather than memory to memory Study impact of high performance TCP over disk servers
Next steps: Above 6Gb/s expected soon between CERN and Los Angeles
(Caltech/CENIC PoP) across DataTAG & Abilene Goal is to reach 10Gb/s with new PCI Express buses Study alternatives to standard TCP (Reno)
Non-TCP transport (Tsunami, SABUL/VDT) HS-TCP, Scalable TCP, H-TCP, FAST, Grid-DT, Wesley+, etc…
21Final DataTAG Review, 24 March 2004March 15, 2004 21
Main networking achievements (3)
QoS
Layer2:VLAN
Layer2:VLAN
Juniper M10
1 GE bottleneck IP-Qos configured
Layer2: VLAN
Layer2: VLAN
AF AF
Geneva
BE
BE
Advance bandwidth reservation
GARA extensions
AAA extensions
22Final DataTAG Review, 24 March 2004March 15, 2004 22
Where are we?
The DataTAG project came up at exactly the right time: Back in the late 2000, 2.5 Gb/s looked futuristic 10GigE, especially host interfaces, did not really exist
However, it was already very clear that the standard TCP stack (Reno/Newreno) was problematic Much hope was placed on autotuning (Web100/Net100) & ECN/RED
like solutions Actual bit error rates of transatlantic circuits were over-estimated
Much better shape than expected on over-provisioned R&D backbones such as Abilene, Canarie, GEANT For how long? One of the strongest proof made by DataTAG is the extreme
vulnerability of production R&D backbones in the presence of high performance flows (i.e. 10GigE or even less)
23Final DataTAG Review, 24 March 2004March 15, 2004 23
Where are we (cont)?
For many years the Wide Area Network has been the bottlemeck, this is no longer the case in many countries, thus making the deployment of data intensive Grid infrastructure, in principle, possible, e.g. EGEE the DataGrid successor
Recent I2LSR records show, for the first time ever, that the network can be truly transparent and that throughput is only limited by the end hosts and/or campus network infrastructures. Challenge shifted from getting adequate bandwidth to deploying
adequate LANs and cybersecurity infrastructure as well as making effective use of it!
Non-trivial transport protocol issues still need to be resolved The only encouraging sign is that this is now widely recognized But we are still quite far from converging on a practical solution?
24Final DataTAG Review, 24 March 2004March 15, 2004 24
Layer1/2/3 networking (1)
Conventional layer 3 technology is no longer fashionable because of: High associated costs, e.g. 200/300 KUSD for a 10G router
interfaces Implied use of shared backbones
The use of layer 1 or layer 2 technology is very attractive because it helps to solve a number of problems, e.g. 1500 bytes Ethernet frame size (layer1) Protocol transparency (layer1 & layer2) Minimum functionality hence, in theory, much lower costs (layer1&2)
25Final DataTAG Review, 24 March 2004March 15, 2004 25
Layer1/2/3 networking (2)
« Lambda Grids » are becoming very popular: Pros:
circuit oriented model like the telephone network, hence no need for complex transport protocols
Lower equipment costs (i.e. « in theory » a factor 2 or 3 per layer) the concept of a dedicated end to end light path is very elegant
Cons: « End to end » still very loosely defined, i.e. site to site, cluster to cluster
or really host to host Higher circuit costs, Scalability, Additional middleware to deal with circuit
set up/tear down, etc Extending dynamic VLAN functionality to the campus network is a
potential nightmare!
26Final DataTAG Review, 24 March 2004March 15, 2004 26
« Lambda Grids » What does it mean?
Clearly different things to different people, hence the « apparently easy » consensus!
Conservatively, on demand « site to site » connectivity Where is the innovation? What does it solve in terms of transport protocols? Where are the savings?
Less interfaces needed (customer) but more standby/idle circuits needed (provider)
Economics from the service provider vs the customer perspective?» Traditionally, switched services have been very expensive,
Usage vs flat charge Break even, switches vs leased, few hours/day Why would this change?
In case there are no savings, why bother? More advanced, cluster to cluster
Implies even more active circuits in parallel Even more advanced, Host to Host
All optical Is it realisitic?
27Final DataTAG Review, 24 March 2004March 15, 2004 27
Networking testbed requirements
Multi-vendor Unless a particular research group is specifically interested by the behaviour of TCP in
the presence of out of order packets, running high performance TCP tests across a Juniper M160 backbone is pretty useless.
IPv6 achievable performance vary widely between different vendors MPLS & QoS implementations also veary widely Interoperability
Dynamic Implies manpower & money
Partitionable Reservation application
Reconfigurable Avoid manual recabling, implies Electronic or Optical switch/patch panel
Extensible Extensions to other networks Implies collaboration
Not limited to network equipment, must also include high performance servers, high perf. Disks & NICs,
Coordination with other testbeds
28Final DataTAG Review, 24 March 2004March 15, 2004 28
Acknowledments
The project would not have accumulated so many successes without the active participation of our North American colleagues, in particular: Caltech/DoE University of Illinois/NSF iVDGL Starlight Internet2/Abilene Canarie
and our European sponsors and colleagues as well, in particular: European Union’s IST program Dante/GEANT GARR Surfnet VTHD
The GNEW2004 workshop is yet another example of successful collaboration between Europe and USA