massive data transfers
DESCRIPTION
George McLaughlin Mark Prior AARNet. Massive Data Transfers. “Big Science” projects driving networks. Large Hadron Collider Coming on-stream in 2007 Particle collisions generating terabytes/second of “raw” data at a single, central, well-connected site - PowerPoint PPT PresentationTRANSCRIPT
Copyright AARNet 20051
Massive Data Transfers
George McLaughlinMark PriorAARNet
Copyright AARNet 20052
“Big Science” projects driving networks• Large Hadron Collider
– Coming on-stream in 2007– Particle collisions generating terabytes/second of
“raw” data at a single, central, well-connected site– Need to transfer data to global “tier 1” sites. A tier
1 site must have a 10Gbps path to CERN– Tier 1 sites need to ensure gigabit capacity to the
Tier2 sites they serve• Square Kilometre Array
– Coming on-stream in 2010?– Greater data generator than LHC– Up to 125 sites at remote locations, data need to
be brought together for correlation– Can’t determine “noise” prior to correlation – Many logistic issues to be addressed
Billion dollar globally funded projects
Massive data transfer needs
Copyright AARNet 20053
From very small to very big
Copyright AARNet 20054
Scientists and Network Engineers coming together
• HEP community and R&E network community have figured out mechanisms for interaction – probably because HEP is pushing network boundaries
• eg the ICFA workshops on HEP, Grid and the Global Digital Divide bring together scientists, network engineers and decision makers – and achieve results
• http://agenda.cern.ch/List.php
Copyright AARNet 20055
What’s been achieved so far A new generation of real-time Grid systems is
emerging - support worldwide data analysis by the physics community
Leading role of HEP in developing new systems and paradigms for data intensive science
Transformed view and theoretical understanding of TCP as an efficient, scalable protocol with a wide field of use
Efficient standalone and shared use of 10 Gbps paths of virtually unlimited length; progress towards 100 Gbps networking
Emergence of a new generation of “hybrid” packet- and circuit- switched networks
Copyright AARNet 20056
LHC data (simplified) Per experiment40 million collisions per second
• After filtering, 100 collisions of interest per second
• A Megabyte of digitised information for each collision = recording rate of 100 Megabytes/sec
• 1 billion collisions recorded = 1 Petabyte/year
CMS LHCb ATLAS ALICE
1 Megabyte (1MB)A digital photo
1 Gigabyte (1GB) = 1000MB
A DVD movie
1 Terabyte (1TB)= 1000GB
World annual book production
1 Petabyte (1PB)= 1000TB
10% of the annual production by LHC experiments
1 Exabyte (1EB)= 1000 PB
World annual information production
Copyright AARNet 20057
LHC Computing Hierarchy
Tier 1
Tier2 Center
Online System
CERN Center PBs of Disk;
Tape Robot
FNAL CenterIN2P3 Center INFN Center RAL Center
InstituteInstituteInstituteInstitute
Workstations
~100-1500 MBytes/sec
2.5-10 Gbps
Tens of Petabytes by 2007-8.An Exabyte ~5-7 Years later.
~PByte/sec
~2.5-10 Gbps
Tier2 CenterTier2 CenterTier2 Center
~2.5-10 Gbps
Tier 0 +1
Tier 3
Tier 4
Tier2 Center Tier 2
Experiment
CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1
0.1 to 10 GbpsPhysics data cache
Copyright AARNet 20058
Lightpaths for Massive data transfers• From CANARIE
0
5
10
15
20
25
30
Lightpaths
IP Peak
IP Average
A small number of users with large data transfer needs can use more bandwidth than all other users
Copyright AARNet 20059
Why?• Cees de Laat classifies network
users into 3 broad groups. 1. Lightweight users, browsing, mailing,
home use. Who need full Internet routing, one to many;
2. Business applications, multicast, streaming, VPN’s, mostly LAN. Who need VPN services and full Internet routing, several to several + uplink; and
3. Scientific applications, distributed data processing, all sorts of grids. Need for very fat pipes, limited multiple Virtual Organizations, few to few, peer to peer.
Type 3 users:
High Energy PhysicsAstronomers, eVLBI,High Definition multimedia over IPMassive data transfers from experiments running 24x7
Copyright AARNet 200510
What is the GLIF?• Global Lambda Infrastructure Facility
- www.glif.is• International virtual organization that
supports persistent data-intensive scientific research and middleware development
• Provides ability to create dedicated international point to point Gigabit Ethernet circuits for “fixed term” experiments
Copyright AARNet 200511
Huygens Space Probe – a practical example• Cassini spacecraft left Earth in October 1997
to travel to Saturn• On Christmas Day 2004, the Huygens probe
separated from Cassini• Started it’s descent through the dense
atmosphere of Titan on 14 Jan 2005• Using this technique 17 telescopes in
Australia, China, Japan and the US were able to accurately position the probe to within a kilometre (Titan is ~1.5 billion kilometres from Earth)
• Need to transfer Terabytes of data between Australia and the Netherlands
Very Long Baseline Interferometry (VLBI) is a technique where widely separated radio-telescopes observe the same region of the sky simultaneously to generate images of cosmic radio sources
Copyright AARNet 200512
AARNet - CSIRO ATNF contribution• Created “dedicated” circuit• The data from two of the Australian
telescopes (Parkes [The Dish] & Mopra) was transferred via light plane to CSIRO Marsfield (Sydney)
• CeNTIE based fibre from CSIRO Marsfield to AARNet3 GigaPOP
• SXTransPORT 10G to Seattle• “Lightpath” to Joint Institute for VLBI
in Europe (JIVE) across CA*net4 and SURFnet optical infrastructure
Copyright AARNet 200513
But………..• 9 organisations in 4 countries involved in
“making it happen” • Required extensive human-human
interaction (mainly emails…….lots of them) • Although a 1Gbps path was available,
maximum throughput was around 400Gbps• Issues with protocols, stack tuning, disk-to-
disk transfer, firewalls, different formats, etc• Currently scientists and engineers need to
test thoroughly before important experiments, not yet “turn up and use”
• Ultimate goal is for the control plane issues to be transparent to the end-user who simply presses the “make it happen” icon
Although time from concept to undertaking the scientific experiment was only 3 weeks……..
Copyright AARNet 200514
International path for Huygens transfer
Copyright AARNet 200515
EXPReS and Square Kilometre Array• SKA bigger data generator than LHC• But in a remote location
Australia one of countries bidding for SKA – significant infrastructure challenges
Also, Eu Commision funded EXPReS project to link 16 radio telescopes around the world at gigabit speeds
Copyright AARNet 200516
In Conclusion
• scientists and network engineers working together can exploit the new opportunities that high capacity networking opens up for “big science”
• Need to solve issues associated with scalability, control plane, ease of use
• QUESTIONS?