the rob sdo data system
DESCRIPTION
All of the Sun all of the time: Distributing 1TB/day from the Solar Dynamics Observatory satellite, 24/7 for 5+ years. The ROB SDO data system. David Boyes, Véronique Delouille, Benjamin Mampaey, Tobias Berghoff, Cis Verbeeck, Jean-François Hochedez (Royal Observatory of Belgium). - PowerPoint PPT PresentationTRANSCRIPT
ROB for BNC 2009 Brussels Slide 1
The ROB SDO data system
All of the Sun all of the time: Distributing 1TB/day from the Solar Dynamics Observatory satellite,
24/7 for 5+ years
ROB for BNC 2009 Brussels Slide 2
The Presentation
The science - studying the Sun
The system - satellite, network, data centres
In practice - getting it all to work
David Boyes, Véronique Delouille, Benjamin Mampaey, Tobias Berghoff, Cis Verbeeck, Jean-François Hochedez (Royal Observatory of Belgium)
The people
ROB for BNC 2009 Brussels Slide 3
Why look at the Sun?
Weather - it affects us but can be forecast
Science - there is still a lot to find out!
– why is the solar corona so hot
– what drives mass ejections
– why is there an 11 year cycle
– … and much more
ROB for BNC 2009 Brussels Slide 4
Why have satellites?
The earths atmosphere blocks a lot - UV and above
At these wavelengths the structure of the sun is revealed
ROB for BNC 2009 Brussels Slide 5
What are the effects on earth? Radiation - effects are
immediate, goal is to predict them
Particles - arrive in hours or days - we can give warning
This is solar weather forecasting - ROB is a Regional Warning Center
Graphic : NASA
ROB for BNC 2009 Brussels Slide 6
What's up there?
Telescopes to observe the Sun's atmosphere at multiple UV wavelengths (AIA)
Telescopes to measure specific wavelengths to allow calculation of magnetic fields and seismic activity (HMI)
Wide band UV sensor to measure total UV spectrum (EVE)
ROB for BNC 2009 Brussels Slide 7
Some numbers about SDO
A massive increase in data quantity and precision - 1000 to 10000 times as much data as current satellites
Flies at 38 000 km, geosynchronous orbit
AIA - images at 10 wavelengths from visible to 131Å, one image every 1.25s
HMI – one magnetic image every 45s
EVE – irradiance time series from 10 to 1050Å
Images are 4Kx4K - 32MB per image
A lot of data - more than 1TB/day
ROB for BNC 2009 Brussels Slide 8
The challenges of SDO
Huge bandwidth
Lots of data to be made available
Too much data for humans to absorb
ROB for BNC 2009 Brussels Slide 9
The solution A worldwide network of data stores holding current quarter and
popular data
Joined by high-speed network
Pushing a full copy of data to as wide an area as possible in compact form
Software system (netDRMS with internal PostgreSQL) at each data store provides virtual storage for file requests from users
Transparent access to any data, if needed going down to original source data
Local users have the impression they have file access
Web based mediation for remote use interface
Automatic processing by high performance computing
ROB for BNC 2009 Brussels Slide 10
What's down here
Ground station - main station at White Sands
Mission Operations Center at Goddard
Joint Science Operations Centers (JSOC) - Stanford and Colorado
Knowledge base – Lockheed, Virtual Observatory - Harvard
Storage at White Sands, JSOC and Data Centres
Compute clusters and data servers at Data Centres
A network of Data Centres...
ROB for BNC 2009 Brussels Slide 11
The Data Centres
ROB for BNC 2009 Brussels Slide 12
What this enables
Many groups working in parallel on the unprecedented flow of data
Simultaneous access and processing of bulk data in many high-performance systems
Online access for forecasters to complete data to refine their techniques
Completely open and low cost access to all data for both researchers with specific interests and for researchers with limited budgets
ROB for BNC 2009 Brussels Slide 13
Does it work?
Yes it does – e.g. two weeks 320Mb/s from Harvard
ROB for BNC 2009 Brussels Slide 14
Network requirements
Throughput
– One set of data takes around 200Mb/s
– Requires 320Mb/s to handle catch ups
– Practical limit is network chain topology
Availability
– More than five year, probably ten year operation
– 24/7, 365
– Must maintain full performance for backbone data system even with subsystem failures
ROB for BNC 2009 Brussels Slide 15
In practice - Bandwidth-Delay product
There are simply a lot of Bytes in the cable - this is the Bandwidth-Delay (BD) product
Problem with the TCP protocol is that buffer size >= 2 * bandwidth * delay and the actual size is adaptive
For example 200Mb/s and 0.1s → 5MB, and you need about twice that for adaptation.
– Standard Linux buffer size is 64K!
Plus you can run into congestion control limits – designed to share traffic fairly!
ROB for BNC 2009 Brussels Slide 16
In practice - BD product
Fixes …
– Use an improved scp (HPN-scp)
– Use multiple sessions
– Use another protocol
– Use a tool which combines these (e.g. GridFTP)
We use multiple sessions in user space
– Raw bandwidth tests used many more than needed
– Production system has tool which interfaces with the data system
ROB for BNC 2009 Brussels Slide 17
In practice - routing
Check it - you might be surprised
– Different networks have quite legitimate different behaviour
What didn't get noticed with e-mail and web pages can still be a problem
– The odd few minutes for e-mail don't get noticed
– Low speed at 2am probably won't get noticed
ROB for BNC 2009 Brussels Slide 18
In practice - reliability Can't use terms like guarantee – things will go wrong
Can't be qualitative – this is way beyond normal hardware reliability
– You still need quality, duplication, spares and conservative ratings
Must get quantitative – failure analysis and point of failure identification
– Time to repair (night shifts!) is critical
Must be able to detect failures
– A single failure will not show up in system performance
ROB for BNC 2009 Brussels Slide 19
In practice - reliability
This is how the Belnet connections at ROB deliver reliability
Single failures do not affect data flow, regardless of which HA node is active
You must check that no failure has occurred
ROB for BNC 2009 Brussels Slide 20
In practice - last mile
It's here you will have the most problems
– Both ends will need work
– Firewalls
– Routers
– Just where is the cable really
– A server is not quite as good as the manufacturer said
Again, situations which might have gone unnoticed will make themselves known
But you are right there to fix them...
ROB for BNC 2009 Brussels Slide 21
Where it's at
The data network is runningThe data transfer system is testingThe system is being documented
So ... it's looking good
ROB for BNC 2009 Brussels Slide 22
Further reading SDO at the ROB
http://wissdom.oma.be
Belnethttp://www.belnet.be
SDOhttp://sdo.gsfc.nasa.gov
ESnet Network Performance Knowledge Base http://fasterdata.es.net
High Performance Enabled SSH/SCP http://www.psc.edu/networking/projects/hpn-ssh
ROB for BNC 2009 Brussels Slide 23
The ROB SDO data system
Thanks for your interestand success with your projects