globus gridftp and rft: an overview and new featureskettimut/talks/nersc08gridftptalk.pdf · what...
TRANSCRIPT
![Page 1: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/1.jpg)
Globus GridFTP and RFT: An Overview and New
Features
Raj Kettimuthu
Argonne National Laboratory and
The University of Chicago
![Page 2: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/2.jpg)
What is GridFTP?
High-performance, reliable data transfer protocol optimized for high-bandwidth wide-area networks
Based on FTP protocol - defines extensions for high-performance operation and security
We supply a reference implementation: Server
Client tools (globus-url-copy)
Development Libraries
Multiple independent implementations can interoperate Fermi Lab and U. Virginia have home grown servers that work
with ours.
![Page 3: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/3.jpg)
GridFTP
Two channel protocol like FTP
Control Channel Communication link (TCP) over which
commands and responses flow
Low bandwidth; encrypted and integrity protected by default
Data Channel Communication link(s) over which the
actual data of interest flows
High Bandwidth; authenticated by default; encryption and integrity protection optional
![Page 4: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/4.jpg)
Globus GridFTP
Performance Parallel TCP streams Non TCP protocol such as UDT Order of magnitude greater
Cluster-to-cluster data movement Another order of magnitude
Support for reliable and restartable transfers Multiple security options
Anonymous, password, SSH, GSI
Modular and easy to optimize for various storage HPSS, SRB
![Page 5: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/5.jpg)
Cluster-to-Cluster transfers
Control node Control node
Data node
Data node
Data node
Data node
![Page 6: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/6.jpg)
Performance
Mem. transfer between Urbana, IL and San Diego, CA
0
5
10
15
20
25
30
0 10 20 30 40 50 60 70
Degree of Striping
Th
rou
gh
pu
t (G
bit
/s)
# Stream = 1 # Stream = 2 # Stream = 4
# Stream = 8 # Stream = 16 # Stream = 32
![Page 7: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/7.jpg)
Performance
Disk transfer between Urbana, IL and San Diego, CA
0
5
10
15
20
0 10 20 30 40 50 60 70
Degree of Striping
Th
rou
gh
pu
t (G
bit
/s)
# Stream = 1 # Stream = 2 # Stream = 4
# Stream = 8 # Stream = 16 # Stream = 32
![Page 8: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/8.jpg)
Users
HEP community is basing its entire tiered data movement infrastructure for the LHC computing Grid on GridFTP
Southern California Earthquake Center (SCEC), Laser Interferometer Gravitational Wave Observatory (LIGO), Earth Systems Grid (ESG), Relativistic Heavy Ion Collider (RHIC), Advanced Photon Source use GridFTP for data movement
European Space Agency, Disaster Recovery Center in Japan, British Broadcasting Corporation move large volumes of data using GridFTP
GridFTP facilitates an average of more than 2.5 million data transfers every day
![Page 9: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/9.jpg)
New Features
GUI client SSH security for GridFTP GridFTP over UDT Pipelining Multicasting / Overlay Routing Scalability Lotman Storage plugin Anomaly and bottleneck detection using
Netlogger
![Page 10: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/10.jpg)
A GUI client for GridFTP
An alpha version is available at http://www.globus.org/cog/demo/
Java web start application Integrated with myproxy-logon
Certificates can be completely hidden from the user
If certificates are in place, proxy can be generated through the GUI
Provides support for RFT as well
![Page 11: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/11.jpg)
SSH Security for GridFTP
sshd Client
GridFTP Server
Port 22
ROOT
USER
ssh Stdin/out
![Page 12: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/12.jpg)
SSH Security for GridFTP
Client support for using SSH is automatically enabled
On the server side (where you intend the client to remotely execute a server) setup-globus-gridftp-sshftp -server
In order to use SSH as a security mechanism, the user must provide urls that begin with sshftp:// as arguments. globus-url-copy sshftp://<host>:<port>/<filepath>
file:/<filepath> <port> is the port in which sshd listens on the host
referred to by <host> (the default value is 22).
![Page 13: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/13.jpg)
GridFTP over UDT
GridFTP uses XIO for network I/O operations XIO presents a POSIX-like interface to many
different protocol implementations
GSI
TCP
Default GridFTP
GridFTP over UDT
GSI
UDT
![Page 14: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/14.jpg)
GridFTP over UDT Argonne to NZ Throughput in
Mbit/s Argonne to LA
Throughput in Mbit/s
Iperf – 1 stream 19.7 74.5
Iperf – 8 streams 40.3 117.0
GridFTP mem TCP – 1 stream 16.4 63.8
GridFTP mem TCP – 8 streams 40.2 112.6
GridFTP disk TCP – 1 stream 16.3 59.6
GridFTP disk TCP – 8 streams 37.4 102.4
GridFTP mem UDT 179.3 396.6
GridFTP disk UDT 178.6 428.3
UDT mem 201.6 432.5
UDT disk 162.5 230.0
![Page 15: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/15.jpg)
Lots of Small Files (LOSF) Problem Traditional transfer pattern
Sender Receiver
Client
Data
![Page 16: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/16.jpg)
Pipelining
Allow many outstanding transfer requests
Send next request before previous completes Latency is overlapped with the data transfer
Backward compatible Wire protocol doesn’t change
Client side sends commands sooner
![Page 17: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/17.jpg)
Pipelining Traditional Pipelining
Significant performance improvement for LOSF
File Request 1
File Request 2
File Request 3
DATA 1
DATA 2
DATA 3
ACK 1
ACK 2
ACK 3
File Request 1 File Request 2 File Request 3
DATA 1
DATA 2
DATA 3
ACK 1
ACK 2
ACK 3
![Page 18: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/18.jpg)
Multicast / Overlay Routing
Enable GridFTP to transfer single data set to many locations or act as an intermediate routing node
![Page 19: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/19.jpg)
Scalability
Data nodes can be added dynamically - need more throughput, add more data nodes
Control node Control node
Data node
Data node
Data node
Data node
![Page 20: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/20.jpg)
Storage Plugin
Destination storage might run out of space in the middle of a GridFTP transfer
Lotman - tool from univ. of wisconsin that manages storage
Developed plugin for GridFTP to interact with Lotman
Space availability (for individual file transfers) determined ahead of transfers to Lotman enabled storage
![Page 21: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/21.jpg)
GridFTP with Lotman
GridFTP Server
Client Lotman
![Page 22: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/22.jpg)
Anomaly and Bottleneck Detection using Netlogger
GridFTP server can be instrumented with Netlogger
Log messages which can be post processed using Netlogger tools
Fine grained disk and net I/O characteristics can then be visualized and analyzed
![Page 23: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/23.jpg)
Reliable File Transfer Service (RFT)
GridFTP - on demand transfer service Not a queuing service
RFT - GridFTP client Queues requests
Orchestrates transfers on client’s behalf
Third party transfers
Interacts with many GridFTP servers
Retry requests on failure
Recovers from GridFTP and RFT service failures
![Page 24: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/24.jpg)
RFT
RFT Service
RFT Client
SOAP Messages
Notifications (Optional)
GridFTP Server
GridFTP Server
CC CC
DC
Persistent Store
![Page 25: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/25.jpg)
RFT - Connection Caching
Control channel connections (and thus the data channels associated with it) are cached to reuse later (by the same user)
RFT Service
GridFTP Server
GridFTP Server
CC CC
DC
![Page 26: Globus GridFTP and RFT: An Overview and New Featureskettimut/talks/NERSC08GridFTPTalk.pdf · What is GridFTP? High-performance, reliable data transfer protocol optimized for high-bandwidth](https://reader030.vdocument.in/reader030/viewer/2022041222/5e0c30e9caa302637730c970/html5/thumbnails/26.jpg)
RFT - Connection Caching
Reusing connections eliminate authentication overhead on the control and data channels
Measured performance improvement for jobs submitted using Condor-G
For 500 jobs - each job requiring file stageIn, stageOut and cleanup (RFT tasks) 30% improvement in overall performance No timeout due to overwhelming connection
requests to GridFTP servers