data grid interactions with firewalls michael wan reagan moore {mwan,moore}@sdsc
DESCRIPTION
Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu http://www.npaci.edu/DICE/SRB/. SDSC/UCSD/NPACI. A Quick Overview of SRB Data Grid. Federated server system Single client signOn Access to all resources in the federation Data grid owns all files - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/1.jpg)
Data Grid Interactionswith Firewalls
Michael WanReagan Moore
{mwan,moore}@sdsc.edu
http://www.npaci.edu/DICE/SRB/
SDSC/UCSD/NPACI
![Page 2: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/2.jpg)
A Quick Overview of SRB Data Grid
• Federated server system– Single client signOn– Access to all resources in the federation– Data grid owns all files
• Context management– MCAT server – Metadata catalog– Use traditional DBMS
• Four logical name spaces– Logical resource name (operations on sets of resources)– Distinguished user name space– Logical file name space– Metadata attribute name space (state information)
![Page 3: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/3.jpg)
Federated Servers and Resources
MCAT1
MCAT2
MCAT3Server1.1
Server1.2
Server2.1Server2.2
Server3.1
Federated Data Grids
Data Grid 1
Data Grid 2
Data Grid 3
![Page 4: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/4.jpg)
Types of Data Loss Risks
• Media corruption
• Vendor systemic failure
• Operational error
• Malicious user
• Natural disaster
• Solutions - replication, firewalls, federation
![Page 5: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/5.jpg)
National Archives Persistent Archive
NARA U Md SDSC
MCAT MCAT MCAT
Principle copystored at NARAwith completemetadata catalog
Replicated copyat U Md for improvedaccess, load balancingand disaster recovery
Deep Archive atSDSC, no useraccess, but complete copy
![Page 6: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/6.jpg)
BIRN Virtual Data Grid:BIRN Virtual Data Grid:Source Mark EllismanSource Mark Ellisman
• Defines a Distributed Data Handling System
• Integrates Storage Resources in the BIRN network
• Integrates Access to Data, to Computational and Visualization Resources
• Acts as a Virtual Platform for Knowledge-based Data Integration Activities
• Provides a Uniform Interface
to Users
![Page 7: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/7.jpg)
Worldwide Universities NetworkDavid De Roure, University of Southampton
[email protected]://www.ecs.soton.ac.uk/~dder
• Implement data grid linking academic universities
• Support collaborative research and education– HASTAC: Humanities, Arts, Science and Technology Advanced
Collaboratory
– Geo-referenced social science data collections
– Earth Science data collections
• Provide data grid registry to promote federation of international data grids
![Page 8: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/8.jpg)
Foundation of the WUN Grid
• SDSC• Manchester• Southampton• White Rose• NCSA• A functioning, general
purpose international Grid
• A hub for federating other data grids Manchester-SDSC mirror
![Page 9: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/9.jpg)
Authentication
• User authenticates to a data grid server– GSI or challenge response– Access controls map constraints between user
distinguished names and logical file names
• Data grid server authenticates to remote data grid server
• Remote data grid server authenticates to remote storage repository under data grid ID
![Page 10: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/10.jpg)
Firewall Interactions• Client behind a firewall
• Client initiated parallel I/O• Client initiated bulk file load
• Server behind a firewall• Paired servers inside and outside the firewall
• Server inside the firewall only responds to messages from outside server
• Server initiated parallel I/O
• Federated data grids• Need to add metadata to forward messages from a paired front-end server to the back-end server
![Page 11: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/11.jpg)
SRBserver1
SRB agent
SRBserver2
Client behind firewall
MCAT
Sput
SRB agent
1
2
3
4
5
6
srbObjCreatesrbObjWrite
1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control
Peer-to-peer
Request
Server(s) SpawningData
Transfer R
![Page 12: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/12.jpg)
SRBserver1
SRB agent
SRBserver2
Client Initiated Parallel I/O
MCAT
Sput -M
SRB agent
1
2
3
4
7
8srbObjPut
1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control
Return socket addr.,
port and cookie
Connect to server
Data transfer
R
5
6
![Page 13: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/13.jpg)
SRBserver
SRB agent
SRBserver2
Client Initiated -Third Party Data Transfer
MCAT
Scp
SRB agent
1
2
3
4
5
srbObjCopy
dataPut- socket addr.,
port and cookie
Connect to server2 Data
transfer
R
6
SRBserver1
SRBserver
SRB agent
R
![Page 14: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/14.jpg)
SRBserver1
SRB agent
SRBserver2
Client Initiated - Bulk Load Operation
MCAT
Sput -b
SRB agent
1
2
3
4
6
Return Resource Location
1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control
Query Resource
Bulk Register
Bulk Data transfer thread
R
8 Mb buffer
Bulk Registration
threads
5
Store Data in a temp file
Unfold temp file
![Page 15: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/15.jpg)
SRBserver1
SRB agent
SRBserver2
Server behind firewall
MCAT
Sput
SRB agent
1
2
3
4
5
6
srbObjCreatesrbObjWrite
1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control
Peer-to-peer
Request
Server(s) SpawningData
Transfer R
![Page 16: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/16.jpg)
SRBserver1
SRB agent
SRBserver2
Server Initiated Parallel I/O
MCAT
Sput -m
SRB agent
1
2
3
4
5
6
srbObjPut+ socket addr , port and cookie
1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control
Peer-to-peer
Request
Connect to client
Data transfer
R
![Page 17: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/17.jpg)
Federated Data Grids
MCAT1
MCAT2
MCAT3Server1.1
Server1.2
Server2.1Server2.2
Server3.1
Automating redirection toa server in front of a firewall
Data Grid 1
Data Grid 2
Data Grid 3
Client
![Page 18: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/18.jpg)
Container - Archival of Small files
• Performance issues with storing/retrieving large number of small files to/from tape
• Container design– physical grouping of small files– Implemented with a Logical Resource
• A pool of Cache Resource for the frontend resource• An Archival Resource for the backend resource
– Read/Write I/O always done on Cache Resource and sync to the Archival Resource
• Stage to cache if a cache copy does not exist• The entire container is moved between cache and archival and
written to tape • Bulk operation with container - faster
![Page 19: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/19.jpg)
Examples of using container
• Make a container with name “myCont”– Smkcont -S cont-sdsc myCont
• Put a file into “myCont”– Sput -c myCont myLocalSrcFile mySRBTargFile
• Bulk Load a local directory into “myCont”– Sbload -c myCont myLocalSrcDir mySRBTargColl
• Sync “myCont” to archival and purge the cache copy– Ssyncont -d myCont
• Download a file store in “myCont”– Sget mySRBsrcFile myLocalTargFile
• Slscont - list existing containers and contents
![Page 20: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/20.jpg)
Summary of Data Transfer modes
• Serial - default mode
• Parallel - for large files
• Bulk - for large number of small files
• Container - Archiving small files (to tapes).
• Container + bulk - faster archival of small files
![Page 21: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/21.jpg)
Types of Data Transfer
• Local to SRB - Sput, Srsync
• SRB to Local - Sget, Srsync
• SRB to SRB - Scp, Sreplicate, Sbkupsrb, Srsync– Third party transfer
• Server to Server data transfer, client not involved
• Parallel I/O
![Page 22: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/22.jpg)
Other useful Data Management Scommands
• Srsync, Schksum - – Data synchronization using checksum values – similar to UNIX’s rsync
• Sreplicate, Sbkupsrb– generate multiple copies of data using replica– Replica - multiple copies of the same file
• same Logical Path Name - e.g., /home/srb.sdsc/foo
• replica on different resources
• Each replica has different replNum
• Most recently modified flag
![Page 23: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/23.jpg)
Commands Using Checksum
• Registering checksum values into MCAT– at the time of upload
• Sput -k - compute checksum of local source file and register with MCAT
• Sput -K – checkum verification mode
– After upload, compute checksum by reading back uploaded file
– Compare with the checksum generated with locally
– Existing SRB files• Schksum
– compute and register checksum if not already exist
• Srsync - if the checksum does not exist
![Page 24: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/24.jpg)
Srsync command• Synchronize the data
– from a local copy to SRB• Srsync myLocalFile s:mySrbFile
– from a SRB copy to a local file system• Srsync s:mySrbFile myLocalFile
– between two SRB paths.• Srsync s:mySrbFile1 s:mySrbFile2
• Similar to rsync– compare the checksum values of source and target– upload/download source to target if
• target does not exist or checksum differ
– Save checksum values to MCAT
![Page 25: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/25.jpg)
Srsync command (cont)
• Some Srsync options– -r --- recursively Synchronizing a
directory/collection– -s --- use size instead of checksum value for
determining synchronization• Faster - no checksum computation
• Less accurate
– -m, -M --- parallel I/O
![Page 26: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/26.jpg)
Sreplicate, Sbkupsrb commands
• Generate multiple copies of data using replica
• Sreplicate - Generate a new replica each time
• Sbkupsrb– Backups the srb data/collection to the specified
backupResource with a replica– If an up-to-date replica already exists in the
backupResource, nothing will be done
![Page 27: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/27.jpg)
Data and Resource Virtualisation
• Data and Collections Organisation– File Logical Name space -
• UNIX like directories (collections) and files (data)
• Mapping of logical name to physical attributes - host address, physical path.
• UNIX like API and utilities for making collections (mkdir) and data creation (creat)
• Virtualisation of Resources– Mapping of a logical resource name to physical attributes: Resource
Location, Type – Client use a single logical name to reference a resource
![Page 28: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/28.jpg)
Listing Resources
• SgetR – List Configured Resources– SgetR– --------------------------- RESULTS ------------------------------– rsrc_name: unix-sdsc– netprefix: srb.sdsc.edu:NULL:NULL– rsrc_typ_name: unix file system– default_path:
/misc/srb/srb/SRBVault/?USER.?DOMAIN/?SPLITPATH/TEST.?PATH?DATANAME.?RANDOM.?TIMESEC
– phy_default_path: /misc/srb/srb/SRBVault/?USER.?DOMAIN/?SPLITPATH/TEST.?PATH?DATANAME.?RANDOM.?TIMESEC
– phy_rsrc_name: unix-sdsc– rsrc_typ_name: unix file system– rsrc_class_name: permanent– user_name: srb– domain_desc: sdsc– zone_id: sdscdemo– -----------------------------------------------------------------
![Page 29: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/29.jpg)
Serial Mode Data Transfer
• Simple to Implement and Use – Unix-like API – srbObjCreate, srbObjWrite
• Performance Issue– 2 hops data transfer – Single data stream– One file at a time – overhead relatively high for
small files• MCAT interaction – query and registration• Small buffer transfer
• Large files – Single Hop, multiple data streams• Small files – Single Hop, multiple files at a time
![Page 30: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/30.jpg)
Upload a File to a SRB Resource
• Sput –S unix-sdsc localFile srbFile– Default data transfer mode – serial
• Sls -l srbFile– srb 0 unix-sdsc 2764364 2004-08-21-18.19 % srbFile
![Page 31: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/31.jpg)
Small files Data Transfer (Bulk operation)
• Upload/download large number of small files– One file at a time – relative high overhead
• MCAT interaction, Small buffer transfer
• <= 0.5 sec/file for LAN, > 1 sec/files for WAN
• Bulk Operation– Bulk data transfer
• transfer multiple files in a single large buffer (8 Mb)
– Bulk Registration• Register large number of files (1,000) in a single call
– Multiple threads for transfer and registration
– Single Hop
– 3-10 times speedup
– All or nothing type operation
– Specify -b in Sput/Sget
![Page 32: Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc](https://reader036.vdocument.in/reader036/viewer/2022070412/5681492f550346895db66cee/html5/thumbnails/32.jpg)
Parallel Mode Data Transfer
• For large file transfer– multiple data streams – Single hop data transfer
• Two sub-modes – Server initiated – Client initiated (for clients behind firewall)
• Up to 5 times speed up for WAN• Two simple API – srbObjPut and srbObjGet• Use –m (Server initiated), -M (Client initiated) options• Available to all Scommands involving data transfer
– As an option – Sput, Sget, Srsync– Automatic – Sreplicate, Scp, Sbkupsrb, SsyncD, Ssyncont