part four: the lsc datagrid
DESCRIPTION
Part Four: The LSC DataGrid. Part Four: LSC DataGrid. A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool. A: Data Replication. General Principle. Not all pipes are created equal. Neither are all storage locations. Data Requirements. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/1.jpg)
![Page 2: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/2.jpg)
Part Four:The LSC DataGrid
![Page 3: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/3.jpg)
Part Four: LSC DataGrid
• A: Data Replication
• B: What is the LSC DataGrid?
• C: The LSCDataFind tool
![Page 4: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/4.jpg)
A: Data Replication
![Page 5: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/5.jpg)
General Principle
Not all pipes are created equal. Neither are all
storage locations.
![Page 6: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/6.jpg)
Data Requirements
• Catalog 108 files and their locations• What files are where (possibly at more than one
place)• Across multiple sites within a Grid
• No single point of failure• No central catalog/server
![Page 7: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/7.jpg)
Data Replication Services: Concepts
• Abstract logical file name (LFN) from physical filename (PFN)
• Maintain a local replica catalog (LRC) mapping from LFNs to PFNs only for local files.
• Maintain a replica location index (RLI) mapping LFNs to other sites’ LRCs for files that aren’t local.
![Page 8: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/8.jpg)
Replica Location Service
file1→ gsiftp://serverA/file1file2→ gsiftp://serverA/file2
LRC
RLIfile3→ rls://serverB/file3file4→ rls://serverB/file4
rls://serverA:39281
file1file2
site A
file3→ gsiftp://serverB/file3file4→ gsiftp://serverB/file4
LRC
RLIfile1→ rls://serverA/file1file2→ rls://serverA/file2
rls://serverB:39281
file3file4
site B
![Page 9: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/9.jpg)
RLS: Replica Location Service
• Globus RLS
• Each RLS server usually runs two catalogs:• LRC: Local Replica Catalog
• Catalog of what files you have (LFNs) and mappings to URL(s) or PFNs
• RLI: Replica Location Index• Catalog of which files (LFNs) that other LRCs in your
data grid know about
![Page 10: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/10.jpg)
A Site’s LRC
• Each site has LRC with mappings of LFNs to PFNs• usually contains the “local” mappings• where files are located at the site
• Example: UMW might have this mapping in its LRC:
H-R-792845521-16.gwf → gsiftp://dataserver.phys.uwm.edu/LIGO/H-R-792845521-16.gwf
![Page 11: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/11.jpg)
LRCs Inform Each Other
LRC catalog at each site tells remote RLIs what LFNs it has mappings for.
• Example: UWM tells Caltech it has a mapping for H-R-792845521-16.gwf
• So Caltech RLI has mappingH-R-792845521-16.gwf → LRC at Milwaukee
![Page 12: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/12.jpg)
How it Works (Under the Hood)
Ask your local LRC: “Do you know about file X?”• If yes, you can ask your local LRC for the
corresponding URL (PFN).• If no,
• Ask your local RLI: “Who do I ask about X?”
• It will answer, “The RLS server at Site Y.”
• Ask the LRC at Site Y, “Do you know about file X?”
• It will return the PFN.
![Page 13: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/13.jpg)
SRB: Storage Request Broker
• http://www.sdsc.edu/srb/• Distributed data management solution• Supports management, collaborative (and controlled)
sharing, publication, and preservation of distributed data collections
• Provides rich set of APIs available to higher-level applications
• Provides a management layer on top of a wide variety of storage systems.
![Page 14: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/14.jpg)
SRB
• SRB can be thought of as a:• Distributed file system• Datagrid management system• Digital Library system• Semantic Web
![Page 15: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/15.jpg)
SRB as Data Grid Management
• Transparent replication
• Archiving, caching, synchs, and backups
• Heterogeneous storage
• Container and aggregated data movement
• Bulk data ingestion
• Third-party copy & move
![Page 16: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/16.jpg)
LDR: Lightweight Data Replicator
• http://www.lsc-group.phys.uwm.edu/LDR
• Replicates datasets within a data grid• High-speed data transfers with Globus GridFTP• Globus RLS stored using a MySQL backend• Metadata stored in MySQL backend• Uses GSI for security
![Page 17: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/17.jpg)
LDR
• Collections of files to be replicated defined by LRD administrator as a SQL query
• Priority queue for scheduling replication
![Page 18: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/18.jpg)
B: What is the LSC DataGrid?
![Page 19: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/19.jpg)
What is the LSC DataGrid?
• A collection of LSC computational and storage resources…
• … linked through Grid middleware…
• … into a uniform LSC data analysis environment.
![Page 20: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/20.jpg)
LSC DataGrid Sites
• Tier 1: CalTech• Tier 2: UWM and PSU• Tier 3: UT-Brownsville and Salish Kootenai
College (SKC)• Linux clusters at GEO sites Birmingham,
Cardiff and the Albert Einstein Institute (AEI)• LDAS instances at Caltech, MIT, PSU, and
UWM
![Page 21: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/21.jpg)
Monitoring the LSC DataGrid
http://watchtower.phys.uwm.edu/ganglia-webfrontend/
![Page 22: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/22.jpg)
Lab 4: LSCDataFind
![Page 23: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/23.jpg)
Lab 4: LSCDataFind
• In this lab, you’ll:• Verify your DataFind configuration• Find observatories• Find data types• Find actual data (wow!)• Refine a search• Retrieve data you’ve found
![Page 24: Part Four: The LSC DataGrid](https://reader036.vdocument.in/reader036/viewer/2022062315/5681593c550346895dc678a0/html5/thumbnails/24.jpg)
Credits
• NSF disclaimer
• Portions of this presentation were adapted from the following sources:• GryPhyN Grid Summer Workshop• NEESgrid Sysadmin Workshop