![Page 1: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/1.jpg)
1GCA Application in STAR GCA Collaboration
Grand Challenge ArchitectureGrand Challenge Architecture
and its Interface to STARand its Interface to STAR
Sasha Vaniachine
presenting for the Grand Challenge collaboration
(http:/www-rnc.lbl.gov/GC/)
March 27, 2000
STAR MDC3 Analysis Workshop
![Page 2: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/2.jpg)
2GCA Application in STAR GCA Collaboration
OutlineOutline
• GCA Overview
• STAR Interface:– fileCatalog– tagDB– StGCAClient
• Current Status
• Conclusion
![Page 3: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/3.jpg)
3GCA Application in STAR GCA Collaboration
GCA: Grand Challenge ArchitectureGCA: Grand Challenge Architecture
• An order-optimized prefetch architecture for data retrieval from multilevel storage in a multiuser environment
• Queries select events and specific event components based upon tag attribute ranges– query estimates are provided prior to execution– collections as queries are also supported
• Because event components are distributed over several files, processing an event requires delivery of a “bundle” of files
• Events are delivered in an order that takes advantage of what is already on disk, and multiuser policy-based prefetching of further data from tertiary storage
• GCA intercomponent communication is CORBA-based, but physicists are shielded from this layer
![Page 4: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/4.jpg)
4GCA Application in STAR GCA Collaboration
ParticipantsParticipants
• NERSC/Berkeley Lab– L. Bernardo, A. Mueller, H. Nordberg, A. Shoshani,
A. Sim, J. Wu
• Argonne– D. Malon, E. May, G. Pandola
• Brookhaven Lab– B. Gibbard, S. Johnson, J. Porter, T. Wenaus
• Nuclear Science/Berkeley Lab– D. Olson, A. Vaniachine, J. Yang, D. Zimmerman
![Page 5: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/5.jpg)
5GCA Application in STAR GCA Collaboration
ProblemProblem
• There are several– Not all data fits on disk ($$)
• Part of 1 year’s DST’s fit on disk– What about last year, 2 year’s ago?– What about hits, raw?
– Available disk bandwidth means data read into memory must be efficiently used ($$)
• don’t read unused portions of the event• Don’t read events you don’t need
– Available tape bandwidth means files read from tape must be shared by many users, files should not contain unused bytes ($$$$)
– Facility resources are sufficient only if used efficiently• Should operate steady-state (nearly) fully loaded
![Page 6: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/6.jpg)
6GCA Application in STAR GCA Collaboration
BottleneksBottleneks
Keep recently accessed data on disk, but manage itso unused data does notwaste space.
Try to arrangethat 90% of fileaccess is to diskand only 10%are retrievedfrom tape.
![Page 7: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/7.jpg)
7GCA Application in STAR GCA Collaboration
Solution ComponentsSolution Components
• Split event into components across different files so that most bytes read are used– Raw, tracks, hits, tags, summary, trigger, …
• Optimize file size so tape bandwidth is not wasted– 1GB files, means different # of events in each file
• Coordinate file usage so tape access is shared– Users select all files at once– System optimizes retrieval and order of processing
• Use disk space & bandwidth efficiently– Operate disk as cache in front of tape
![Page 8: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/8.jpg)
8GCA Application in STAR GCA Collaboration
STAR Event ModelSTAR Event Model
T. Ullrich, Jan. 2000
![Page 9: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/9.jpg)
9GCA Application in STAR GCA Collaboration
Analysis of EventsAnalysis of Events
• 1M events = 100GB – 1TB– 100 – 1000 files (or more if not optimized)
• Need to coordinate event associations across files
• Probably have filtered some % of events– Suppose 25% failed cuts after trigger selection
• Increase speed by not reading these 25%
• Run several batch jobs for same analysis in parallel to increase throughput
• Start processing with files already on disk without waiting for staging from HPSS
![Page 10: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/10.jpg)
10GCA Application in STAR GCA Collaboration
In the DetailsIn the Details
– Range-query language, or query by event list• “NLa>700 && run=101007”, • {e1,r101007;e3,r101007;e7;r101007 …}• Select components: dst, geant, …
– Query estimation• # events, # files, # files on disk, how long, …• Avoid executing incorrect queries
– Order optimization• Order of events you get maximizes file sharing and
minimizes reads from HPSS
– Policies• # of pre-fetch, # queries/user, # active pftp connections, …• Tune behavior & performance
– Parallel processing• Submitting same query token in several jobs will cause
each job to process part of that query
![Page 11: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/11.jpg)
11GCA Application in STAR GCA Collaboration
Organization of Events in FilesOrganization of Events in Files
Event Identifiers(Run#, Event#)
Event components
Files
File bundle 1 File bundle 2 File bundle 3
![Page 12: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/12.jpg)
12GCA Application in STAR GCA Collaboration
GCA System OverviewGCA System Overview
Client
GCASTACS
Stagedeventfiles
EventTags
(Other)disk-resident
event data
Index
HPSSpftp
fileCatalog
ClientClient
Client
Client
![Page 13: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/13.jpg)
13GCA Application in STAR GCA Collaboration
STACS: STorage Access Coordination SystemSTACS: STorage Access Coordination System
Bit-SlicedIndex
FileCatalog
PolicyModule
Query Status,CacheMap
QueryMonitor
List of file bundles and events
CacheManager
Requests for file caching and purging
QueryEstimator
Estimate
pftp and file purge commands
File Bundles,Event lists
Query
![Page 14: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/14.jpg)
14GCA Application in STAR GCA Collaboration
database
Interfacing GCA to STARInterfacing GCA to STAR
GC System
StIOMaker
fileCatalog
tagDB
QueryMonitor
CacheManager
QueryEstimator
STAR Software
IndexBuilder
gcaClient
FileCatalog
IndexFeeder
GCA Interface
![Page 15: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/15.jpg)
15GCA Application in STAR GCA Collaboration
Limiting DependenciesLimiting Dependencies
STAR-specificSTAR-specific
• IndexFeeder server– IndexFeeder read the “tag database” so that GCA “index
builder” can create index
• FileCatalog server– FileCatalog queries the “file catalog” database of the
experiment to translate fileID to HPSS & disk path
& GCA-dependent& GCA-dependent
• gcaClient interface– Experiment sends queries and get back filenames
through the gcaClient library calls
![Page 16: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/16.jpg)
16GCA Application in STAR GCA Collaboration
Eliminating DependenciesEliminating Dependencies
StIOMaker
ROOT + STAR Software
<<Interface>>StGCAClient
libGCAClient.so
libStCGAClient.so(implementation)
/opt/star/lib
CORBA + GCA software
libOB.so
ROOT
![Page 17: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/17.jpg)
17GCA Application in STAR GCA Collaboration
STAR STAR fileCatalogfileCatalog
• Database of information for files in experiment.File information is added to DB as files are created.
• Source of File information
– for the experiment
– for the GCA components (Index, gcaClient,...)
![Page 18: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/18.jpg)
18GCA Application in STAR GCA Collaboration
Job monitoring system
Cataloguing Analysis WorkflowCataloguing Analysis Workflow
fileCatalog
Job configuration manager
![Page 19: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/19.jpg)
19GCA Application in STAR GCA Collaboration
GCA MDC3 Integration Work GCA MDC3 Integration Work
http://www-rnc.lbl.gov/GC/meetings/14mar00/default.htm
14-15 March 2000
![Page 20: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/20.jpg)
20GCA Application in STAR GCA Collaboration
Status TodayStatus Today
• MDC3 Index– 6 event components:
– 179 physics tags:
– 120K events– 8K files
• Updated daily...
•fzd•geant•dst•tags•runco•hist
•StrangeTag•FlowTag•ScaTag
![Page 21: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/21.jpg)
21GCA Application in STAR GCA Collaboration
User QueryUser Query
ROOT Session:
![Page 22: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/22.jpg)
22GCA Application in STAR GCA Collaboration
STAR Tag Database AccessSTAR Tag Database Access
![Page 23: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/23.jpg)
23GCA Application in STAR GCA Collaboration
Problem:Problem: SELECT NLa>700SELECT NLa>700
ntuple
Event # NLa1 7312 8003 3454 5435 567
index
NLa Event #345 3543 4567 5731 1800 2
read selected eventsread all events
![Page 24: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/24.jpg)
24GCA Application in STAR GCA Collaboration
STAR Tag Structure DefinitionSTAR Tag Structure Definition
Selections likeqxa²+qxb² > 0.5
can not use index
![Page 25: 1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge](https://reader036.vdocument.in/reader036/viewer/2022062803/56649f295503460f94c428e4/html5/thumbnails/25.jpg)
25GCA Application in STAR GCA Collaboration
ConclusionConclusion
• GCA developed a system for optimized access to multi-component event data files stored in HPSS.
• General CORBA interfaces are defined for interfacing with the experiment.
• A client component encapsulates interaction with the servers and provides an ODMG-style iterator.
• Has been tested up to 10M events, 7 event components, 250 concurrent queries.
• Is currently being integrated with the STAR experiment ROOT-based I/O analysis system.