hybrid event store
DESCRIPTION
Hybrid Event Store. ATLAS Software Week Database Session. David Adams BNL March 7, 2002. Contents. What does hybrid mean? Files and their contents File Event data object (EDO) Object ID Event Placement category (PC) File, PC and EDO associations File interface. Catalogs - PowerPoint PPT PresentationTRANSCRIPT
David Adams
ATLAS
Hybrid Event Store
David Adams
BNL
March 7, 2002
ATLAS Software Week
Database Session
March 7, 2002Hybrid Event Store SW week – DB session 2
David Adams
ATLAS
Contents• What does hybrid mean?
• Files and their contents
–File
–Event data object (EDO)
–Object ID
–Event
–Placement category (PC)
–File, PC and EDO associations
–File interface
• Catalogs
• Reading and writing
–Input stream
–Output stream
–Store view
• HES components
• Tasks and schedule
March 7, 2002Hybrid Event Store SW week – DB session 3
David Adams
ATLAS
What does “hybrid” mean?Hybrid merges
• Files that manage event data objects (EDO’s) and references to EDO’s with
• Relational DB’s used to catalog the files and EDO’s.
Files are self-describing• The data in a file can be traversed without
consulting any file catalog.• References between objects in files can be
resolved without consulting file catalogs.
March 7, 2002Hybrid Event Store SW week – DB session 4
David Adams
ATLAS
FileFile type
• HES supports files of different types (formats).• Each file type is responsible for providing its
own means to write and read data.
File replica• A file replica contains the same data as the
original file.• The replica may be a simple bitwise copy or• It may be of a type different from the original.
March 7, 2002Hybrid Event Store SW week – DB session 5
David Adams
ATLAS
File (cont)File ID and names
• Each file carries– A unique ID
– A unique logical name
– A locally unique physical name
• A file replica carries the ID and logical name of its original.
• Normally any replica may be used in place of the original file.
March 7, 2002Hybrid Event Store SW week – DB session 6
David Adams
ATLAS
Event data object (EDO)What is an EDO?
• Collection of data associated with a particular beam crossing (event ID)
• Typically a homogenous collection, e.g. tracks, jets or electrons
EDO’s in HES• HES doesn’t care what is in an EDO.• HES provides an interface for file types that
– write transient EDO’s to files and
– read them back in from files.
March 7, 2002Hybrid Event Store SW week – DB session 7
David Adams
ATLAS
EDO (cont)EDO ID
• Each EDO is assigned a unique ID.• The ID specifies the:
– ID of the file that owns the EDO
– Event ID
– EDO type (and version?)
– String key
• Any type-key may appear no more than once for any event ID in any file.
• An EDO is retrieved from a file with its ID.
March 7, 2002Hybrid Event Store SW week – DB session 8
David Adams
ATLAS
EDO (cont)EDO reference
• A file holds an EDO by reference if it holds a ID for that EDO but does not hold the data.
• The file owning an EDO must hold that EDO by value, not just by reference.
March 7, 2002Hybrid Event Store SW week – DB session 9
David Adams
ATLAS
EDO (cont)EDO replica
• A file which does not own an EDO may hold a replica.
– The replica has a copy of the EDO data and may be used in place of the original.
– The replica carries the same ID as the original.
• A reference to an EDO may be satisfied by the file owning the EDO or any file carrying a replica.
March 7, 2002Hybrid Event Store SW week – DB session 10
David Adams
ATLAS
Object IDRequirement
• EDO’s contain objects• Objects in one EDO need to reference those in
another EDO– Pointer or reference in the transient world
Solution• HES defines an object ID:
– ID of the EDO holding the referenced object plus
– Index indicating the location of the referenced object in its EDO
March 7, 2002Hybrid Event Store SW week – DB session 11
David Adams
ATLAS
Object ID (cont)Size considerations
• The EDO ID carries a lot of information and is fairly large (~200 bytes).
• There may be very many object ID’s.• EDO’s in files may store a small EDO index in
place of the EDO ID.– The index is only valid in the context of the EDO.
– Probably 8 bits to allow 512 referenced EDO’s.
– Converted to full EDO ID when the object is converted to transient form.
March 7, 2002Hybrid Event Store SW week – DB session 12
David Adams
ATLAS
EventEvents in HES files
• Each file holds data for a specified collection of event ID’s.
• Each EDO in a file is associated with exactly one event ID.
– Add type and key to specify an EDO..
• “Event holds an EDO” means the EDO is associated with the ID of that event.
– There are no event objects.
March 7, 2002Hybrid Event Store SW week – DB session 13
David Adams
ATLAS
Placement categoryPC’s in events
• Each EDO in a file (by value or reference) is associated with a named placement category (PC) in an event.
– This is hint to the file that EDO’s in the same PC are likely to be accessed together.
– Files can share data at the level of a PC.
• Each event in a file is associated with (“holds”) the same collection of PC names (types)
March 7, 2002Hybrid Event Store SW week – DB session 14
David Adams
ATLAS
Placement Category (cont)PC’s in events (cont)
• Each PC holds a collection of EDO ID’s indexed by type-key.
– File may choose to organize EDO data by PC.
• The union of these type-keys or (EDO ID’s) for all PC’s in an event constitutes the view of the event for that file.
– Users may restrict this view to a subset of the PC’s.
March 7, 2002Hybrid Event Store SW week – DB session 15
David Adams
ATLAS
Placement Category (cont)PC type
• Each PC is an instance of a PC type• The type defines
– the PC name and
– the allowed type-keys > (the type-keys in the PC will be a subset of these)
• The file holds the definition of all types that appear in that file
• Each event “holds” one PC of each type
March 7, 2002Hybrid Event Store SW week – DB session 16
David Adams
ATLAS
Placement Category (cont)Sharing categories
• The ATLAS DB architecture design distinguishes between “placement categories” and “sharing categories”.
• We have merged the two into PC.– This was agreed to at an ANL meeting last October
and no objections were raised there or subsequently.
– We will go back and make this separation in HES if the need arises.
March 7, 2002Hybrid Event Store SW week – DB session 17
David Adams
ATLAS
Placement Category (cont)PC references
• Any PC in an event in a file may be replaced with a PC reference.
– The referenced PC has the same name and event ID as the PC reference.
– The referenced PC must be held by value
• The file holding the referenced PC must be accessed to construct the view of the event
– (To know which type-keys are included.)
• Reference may only be satisfied in the original file (no PC replicas).
March 7, 2002Hybrid Event Store SW week – DB session 18
David Adams
ATLAS
File, PC and EDO associations
The following figure illustrates some allowed associations between files, PC’s and EDO’s.
• The first event in the first file holds all EDO’s by value.
• The second file holds only references.– The first PC holds an EDO by reference.
– The second PC is held by reference.
– The EDO reference in the second event may be satisfied by the original EDO in the third file or its replica in the first.
March 7, 2002Hybrid Event Store SW week – DB session 19
David Adams
ATLAS
File, PC and EDO associations (cont)F ile f1
Event e1
P C p1
P C p2
P C p3
ED O 1 ED O 2
ED O 3
ED O 4 ED O 5 ED O 6
Event e2
P C p1
P C p2
P C p3
ED O 7' ED O 8'
ED 10 ED O 11 ED O 12
F ile f2
Event e1
P C p1
P C p3
Event e2
P C p1
F ile f3
Event e2
P C p1
P C p2
ED O 7 ED O 8
ED O 09
P C p3
E xa m ple o f p os s ib le a s so c ia tions be tw e en H E S file s , p la c em e nt c a tego rie s (P C 's ) and e v ent da ta o b je c ts (E D O 's ).
March 7, 2002Hybrid Event Store SW week – DB session 20
David Adams
ATLAS
File interfaceThe following figure illustrates the file structure implied by the file interface.
• Ovals on the right indicate data that can be obtained from the file on the left.
– Labels on the line indicate the key required to specify the data.
• Blue indicates data which is not specific to an event.
• Yellow indicates the collection of event ID’s.• Remaining is data associated with an event ID.
March 7, 2002Hybrid Event Store SW week – DB session 21
David Adams
ATLAS
File interfaceF ile
F ile typ e
P hys ic al file name
Lo gic al file name
F ile ID
S tream name
P C typ e
P C nam e
P C name
EDO typ e-key
Event ID
P C ID
Event ID, PC nam
e
P C
EDO hand le
EDO
IDPC
ID
ED O his to ry
F ile ID
Event ID
P C name
EDO IDtype-key
EDO d ata
EDO ID
ref index
EDO ID
EDO
ID
parent index
March 7, 2002Hybrid Event Store SW week – DB session 22
David Adams
ATLAS
CatalogsFile location catalog
• Also known as replica catalog• Enables the user to locate the physical file(s)
corresponding to a logical file name
Logical file name
Site
Directory
Physical filename
• Table at right is crude first pass
• Expect this to be implemented in the GRID environment
March 7, 2002Hybrid Event Store SW week – DB session 23
David Adams
ATLAS
Catalogs (cont)File content catalog
• Enables users to locate logical file name based on ID
• Enables users to locate logical files based on stream type, event and production
• Example at right
Logical file name
File ID
Stream type name
Min event ID
Max event ID
Job ID
Production thread ID
Production environment ID
March 7, 2002Hybrid Event Store SW week – DB session 24
David Adams
ATLAS
Catalogs (cont)Stream catalog
• Specifies which placement category types are included in which stream types.
• Example at right.
PC name
EDO type
EDO key
PC catalog• Specifies which type-keys are
included in which stream types.• Example at right.
Stream type name
PC name
March 7, 2002Hybrid Event Store SW week – DB session 25
David Adams
ATLAS
Catalogs (cont)EDO catalog
• Enables users to locate the file holding a particular EDO.
• Unlikely this would be created for all data but would be used for subsets such as datasets.
• Example at right.• Original EDO ID relevant
for regenerated data
EDO ID (derived?)
Original EDO ID
EDO type
EDO key
Event ID
PC name
File ID
March 7, 2002Hybrid Event Store SW week – DB session 26
David Adams
ATLAS
Input streamCollection of files to define events
• All files have same stream type– Stream type = set of PC types
• Any event ID appears at most once
Placement categories• Specify which PC’s are accepted or omitted
Next event ID• Can be externally specified• Stream provides means to generate
March 7, 2002Hybrid Event Store SW week – DB session 27
David Adams
ATLAS
Input stream (cont)Event
• The input data for an event in a stream includes all EDO’s in accepted PC’s for the event ID
• PC’s and PC references are taken from one file• PC and EDO references can be satisfied in a
separate collection of “reference files”– The event cannot be defined (set of type-keys
discovered) if any PC’s cannot be found
– If an EDO and any of its replicas is not found, the event is defined but the data for that EDO is inaccessible
March 7, 2002Hybrid Event Store SW week – DB session 28
David Adams
ATLAS
Output streamType
• Each output stream is of a named type which specifies the included PC types
– Each event added to the stream will include one PC of each type
– Each PC type specifies the allowed type-keys
– User (see view) may choose whether or not to write an EDO of allowed type-key
March 7, 2002Hybrid Event Store SW week – DB session 29
David Adams
ATLAS
Output stream (cont)Files
• Output stream includes a series of files to which data is added for each accepted event
• Stream has policies for– Deciding when a file is full and opening a new file
for the stream
– Providing ID’s and logical and physical names for these files
March 7, 2002Hybrid Event Store SW week – DB session 30
David Adams
ATLAS
Store viewContents
• One or more input streams• Collection of files to be used for chasing PC
and EDO references• One or more output streams
Event selection• The view can assign the ID for the next event
– By iterating over a user-defined list or
– Asking one of its streams to make this event
March 7, 2002Hybrid Event Store SW week – DB session 31
David Adams
ATLAS
Store view (cont)Reading the event
• Data extracted using the same event ID for all input streams
• The input event is defined as the union of the input events in in each stream
– No type-key may be duplicated
March 7, 2002Hybrid Event Store SW week – DB session 32
David Adams
ATLAS
Store view (cont)Writing the event
• User specifies which streams are accepted for each event
• Event data is written for all accepted streams• View assigns a stream to own each new EDO
that is to be written• View has policy for deciding for each stream:
– Whether each PC is written by value or reference
– Which EDO’s are written by value
– Which EDO’s are written by reference
March 7, 2002Hybrid Event Store SW week – DB session 33
David Adams
ATLAS
HES components
S to re V ie w
I n p u tS tre a m
O u tp u tS tre a m
F ile
P C
E D OED O ID
E D O I D
O b je c tI D
( re f )
March 7, 2002Hybrid Event Store SW week – DB session 34
David Adams
ATLAS
Tasks and schedulePlan:
• To deliver an initial version of HES that– is sufficient to meet the needs of DC1-2
– and serves as prototype for the LCG common hybrid event store
• Attempt a design that can evolve to meet the long-term goals of both ATLAS and the LCG
• Cooperate with the LCG– to whatever extent possible in the short term
– fully in the long term
March 7, 2002Hybrid Event Store SW week – DB session 35
David Adams
ATLAS
Tasks and schedule (cont)DC1-2 functionality
• HES core– Base (ID’s, PC, …)
– File interface
– Simple implementation of input and output streams
– Simple implementation of view
• Athena/StoreGate integration– See talk for EDM meeting
• ROOT storage type with HES interface• Sufficient cataloging
March 7, 2002Hybrid Event Store SW week – DB session 36
David Adams
ATLAS
Tasks and schedule (cont)First release
• Deliver June 1, 2002– In time for users to test and discover any design
flaws well in advance of DC1-2
• Effort required is 20 FTE-weeks
HES 4 FTE-weeks
Athena/SG integration
7 FTE-weeks
ROOT 7 FTE-weeks
ZEBRA 2 FTE-weeks
Cataloging ?
– Plus testing
– 2X contingency implied by work thus far
March 7, 2002Hybrid Event Store SW week – DB session 37
David Adams
ATLAS
Tasks and schedule (cont)Completed to date:
• Design sufficient to begin the first implementation
– See the HES page at http://www.usatlas.bnl.gov/~dladams/hybrid
• HES ID’s, placement category and file interface have been implemented (see HES page)
• ROOT persistency (but not HES interface) is far along
March 7, 2002Hybrid Event Store SW week – DB session 38
David Adams
ATLAS
Tasks and schedule (cont)Resources
• BNL PAS group will is focusing on HES core and ROOT
– Outside help is welcome
• Need allocation of priority (and volunteers) to implement Athena/SG integration
– BNL can provide some of this
• Cataloging (RDB) needs to be better understood
– Again BNL would like to involved but expects to share the effort