analysis tools at d0 ppdg analysis grid computing project, cs 11 caltech meeting lee lueking femilab...
DESCRIPTION
December 19, 2002Lee Lueking - FNAL/CD SAM Simplified Database Schema Files ID Name Format Size # Events Files ID Name Format Size # Events Events ID Event Number Trigger L1 Trigger L2 Trigger L3 Off-line Filter Thumbnail Events ID Event Number Trigger L1 Trigger L2 Trigger L3 Off-line Filter Thumbnail Volume Project Data Tier Physical Data Stream Physical Data Stream Trigger Configuration Trigger Configuration Creation & Processing Info Creation & Processing Info Run Event-File Catalog Event-File Catalog Run Conditions Luminosity Calibration Trigger DB Alignment Run Conditions Luminosity Calibration Trigger DB Alignment Group and User information Group and User information Station Config. & Cache info Station Config. & Cache info File Storage Locations File Storage Locations MC Process & Decay MC Process & Decay SAM schema has over 100 tables There are several other related tablespaces also availableTRANSCRIPT
Analysis Tools at D0
PPDG Analysis Grid Computing Project, CS 11Caltech Meeting
Lee LuekingFemilab Computing Division
December 19, 2002
December 19, 2002 Lee Lueking - FNAL/CD
DZero Analysis Tools SAM dataset tools
– Database Schema– Datasets and Query Dimensions
MC RunJob• Monte Carlo request system
– Physics Groups submit web based requests for MC– Provides prioritization, work assignment, and tracking.
• D0 Framework, D0Tools, D0 Run Time Environment (RTE) • ROOT
– Using ROOT non-intrusive I/O for Dzero EDM (Event Data Model)– ROOT/SAM and ROOT/SAM-Grid (Philippe Canal, up next)
More in this talk
December 19, 2002 Lee Lueking - FNAL/CD
SAM Simplified Database Schema
FilesID
NameFormat
Size# Events
EventsID
Event NumberTrigger L1Trigger L2Trigger L3
Off-line FilterThumbnail
Volume
Project
Data TierPhysical
Data Stream
TriggerConfiguration
Creation &Processing
Info
Run
Event-FileCatalog
Run Conditions
Luminosity
Calibration
Trigger DB
Alignment
Group and User
information
Station Config. &Cache info
File Storage
Locations
MC Process & Decay
•SAM schema has over 100 tables
•There are several other related tablespaces also available
December 19, 2002 Lee Lueking - FNAL/CD
file Run Event Date Trigger Apo App vsn …
file1
file2
file3
file4
file5
filen
Challenge: Transform the complex SAM schema into a form that is user friendly, and avoids badly formed user SQL queries.
Solution: Transform the schema to look like one giant table.
Dimension Name
DataFile
December 19, 2002 Lee Lueking - FNAL/CD
Dimensions, Links, and ChainsMatthew Vranicar (Piocon Technologies), SAM team
• This transformation is done using what we call links and chains.
• A link is a description of how to relate two tables
• A chain is a set of links that connects the desired dimension (column in a table) to the datafile.
• These are stored in the database itself, and loaded in the middle tier server when it starts up.
• A grammar is provided for users to build complex queries employing dimensions.
December 19, 2002 Lee Lueking - FNAL/CD
Dimensions:Examples• There are dozens of dimensions available. • Additional dimensions are easily defined. • Examples of dimensions defined:
– APPL_NAME, APPL_NAME_ANALYZED, CONSUMED_DATE, CONSUMED_STATUS, CONSUMER, CONSUMER_GROUP, CONSUMER_ID, CREATE_DATE, DATASET_DEF_ID, DATASET_DEF_NAME, DATASET_ID, DATASET_VERSION, DATA_FILE_LOCATION_STATUS, DATA_TIER, DATA_TIER_ANALYZED, DELIVERED_STATUS, EVENT_NUMBER, FAMILY, FAMILY_ANALYZED, FILE_ANALYZED, FILE_NAME, FILE_PARTITION, FILE_STATUS, FULL_PATH, LOGICAL_DATASTREAM_NAME, PARAM_TYPE, RUN_ID, RUN_NUMBER, RUN_QUALITY, VERSION, VERSION_ANALYZED, WORK_GRP_NAME , etc., etc., etc.
• __SET__ : Special dimension allowing you to include an existing dataset definition.
December 19, 2002 Lee Lueking - FNAL/CD
Query Syntax and Grammar• Constraint operators:=, !=, >, < >=, <=, like, not like, in,
not in, between, is null, is not null • Sets operators: and, or, minus, (union, intersection to be added)• syntax: --dim="[(]name [conOper] value [setOper name
[conOper] value][)] ..." • Command line examples:
– sam define dataset --defname=dataset_definition_name --group=work_group_name --dim="(run_number 100930 data_tier digitized) minus physical_datastream_name electron+jet"
– sam create dataset --defname=dataset_definition_name
Note: Through an SBIR (Matthew Vranicar, Piocon + Randolf Herber, FNAL CD) are providing additional features. More reliability using tokenizer (flex), and parser (bison) to check the grammar and do () handling. Also, security, user access control, database resource management are being added.
December 19, 2002 Lee Lueking - FNAL/CD
December 19, 2002 Lee Lueking - FNAL/CD
December 19, 2002 Lee Lueking - FNAL/CD
MC_RunJob:What is it?Greg Graham (FNAL/CMS), David Evans (Lancaster University/DZero)
• Python based work flow planner• Metadata language interpreter• Flexible and generic
Dataset1
Pkg 1
Pkg 1
Pkg 1
Pkg 2Pkg 3
Pkg 2
Pkg 2
Pkg 2
Pkg 1Processing Line
Phase Boundaries
DS 2 DS 3
Pkg 3
Dataset4
December 19, 2002 Lee Lueking - FNAL/CD
MC_RunJob:Work Flow Planner
• Chain a set of inputs, processes, and outputs• Set up a local execution environment• Parallel-ize the job according to the local
environment.• Produce metadata to describe each processing
step.• Retrieval/delivery of input/output data.• Cluster scale job management (adapting to work
with SAM-Grid).
December 19, 2002 Lee Lueking - FNAL/CD
MC_RunJob:Metadata Language Interpreter
• Define metadata language:keywords• Convert metadata into jobs: Macro• Generate metadata for processors• Log the metadata for output to declare/store into SAM
Metadata Physics Result
December 19, 2002 Lee Lueking - FNAL/CD
MC_RunJob:Use at DZero
• MC production at remote farms• User MC production on central resources• MC SAM storage using keywords• Run MC software: Gen => GEANT => Digi =>
Trigsim.• Runs MC Reconstruction• Runs MC/Data analyzers: ROOT-tuple/tree makers• Plans to use more broadly for data analysis.
December 19, 2002 Lee Lueking - FNAL/CD
Flexibility
• OO based python– Basic stuff is pretty generic– Essentially one python class and a list of metadata will
allow running an executable in a SAM friendly way. Can complicate it as desired.
– SAM gets the metadata definition from runjob, so when you add new features, telling SAM about it is as simple as typing “sam load keywords”.
December 19, 2002 Lee Lueking - FNAL/CD
For Additional Information
• Dataset Dimensions:– http://d0db.fnal.gov/sam_project_editor
• MC_RunJob: – http://www-clued0.fnal.gov/mc_runjob/mainframe.html
• Monte Carlo Request System:– http://www-d0.fnal.gov/computing/mcprod/mcc.html
• Other D0 stuff:– http://www-d0.fnal.gov/atwork
• SAM/Root:– http://d0db.fnal.gov/sam/doc/userdocs/SamRoot.html