finding information: metadata in atlas

Finding Information: Metadata in ATLAS Elizabeth Gallas – Oxford ATLAS UK: Software Session Lancaster, UK January 9, 2013

Upload: quanda

Post on 17-Jan-2016




2 download


Finding Information: Metadata in ATLAS. Elizabeth Gallas – Oxford ATLAS UK: Software Session Lancaster, UK January 9, 2013. Outline. What is “Metadata” ? Challenges in ATLAS Survey some user oriented systems using Metadata Show utility of collecting metadata into dedicated systems - PowerPoint PPT Presentation


Page 1: Finding Information: Metadata in ATLAS

Finding Information:Metadata in ATLAS

Elizabeth Gallas – Oxford

ATLAS UK: Software SessionLancaster, UK

January 9, 2013

Page 2: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 2

Outline What is “Metadata” ?

Challenges in ATLAS Survey some user oriented systems using Metadata Show utility of collecting metadata into dedicated systems

Tour of some COMA Reports Features: Runs, Periods, Triggers, Luminosity … metadata New content and newly aggregated quantities

Describe a few areas: metadata in evolution (Event) Dataset Nomenclature: PhysicsShort, AMI tags Transforms and metadata AMI Hierarchical Search (aka: Dataset Browser)

New interface … a different way to find Datasets of interest aim to help metadata issues in MC

Summary and Conclusions

Page 3: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 3

What is Metadata ?Metadata definition: Concisely:

“data about data” More precisely:

“data used to describe the context, content or structure of data”

Structural or DescriptiveMetadata: used extensively in ATLAS …In fact: No process doesn’t use metadata

Descriptive examples: Dataset name, Run Number,

Channel number in some detector, TWiki Name, Trigger Names, dates/times, DQ Defect, ATLAS Software release number, …

Structural examples: Number of runs or events or files,

data volume, structure of compound objects, …

Usage examples: Upstream: data taking with the

correct calibrations … Downstream: user finding Events

of interest … or Luminosity for an event sample

Metadata challenges: Data/metadata: have grown

organically as the experiment evolved

Size/Scope of ATLAS data … Volume/Diversity of metadata

Following evolution in Run1 and trying to anticipate changes for Run2

Try to offer a coherent / integrated view to physicists while devising strategic placement for processing and analysis

Page 4: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 4

ATLAS User Application Overview Subsystem specific: driven by subsystem specific needs (using metadata)

Trigger: wide variety of tools and interfaces Geometry DB: Detector Description Browser Conditions DB:

RunQuery (in-depth Run information from Conditions DB) ATLAS WEB DQ COOL Tag Browser

Lumi Data Summary Reports (Luminosity, Beam) GRLs (Good Run List xml)

And the Luminosity calculator Beam Spot Summary GANGA and PAthena Panda / monitor DQ2 Client Tag Collector – software releases ... (not a complete list !)

Dedicated Metadata Catalogs TAGs (and TAG Catalog) – event level metadata

iELSSI and Suite of TAG Services AMI – Datasets, processing … other metadata

And the AMI Suite of services COMA – Run/LB level Conditions and configuration

Plus Conditions DB management metadata Important metadata facilitator: ATLAS Job Transforms

Fundamental areas

for every analysis !

See specific talks

in software tutorials.

COMA: Is an Oxford based project.

Page 5: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 5

COMA @ OxfordThe COMA Project: TWiki: ConditionsMetadata Originally: built to support other systems. Evolved into a standalone system with its own

interfaces. Components: Relational Database (Oracle)

Info: copied, refined, reduced, derived Unique content (not found elsewhere)

Data Periods, Derived/Aggregated data Interfaces (Reports and Browsers)

Current efforts: COMA Database content/interfaces growing

Aggregating various quantities across Periods, Runs and by Trigger

Adding event counts: Stream, Trigger Enhance aspects of MC metadata (LS1)

Improve content, functionality, and usability

Beyond COMA: COMA is part of general effort to consolidate/relate ATLAS Metadata

Strong ties with AMI and TAG DB COMA data/links now found in many ATLAS systems:

AMI, TAGs, DataQuality, RunQuery, Muon alignment, Conditions DB Many links from ATLAS TWiki physics pages and personal pages

Ryan Buckingham (4th year)

Kate Pachal(2nd year)

Dr. Jeff Tseng

Dr. Elizabeth Gallas

Page 6: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 6

COMA Interfaces Portal

Most popular


at top of this

Portal page

(shade: grey) … operational … but no current/active development

Page 7: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 7

COMA: ATLAS Data Periods … + aggregating new content

A Data Period is a set of ATLAS Runs grouped for a purpose Defined by Data Preparation Coordinators Used in ATLAS data processing, assessment, and selection … Each Period uniquely defined with a combination of

Project name (i.e. ‘data10_7TeV’) Period name (i.e. ‘C1’, ‘C2’, ‘C’, ‘AllYear’ …)

Before 2011, Data Periods were Described on TWiki page Stored in a file based system

Edited by hand by Data Prep Coordination (experts) Structure evolved over 2010 with experience

This experience valuable to decide/define long term solution

In 2011: Data Periods moved into COMA Coordination/Effort: Data Prep, AMI, COMA experts This made all aspects of Period definitions available programmatically

Since then, COMA content has grown in many areas Allows for more details reports and information to other systems Enables aggregation of LB-wise information by Run, … Period.

Painful to maintain,Error prone

Simple to enter, check integrity, more

Robust, available

Page 8: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 88

Period Menu Purpose: Overview of all DataPrep defined Periods giving links to reports of general info about their Runs.

Choose the Period of interest: By Year

e.g. all ‘2011’ Or for ‘all years’

By Project e.g. ‘data12_8TeV’

By Beam Energy or Type e.g. ‘7TeV’

By specific Period or Group Click on the project and then

Period of interest General feature of COMA Reports

“highlighted” link opens expanding


Help, DocLinks

Page 9: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 9

highlight links:

show / hide

period members

Members of


are A1-A8

Links: to COMA,RunQuery,

AMI Container production


Input criteriaLinks in Table column headers:

Short description of column

Note: some columns

removed using the

“customize report”

feature (not shown)

Hover on link:

Indicates what

will happen

Page 10: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 10

“Event”: detector output during a single particle bunch crossing“Lots”: LHC max particle bunch crossing rate is 31.6 MHz“Fewer”: a few hundred events per second“Trigger” is a multi-component selection filter for events:

ATLAS detector hardware/electronics Many subsystems … TDAQ

ATLAS software: HLT Release Mostly C++ algorithms collected in a specific ATLAS

Software Release executed by the HLT (2nd,3rd trigger levels)

Trigger Menu: defines ~500 to 1000 Triggers Every distinct Menu is assigned a unique integer ID

SMK: Super Master Key Configurable input to the Trigger hardware and software

Specifies what logic or algorithms to execute, including configurable parameters (eg: thresholds)

Assigns each trigger to one/more output Streams Menu (SMK) is FIXED during each Run (not incl.prescales) Each trigger: 3 levels of pass OR fail

Each Event either passes or fails each Trigger Prescales: Blind filter applied by TDAQ when above Trigger

logic does not sufficiently reduce event output rate Prescales can change during a Run (on LB boundary) Integer identifiers are assigned to sets of prescales

Level 1 and HLT Prescale Keys

Trigger Intro

Event is recorded for offline physics analysis if it passes at least one trigger (and its prescale)


“Fewer” but more interesting Events

“Lots” of “Events”




Page 11: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 11

Highest level Trigger Configuration Metadata: SMK Trigger Chains: EF chain, L2 Chain, L1 Item

Names, Versions, Bit Assignments, Streams, ReRun LVL1, HLT Prescale Keys:

EF, L2, L1 prescales EF, L2 Passthrough values

Details behind Trigger Configuration and what is stored event-wise: need tools from the Trigger Experts Understanding trigger execution and info storage

Algorithms, cuts, multiplicities, bunch groups Dead-time veto, BCID / Train / Lumi dependence Trigger objects related to trigger decisions HLT algorithm Error codes Trigger EDM and the Trigger Decision Tool How to work with Chain Groups (Trigger ‘OR’s)

See the trigger related talks in Software Tutorials:

Trigger Metadata: just the tip of the iceberg

COMA: Stores this metadata.Combines it w / Period,Run,Lumidata to provide unique reports.

Page 12: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 12

COMA Chain Wildcard Reports

L1_2EM*_MU*over all periods


Page 13: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 13

1.Configuration Summary:Shows where this element is configured:

Super Master Key(s) Project (Summary)

2. Period Evolution: Shows chain/item bit, version evolutionfor EF_g20_loose chains during PeriodRuns

3. Activation Summary:Shows Runs where this chain is ”active”

Via prescale Via pass through Via rerun


Page 14: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 14June 2011 Elizabeth Gallas - COMA 14

COMA Chain Report (EF_e9_tight_e5_tight_Jpsi)Expand Run-wise Activation … “Physics” EF-L2-L1 signatures

Active via Prescale Runs in Data Periods

Table Shows (Run Count): Periods, Link: Run, SMK Reports Level bit assignments Link to: Chain/Item Reports (3) Range of Aggregate Prescale while

chain is active via prescale in Run Links: COMA Prescale Report (3)

70 Period Runs where this chain is “active”

Page 15: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 15June 2011 Elizabeth Gallas - COMA 15

COMA Chain Report (EF_e9_tight_e5_tight_Jpsi)Period Evolution Section … For “Physics” chains in Period Runs Separate table for each EF-L2-L1

signature at each beam energy

Each Table Shows: Row-wise:Distinct set of bit and

chain/item versions

Columns: Bit assignments Chain version (links to Trig diff) Chain Report Links Range of AggPS, SMK, Run,

Period, Date, HLTRelease

Thanks to Tomasz, Joerg

for many useful discussions

Page 16: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 16June 2011 Elizabeth Gallas - COMA 16

COMA Chain Wildcard Report (input: “EF_g10%”)Purpose: See all the names matching a pattern or

Find exact name from part of the nameReport: Displays chain/item names matching the input string …

text size proportional to occurrence in SMK In Period Runs and in All Runs

Page 17: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 17June 2011 Elizabeth Gallas - COMA 17

Summary and Plans COMA – an integral part of ATLAS Metadata infrastructure

Essential to ATLAS event-level metadata decoding Ideally placed to provide links and interface to other metadata

Special relationship to AMI (and TAG catalog) Launch iELSSI to take a quick look at any Run

Primary source for “ATLAS Data Periods” Periods in Lum, DQ, Run Summary, AMI reports comes from COMA

Reports feature “derived” information not available elsewhere Trigger experts recommend COMA Trigger/Prescale

Report usage: from ~200 to over 5000 pages viewed/month. Peaked in July as users did final preparations for summer conferences

Current efforts: COMA Database content growing

Watch use cases to identify new areas to focus growth COMA Report and Browser development

keep pace with content, improve functionality and usability Beyond COMA: Interface development

Connecting COMA to other parts of the infrastructure

Page 18: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 18June 2011 Elizabeth Gallas - COMA 18

COMA Conclusions

This is an evolving system … information in the system is growing based on information available and use cases Adding more dimensions to the Conditions data

With suitable relationships to facilitate queries Making that criteria available in dynamic useable interfaces

We want to insure the Metadata is complete enough to satisfy use cases while reflecting accurately its limitations

Interfaces are being constructed to use selection syntax, criteria, and communication in common use in ATLAS This facilitates cross checks with other systems

Continuous process: talking with various experts to ensure data integrity, completeness, compatibility w/other systems

… Very positive feedback so far … more always welcome …

[email protected]

Page 19: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 19

Shows “at a glance”: the latest Period Runs with Magnet states, ‘ready fraction’, link to Stable Beam fill(s), beam information …

Oct 2012 E Gallas / COMA & TAGs 19

COMA multi-Run report: Latest 6 runs

Page 20: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 20

New aggregated information

Oct 2012 E Gallas / COMA & TAGs 20

COMA Period Documentation Report: enhanced content

Page 21: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 21

New aggregated information

COMA Period Documentation Report: enhanced content

Page 22: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 22

Shows “at a glance”: the latest Period Runs with Magnet states, ‘ready fraction’, link to Stable Beam fill(s), beam information …

Oct 2012 E Gallas / Metadata 22

COMA multi-Run report: Latest 8 runs

Page 23: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 23

A lot of progress in many areas using metadata: Transforms, Data Processing, Dataset related metadata Dedicated Metadata Catalogs: AMI, COMA, (TAGs)

Metadata in ATLAS continues to evolve Naming conventions/rules

Important to form coherent view over datasets, runs, periods, … Increased cooperation between systems

Upstream and downstream Use cases continue to expand Improvements in metadata

Storage Consistency Delivery Usage

Challenges ahead Offer coherence at Management and User levels To keep pace with

system evolution (such as DDM Rucio, ProdSys, … upgrades) Analysis pattern evolution and use cases

Summary and Conclusions

E Gallas / MetadataOct 2012 23

Page 24: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 24

Dataset names used extensively: Storage and operating systems, DDM, ProdSys, Metadata repositories But needs to be pneumonic from user point of view

Dataset naming rules: Carefully defined by experts, evolved somewhat, has served us well But was last updated in 2010 … needs of ATLAS have grown

2012 Task Force formed to try to amend the rules to address these needs

Overall length < 231 characters (base directory name): Hard limit ! If each field at field limit, overall limit is exceeded !

Many pressures on component lengths … Highest areas of concern: “physicsShort” – for MC datasets AMI tag – for both data and MC

Importance of Name: coherence must be understood at all ATLAS levels From Management to Users … and sometimes limits are good

Keep a rational balance !!!

Metadata Issues: Dataset Names



dataNN_* or mcNN_* ESD, AOD, … Concatenation of configurations

Page 25: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 25

Example of one proposed “Physics Short”: MadgraphPythia8_NNPDF21NLOME_AU2NNPDF21LOMPI_SingleTopTChanWelenu_LeptonFilter Rules: physicsShort field must not exceed 40 characters.” This one: 78 characters (and is it really user friendly ?)

This kind of ‘growth’ is oblivious to the rules, shows addiction of experts/users to depending entirely on the Dataset Name to identify/find their data

General frustration finding MC needed, Twiki pages, understanding the MC they use, and

identifying additional MC samples they need or what exists … Jamie Boyd:

“General feeling is this level of info should be encoded in AMI rather than the filename – need to follow up with generators group on this”

Progress in 2012: Commendable effort by MC Coordination: add more metadata to AMI “Simulation Metadata Workshop” – held in April 2012 Metadata systems need to provide better tools which

Better explains relays the metadata behind the dataset AND Better allows browsing of the datasets and the metadata

“PhysicsShort” for MCProject.datasetNumber.physicsShort.productionStep.dataType.AMITag[/


Page 26: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 26

AMI: Portal Page:

Means of finding datasets to use in analysis using physics metadata predicates

Not just dataset names, but also the underlying metadata AMI contains a LOT more than just a list of datasets

Dataset provenanceFilesLost Lumi blocksLinks to other applicationsNomenclature reference tablesConnection to COMA and all its data

New AMI interface: Dataset ‘browsing’ (hierarchical search)Now available to users (first version !)Good feedback from users … important for evolution

AMI team working on refining this tool based on feedbackAdding available metadata coherently: always a challenge

How does Metadata Help ?

Page 27: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 27

User is guided to the AMI catalog specific to the project of interest Information varies according to project Allows users progressive selection to iteratively narrow result set

This is a working/evolving example … the major point is: Always open to ideas for new interfaces using wealth of metadata that exists

AMI Dataset Browsing

Page 28: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 28

Critical Component: “Transforms” and Metadata “ATLAS Transforms”: a wrapper to Athena & python job options

Thanks to the Transform Group ! GraemeStewart, StephenBeal, ThomasGadfort, HarveyMaddocks, BjornSarrazin

See Graeme’s talk during Software week Required, for example, by the ATLAS production system Provides uniform, coherent mechanisms for specifying, executing tasks

Even multi-step transforms

New Transforms: General merging capabilities

Also need for the merging of file based metadata Provide important computations

Such as Event counts Bridges the gap in metadata communication

uniform information transfer to other systems and metadata repositories

Page 29: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 29

Summary and Conclusions There are significant challenges ahead

LS1 planning is well underway With a longer term view: we hope will handle future data volumes

Many major systems need to evolve in major ways Take advantage of accumulated experience and new technology While maintaining operations

Maintaining the experts we need !!! Metadata in ATLAS continues to evolve

Naming conventions/rules Important to form coherent view over datasets, runs, periods, …

Increased cooperation between systems Upstream and downstream

Use cases continue to expand Improvements in metadata

Storage Consistency Delivery Usage

Page 30: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 30

2nd issue in Dataset Names: AMI (“Config”) Tags The AMI Tag:

Definition: Is composed of concatenated strings encoding processing steps

Example: r2713_p705 … encodes information about which ATLAS releases ( which database releases ( which transforms (, job configurations, …

Why is it called the “AMI tag” ? AMI provides interfaces for its interpretation

Rules for AMI tags also listed in Nomenclature doc Original specification now also needs revision

Max length sometimes exceeds limit (22) – multiple factors driving this … Highlight some issues to be addressed:

Running out of lower case letters Numeric parts … require more characters (99 … 999 … 9999 ….?) More processing/merging steps: add more/more fields

Must find a way to consolidate steps in a managed way

Page 31: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 31

AMI tags: Evolution AMI Tag issues also being discussed in the Nomenclature Task Force

Solveig Albrand: evolving document describing issues/possibilities

The scope and use AMI tags has turned out to be much wider than the original design anticipated

When considering all issues: A major phase change is required A step-wise way solution (case by case concatenation of parts of AMI

tag but not others) would be a long term mistake: Confusing, waste developer time, inevitably incomplete

Example proposal: AMI tag “e1494_s1499_s1504_r3658_r3549_t85” would become “mc1201234_t85” where “mcYYnnnnn” means e1494_s1499_s1504_r3658_r3549,

and would be substituted for the AMI tag used to produce merged AOD, the nnnnnth chain for the mcYY data

This is under discussion. A complete set of rules will be written and proposed for approval by Data Preparation Proposal must include how other systems cope with the change

And take advantage of it Describe the interfaces: help users understand underlying


Page 32: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 32

Page 33: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 33

Overview of Plans for LS1Sept 2012: DB coordination asked all database developers their plans for LS1:

Plans to modify in any way the use of central (Oracle) databases Needs to scale up Oracle data sizes and/or load in Run 2 Intentions to move any activities to Hadoop

Foreseen load (data and CPU) for the Hadoop applications Requests for: web servers or centrally managed machines

Sub-system plans for change vary widely From NONE to major changes in storage TOO many to list individually here

Responses/plans collected in TWiki:

DatabasesLS1Planning (All) LS1ConditionsDB

CompUpgPlanDistriComputing (ADC) DCS workshop (PVSS):

confId=208712 Some details also in talks SC Week DB session:

Page 34: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 34

Metadata tools: now upgrade Users need appropriate tools to find, understand, process, analyse the

data they need to produce results. Increase in data rate will make this even more critical

Improve and expand use of metadata tools AMI, COMA, and TAG systems are currently undergoing a lot of

growth and evolution out of use cases arising with existing data 2012 data volume is forcing changes

Heavy Ion processing MUST use TAGs: currently in use Group processing also testing TAG usage

TIM workshop: many jobs peeking at files, but reading no events ? Better usage of metadata might eliminate the need to provide/access

unneeded files Reduce unneeded use of grid resources

Recent TAG “Brainstorming” (November 2012): Collect feedback from users, experts Identify issues and use cases Parallel efforts:

Keep system running while improving existing TAG performance Look into possible use of alternative storage (Hadoop / HBase)

Page 35: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 35

Must recognize: any change is painful for users Disruptive to workflow; Immediate interest is to get results out quickly

Any change must, in parallel, come with tools they need to “GET OVER IT” It helps the process if we provide MORE of the information they need

Cartoon Break: Cycles of Change

Page 36: Finding Information: Metadata in ATLAS

Jan 2013 E.Gallas- Metadata 36

Challenges of building a New World New/Replacement systems require:

Motivation“why do we need that”?

Vision (long term) Resources

Developers, infrastructure New/improved technology *

Knowledge how/when to use it Existing data/systems are

A Blessing Reflect real usage Populate new system with real

data A Curse

Maintain existing operations LOTS of real data Backward compatibility

Carries inherently Risks (failure) and Rewards (better world)