lhc data processing challenges: from detector data to physics july 2007 pere mat ó cern

33
LHC data processing challenges: from Detector data to Physics July 2007 Pere Mató CERN

Upload: edward-lawrence

Post on 31-Dec-2015

230 views

Category:

Documents


13 download

TRANSCRIPT

LHC data processing challenges: from Detector data to Physics

July 2007

Pere Mató CERN

Pere Mato, CERN/PH

2

The goal is to understand in the most general; that’s usually also the simplest.

- A. Eddington

We use experiments to inquire about what “reality” does.

Theory &Parameters

Reality

This talk is about filling

this gap

Pere Mato, CERN/PH

3

Theory

“Clear statement of how the world works formulated mathematically in term of equations”

Particle Data Group, Barnett et al

Additional term goes here

Pere Mato, CERN/PH

4

Experiment0x01e84c10: 0x01e8 0x8848 0x01e8 0x83d8 0x6c73 0x6f72 0x7400 0x00000x01e84c20: 0x0000 0x0019 0x0000 0x0000 0x01e8 0x4d08 0x01e8 0x5b7c0x01e84c30: 0x01e8 0x87e8 0x01e8 0x8458 0x7061 0x636b 0x6167 0x65000x01e84c40: 0x0000 0x0019 0x0000 0x0000 0x0000 0x0000 0x01e8 0x5b7c0x01e84c50: 0x01e8 0x8788 0x01e8 0x8498 0x7072 0x6f63 0x0000 0x00000x01e84c60: 0x0000 0x0019 0x0000 0x0000 0x0000 0x0000 0x01e8 0x5b7c0x01e84c70: 0x01e8 0x8824 0x01e8 0x84d8 0x7265 0x6765 0x7870 0x00000x01e84c80: 0x0000 0x0019 0x0000 0x0000 0x0000 0x0000 0x01e8 0x5b7c0x01e84c90: 0x01e8 0x8838 0x01e8 0x8518 0x7265 0x6773 0x7562 0x00000x01e84ca0: 0x0000 0x0019 0x0000 0x0000 0x0000 0x0000 0x01e8 0x5b7c0x01e84cb0: 0x01e8 0x8818 0x01e8 0x8558 0x7265 0x6e61 0x6d65 0x00000x01e84cc0: 0x0000 0x0019 0x0000 0x0000 0x0000 0x0000 0x01e8 0x5b7c0x01e84cd0: 0x01e8 0x8798 0x01e8 0x8598 0x7265 0x7475 0x726e 0x00000x01e84ce0: 0x0000 0x0019 0x0000 0x0000 0x0000 0x0000 0x01e8 0x5b7c0x01e84cf0: 0x01e8 0x87ec 0x01e8 0x85d8 0x7363 0x616e 0x0000 0x00000x01e84d00: 0x0000 0x0019 0x0000 0x0000 0x0000 0x0000 0x01e8 0x5b7c0x01e84d10: 0x01e8 0x87e8 0x01e8 0x8618 0x7365 0x7400 0x0000 0x00000x01e84d20: 0x0000 0x0019 0x0000 0x0000 0x0000 0x0000 0x01e8 0x5b7c0x01e84d30: 0x01e8 0x87a8 0x01e8 0x8658 0x7370 0x6c69 0x7400 0x00000x01e84d40: 0x0000 0x0019 0x0000 0x0000 0x0000 0x0000 0x01e8 0x5b7c0x01e84d50: 0x01e8 0x8854 0x01e8 0x8698 0x7374 0x7269 0x6e67 0x00000x01e84d60: 0x0000 0x0019 0x0000 0x0000 0x0000 0x0000 0x01e8 0x5b7c0x01e84d70: 0x01e8 0x875c 0x01e8 0x86d8 0x7375 0x6273 0x7400 0x00000x01e84d80: 0x0000 0x0019 0x0000 0x0000 0x0000 0x0000 0x01e8 0x5b7c0x01e84d90: 0x01e8 0x87c0 0x01e8 0x8718 0x7377 0x6974 0x6368 0x0000

1/5000th of an event in the CMS detector– Get about 100 events

per second

Pere Mato, CERN/PH

5

What does the Data Mean? Digitization:

“Address”: what detector element took the reading

“Value”: What the electronics wrote down

Look up type, calibration info

Look up/calculate spatial position

Check valid, convert to useful units/form

Draw

Pere Mato, CERN/PH

6

Graphical Representation

Pere Mato, CERN/PH

7

Bridging the Gap

Raw Data

Theory &Parameters

Reality

A small number of general equations, with specificinput parameters (perhaps poorly known)

The imperfect measurement of a (set of) interactions in the detectorVery strong selection

ObservablesSpecific lifetimes, probabilities, masses,branching ratios, interactions, etc

EventsA unique happening:Run 21007, event 3916 which contains a H -> xx decay

Reconstruction

Analysis

Phenomenology

Pere Mato, CERN/PH

8

Physics Selection at LHC

LEVEL-1 TriggerHardwired processors (ASIC, FPGA) Pipelined massive parallel

HIGH LEVEL Triggers Farms of

processors

10-9 10-6 10-3 10-0 103

25ns 3µs hour yearms

Reconstruction&ANALYSISTIER0/1/2

Centers

ON-lineOFF-line

sec

Giga Tera Petabit

Pere Mato, CERN/PH

9

TriggerTask: inspect detector information and provide a first decision on whether to keep the event or throw it out

The trigger is a function of :

• Detector data not (all) promptly available• Selection function highly complexT(...) is evaluated by successive approximations, the

TRIGGER LEVELS(possibly with zero dead time)

Event data & Apparatus Physics channels & Parameters

T( ) ACCEPTED

REJECTED

Pere Mato, CERN/PH

10

Trigger Levels

Lvl-1

Lvl-2

HLT

Front end pipelines

Readout buffers

Processor farms

Switching network

Detectors

“Traditional”: 3 physical levels

Level-1 trigger: reduce 40 MHz to 105 Hz– This step is always there

Level-2 trigger: from 105 Hz to 103 Hz– Not always there if next level

can take the rate and bandwidth

High Level Trigger (HTL)– Need to get to 102 Hz that is

what is affordable to be permanently stored and processed offline

Pere Mato, CERN/PH

11

Trigger and DAQ Challenges N (channels) ~ O(107); ≈20 interactions every 25 ns

– need huge number of connections– need information super-highway

Calorimeter information should correspond to tracker info– need to synchronize detector elements to (better than) 25 ns

In some cases: detector signal/time of flight > 25 ns– integrate more than one bunch crossing's worth of

information– need to identify bunch crossing...

Can store data at ≈ 102 Hz– need to reject most interactions

It's On-Line (cannot go back and recover events)– need to monitor selection

Pere Mato, CERN/PH

12

Trigger/DAQ Summary for LHCNo.Levels Level-1 Event Readout Filter OutTrigger Rate (Hz) Size (Byte) Bandw.(GB/s) MB/s (Event/s)

3 105 106 10 100 (102)

LV-2 103

2 105 106 100 100 (102)

3 LV-0 106 2x105 4 40 (2x102)

LV-1 4 104

4 Pp-Pp 500 5x107 5 1250 (102)

p-p 103 2x106 200 (102)

ATLAS

CMS

LHCb

ALICE

Pere Mato, CERN/PH

13

individualphysicsanalysis

batchphysicsanalysis

batchphysicsanalysis

detectorEvent Summary

Data (ESD)

rawdata

eventreconstruction

eventreconstruction

eventsimulation

eventsimulation

event filter(selection &

reconstruction)

event filter(selection &

reconstruction)

processeddata

Offline Processing Stages and Datasets

Analysis Object Data (AOD)(extracted by physics topic)

Pere Mato, CERN/PH

14

Main Software Requirements

The new software being developed by the LHC experiments must cope with the unprecedented conditions and challenges that characterizes these experiments (trigger rate, data volumes, etc.)The software should not become the limiting factor for the

trigger, detector performance and physics reach for these experiments

In spite of its complexity it should be easy-to-use – Each one of the ~ 4000 LHC physicists (including people from

remote/isolated countries, physicists who have built the detectors, software-old-fashioned senior physicists) should be able to run the software, modify part of it (reconstruction, ...), analyze the data, extract physics results

Users demand simplicity (i.e. hiding complexity) and stability

Pere Mato, CERN/PH

15

Software Structure

non-HEP specificsoftware packages

Experiment Framework

EventDet

Desc.Calib.

Applications

Core Libraries

SimulationData

Mngmt.Distrib.Analysis

Every experiment has a framework for basic services and various specialized frameworks: event model, detector description, visualization, persistency, interactivity, simulation, calibrarion, etc.

General purpose non-HEP libraries

Applications are built on top of frameworks and implementing the required algorithms

Core libraries and services that are widely used and provide basic functionality

Specialized domains that are common among the experiments

Pere Mato, CERN/PH

16

Software Components

Foundation Libraries– Basic types– Utility libraries– System isolation libraries

Mathematical Libraries– Special functions– Minimization, Random

Numbers Data Organization

– Event Data– Event Metadata (Event

collections)– Detector Conditions Data

Data Management Tools– Object Persistency– Data Distribution and

Replication

Simulation Toolkits– Event generators– Detector simulation

Statistical Analysis Tools– Histograms, N-tuples– Fitting

Interactivity and User Interfaces– GUI– Scripting– Interactive analysis

Data Visualization and Graphics– Event and Geometry displays

Distributed Applications– Parallel processing– Grid computing

Pere Mato, CERN/PH

17

Programming Languages

Object-Oriented (O-O) programming languages have become the norm for developing the software for HEP experiments

C++ is in use by (almost) all Experiments– Pioneered by Babar and Run II (D0 and CDF)– LHC experiments with an initial FORTRAN code base have

basically completed the migration to C++ Large common software projects in C++ have been in

production for many years aready– ROOT, Geant4, …

FORTRAN still in use mainly by the MC generators– Large developments efforts are put for the migration to C++

(Pythia8, Herwig++, Sherpa,…)

Pere Mato, CERN/PH

18

Scripting Languages

Scripting has been an essential component in the HEP analysis software for the last decades– PAW macros (kumac) in the FORTRAN era– C++ interpreter (CINT) in the C++ era– Python recently introduced and gaining momentum

Most of the statistical data analysis and final presentation is done with scripts– Interactive analysis– Rapid prototyping to test new ideas– Driving complex procedures

Scripts are also used to “configure” complex C++ programs developed and used by the LHC experiments– “Simulation” and “Reconstruction” programs with hundreds or

thousands of options to configure

Pere Mato, CERN/PH

19

Object-Orientation

The object-oriented paradigm has been adopted for HEP software development– Basically all the code for the new generation experiments is O-O– O-O has enabled us to handle reasonably well higher complexity

The migration to O-O was not easy and took longer than expected– The process was quite long and painful (between 4-8 years)

» The community had to be re-educated to new languages and tools

– C++ is not a simple language» Only specialists master it completely

– Mixing interpreted and compiled languages (e.g. C++ and Python) is a workable compromise

Pere Mato, CERN/PH

20

Non-HEP Packages widely used in HEP

Non-HEP specific functionality required by HEP programs can be implemented using existing packages– Favoring free and open-source software– About 30 packages are currently in use by the LHC experiments

Here are some examples– Boost

» Portable and free C++ source libraries intended to be widely useful and usable across a broad spectrum of applications

– GSL» GNU Scientific Library

– Coin3D» High-level 3D graphics toolkit for developing

cross-platform real-time 3D visualization– XercesC

» XML parser written in a portable subset of C++non-HEP specific

software packages

Experiment Framework

Applications

Core Libraries

SimulationData

Mngmt.Distrib.Analysis

Pere Mato, CERN/PH

21

HEP Generic Packages (1)

Core Libraries– Library of basic types (e.g. 3-vector, 4-vector, points, particle,

etc.)– Extensions to C++ Standard Library– Mathematical libraries– Statistics libraries

Utility Libraries– Operating system isolation libraries– Component model– Plugin management– C++ Reflexion

Examples: ROOT, CLHEP, etc.non-HEP specific

software packages

Experiment Framework

Applications

Core Libraries

SimulationData

Mngmt.Distrib.Analysis

Pere Mato, CERN/PH

22

HEP Generic Packages (2)

MC Generators– This is the best example of common code used by all the

experiments» Well defined functionality and fairly simple interfaces

Detector Simulation– Presented in form of toolkits/frameworks (Geant4, FLUKA)

» The user needs to input the geometry description, primary particles, user actions, etc.

Data Persistency and Management– To store and manage the data produced by experiments

Data Visualization– GUI, 2D and 3D graphics

Distributed and Grid Analysis– To support end-users using the distributed computing

resources (PROOF, Ganga,…)non-HEP specific

software packages

Experiment Framework

Applications

Core Libraries

SimulationData

Mngmt.Distrib.Analysis

Pere Mato, CERN/PH

23

ROOT I/O

ROOT provides support for object input/output from/to platform independent files– The system is designed to be particularly efficient for objects

frequently manipulated by physicists: histograms, ntuples, trees and events

– I/O is possible for any user class. Non-intrusive, only the class “dictionary” needs to be defined

– Extensive support for “schema evolution”. Class definitions are not immutable over the life-time of the experiment

The ROOT I/O area is still moving after 10 years– Recent additions: Full STL support, data compression, tree I/O

from ASCII, tree indices, etc. All new experiments rely on ROOT I/O to store its data

Pere Mato, CERN/PH

24

Persistency Framework

FILES - based on ROOT I/O– Targeted for complex data structure: event data, analysis data– Management of object relationships: file catalogues– Interface to Grid file catalogs and Grid file access

Relational Databases – Oracle, MySQL, SQLite– Suitable for conditions, calibration, alignment, detector description

data - possibly produced by online systems– Complex use cases and requirements, multiple ‘environments’ –

difficult to be satisfied by a single solution – Isolating applications from the database implementations with a

standardized relational database interface» facilitate the life of the application developers» no change in the application to run in different environments» encode “good practices” once for all

Pere Mato, CERN/PH

25

Monte Carlo simulation’s role

Raw Data

Theory &Parameters

Reality

Observables

Events

Calculate expected branching ratios

Randomly pick decay paths, lifetimes, etc for a number of events

Calculate what imperfect detector would have seen for those events

Treat that as real data and reconstruct it

Compare to original to understand efficiency

Pere Mato, CERN/PH

26

MC Generators

Many MC generators and tools are available to the experiments provided by a solid community– Each experiment chooses the tools more adequate for their physics

Example: ATLAS alone uses currently– Generators

» AcerMC: Zbb~, tt~, single top, tt~bb~, Wbb~ » Alpgen (+ MLM matching): W+jets, Z+jets, QCD multijets » Charbydis: black holes » HERWIG: QCD multijets, Drell-Yan, SUSY... » Hijing: Heavy Ions, Beam-gas.. » MC@NLO: tt~, Drell-Yan, boson pair production » Pythia: QCD multijets, B-physics, Higgs production...

– Decay packages» TAUOLA: Interfaced to work with Pythia, Herwig and Sherpa, » PHOTOS: Interfaced to work with Pythia, Herwig and Sherpa, » EvtGen: Used in B-physics channels.

Pere Mato, CERN/PH

27

Detector Simulation - Geant 4

Geant4 has become an established tool, in production for the majority of LHC experiments during the past two years, and in use in many other HEP experiments and for applications in medical, space and other fields

On going work in the physics validation Good example of common software

LHCb : ~ 18 million volumes ALICE : ~3 million volumes

Pere Mato, CERN/PH

28

Analysis - Brief History

1980s: mainframes, batch jobs, histograms back. Painful. Late 1980s, early 1990s: PAW arrives.

– NTUPLEs bring physics to the masses– Workstations with “large” disks (holding data locally) arrive; looping

over data, remaking plots becomes easy Firmly in the 1990s: laptops arrive;

– Physics-in-flight; interactive physics in fact. Late 1990s: ROOT arrives

– All you could do before and more. In C++ this time.– FORTRAN is still around. The “ROOT-TUPLE” is born– Side promise: reconstruction and analysis form a continuum

2000s: two categories of analysis physicists: those who can only work off the ROOT-tuple and those who can create/modify it

Mid-2000s: WiFi arrives; Physics-in-meeting

Pere Mato, CERN/PH

29

PROOF – Parallel ROOT Facility

PROOF aims to provide the necessary functionality that allows to run ROOT data analysis in parallel– A major upgrade of the PROOF system has been started in

2005. – The system is evolving from processing interactive short blocking queries to a system that also supports long running queries in a stateless client mode.

– Currently working with ALICE to deploy it on the CERN Analysis Facility (CAF)

Pere Mato, CERN/PH

30

Experiment Data Processing Frameworks

Experiments develop Software Frameworks– General Architecture of any Event processing applications

(simulation, trigger, reconstruction, analysis, etc.)– To achieve coherency and to facilitate software re-use– Hide technical details to the end-user Physicists– Help the Physicists to focus on their physics algorithms

Applications are developed by customizing the Framework– By the “composition” of elemental Algorithms

to form complete applications– Using third-party components wherever

possible and configuring them ALICE: AliROOT; ATLAS+LHCb: Athena/Gaudi CMS: moved to a new framework recently

non-HEP specificsoftware packages

Experiment Framework

Applications

Core Libraries

SimulationData

Mngmt.Distrib.Analysis

Pere Mato, CERN/PH

31

Example: The GAUDI Framework

User “algorithms” consume event data from the “transient data store” with the help of “services” and “tools” with well defined interfaces and produce new data that is made available to other “algorithms”.

Data can have various representations and “converters” take care of theirtransformation

The GAUDI framework is used by LHCb, ATLAS, Harp, Glast, BES III

Converter

Algorithm

Event DataService

PersistencyService

DataFiles

AlgorithmAlgorithm

TransientEvent Store

Detec. DataService

PersistencyService

DataFiles

TransientDetector

Store

MessageService

JobOptionsService

Particle Prop.Service

OtherServices

HistogramService

PersistencyService

DataFiles

TransientHistogram

Store

ApplicationManager

ConverterConverterEventSelector

Pere Mato, CERN/PH

32

Application Software on the Grid

What I have been describing until now is the software that runs on the Grid, it is not the Grid software itself (middleware)

Experiments have developed tools to facilitate the usage of the Grid

Example: GANGA– Help configuring and submitting analysis jobs (Job Wizard)– Help users to keep track of what they have done– Hide completely all technicalities– Provide a palette of possible choices and specialized plug- ins:

» pre-defined application configurations» batch/grid systems, etc.» GANGA ‘knows’ what the user is running,

so it can really help him/her– Single desktop for a variety of tasks– Friendly user interface is essential

GAUDI Program

GANGAGU

I

JobOptionsAlgorithms

Collective&

ResourceGrid

Services

HistogramsMonitoringResults

Pere Mato, CERN/PH

33

Summary

The online multi-level trigger is essential to select interesting collisions (1 in 105-106)– Level1: custom hardware, huge fanin/out problem, fast algorithms on

coarse-grained, low-resolution data– HTL: software/algorithms (offline) on large processor farm of PCs

Reconstruction and analysis is how we get from raw data to physics papers– Huge programs (107 lines of code) developed by 100’s of physicists– Unprecedented need of computing resources

The next generation of software for experiments needs to cope with more stringent requirements and new challenging conditions– The software should not be the limiting factor and should allow the

physicists extract the best physics from the experiment– The new software is more powerful but at the same time more

complex