12 september 2013, nec2013/varna rené brun/cern*

41
The Evolution of HEP software 12 September 2013, NEC2013/Varna René Brun/CERN*

Upload: lilian-eells

Post on 15-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 12 September 2013, NEC2013/Varna René Brun/CERN*

The Evolutionof HEP software

12 September 2013, NEC2013/Varna

René Brun/CERN*

Page 2: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 2

planIn this talk I present the views of somebody involved in some aspects of scientific computing as seen from a major lab in HEP.

Having been involved in the design and implementation of many systems, my views are necessarily biased by my path in several experiments and the development of some general tools.

I plan to describe the creation and evolution of the main systems that have shaped the current HEP software, with some views for the near future.

12/09/13

Page 3: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 3

Machines

12/09/13

From Mainframes ===== Clusters

Walls of

cores

GRIDs&

Clouds

Page 4: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 4

Machine Units (bits)

12/09/13

16 32 36 48 56 60 64pdp

11nord50

besm6

cdc many

many

univac

With even more combinations of

exponent/mantissa size

or byte ordering

A strong push to develop portable

machine independent I/O systems

Page 5: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 5

User machine interface

12/09/13

Page 6: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 6

General Software in 1973

Software for bubble chambers: Thresh, Grind, Hydra

Histogram tool: SUMX from Berkeley

Simulation with EGS3 (SLAC), MCNP(Oak Ridge)

Small Fortran IV programs (1000 LOC, 50 kbytes)

Punched cards, line printers, pen plotters (GD3)

Small archive libraries (cernlib), lib.a12/09/13

Page 7: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 7

Software in 1974First “Large Electronic Experiments”

Data Handling Division == Track Chambers

Well organized software in TC with HYDRA, Thresh, Grind, anarchy elsewhere

HBOOK: from 3 routines to 100, from 3 users to many

First software group in DD12/09/13

Page 8: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 8

GEANT1 in 1975Very basic framework to drive a simulation program, reading data cards with FFREAD, step actions with GUSTEP, GUNEXT, apply mag-field (GUFLD).

Output (Hist/Digits) was user defined

Histograms with HBOOK

About 2,000 LOC

12/09/13

Page 9: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 9

ZBOOK in 1975Extraction of the HBOOK memory manager in an independent package.

Creation of banks and data structures anywhere in common blocks

Machine independent I/O, sequential and random

About 5,000 LOC

12/09/13

Page 10: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 10

GEANT2 in 1976Extension of GEANT1 with more physics (e-showers based on a subset of EGS, mult-scattering, decays, energy loss

Kinematics, hits/digits data structures in ZBOOK

Used by several SPS experiments (NA3, NA4, NA10, Omega)

About 10,000 LOC

12/09/13

Page 11: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 11

Problems with GEANT2

Very successful small framework.

However, the detector description was user written and defined via “if” statements at tracking time.

This was becoming a hard task for large and always evolving detectors (case with NA4 and C.Rubbia)

Many attempts to describe a detector geometry via data cards (a bit like XML), but the main problem was the poor and inefficient detector description in memory.

12/09/13

Page 12: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 12

GEANT3 in 1980A data structure (ZBOOK tree) describing complex geometries introduced , then gradually the geometry routines computing distances, etc

This was a huge step forward implemented first in OPAL, then L3 and ALEPH.

Full electromagnetic showers (first based on EGS, then own developments)

12/09/13

Page 13: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 13

Systems in 1980

12/09/13

OS & fortran

LibrariesHBOOK, Naglib, cernlib

ExperimentSoftware

End userAnalysis software

CDC, IBM

1000 KLOC

500 KLOC

100 KLOC

10 KLOC

Vax780

TapesRAM1 MB

Page 14: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 14

GEANT3 with ZEBRA

ZEBRA was very rapidly implemented in 1983.

We introduced ZEBRA in GEANT3 in 1984.

From 1984 to 1993 we introduced plenty of new features in GEANT3: extensions of the geometry, hadronic models with Tatina, Gheisha and Fluka, Graphics tools.

In 1998, GEANT3 interface with ROOT via the VMC (Virtual Monte Carlo)

GEANT3 has been used and still in use by many experiments.

12/09/13

Page 15: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 15

PAWFirst minimal version in 1984

Attempt to merge with GEP (DESY) in 1985, but take the idea of ntuples for storage and analysis. GEP was written in PL1.

Package growing until 1994 with more and more functions. Column-wise ntuples in 1990.

Users liked it, mainly once the system was frozen in 1994.

12/09/13

Page 16: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 16

Vectorization attempts

During the years 1985->1990 a big effort was invested in vectorizing GEANT3 (work in collaboration with Florida State University) on CRAY/YMP, CYBER205,ETA10.

The minor gains obtained did not justify the big manpower investment. GEANT3 transport was still essentially sequential and we had a big overhead with vectors creation, gather/scatter.

However this experience and failure was very important for us and many messages useful for the design of GEANT5 many years later.

12/09/13

Page 17: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 17

Parallelism in the 80s & early 90s

Many attempts (all failing) with parallel architectures

Transputers and OCCAM

MPP (CM2, CM5, ELXI,..) with OpenMP-like software

Too many GLOBAL variables/structures with Fortran common blocks.

RISC architectures or emulators perceived as a cheaper solution in the early 90s.

Then MPPs died with the advent of the Pentium Pro (1994) and farms of PCs or workstations.

12/09/13

Page 18: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 18

1992: CHEP Annecy

Web, web, web, web…………

Attempts to replace/upgrade ZEBRA to support/use F90 modules and structures, but modules parsing and analysis was thought to be too difficult.

With ZEBRA the bank description was within the bank itself (just a few bits). A bank was typically a few integers followed by a dynamic array of floats/doubles.

We did not realize at the time that parsing user data structures was going to be a big challenge!!

12/09/13

Page 19: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 19

ConsequencesIn 1993/1994 performance was not anymore the main problem.

Our field invaded by computer scientists.

Program design, object-oriented programming , move to more sexy languages was becoming a priority.

The “goal” was thought less important than the “how”

This situation deteriorates even more with the death of the SSC.

12/09/13

Page 20: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 20

1993: Warning Danger

3 “clans” in my group1/3 pro F90

1/3 pro C++

1/3 pro commercial products (any language) for graphics, User Interfaces, I/O and data bases

My proposal to continue with PAW, develop ZOO(ZEBRA Object-Oriented) and GEANT3 geometry in C++ is not accepted.

Evolution vs Revolution

12/09/13

Page 21: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 21

1995: roads for ROOT

The official line was with GEANT4 and Objectivity, not much room left for success with an alternative product when you are alone.

The best tactic had to be a mixture of sociology , technicalities and very hard work.

Strong support from PAW and GEANT3 users

Strong support from HP (workstations + manpower)

In November we were ready for a first ROOT show

Java is announced (problem?)

12/09/13

Page 22: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 22

1998: work & smile

RUN II projects at FNALData Analysis and Visualization

Data Formats and storage

ROOT competing with HistoScope, JAS, LHC++

CHEP98 (September) Chicago

ROOT selected by FNAL, followed by RHICVital decision for ROOT

But official support at CERN only in 2002

12/09/13

Page 23: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 23

ROOT evolutionNo time to discuss the creation/evolution of the 110 ROOT shared libs/packages.

ROOT has gradually evolved from a data storage, analysis and visualization system to a more general software environment replacing totally what was known before as CERNLIB.

This has been possible thanks to MANY contributors from experiments, labs or people working on other fields.

ROOT6 coming soon includes a new interpret CLING and supports all the C++11 features

12/09/13

Page 24: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software

Input/Output: Major Steps

24

parallel merge

TreeCache

member-wise streamingfor STL collections<T*>

member-wise streamingfor TClonesArray

automatic streamers from dictionary with StreamerInfosin self-describing files

streamers generatedby rootcint

User written streamersfilling TBuffer

12/09/13

Page 25: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 25

GEANT4 EvolutionGEANT4 is an important software tool for current experiments with more and more physics improvements and validation procedures.

However, the GEANT4 transport system is not any more suitable for parallel architectures. Too many changes are required.

GEANT5: keep the Geant4 physics and a radically new transport system.

12/09/13

Page 26: 12 September 2013, NEC2013/Varna René Brun/CERN*

Tools & Libs

10/09/13R.Brun : Computing in HEP 26

hbook

zebra

pawzbook

hydra

geant1

geant2

geant3

geant4

Root 1,2,3,4,5,6

minuit

bos

Geant4+5

Page 27: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 27

Systems today

12/09/13

OS & compilers

Frameworks likeROOT, Geant4

ExperimentSoftware

End userAnalysis software

Hardware

20 MLOC

5 MLOC

4 MLOC

0.1 MLOC

HardwareHardwareHardwareClusters of multi-core machines

10000x8

GRIDS

CLOUDS

Networks10 Gbit/s

Disks1o PB

RAM16 GB

Page 28: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 28

Systems in 2025 ?

12/09/13

OS & compilers

Frameworks likeROOT, Geant5

ExperimentSoftware

End userAnalysis software

Hardware

40 MLOC

10 MLOC

10 MLOC

0.2 MLOC

HardwareHardware

HardwareMulti-level parallel machines10000x1000x1000

GRIDS

CLOUDSon

demand

Networks100

Gbit/s

Disks1o00 PB

Networks100

Gbit/sNetworks10 Tbit/s

RAM10 TB

Page 29: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 29

BUT !!!!!It looks like the amount of money devoted to computing is not going to increase with the same slope as it used to increase in the past few years.

The Moore’s law does not apply anymore for one single processor.

However, the Moore’s law looks still OK when looking at the amount of computing delivered/$, € when REALLY using parallel architectures.

Using these architectures is going to be a big challenge, but we do not have the choice!!!!

12/09/13

Page 30: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 30

Software and Hardware

GRIDs/Clouds are inherently parallel. However, because the hardware has been relatively cheap, GRIDs have pushed towards job-level parallelism at the expense of parallelism within one job.

It is not clear today what will be the winning hardware systems: supercomputer?, walls of cores with accelerators?, zillions of ARM-like systems?,..

Our software must be upgraded keeping in mind all these possible solutions. A big challenge!

12/09/13

Page 31: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 31

Expected Directions

Parallelism: Today we do not exploit well the existing hardware (0.6 instructions/cycle in average) because our code was designed “sequential”. Important gains foreseen (10?), eg in detector simulation.

Automatic Data Caches: Many improvements are required to speed-up and simplify skimming procedures and data analysis.

12/09/13

Page 32: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 32

Data cachesMore effort is required to simplify the analysis of large data sets (typically ROOT Trees).

When zillions of files are distributed in Tiers1/2, automatic, transparent, performing, safe caches are becoming mandatory on Tiers2/3 or even laptops.

This must be taken into account in the dilemma: sending jobs to data or vice-versa.

This will require changes in ROOT itself and in the various data handling or parallel file systems.

12/09/13

Page 33: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 33

Parallelism: key points

12/09/13

Minimize the sequential/synchronization parts (Amdhal law): Very difficult

Run the same code (processes) on all cores to optimize the memory use (code and read-only data sharing)

Job-level is better than event-level parallelism for offline systems.

Use the good-old principle of data locality to minimize the cache misses.

Exploit the vector capabilities but be careful with the new/delete/gather/scatter problem

Reorganize your code to reduce tails

Page 34: 12 September 2013, NEC2013/Varna René Brun/CERN*

Data Structures & parallelism

12/09/13R.Brun : Evolution of HEP software 34

eventevent

vertices

tracks

C++ pointersspecific to a process

Copying the structure implies a

relocation of all pointers

I/O is a nightmare

Update of the structure from a different thread implies a

lock/mutex

Page 35: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 35

Data Structures & Locality

12/09/13

sparse data structures defeat the system memory caches

Group object elements/collections such

that the storage matches the traversal processes

For example: group the cross-sections for all

processes per material instead of all materials

per process

Page 36: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 36

Create Vectors& exploit LocalityBy making vectors , you optimize the instruction cache (gain >2) and data cache (gain >2)

By making vectors, you can use the built-in pipeline instructions of existing processors (gain >2)

But, there is no point in making vectors if your algorithm is still sequential or badly designed for parallelism, eg:

Too many threads synchronization points (Amdhal)

Vectors gather/scatter

12/09/13

Page 37: 12 September 2013, NEC2013/Varna René Brun/CERN*

Conventional Transport

11/07/2011LPCC workshop Rene Brun 37

oo

o

o

oo

o

o

o

o

ooo

o

oo

o oo

o

o

o

T1

T3

T2

o

o

o

oo

oo

o

o

o

o

ooo

o

oo

oo

oT4

Each particle tracked step by step through hundreds of volumes

when all hits for all tracks are in

memory summable digits

are computed

Page 38: 12 September 2013, NEC2013/Varna René Brun/CERN*

Analogy with car traffic

11/07/2011LPCC workshop Rene Brun 38

Page 39: 12 September 2013, NEC2013/Varna René Brun/CERN*

New Transport Scheme

11/07/2011LPCC workshop Rene Brun 39

oo

o

o

oo

o

o

o

o

ooo

o

oo

o oo

o

o

o

T1

T3

T2

o

o

o

oo

oo

o

o

o

o

ooo

o

oo

oo

oT4

All particles in the same volume

type are transported in

parallel.Particles

entering new volumes or

generated are accumulated in

the volume basket.

Events for which all hits are

available are digitized in parallel

Page 40: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 40

Towards Parallel Software

A long way to go!!

There is no point in just making your code thread-safe. Use of parallel architectures requires a deep rethinking of the algorithms and dataflow.

One such project is GEANT GEANT4+5 launched 2 years ago. We start having very nice results. But still a long way to go to adapt (or write radically new software) for the emerging parallel systems.

12/09/13

Page 41: 12 September 2013, NEC2013/Varna René Brun/CERN*

R.Brun : Evolution of HEP software 41

A global effortSoftware development is nowadays a world-wide effort with people scattered in many labs developing simulation, production or analysis code.

It remains a very interesting area for new people not scared by big challenges.

I had the fantastic opportunity to work for many decades in the development of many general tools in close cooperation with many people to whom I am very grateful. 12/09/13