1
The Telescope Array Fluorescence Detector Simulation on GPUs
Tareq AbuZayyadUniversity of Utah
GTC 2013San Jose, CA03/21/2013
2
Overview
● A brief introduction:● What we study:
– (Ultra) High energy cosmic rays: Matter, mainly atomic nuclei, that has been accelerated to ~speed-of-light by powerful astrophysical sources. Cosmic rays impinging on the top of the atmosphere generate extensive air showers (of secondary particles)
● Who we are:– The Telescope Array collaboration: Multi-National, more than 100 people
from some 20 institutions. Me, I am a Research Assistant Professor at the University of Utah.
● How we do it:– Surface Detector / Fluorescence Telescopes. The surface array provides
the highest statistics at the highest energies. The fluorescence telescopes provide lower energy reach, calibration for the surface detector, and hybrid observation. In this talk I will not discuss the surface detector!
3
Overview (cont'd)
● The Main Topic:● Fluorescence Detector (FD) Response Simulation:
– Overview: We simulate the development of an extensive air shower in the atmosphere. The light production and propagation to the detector, and the collection and processing of this light by the detector to produce an “event”.
● GPU Based Simulation:– How the simulation is done using CUDA. Design goals, code
implementation and kernel launch configuration.
● GPU Performance Analysis:– Results on accuracy and speedup are shown for a few test cases.
More details if time allows. See backup slides otherwise.
● Summary and Outlook.
4
High Energy Cosmic Rays
5
High Energy Cosmic Rays● Cosmic Rays (CR) flux
extends across many decades in energy.
● Mainly protons (hydrogen nucleus) and nuclei of heavier elements.
● Acceleration up to 1015 eV in supernovae. Sources of highest energy CR still unknown.
● Flux falls off rapidly. At some energy direct observation is no longer feasible.
6
Indirect Observation- Extensive Air Showers
● High energy CR primary particle collides with an “air molecule” near the top of the atmosphere.
● Collision initiates a cascade of lower energy particles moving at the speed of light towards the ground.
● Shower Development is complex. Specialized MC programs are used to simulate the high energy physics processes involved.
7
Indirect Observation- Extensive Air Showers
● While complex, the average shower behavior can be parameterized for use in Detector Response simulations.
● Most of the primary cosmic ray particle's energy ends up in the electromagnetic component.
● Shower electrons excite Nitrogen Molecules in the atmosphere. The molecules give back the excess energy in the form of scintillation light which can be observed by our detectors.
● Cerenkov radiation is also emitted by shower particles and depending on geometry can contribute to the observed signal.
8
The Telescope Array Experiment
9
Telescope Array ExperimentT Abu-Zayyad1, R Aida2, M Allen1, R Azuma3, E Barcikowski1, JW Belz1, T Benno4, DR Bergman1,
SA Blake1, O Brusova1, R Cady1, BG Cheon6, J Chiba7, M Chikawa4, EJ Cho6, LS Cho8, WR Cho8, F Cohen9,K Doura4, C Ebeling1, H Fujii10, T Fujii11, T Fukuda3, M Fukushima9,22, D Gorbunov12, W Hanlon1,
K Hayashi3, Y Hayashi11, N Hayashida9, K Hibino13, K Hiyama9, K Honda2, G Hughes5, T Iguchi3, D Ikeda9, K Ikuta2, SJJ Innemee5, N Inoue14, T Ishii2, R Ishimori3, D Ivanov5, S Iwamoto2, CCH Jui1,
K Kadota15, F Kakimoto3, O Kalashev12, T Kanbe2, H Kang16, K Kasahara17, H Kawai18, S Kawakami11, S Kawana14, E Kido9, BG Kim19, HB Kim6, JH Kim6, JH Kim20, A Kitsugi9, K Kobayashi7, H Koers21,
Y Kondo9, V Kuzmin12, YJ Kwon8, JH Lim16, SI Lim19, S Machida3, K Martens22, J Martineau1, T Matsuda10, T Matsuyama11, JN Matthews1, M Minamino11, K Miyata7, H Miyauchi11, Y Murano3, T Nakamura23,
SW Nam19, T Nonaka9, S Ogio11, M Ohnishi9, H Ohoka9, T Okuda11, A Oshima11, S Ozawa17, IH Park19, D Rodriguez1, SY Roh20, G Rubtsov12, D Ryu20, H Sagawa9, N Sakurai9, LM Scott5, PD Shah1, T Shibata9, H Shimodaira9, BK Shin6, JD Smith1, P Sokolsky1, TJ Sonley1, RW Springer1, BT Stokes5, SR Stratton5,
S Suzuki10, Y Takahashi9, M Takeda9, A Taketa9, M Takita9, Y Tameda3, H Tanaka11, K Tanaka24, M Tanaka10, JR Thomas1, SB Thomas1, GB Thomson1, P Tinyakov12,21, I Tkachev12, H Tokuno9, T Tomida2,
R Torii9, S Troitsky12, Y Tsunesada3, Y Tsuyuguchi2, Y Uchihori25, S Udo13, H Ukai2, B Van Klaveren1, Y Wada14, M Wood1, T Yamakawa9, Y Yamakawa9, H Yamaoka10, J Yang19, S Yoshida18, H Yoshii26, Z Zundel1
1University of Utah, 2University of Yamanashi, 3Tokyo Institute of Technology, 4Kinki University, 5Rutgers University, 6Hanyang University, 7Tokyo University of Science, 8Yonsei University,
9Institute for Cosmic Ray Research, University of Tokyo, 10Institute of Particle and Nuclear Studies, KEK, 11Osaka City University, 12Institute for Nuclear Research of the Russian Academy of Sciences,
13Kanagawa University, 14Saitama University, 15Tokyo City University, 16Pusan National University, 17Waseda University, 18Chiba University 19Ewha Womans University, 20Chungnam National University,
21University Libre de Bruxelles, 22University of Tokyo, 23Kochi University, 24Hiroshima City University, 25National Institute of Radiological Science, Japan, 26Ehime University
10
Telescope Array Experiment
● TA is located in Millard County, Utah, ~200 km southwest of Salt Lake City.
● Surface Detector: 507 scintillation counters 1.2 km spacing. (run 24/7)
● Three Fluorescence Detectors overlooking SD (run only during moonless nights):● Middle Drum (MD)
● Black Rock (BR)
● Long Ridge (LR)
1111
Long Ridge Black Rock Mesa
Middle DrumRefurbishedfrom HiRes
~30km New FDs
6.8 m2 ~1 m2
14 cameras/station256 PMTs/camera
5.2 m2
TA Fluorescence Detectors
Observation started Dec. 2007
Observation started Nov. 2007
Observation started Jun. 2007
256 PMTs/cameraHAMAMATSU R9508
FOV~15x18deg12 cameras/station
12
Middle Drum TA FD Observatory Site (14 Telescopes)
13
Middle Drum TA/TALE FD Observatory Site (14 + 10 Telescopes)
14
Middle Drum TA/TALE Viewing Range
● TAMD + TALE
● 14 lower telescopes make up TA (Middle Drum) Detector.
● 10 higher telescope (new addition) make up the TA-Low Energy extension Detector.
15
Fluorescence Detector Response Simulation
16
Main components of Simulation
● Atmosphere (Medium)● Extensive Air Shower (Light Source)● Detector (Instrument)● Physics (Interactions)● Auxiliary
● RNG● Coordinates System● Event Management● I/O
17
Coordinates System• Defines the origin of a Cartesian system (x, y, z),
as a point on the surface of the Earth (in geodetic coordinates).
• Provides a function to convert (x,y,z) to height (a.s.l.) using a spherical earth approximation.● Also (x,y,z) to height above ground (for aerosols
density calculation)
• Coordinate transformations to/from alternative systems.
• Single instance per simulation (a service class).
18
The Atmosphere● Medium in which shower
develops and deposits it's energy, and where light production occurs.
● Single instance per simulation (a service class).
● Pressure, Density, and Temperature profiles:
● All functions of altitude.
● Also Contains an Ozone Layer and Aerosols near the ground:
● Altitude dependent as well.
19
Physics• Different models (experimental measurements) of
Fluorescence yield, wavelength spectrum and shower dE/dx.● User selectable.
• Functions / data to calculate Cerenkov light production and wavelength spectrum.
• Single instance per simulation (a service class).
20
Extensive Air Shower Parameters
● Primary particle energy:
● Either explicitly set or randomly sampled from a power law.● Determines total # of particles in the shower.
● Shower Geometry (Randomly Sampled):
● Shower Track can be at any distance from the detector, and can have any downward direction.
● Shower Longitudinal Development Curve (Randomly Sampled):
● Parametrized; parameters selected randomly from shower library.
● Shower Lateral extent:
● Parameterized; average shower behavior.
21
EAS Parameters (cont'd)
N x =N max× x−x o
xmax− xo xmax−xo
×e
xmax− x
● Primary particle energy (E)
● Shower Geometry:
● A line segment from the top of the atmosphere, ~47km, to the ground. Rp_vec[3], ut_hat[3]
● Shower Longitudinal Profile:
● Parametrization given by a Gaisser-Hillas function with four parameters (N_max, x_0, x_max, lambda)
● x [g/cm2] is the atmospheric slant depth along the shower track.
● N(x) is the number of charged particles at depth x
● A total of eight free parameters:
● E → N_max
● ut_hat is a unit vector & is normal to Rp_vec.
22
EAS Parameters (cont'd)
r =N
r 2 f s , rr M
● Real showers are 3D.
● Previous slide shows a 1D description of longitudinal development.
● Lateral spread is described by a parameterization based on NKG function:
● r radial distance.
● rM Moliere Radius (air
density dependent)● s: shower age parameter.● f: NKG function
(normalized)
23
Light Production● Air Fluorescence
● Shower Particles excite Nitrogen Molecules in the atmosphere
● the molecules give back the energy in the form of UV light.
24
Light Production (cont'd)
● Cerenkov Radiation.● Shower particles traveling at
speeds faster than the speed of light in air emit Cerenkov radiation.
● We observe direct Cerenkov light.
● We also see scattered light● Scattering by air molecules
● Scattering by Aerosols
25
Light Propagation
● Light Scattering● Air Molecules (Rayleigh)● Aerosols (Empirical
Model)
● Ozone Absorption.● Function of:
● Wavelength● Altitude (density)
26
Light Detection
● Spherical mirror collects photons and focuses them to a PMT cluster.
● Photo-Multiplier Tubes convert photons to photoelectrons (electrical signal)
27
Detector Efficiency
● Mirror Reflectivity● UV Filter Transmission● PMT Quantum
efficiency● are all wavelength
dependent.
● PMT efficiency profile is also accounted for in the simulation.
28
Electronics
● Signal processing for PMT trigger:● RC filter● LC filter + threshold test● Integrate pulse if trigger.
● Record data.● “Mirror Trigger Logic”
for event formation done on CPU.
29
GPU Based Simulation
30
Opportunities for Parallelism
● Simulate a large number O(106) of independent events.● 1 GB of GPU card memory can accommodate ~500 showers
simultaneously. (when using floats)
● A shower is treated as an extended light source.● divided into ~1000 point sources.● 32 wavelength bins.
● Ray-trace collected photons independently.● 256 PMT's (pixels) per telescope; multiple telescopes per event.● Signal processing: ~10000 time bins per pixel
31
Design Goals and Constraints● CUDA Capable GPU may or may not be available on a given system, so CUDA support has to be optional.● To insure the program produces the same results with or without acceleration, as much code as possible is written to run on either: using __host__ __device__ qualifier.
● This is needed most if/when someone updates the code in the future; … chances are they will update one path and not the other! (speaking from experience)
● To get best performance on GeForce GPU's we also require that as much as code as possible is written using C++ templates to support both single and double precision calculations.
32
Basic Code structure
• Can compile on a machine with/without CUDA environment.
• Can compile to run with/without acceleration.
• Essentially all code can run on CPU and GPU.
• As float or double.
• Aside: Can also build a Root dictionary and use interactively in a root session(http://root.cern.ch/)
33
Hide Separate Code Path
• Create wrapper functions for cases where CPU/GPU require different functions:
• CPU code calls as random_uniform()
• GPU code calls as random_uniform(x,a)
• x, a unique per thread
34
Service Classes
• Simulation Parameters:● Coordinates● Atmosphere● Physics● Detector
• Single initialization point (cpu double); copy to cpu float, gpu float / double.
• Accessed through a common interface with a dummy argument to resolve data type.
35
Memory Layout
● PMT signal processing and trigger logic has to be done in serial.
● To achieve efficient memory access pattern on the gpu:● Reverse memory layout for 2D
arrays used for electronics signal processing
● Use macro to select the appropriate class in __host__ __device__ code.
36
Kernel Launch Configuration
• Table shows kernel launch configuration on Fermi card with 1GB of memory.
37
Kernel Launch Configuration
• All kernels were launched using a similar setup.
• All prefer L1 Cache over User managed Shared Memory.
38
GPU Performance Analysis
39
Performance comparison – running time
E (eV) 100 evt (cpu)t in sec.
100 evt (gpu) t in sec.
1000 evt (cpu) t in sec.
1000 evt (gpu) t in sec.
1017 13.91 0.759 (18x) 140.97 2.860 (49x)
1018 18.76 0.875 (21x) 164.59 3.348 (49x)
1019 24.64 1.006 (24x) 196.99 4.230 (47x)
1020 40.72 1.297 (31x) 364.44 6.834 (53x)
• Desktop CPU vs. Desktop GPU
40
Nvidia Virtual Profiler results
• Fraction of time spent in each function.
41
Nvidia Virtual Profiler results
42
Nvidia Virtual Profiler results
43
Nvidia Virtual Profiler results
44
Summary and Outlook
● An entire software suite used for the detector simulation of a real astrophysics experiment is ported to CUDA.
● Verified float precision is adequate for the MC● GPU provides >10x performance of a quad core
CPU.● Next is to do event reconstruction, inverse Monte
Carlo, on the GPU. This would be big!● Need to implement minimization routine on GPU.
● Looking forward for running on Kepler as well.
45
Backup Slides
46
Cuda 5.0
• Unfortunately:● Code would not compile under 5.0! cicc hangs.● After some testing I isolated the problem:
– A function with a large “const array”● Code compiles, but I still need to implement a work-
around for offending function.
• Did you submit a bug report?● I had every intention to, but got sidetracked. Sorry!
• Utah CHPC just added a node with 2 Kepler GPU's, will be testing soon.
47
floats vs. double precision
● NOTE: in the following comparisons I mention the function xyz2h()
● This function calculates the height of a point in (x,y,z) coordinates above sea level. The calculation involves the “Earth Radius”:
● RE >> z causes the calculation to be inaccurate
when using floats.
48
Quick Test Set
• 150 Thrown showers: 1017 eV proton.
• Run once without GPU acceleration, using double precision.
• Run once with GPU acceleration, using single precision math on the GPU.
• Write out intermediate calculation results for comparison.
• Plot results
49
Shower Simulation Step 1: Generate track segments
• Subdivide shower track to 832 segments; (~1-2 g/cm2.
• For each segment calculate:● Height at center of segment.● Atmospheric Density.● Segment length (meters & g/cm2)● Slant depth at center of segment (g/cm2)● ...
• Comparison follows.
50
Track Segments (All floats)
• Each data point corresponds to a track segment in one of the simulated showers which was in MD FOV.
• Height
• Density
• Segment Length
• Slant Depth
51
Track Segments (double xyz2h)
• Each data point corresponds to a track segment in one of the simulated showers which was in MD FOV.
• Height
• Density
• Segment Length
• Slant Depth
52
Track Segments (double xyz2h)• Error on
Slant Depth Calculation (floats) vs Slant Depth calculated with double precision.
53
Track Segments (Kahn Summation)
• Error on Slant Depth Calculation (floats) after use of Kahn Summation to sum up individual track segments length.
54
Track Segments (Kahn Summation)
• Error on Slant Depth Calculation (floats) after use of Kahn Summation to sum up individual track segments length.
55
Shower Simulation Step 2: Generate shower profile
• For each track segment calculate:
● Shower Size Ne(x).
● Shower age.● Moliere radius. (used for lateral spread calculation)● Total amount of Fluorescence light production.● Total amount of Cerenkov light production.
• Comparison follows.
56
Shower profile
• One data point for each track segment:● Shower size● Shower age● Moliere Radius
57
Shower profile (cont'd)
• One data point for each track segment:● Total
Fluorescence ● Total
Cerenkov
58
Shower Simulation Step 3: Generate shower photons
• For each track segment calculate:● Total amount of Fluorescence light; divided into
wavelength bins … currently using FLASH.● Total amount of Cerenkov light; divided into
wavelength bins.● Rayleigh & Aerosols scattered Cerenkov.● Total Cerenkov (Accumulates along shower
development)
• Comparison follows.
59
Shower photons (Cerenkov)
• One data point for each track segment (summed over wavelength):● Accumulated C'kov
● Rayleigh scattered
● Aerosols scattered
● Ozone absorption
60
Shower Simulation Step 4: Generate Mirror photons (pe's)
• Each event in the FOV of the detector can be seen by one or more mirrors (telescopes).
• A telescope will see a portion of the shower (A subset of the shower track segments).
• For each track segment (in telescope FOV) calculate:● Total amount of Fluorescence pe's (flux x mirror_area)● Total amount of Cerenkov pe's including
– Rayleigh & Aerosols scattered Cerenkov.
– Direct Cerenkov.
• Note: This includes atmospheric transmission and telescope response at relevant wavelengths.
• Comparison follows.
61
Mirror photo-electrons
• One data point for each track segment:● Fluorescence
pe's
● Direct Cerenkov
● Rayleigh Scat. Ckov.
● Aerosols Scat. Ckov.
62
Mirror photo-electrons
• One data point for each track segment:● Total # pe's
• End of Physics.
63
Mirror photo-electrons
• One data point for each track segment:● Total # pe's
• Largest difference at slant depth ~<200 g/cm2
64
Mirror photo-electrons
• How much error can we tolerate?● Next step in the simulation is ray-tracing. The
variable “ctotal” is used as the mean of a Poisson distribution:
● int raytrace_npe = random_poisson(ctotal);
• As long as we are within 0.1% (10-3), I highly doubt we can tell the difference.
• We get O(10-5)
65
Mirror photo-electrons
• Needed for ray tracing:
• transformed coordinates:● rvec_x,
rvec_y, rvec_z
● Angle made with mirror axis.
66
More comparisons ...
● Still need to verify● Ray tracing accuracy, including sampling of shower lateral spread.● Random number generator works as expected.● Electronics simulation is correct.
● Harder to do because of the use of different random number generators and therefore different random sequences.
● Can be done by using statistical averages.
● The end result of the simulation (triggered event data) has been checked and the cpu/gpu are in excellent agreement. Still, would be nice to have the Individual checks listed above.