fpga co-processor for the alice high level trigger gaute grastveit university of bergen norway...
DESCRIPTION
Very High Data Rate Pb-Pb central collisions Event rate: 200Hz Event size: ~75Mb => 15 Gbyte/s Max data-rate to tape is 1.25 Gbyte/s Compression/selection is needed Conventional, lossless methods: factor 2TRANSCRIPT
FPGA Co-processor for the ALICE High Level Trigger
Gaute GrastveitUniversity of Bergen
NorwayH.Helstrup1, J.Lien1, V.Lindenstruth2, C.Loizides5, D.Roehrich3, B.Skaali4,
T.Steinbeck2, K.Ullaland3, A.Vestbo3, T. Vik4, A. Wiebalck2
for the ALICE Collaboration
1Bergen College, Norway2Kirchhoff Institute for Physics, University of Heidelberg, Germany
3Departement of Physics, University of Bergen, Norway4Departement of Physics, University of Oslo, Norway
5Institute of Nuclear Physics, University of Frankfurt, Germany
ALICE – A Large Ion Collider Experiment
TPC- Time Projection Chamber
Very High Data Rate
Pb-Pb central collisionsEvent rate: 200HzEvent size: ~75Mb => 15 Gbyte/sMax data-rate to tape is1.25 Gbyte/s
Compression/selection is neededConventional, lossless methods: factor 2
• Compress• Reduce the amount of data required to encode the event
as far as possible without loosing physics information• Trigger
• Accept/reject events on the basis of physics application• Select
• Select regions of interest within an event• remove pile-up in p-p• ...
HLT functionality
Task: reconstruct the tracks of 20.000 charged particles (each producing 150 clusters) in the TPCTimebudget: 5 ms
The HLT setupData are received in parallel
RCU – Readout Controller UnitDDL – Data Detector LinkRORC – ReadOut Reciver Card
•PCI kernel in the FPGA•FPGA will also be utilised for pattern recognition•Reduces number of CPU’s needed
ALTRO TPC FEE Buffer
(8 Events )
RcvBd NIC PCI RcvBd NIC PCI
reveiver Buffer > 1000 Events
RCU
216x320 MB/s 216x100 MB/s
DDL RORC
RcvBd NIC PCI RcvBd NIC PCI
reveiver Buffer > 1000 Events
DDL RORC HLT
farm
The HLT FPGA co-processor
• FPGA: APEX 20K400• Next prototype: Altera Stratix FPGA
– Large internal memory– DSP cores
Two Schemes for Finding Tracks
•Low occupancy (p-p, Pb-Pb outer padrows)
•Conventional approach with (2d) cluster finder and track follower
•High occupancy (overlapping clusters):
•Hough transform on raw data•Cluster analysis for deconvolution•(Kalman filter)
High multiplicity picture
Cluster Finder
The numbers represent Charge (ADC values)
A vertical uninterrupted stack of numbers is called a sequence. The square shows the geometric centre of the sequence.
Neighbouring sequences belong to the same Cluster.
Final mean value:
(Weighted mean)
Pad
time
charge
scalevaluecharge
• Calculate the mean for every sequence
• Adjacent pads with similar means are merged
• Two lists of sequences are used: one for clusters on the previous pad one for clusters on the current pad
• Clusters are removed from the searchrange when a match is found or we know it is finished
• Clusters are inserted in the inputrange after merging or when we start a new cluster
Inputrange /Current pad
begin
end
insert
Memory of clusters
FPGA implementation of a cluster finder - the algorithm
Searchrange /Previous pad
T
Testbench
Top structure
Decoder FIFO (lpm) Mergerseqseq cluster
File: charges File: VHDL clusters
RAM (lpm)
Block Diagram, Verification
C++ model File: C++ clustersC++ program
comparesthe results
+ Smaller numbers, only multiplies by <11- Multiplication can’t be done until merging takes place
As before the mean iscalculated by:
Relative Scales
smaller
charge
scalevaluecharge
Decoder FIFO (lpm) Pre_Calc(2 mult, 1 add)
Merger
Alternative, (absolute):
off on
DeconvolutionSimplified implementation, almost for free – splits at minima in both directions (time and pad)
Merger GoalsClock cycles spent in the dif ferent states
30 %
11 %
11 %11 %0 %
4 %
5 %
22 %
6 %
idle - 30%
merge_mult
merge_add
merge_store
send_all
send_many
send_one
calc_dist
insert_seq
•spend few clock cycles per sequence•use few logic elements•high clockspeed
m ergem ult
id le
m ergeadd
m erges tore
sendone
insertseq
ca lcd is t
sendall
sendm any
&
&
&
next pad
new search range
new row orsk ip pad
old is above
old is be low
w ith in m atch dis tance
W
W
- -
+ +
* * +
new data
&
em pty
Cluster Finder Performance
•Syntesized on Altera APEX•Uses 1800 Logic Elements (11%)•Memory usage 16*80 + 64*112= 8448 bits (4%)
•Circuit runs at 33Mhz
OutlookImplementation of Hough transformation
Data FormatDecoder
Data FormatDecoder
XYZTransformer
XYZTransformer
ABETransformer
ABETransformer
Back Linked List TPC coordinates Local coordinates Parameter Space (ALTRO sequences) (Padrow, Pad, Time) (X, Y, Z) (A,B,E) (k,phi,eta-index)
Detector Data LinkDetector Data Link Histogram 1Histogram 1
.
.
.
.
.
.
Histogram N-1Histogram N-1
Histogram NHistogram N
Histogram 2
ADC count FindMaximaFind
Maxima10-to-8 BitConverter
10-to-8 BitConverter
Conclusion We have demonstrated the feasibility
of a real time cluster finder implemented in an FPGA
Firmware implementation of a Hough transform looks promising
transperacy replacements from now on
ALICE – A Large Ion Collider Experiment
TPC - Time Projection Chamber
18 sectors on each side, each sector is readout in 6 subsectors
Total is ca. 570.000 pads