![Page 1: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/1.jpg)
Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team
A HIGH PERFORMANCE MULTI-CORE FPGA IMPLEMENTATION FOR 2D PIXEL
CLUSTERING FOR THE ATLAS FAST TRACKER (FTK) PROCESSOR
Aristotle University of Thessaloniki
INTERNATIONAL CONFERENCE ON THECHNOLOGY AND INSTRUMENTATION IN PARTICLE PHYSICS ’14,AMSTERDAM, 2-6 JUNE 2014
![Page 2: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/2.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 2
Introduction to 2D Pixel ClusteringHigh Resolution Pixel Detectors are used for a
variety of applications:High Energy Physics (Particle Tracking)Medical Imaging and Biomedical ApplicationsAstrophysics Image Acquisition Security etc.
Most of these applications call for real-time data capture at high input rates
Effective data reduction techniques with minimal loss of information are required
2D Pixel Clustering is a well suited choiceOur implementation is developed for the ATLAS FTK
processor but can be easily adapted to other image processing applications
![Page 3: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/3.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 3
The ATLAS Pixel Detector
r-φ view of pixel detector
Another pixel layer is being added inside the current 3 pixel layers, the Insertable B-Layer (IBL)
IBL
12
r-φ view of barrel region
The spatial resolution of individual pixel modules has been measured at normal incidence at 12 μm
![Page 4: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/4.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 4
FTK Architecture
Data Formatter (DF)
Input Mezzanine (IM)
![Page 5: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/5.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 5
The ATLAS Pixel ModuleEach pixel module has
144 x 328 pixels The pixels are readout by
16 Front End (FE) chipsEach module is comprised
of 2 rows of 8 FEThe pixels are
400 μm x 50 μm or600 μm x 50 μm (FE edges)
G. Aad et al., “ATLAS pixel detector electronics and sensors”,Journal of Instrumentation, Vol. 3, P07007, July 2008.
![Page 6: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/6.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 6
The Clustering ProblemAssociate hits from
same clusterLoop over hit listTime increases with
occupancy & instantaneous luminosity
Non linear execution time
Goal: Keep up with 40 Mhz hit input rate
Total input rate 308 Gb/s (coming from Serial Links)
3 9 71 13 15
4 8 6 1112
2 5 10 140 1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
• The Cluster Hits arrive scrambled• The hits are additionally scrambled by dual column
![Page 7: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/7.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 7
2D-Clustering for the ATLAS FastTracKer ProcessorWhat is needed is a fast 2D-Pixel Clustering
implementation
Generic enough to allow a design space exploration to adapt to the ATLAS detector needs
With a prospect to be used for different image processing applications
![Page 8: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/8.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 8
2D-Pixel Clustering
![Page 9: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/9.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 9
Clustering Modules
![Page 10: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/10.jpg)
Hit Decoder – Issues addressed
The Hit Decoder outputs a robust and aligned data streamFE chip numbering is circularFE chips are readout in the order of their numbering Incoming data is not in sequence:
Hits from FE <8 come in the reverse column orderHits from FE>=8 come in the correct column order
8 9 10
11
12
13
14
15
7 6 5 4 3 2 1 0
ATLAS Pixel Module
10C.-L. Sotiropoulou , TIPP '14 - Amsterdam
![Page 11: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/11.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 11
8 9 10
11
12
13
14
15
7 6 5 4 3 2 1 0
Hit Decoder – Issues addressedATLAS Pixel Module
FIFO (16 words)
LIFO (512 words)
FIFO (16 words)
FE ≥8
Decoder
FE <8
Select Lowest Column34bits 42bits
Hit Decoder
Reg
iste
r
![Page 12: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/12.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 12
Clustering Modules
![Page 13: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/13.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 13
Grid Clustering – Generate Window
Generate a cluster window (e.g. 4x5 pixels) around a reference hit
The reference hit is located on the middle row of the window and Column 1 of the window if it
belongs to an odd column
3 9 71 13 15
4 8 6 1112
0 1 2 3 4 5 6 7 8 9
actual grid size is 8x21
![Page 14: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/14.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 14
3 9 71 13 15
4 8 6 1112
0 1 2 3 4 5 6 7 8 9
actual grid size is 8x21
Grid Clustering – Generate Window
Generate a cluster window (e.g. 4x5 pixels) around a reference hit
The reference hit is located on the middle row of the window and Column 1 of the window if it
belongs to an odd column Column 0 of the window if it
belongs to an even column Hits are read from the input until a hit which belongs to a column beyond the cluster window is identified
The hits that belong to the same columns as the cluster window are stored in a separate circular buffer
![Page 15: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/15.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 15
Grid Clustering – Select and Readout
Two cells in the cluster window are selected as “seeds”(0, mid-row) and (1, mid-row)
3 9 71 13 15
4 8 6 1112
0 1 2 3 4 5 6 7 8 9
Selected grid cells
![Page 16: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/16.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 16
Grid Clustering – Select and Readout
Two cells in the cluster window are selected as “seeds”(0, mid-row) and (1, mid-row)
The “selected” state is propagated to the neighboring hits
3 9 71 13 15
4 8 6 1112
0 1 2 3 4 5 6 7 8 9
![Page 17: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/17.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 17
Grid Clustering – Select and Readout
Two cells in the cluster window are selected as “seeds”(0, mid-row) and (1, mid-row)
The “selected” state is propagated to the neighboring hits
One of the previous hits is now readout
3 9 71 13 15
4 8 6 1112
0 1 2 3 4 5 6 7 8 9
![Page 18: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/18.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 18
Grid Clustering – Select and Readout
Two cells in the cluster window are selected as “seeds”(0, mid-row) and (1, mid-row)
The “selected” state is propagated to the neighboring hits
One of the previous hits is now readout
Selecting and reading out the cluster is executed in parallel
3 9 71 13 15
4 8 6 1112
0 1 2 3 4 5 6 7 8 9
The hits are sent out in their local coordinates with regards to the reference hit
The reference hit is sent out in absolute coordinates in the end cluster word
The hits that do not belong to the cluster are recovered and written to the circular buffer in their absolute coordinates
![Page 19: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/19.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 19
Grid Clustering – Block Diagram It is fundamentally a
control intensive data manipulation module
The pixel data is moved from the grid to the circular buffer and back to the grid until all clusters are identified
This is why execution time is data dependent
![Page 20: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/20.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 20
Clustering Modules
![Page 21: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/21.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 21
Centroid CalculationThe Centroid Calculation step will output the cluster
centroid and also the cluster size in two coordinatesThis is were the actual data reduction takes place
One cluster is reduced to one single wordTo have an accurate centroid calculation the actual
pixel size must be taken into account:Pixels have different sizes in the x or z direction
(400 μm and 600 μm)We transform the pixel coordinates to normalized units of
25 μm for x direction and 5 μm for y direction
The Centroid Calculation is under developmentCurrent version calculates the centroid as the
“center of the cluster bounding box”
![Page 22: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/22.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 22
Centroid CalculationThe bounding box is calculated with respect to
the center of the edge pixels of the cluster (given example is in pixel units, actual implementation is in normalized units of 25 μm for columns and 5 μm for rows)
Min_col = 0Min_col_pix_center = 0.5
Max_col = 4Max_col_pix_center = 3.5
Min_row = 0Min_row_pix_center = 0.5
Max_row= 5Max_row_pixel_center = 4.5
Center_of_bounding_box_column_coordinate = (Min_col_pix_center+Max_col_pix_center)/2 = 2
Center_of_bounding_box_row_coordinate = (Min_row_pix_center+Max_row_pix_center)/2 = 2.5
Bounding Box
Future version will take into account the charge distribution of the cluster and apply it as a correction to the centroid value
0 1 2 3 4 5
0
1
2
3
4
5
Center coordinate
![Page 23: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/23.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 23
2D Pixel Clustering Parallel Version The fundamental characteristic of the 2D-Pixel
Clustering implementation is that different clustering engines can work independently in parallel on different data Increase performance with greater FPGA resource usage
A parallelization strategy was chosen:Instantiate multiple clustering engines (grid clustering
modules) that work independently on data from separate pixel modules
To achieve this data parallelizing (demultiplexing) and data serializing (multiplexing) modules are necessary
![Page 24: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/24.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 24
2D Pixel Clustering Parallel Version
Distributes pixel module data to different engines
Merges the modules of each event
Stores the LVL1ID sequence
![Page 25: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/25.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 25
2D Pixel Clustering Implementation Results
FPGA Area (%) Max Frequency (MHz)
Single Flow 2.5 81.54x Parallel Flow 11.1 80.516x Parallel Flow 38.4 65
Xilinx Spartan-6 LX150T-3
Successful integration on hardware (the FTK_IM board) for both Single Flow and 4 Parallel Engine Flow implementations
Testing input files of events of 80 overlapping proton-proton collisions, which is the maximum LHC luminosity planned until 2022
Output was compared with bit-accurate simulation with excellent accuracy
![Page 26: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/26.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 26
2D Pixel Clustering Testing ResultsThe maximum input data rate is one word per 25 ns (40
MHz)For the Single Flow the average processing time is one
word per 84 ns (post place and route simulation measurements)
84 / 25 = 3.4 clock cycles of 40MHz are required for one data word processing
Therefore it was anticipated that 4 engines would be sufficient for the worst case scenario (Pixel B Layer)
A very important aspect was the data buffering (used FIFOs)A 4x Parallel implementation with small data buffering (FIFOs) failed to meet the requirements
The 4x Parallel implementation with a sufficient buffer size achieved the required target
![Page 27: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/27.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 27
2D-Pixel Clustering Performance Plot
Match!
84ns / word
25ns / word(input rate)
The 4x Parallel Engine Flow fully covers pixel specifications for worst case (Pixel B Layer)
The sample is 100 events where each event has 80 overlapping proton-proton collisions
![Page 28: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/28.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 28
2D-Pixel Clustering - Conclusions A high performance FPGA-based implementation Uses a moving window technique to identify the clusters
Reduces required resources for clustering identification Fully generic
Detection window size, memory size, bus size Can be parallelized
Generic number of parallel engines For the ATLAS FTK processor: data with an input rate of 40MHz Target achieved with a 4x Parallelization More exciting things to be explored…
Next targets (short term) IBL, full centroid calculation implementation (charge distribution)
Next targets (long term) Biomedical Applications etc.FTK Partial system installation late 2015
Full detector coverage 2016
![Page 29: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/29.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 29
FTK Pixel Clustering Team:C.-L. Sotiropoulou, A. Annovi, M. Beretta, S. Gkaitatzis, K. Kordas, S. Nikolaidis, C. Petridou and G. Volpi
Acknowledgements The presented work was supported by Istituto Nazionale di
Fisica Nucleare, and European community FP7 funds (Marie Curie IAPP Project 324318 FTK).
Thank you!
![Page 30: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/30.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 30
Back-Up Slides
![Page 31: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/31.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 31
Clustering – Testing ResultsSuccessful integration with the rest of the
FTK_IM firmware
Testing with ZH nunubb files (80pu)
Big event files (50 events: 7 modules per event) multiple times – this is what each pixel clustering module will receive from the RODs
![Page 32: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/32.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 32
Clustering - Performance Results (Parallel Flow)4 Parallel Engine Flow with Small Buffers
Event Processing Latency (end_event output – end_event input time)
Event number
Time (ns)
End_event input time delayed by backpressureThe actual delay is bigger and increasing with event number
![Page 33: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/33.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 33
Clustering - Performance Results (Parallel Flow)The 4 Parallel Engine Flow with Small Buffers
saturates after 30 events
This is because of the backpressure applied by the Data Merger
We increased the buffering before the Data Merger to reduce this effect
Bigger buffering processing time per word reduced to 25ns which is the 8 Parallel Engine Flow performance without the extra buffering (next slides)
![Page 34: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/34.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 34
Clustering - Performance Results (Parallel Flow)4 Parallel Engine Flow with Big Buffers
Event Processing Latency (end_event output – end_event input time) No saturation and no backpressure seen at the input
Event number
Time (ns)
![Page 35: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/35.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 35
Clustering - Performance Results (Parallel Flow)8 Parallel Engine Flow Event Processing
Latency (end_event output – end_event input time) No saturation and no backpressure seen at the input
Event number
Time (ns)
![Page 36: Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team](https://reader031.vdocument.in/reader031/viewer/2022013004/56815e99550346895dcd29ed/html5/thumbnails/36.jpg)
C.-L. Sotiropoulou , TIPP '14 - Amsterdam 36
Clustering - Performance Results (Parallel Flow)The 8 Parallel Engine Flow fully covers pixel
specifications for 80pu events even without the big buffering (data from 7 pixel modules are easily distributed over 8 engines)
A 25 ns processing time was calculated per data word as a worst case
The 4x Parallel Engine Flow with Big Buffers is the best solution: Smaller device area occupation, smaller architecture fan out, easier place and route