The Fast Tracker Real Time ProcessorACES workshop, 9-11 March 2011
CERN - Geneva, Switzerland
Alberto Annovifor the ATLAS collaboration and FTK group
Istituto Nazionale di Fisica NucleareLaboratori Nazionali di Frascati
FTK poster
F. Cresc
ioli
Fast tracking in pixel and SCT det.
Total # of readout channels:PIXELS: 80 millionsSCT: 6 millions+ IBL: 6 millions @ 3cm
A. Annovi - ACES 2011 @ CERN 2
Two time-consuming jobs in tracking:Pattern recognition & Track fitting
• Pattern recognition – find track candidates with enough Si hits
– 109 prestored patterns (roads) simultaneously see the silicon hits leaving the detector at full speed.
– Based on the Associative Memory chip (content-addressable memory) initially developed for the CDF Silicon Vertex Trigger (SVT).
A. Annovi - ACES 2011 @ CERN 3
• Track fitting – high quality helix parameters and 2
– Over a narrow region in the detector, equations linear in the local silicon hit coordinates give resolution nearly as good as a time-consuming helical fit.
– pi’s are the helix parameters and 2 components.
– xj’s are the hit coordinates in the silicon layers.
– aij & bi are prestored constants determined from full simulation or real data tracks.
» The range of the linear fit is a “sector” which consists of a single silicon module in each detector layer.
– This is VERY fast in FPGA DSPs.
14D coord. space 5D surface
A. Annovi - ACES 2011 @ CERN
Nucl.Instrum.Meth.A623:540-542,2010doi:10.1016/j.nima.2010.03.063
4
1/2
A
M
1/2
A
M
Divide into more than 2 sectors
8 buses 100MHz/bus
ATLAS Pixels + SCT
Feeding FTK @ 100kHz event rate
Up to 8 Logical Layers: full coverage
Allow a small overlapfor full efficiency
• 8 regions each with• 8 sub-regions ( towers)
• ~22.5o, ~1.25 • bandwidth for up to 3×1034 cm-2s-1
A. Annovi - ACES 2011 @ CERN 5
System overview
• Highly parallel data flow: 64 - towers in 8 core crates and 8-fold parallelism within each tower (for inst. lum. 3×1034)
• Second stage: extrapolate into stereo SCT layer. Include stereo hits in final fit. A. Annovi - ACES 2011 @ CERN
Processing unit (2/tower)
6
HLT farm
Data Flow in FTK (goals)• Designed for 100kHz input event rate and 40 pileup events
– Parameters tuned for 75 pileup events (3×1034) to provide safety margin
• ~300 S-Link inputs (with IBL) ~400Gbit/s• Input hit rate at each Processing Unit 8*100 MHz hits
– 128 PU: max total hit input rate 100GHz
• Max output roads/board 800MHz– Max Total 100 GHz roads
• Max Track Fitting rate 4GHz/board– Max Total 500 GHz fits
• Total Track rate after 1st step 640MHz– Track fitting performs first data reduction!
• 2nd step final output ~300 tracks / event @ 3e34• The full FTK system size is 7 racks
A. Annovi - ACES 2011 @ CERN
Not
e: ra
tes
are
wor
d ra
tes
for h
its, r
oads
and
trac
ks
7
FTK input connection
A. Annovi - ACES 2011 @ CERN
1st output
2nd output
Dual output HOLA on SCT/Pixel RODs
FTK clustering mezzanine on Data Formatter
Duplicate ROD outputs
Up to 4 S-Link inputs (Pixel and SCT)Clustering 1 or 2 pixel inputsClustering time linear with occupancy:scales with luminosity 8
The algorithm working principleFPGA replica of pixel matrix
Eta direction -->
1st phase:The pixel module is a 328x144 matrix.Replicate a part of it (8x164) in hw matrix.The matrix identifies hits in the same
cluster (local connections). 2nd phase:Hits in cluster are analyzed (averaged).Flexibility to choose algorithm!
Loop
ove
r eve
nts
and
pix
el m
odul
es
select left most
top most hit
propagate selection
through cluster
Loop
ove
r clu
ster
s in
a m
odul
e
Averagecalculator
out
Core logic:Hit associated into clusters
high level cluster analysis
high level cluster analysis
2nd pipeline stage
A. Annovi - ACES 2011 @ CERN
read out cluster
Load allmodule hits
NIM A617:254-257,2010
9
Data Formatter block diagram
FPGAs
or high speed fibers
The DF core are 1 or a few FPGA distributing ~10Gbps of data to the appropriate outputs
A. Annovi - ACES 2011 @ CERN 10
Procrocessing Unit
A. Annovi - ACES 2011 @ CERN
SNAP12 link
Pattern recognition & track fitting condensed in a single 9U board + aux card.Allows for highly parallel architecture.
Pattern recognition with Associative Memory128 AM chip on a single board
Hit database (DO)Track fitterDuplicate Removal (HW)
CDF AMBoardwith 4 LAMBs
11
What we have now: Standard Cell 180 nm5000 pattern/chip for 6-layer patterns, 2500 pattern/chip for 12-layer patterns
“A VLSI Processor for Fast Track Finding Based on Content Addressable Memories”,
IEEE Transactions on Nuclear Science, Volume 53, Issue 4, Part 2, Aug. 2006 Page(s):2428 - 2433
NEXT:NEW
VERSIONFor both L1 & L2
65 nm technology provides a factor 8 → 20000 patterns/chipFull custom cell provides at least a factor 2 → 40000 patterns/chip8 layers instead of 12 provides a factor 1,5 → 60000 patterns/chip1,2 x 1,2 cm^2 2D chip → 80000 patterns/chipWith a 2 D chip we gain a factor 30!
1 AMboard: 128 chips → ~10 Mpatterns per board 1 Crate: 16 AMboard → ~160 Mpatterns per crate
Current prototype under design:65nm TSMC, 12mm^2 MPW run, 100 MHz running clock8000 patterns/chip 8 layers eachLayer words of 12 bits + 3 ternary bits variable resolution patterns
A. Annovi - ACES 2011 @ CERN 12
Pattern efficiency
90%
# of patterns in Amchips (barrel only, 45 degress)
65M 500M
Pattern sizer-: 24 pixels, 20 SCT stripsz: 36 pixels
Pattern size (half size)r-: 12 pixels, 10 SCT stripsz: 36 pixels
<# roads/event @ 3E34> = 342k <# roads/event @ 3E34> = 40k
Want this
Want this
A. Annovi - ACES 2011 @ CERN 13
Variable resolution AM
A. Annovi - ACES 2011 @ CERN
finer patterns coarserpattern
We can use don’t care on the least significant bit when we want
to match the pattern layer @ carser resolution or use all the
bits to match it @ finer resolution
• Patterns with 1 kid are stored at finer precision
• Layers without “don’t care (DC)” can ignore the hits in the “wrong” side of the layer
DC
coarser pattern
14
With 2 “don’t c
are” bits per la
yer
gain an effective factor of 5 in patterns
A. Annovi - ACES 2011 @ CERN 15
Goal: x30 pattern density but lower power consumption 32 patterns of 8 layers ~ 60m x 500m ~ 1 or 2 pixels
Conclusions
• Several technological challenges and solutions (in progress)
• Working to define initial FTK system• Preparing FTK prototypes• 2012 First prototype test with data• 2014 Barrel only FTK system• 2016 Full FTK system• See FTK poster F. Crescioli
A. Annovi - ACES 2011 @ CERN 16