some thoughts on l1 pixel trigger wu, jinyuan fermilab april 2006
TRANSCRIPT
Introduction
• Preference on detector layout and related pattern recognition issues.
• Experience from Fermilab BTeV.– Triplet.– Triplet finding.– Tiny Triplet Finder.
• Data output rate of the readout chip.
• Triggering on “low PT” features.
Preference on Detector Layout• If N layers of pixel detector planes are affordable, (in terms
of material, cost, data volume, power, cooling etc.), normally spaced configurations like (b) is more preferable.
• Pattern recognition for (b) is more difficult.• From BTeV works, the pattern recognition for (b) is not as
hard as we thought several years ago.
(a) (b)
BTeV Level 1 Vertex Trigger-- Finding Triplets and Then …
FPGA segment finders
Merge
Trigger decision to Global Level 1
Switch: sort by crossing number
track/vertex farm(~2500 processors)
30 station pixel detector
Triplets
• Triplet:– Data item with 2 free parameters.– # of measurements - # of constraints = 2.– A triplet is not necessarily a straight track
segment.– A triplet may have more than 3
measurements.• Circular track with known interaction point is a
triplet since it has 2 free parameters. (Otherwise it has 3 parameters.)
• Three layers of nested loops are needed if the process is implemented in software.
• A total of n3 combinations must be checked (e.g. 5x5x5=125).
• In FPGA, to “unroll” 2 layers of loops, large silicon resource may be needed without careful planning:
O(N2)
Triplet Finding
Plane A Plane B Plane C
for (i=0; i<N_A; i++){for (j=0; j<N_B; j++){
for (k=0; k<N_C; k++){}
}}
Triplet Finding • Triplet finding can be done in software or in firmware.
• Tiny Triplet Finder (TTF) is a firmware implementation developed in Fermilab BTeV.
• Tiny = small silicon usage.
• For more info on TTF, see handout.
Triplet Finding
O(n3)Software
Processes
O(n)FPGA Firmware
Functions
O(N2)Implementations
Hough Trans., etc.
O(N*log(N))Implementation
Tiny Triplet Finder
Circular Tracks from Collision Pointon Cylindrical Detectors
• For a given hit on layer 3, the coincident between a layer 2 and a layer 1 hit satisfying coincident map signifies a valid circular track.
• A track segment has 2 free parameters, i.e., a triplet.
• The coincident map is invariant of rotation.
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100
0
16
32
48
64
80
96
112
128
0 16 32 48 64 80 96 112 128
1-3)+64
2-
3)+
64
Tiny Triplet FinderReuse Coincident Logic via Shifting Hit Patterns
C1
C2
C3
One set of coincident logic is implemented.
For an arbitrary hit on C3, rotate, i.e., shift the hit patterns for C1 and C2 to search for coincidence.
Tiny Triplet Finder for Circular Tracks
*R1/R3
*R2/R3
Triplet Map Output To Decoder
Bit
Arr
ay
Shifter
Bit
Arr
ay
ShifterBit-wise Coincident Logic
0
16
32
48
64
80
96
112
128
0 16 32 48 64 80 96 112 128
1. Fill the C1 and C2 bit arrays. (n1 clock cycles)
2. Loop over C3 hits, shift bit arrays and check for coincidence. (n3 clock cycles)
Data Rate of Readout Chips• SLHC:
– 80MHz– 4 hits/(1.28cm)2 (hits or clusters? cluster = 2.5 hits from BTeV.
So 4 hits = 1.5 clusters)– 16-bit/hit– 3.125 Gb/cm2/s, with 8b10b etc. 5 Gb/cm2/s
• FPIX2 readout chip at Fermilab BTeV:– Area: 128x0.05mm x 22x0.4mm = 0.56cm2.– Output: 6x140Mb/s = 840Mb/s.– 1.5 Gb/cm2/s.
• If we redesign FPIX:– 8x320Mb/s=2.56Gb/s; – 4.57 Gb/cm2/s
Core
Periphery
FPIX2 Readout Chip
Pixel Size: 50m x 400m
Columns: 22Rows: 128
Outputs:1,2,4 or 6 Cu pairs@140MHz
Core Organization
• Column-based architecture• Three mutually-dependent
parts:– Core Logic– End-of-column Logic– Pixel Cells
• Readout order:– Hit cell by hit cell in a column.– Column by column.– Not time ordered.
Pixel Cell
Pixel Cell
Pixel Cell
Pixel Cell
End of Column
Logic
Pixel Cell
Pixel Cell
Pixel Cell
Pixel Cell
End of Column
Logic
Pixel Cell
Pixel Cell
Pixel Cell
Pixel Cell
End of Column
Logic
Pixel Cell
Pixel Cell
Pixel Cell
Pixel Cell
End of Column
Logic
Core Logic
Pixel Cell
Flash Latch to Binary
Encoder
Thermometer
ThresholdsVdda
Test
Sensor Command Interpreter
00 -
01 -
10 -
11 -
idle
reset
output
listen
HFastOR
RFastOR Throttle4 pairs of
Command Lines
Kill
Inject
ADC
RowAddress
Read Clock
Read Reset
Token In
Token Reset
Token Out
ThresholdVref
Resets
Bus
Controller
Output of FPIX2 for BTeVb04 b03 b02 b01 b00b09 b08 b07 b06 b05b14 b13 b12 b11 b10b15b20 b19 b18 b17 b16b23 b22 b21
Row Column BCO(7:0) ADC 1Hit24
1Status 000Sync24
0 0XXXInvalid coding:
1 1 1 X0
• A hit is output using 24 bits, @140Mb/s per Cu pair.• User protocol is used as shown (not 8B/10B).• The BCO field takes 8 bits. (+16 bits = 24 bits)• To eliminate or reduce number of bits taken by the BCO, the
chip has to be redesigned to output time ordered data. Doable or not? It is possible but not obvious now. FPIX1 was designed with time ordered data, but was slow. Study is needed.
A Model of Trigger/DAQ System
• The readout chips send hit data to the correlation logic module (CLM, ~Correlator/OptoTX [J. Jones]) just outside detector via copper links.
• The CLM find triplets and send initial angle, momentum of each triplet to L1.
• L1 system issues trigger commands back.• Readout chip send full data of selected BX to HLT/DAQ via CLM.
ReadoutChip
CorrelationLogic
Module
L1
10 m, Cu 100 m, fiber
ReadoutChip
ReadoutChip
ReadoutChip
HLT/DAQ
Triplet Data
L1 Trigger Commands
Full Data
Outsider of Steel?
Output of the Readout Chip
• Data volume from the readout chips is large. (Full rate 3.125 Gb/cm2/s)• Optionally, partial data can be sent to reduce the bandwidth (about
(1/5)* 3.125 Gb/cm2/s) since the CLM needs only:– coordinate with lower resolution (1/2) – of a hit cluster (1/2.5).
• Study on readout chip re-design is needed.
ReadoutChip
CorrelationLogic
Module
10 m, Cu
ReadoutChip
ReadoutChip
ReadoutChip
Row Column BCO(7:0) ADC 1Hit24
b04 b03 b02 b01 b00b09 b08 b07 b06 b05b14 b13 b12 b11 b10b15b20 b19 b18 b17 b16b23 b22 b21
L1 Trigger Commands
• The CLM find triplets and send initial angle, momentum of each triplet to L1 and L1 system issues a multi-bit trigger commands back.
• Readout chip send full data of selected BX to HLT/DAQ via CLM.
• The data volume of the selected BX is relatively small.
• Optionally, the correlation logic module can run one or a few longer algorithms (L1.5?) when the full data flow through. The HLT uses the results when making L2 decisions.
• So the multi-bit trigger command = {BX, L1.5 algorithm ID}: Dump data in 1234 and apply algorithm ABCD.
10 m, Cu 100 m, fiberOutsider of Steel?
ReadoutChip
CorrelationLogic
Module
L1Readout
Chip
ReadoutChip
ReadoutChip
HLT/DAQ
Triplet Data
L1 Trigger Commands
Full Data
Tiny Triplet Finder for Circular Tracks
*R1/R3
*R2/R3
Triplet Map Output To Decoder
Bit
Arr
ay
Shifter
Bit
Arr
ay
ShifterBit-wise Coincident Logic
0
16
32
48
64
80
96
112
128
0 16 32 48 64 80 96 112 128
1. Fill the C1 and C2 bit arrays. (n1 clock cycles)
2. Loop over C3 hits, shift bit arrays and check for coincidence. (n3 clock cycles)
Latency Budget Usagefor Triplet Finding Process
• CMS L1 decision time = 6.4 s, 2 x 0.5 s of it will be in cable delay.
• Filling the C1 and C2 bit arrays takes n1 clock cycles.• Looping over C3 hits, shifting bit arrays and checking for
coincidence take n3 clock cycles + # pipeline stages (about 10).
• Assume n1, n3 = 64, latency usage = 64 + 64 + 10 = 138 clock cycles.
• At 160 MHz (FPGA or ASIC) clock frequency, 138 clock cycles = 138/(160*6.4) = 13% (of 6.4 s CMS L1 decision time). This is only an example, but looks OK.
A Closer View of Latency Budget
• The readout chips send out hit data to the Correlation Logic Module.
• The triplet finding starts after receiving data of the first hit.
• After all hits are transmitted, phase 2 of triplet finding (looping over C3 hits, shifting bit arrays and checking for coincidence) runs.
• Triplet data are sent out after first triplet is found.
• After cable delay, the L1 starts L1 processes after receiving first triplet data.
• After all triplet data are received, the L1 command is issued.
• The L1 command is sent back and executed.
ReadoutChip
CorrelationLogic
ModuleL1
ReadoutChip
Triplet Data
L1 Trigger Commands
Output Hits
Triplet Finding (1) Triplet Finding (2)
Triplet Data Out
Cable (1) Cable (2)L1 Processes
Cable (1)L1
Max # of Hits/BX
• 4 hits/(1.28cm)2/BX is an average, in some BX, the # of hits may be many times larger.
• The readout chip should drop some hits if the # of hits/BX is too big or the time to output hits will be too long. (Note the 6.4 s L1 latency.)
• Consider 64, 128, 256 hits/(1.28cm)2/BX, i.e., x16, x32, x64 of average, the time to output the hits takes 16, 32, 64 BX on link with throughput match the average data rate. The output time: 0.2 s, 0.4 s, 0.8 s -- should be OK.
Output Hits
Triplet Finding (1) Triplet Finding (2)
Triplet Data Out
Cable (1) Cable (2)L1 Processes
Cable (1)L1
Readout Chip Spec• Should read out time ordered data.
• Should drop data gracefully if # of hits/BX is too big.
• Should drop data gracefully if # of hits/ several BX (a short term average) is too big.
• Should be able to output both brief data for trigger and full data for readout.
• Should store data on chip for 6.4 s .
• ?
Triggering on “Low PT” Features?• Many tracking algorithms degrade rapidly
when momentum of the track goes low.
• Circular track triplet finding does not need high PT assumption, so it does not degrade as rapidly.
• The trigger system discussed is especially suitable if one needs to trigger on low PT features of the event.
• In CMS 4T B field, all tracks look to be “low PT”.
Example: Finding “Soft Jets”• A simulated event with 200 tracks.
• Flat distributions.
• Min. R = 55 cm
• 8+8 soft tracks are added.• They are grouped in 2 small initial
angle regions, i.e., 2 “soft jets”.
Can you see the “soft jets”?
Can you see the “soft jets” now?
Track Initial Angle Distributions
Summary
• With experience from Fermilab BTeV on triplet finding, pattern recognition is not a problem. One should feel free to choose preferred detector layout.
• Data output rates of the current readout chips are close enough to the SLHC requirement. But studies are needed.
• Triggering on “low PT” features is possible. But studies are needed.