some thoughts on l1 pixel trigger wu, jinyuan fermilab april 2006

37
Some Thoughts on L1 Pixel Trigger Wu, Jinyuan Fermilab April 2006

Upload: willis-fleming

Post on 17-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Some Thoughts on L1 Pixel Trigger

Wu, Jinyuan

Fermilab

April 2006

Introduction

• Preference on detector layout and related pattern recognition issues.

• Experience from Fermilab BTeV.– Triplet.– Triplet finding.– Tiny Triplet Finder.

• Data output rate of the readout chip.

• Triggering on “low PT” features.

Preference on Detector Layout

Preference on Detector Layout• If N layers of pixel detector planes are affordable, (in terms

of material, cost, data volume, power, cooling etc.), normally spaced configurations like (b) is more preferable.

• Pattern recognition for (b) is more difficult.• From BTeV works, the pattern recognition for (b) is not as

hard as we thought several years ago.

(a) (b)

BTeV and CMS-SLHCPattern Recognition

Simulated B event in BTeVSilicon Pixel Detector

BTeV Level 1 Vertex Trigger-- Finding Triplets and Then …

FPGA segment finders

Merge

Trigger decision to Global Level 1

Switch: sort by crossing number

track/vertex farm(~2500 processors)

30 station pixel detector

Triplets

• Triplet:– Data item with 2 free parameters.– # of measurements - # of constraints = 2.– A triplet is not necessarily a straight track

segment.– A triplet may have more than 3

measurements.• Circular track with known interaction point is a

triplet since it has 2 free parameters. (Otherwise it has 3 parameters.)

• Three layers of nested loops are needed if the process is implemented in software.

• A total of n3 combinations must be checked (e.g. 5x5x5=125).

• In FPGA, to “unroll” 2 layers of loops, large silicon resource may be needed without careful planning:

O(N2)

Triplet Finding

Plane A Plane B Plane C

for (i=0; i<N_A; i++){for (j=0; j<N_B; j++){

for (k=0; k<N_C; k++){}

}}

Triplet Finding • Triplet finding can be done in software or in firmware.

• Tiny Triplet Finder (TTF) is a firmware implementation developed in Fermilab BTeV.

• Tiny = small silicon usage.

• For more info on TTF, see handout.

Triplet Finding

O(n3)Software

Processes

O(n)FPGA Firmware

Functions

O(N2)Implementations

Hough Trans., etc.

O(N*log(N))Implementation

Tiny Triplet Finder

Circular Tracks from Collision Pointon Cylindrical Detectors

• For a given hit on layer 3, the coincident between a layer 2 and a layer 1 hit satisfying coincident map signifies a valid circular track.

• A track segment has 2 free parameters, i.e., a triplet.

• The coincident map is invariant of rotation.

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100

0

16

32

48

64

80

96

112

128

0 16 32 48 64 80 96 112 128

1-3)+64

2-

3)+

64

Tiny Triplet FinderReuse Coincident Logic via Shifting Hit Patterns

C1

C2

C3

One set of coincident logic is implemented.

For an arbitrary hit on C3, rotate, i.e., shift the hit patterns for C1 and C2 to search for coincidence.

Tiny Triplet Finder for Circular Tracks

*R1/R3

*R2/R3

Triplet Map Output To Decoder

Bit

Arr

ay

Shifter

Bit

Arr

ay

ShifterBit-wise Coincident Logic

0

16

32

48

64

80

96

112

128

0 16 32 48 64 80 96 112 128

1. Fill the C1 and C2 bit arrays. (n1 clock cycles)

2. Loop over C3 hits, shift bit arrays and check for coincidence. (n3 clock cycles)

Readout Chip Issues

Data Rate of Readout Chips• SLHC:

– 80MHz– 4 hits/(1.28cm)2 (hits or clusters? cluster = 2.5 hits from BTeV.

So 4 hits = 1.5 clusters)– 16-bit/hit– 3.125 Gb/cm2/s, with 8b10b etc. 5 Gb/cm2/s

• FPIX2 readout chip at Fermilab BTeV:– Area: 128x0.05mm x 22x0.4mm = 0.56cm2.– Output: 6x140Mb/s = 840Mb/s.– 1.5 Gb/cm2/s.

• If we redesign FPIX:– 8x320Mb/s=2.56Gb/s; – 4.57 Gb/cm2/s

Core

Periphery

FPIX2 Readout Chip

Pixel Size: 50m x 400m

Columns: 22Rows: 128

Outputs:1,2,4 or 6 Cu pairs@140MHz

Core Organization

• Column-based architecture• Three mutually-dependent

parts:– Core Logic– End-of-column Logic– Pixel Cells

• Readout order:– Hit cell by hit cell in a column.– Column by column.– Not time ordered.

Pixel Cell

Pixel Cell

Pixel Cell

Pixel Cell

End of Column

Logic

Pixel Cell

Pixel Cell

Pixel Cell

Pixel Cell

End of Column

Logic

Pixel Cell

Pixel Cell

Pixel Cell

Pixel Cell

End of Column

Logic

Pixel Cell

Pixel Cell

Pixel Cell

Pixel Cell

End of Column

Logic

Core Logic

Pixel Cell

Flash Latch to Binary

Encoder

Thermometer

ThresholdsVdda

Test

Sensor Command Interpreter

00 -

01 -

10 -

11 -

idle

reset

output

listen

HFastOR

RFastOR Throttle4 pairs of

Command Lines

Kill

Inject

ADC

RowAddress

Read Clock

Read Reset

Token In

Token Reset

Token Out

ThresholdVref

Resets

Bus

Controller

Output of FPIX2 for BTeVb04 b03 b02 b01 b00b09 b08 b07 b06 b05b14 b13 b12 b11 b10b15b20 b19 b18 b17 b16b23 b22 b21

Row Column BCO(7:0) ADC 1Hit24

1Status 000Sync24

0 0XXXInvalid coding:

1 1 1 X0

• A hit is output using 24 bits, @140Mb/s per Cu pair.• User protocol is used as shown (not 8B/10B).• The BCO field takes 8 bits. (+16 bits = 24 bits)• To eliminate or reduce number of bits taken by the BCO, the

chip has to be redesigned to output time ordered data. Doable or not? It is possible but not obvious now. FPIX1 was designed with time ordered data, but was slow. Study is needed.

Trigger/DAQ System Model

A Model of Trigger/DAQ System

• The readout chips send hit data to the correlation logic module (CLM, ~Correlator/OptoTX [J. Jones]) just outside detector via copper links.

• The CLM find triplets and send initial angle, momentum of each triplet to L1.

• L1 system issues trigger commands back.• Readout chip send full data of selected BX to HLT/DAQ via CLM.

ReadoutChip

CorrelationLogic

Module

L1

10 m, Cu 100 m, fiber

ReadoutChip

ReadoutChip

ReadoutChip

HLT/DAQ

Triplet Data

L1 Trigger Commands

Full Data

Outsider of Steel?

Output of the Readout Chip

• Data volume from the readout chips is large. (Full rate 3.125 Gb/cm2/s)• Optionally, partial data can be sent to reduce the bandwidth (about

(1/5)* 3.125 Gb/cm2/s) since the CLM needs only:– coordinate with lower resolution (1/2) – of a hit cluster (1/2.5).

• Study on readout chip re-design is needed.

ReadoutChip

CorrelationLogic

Module

10 m, Cu

ReadoutChip

ReadoutChip

ReadoutChip

Row Column BCO(7:0) ADC 1Hit24

b04 b03 b02 b01 b00b09 b08 b07 b06 b05b14 b13 b12 b11 b10b15b20 b19 b18 b17 b16b23 b22 b21

L1 Trigger Commands

• The CLM find triplets and send initial angle, momentum of each triplet to L1 and L1 system issues a multi-bit trigger commands back.

• Readout chip send full data of selected BX to HLT/DAQ via CLM.

• The data volume of the selected BX is relatively small.

• Optionally, the correlation logic module can run one or a few longer algorithms (L1.5?) when the full data flow through. The HLT uses the results when making L2 decisions.

• So the multi-bit trigger command = {BX, L1.5 algorithm ID}: Dump data in 1234 and apply algorithm ABCD.

10 m, Cu 100 m, fiberOutsider of Steel?

ReadoutChip

CorrelationLogic

Module

L1Readout

Chip

ReadoutChip

ReadoutChip

HLT/DAQ

Triplet Data

L1 Trigger Commands

Full Data

More Readout Chip Issues,Latency etc.

Tiny Triplet Finder for Circular Tracks

*R1/R3

*R2/R3

Triplet Map Output To Decoder

Bit

Arr

ay

Shifter

Bit

Arr

ay

ShifterBit-wise Coincident Logic

0

16

32

48

64

80

96

112

128

0 16 32 48 64 80 96 112 128

1. Fill the C1 and C2 bit arrays. (n1 clock cycles)

2. Loop over C3 hits, shift bit arrays and check for coincidence. (n3 clock cycles)

Latency Budget Usagefor Triplet Finding Process

• CMS L1 decision time = 6.4 s, 2 x 0.5 s of it will be in cable delay.

• Filling the C1 and C2 bit arrays takes n1 clock cycles.• Looping over C3 hits, shifting bit arrays and checking for

coincidence take n3 clock cycles + # pipeline stages (about 10).

• Assume n1, n3 = 64, latency usage = 64 + 64 + 10 = 138 clock cycles.

• At 160 MHz (FPGA or ASIC) clock frequency, 138 clock cycles = 138/(160*6.4) = 13% (of 6.4 s CMS L1 decision time). This is only an example, but looks OK.

A Closer View of Latency Budget

• The readout chips send out hit data to the Correlation Logic Module.

• The triplet finding starts after receiving data of the first hit.

• After all hits are transmitted, phase 2 of triplet finding (looping over C3 hits, shifting bit arrays and checking for coincidence) runs.

• Triplet data are sent out after first triplet is found.

• After cable delay, the L1 starts L1 processes after receiving first triplet data.

• After all triplet data are received, the L1 command is issued.

• The L1 command is sent back and executed.

ReadoutChip

CorrelationLogic

ModuleL1

ReadoutChip

Triplet Data

L1 Trigger Commands

Output Hits

Triplet Finding (1) Triplet Finding (2)

Triplet Data Out

Cable (1) Cable (2)L1 Processes

Cable (1)L1

Max # of Hits/BX

• 4 hits/(1.28cm)2/BX is an average, in some BX, the # of hits may be many times larger.

• The readout chip should drop some hits if the # of hits/BX is too big or the time to output hits will be too long. (Note the 6.4 s L1 latency.)

• Consider 64, 128, 256 hits/(1.28cm)2/BX, i.e., x16, x32, x64 of average, the time to output the hits takes 16, 32, 64 BX on link with throughput match the average data rate. The output time: 0.2 s, 0.4 s, 0.8 s -- should be OK.

Output Hits

Triplet Finding (1) Triplet Finding (2)

Triplet Data Out

Cable (1) Cable (2)L1 Processes

Cable (1)L1

Readout Chip Spec• Should read out time ordered data.

• Should drop data gracefully if # of hits/BX is too big.

• Should drop data gracefully if # of hits/ several BX (a short term average) is too big.

• Should be able to output both brief data for trigger and full data for readout.

• Should store data on chip for 6.4 s .

• ?

Extra Possibility:Triggering on Low PT Features

Triggering on “Low PT” Features?• Many tracking algorithms degrade rapidly

when momentum of the track goes low.

• Circular track triplet finding does not need high PT assumption, so it does not degrade as rapidly.

• The trigger system discussed is especially suitable if one needs to trigger on low PT features of the event.

• In CMS 4T B field, all tracks look to be “low PT”.

Example: Finding “Soft Jets”• A simulated event with 200 tracks.

• Flat distributions.

• Min. R = 55 cm

• 8+8 soft tracks are added.• They are grouped in 2 small initial

angle regions, i.e., 2 “soft jets”.

Can you see the “soft jets”?

Can you see the “soft jets” now?

Track Initial Angle Distributions

Summary

Summary

• With experience from Fermilab BTeV on triplet finding, pattern recognition is not a problem. One should feel free to choose preferred detector layout.

• Data output rates of the current readout chips are close enough to the SLHC requirement. But studies are needed.

• Triggering on “low PT” features is possible. But studies are needed.

The EndThanks

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70

-64

-48

-32

-16

0

16

32

48

64

-64 -48 -32 -16 0 16 32 48 64

x3=+40cm

x3=-40cm

x3=0