overview of upgrade daq

30
Overview of Upgrade DAQ Niko Neufeld, CERN 1 st Upgrade DAQ Mini-workshop, May 27 th 2013, CERN

Upload: aviva

Post on 24-Feb-2016

73 views

Category:

Documents


0 download

DESCRIPTION

Overview of Upgrade DAQ. Niko Neufeld, CERN. 1 st Upgrade DAQ Mini-workshop, May 27 th 2013, CERN. Introduction – the task. Assuming there are 11448 GBT links for the DAQ to be read out (true in a Universe containing a pixel VeLo , a reduced OT and a SciFi ) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Overview of Upgrade DAQ

Overview of Upgrade DAQ

Niko Neufeld, CERN

1st Upgrade DAQ Mini-workshop, May 27th 2013, CERN

Page 2: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 2

Introduction – the task

Assuming there are 11448 GBT links for the DAQ to be read out (true in a Universe containing a pixel VeLo, a reduced OT and a SciFi)The total amount of data is thus 12000 x 3.2 == 38400 Gbit/s == 4.8 TB/s

GBT wide-mode disregarded for this talkThese data needs to be collected and filteredThe proposed system should be reliable, operated for 10 years by a small crew, scalable and as cost-effective as possible

Page 3: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 3

Some constraints

Input links are bundled in units of 24 Not a magic number but a good match for current state of the art FPGA technology

With foreseeable technology the required system (network + filter-farm) does not fit into the existing space, cooling and power available

CERN has agreed to provide a new dedicated modular data-centre – this has the additional advantage that in principle installation (where it makes economical sense) can start before LS2

Also the up-graded LHCb events will be small – MEP will be needed more than ever (packing factor will increase)

Page 4: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 4

The topic of this talkThe technologies and technology trends which we can use

I will limit myself here to the baseline: a farm of x86-servers interconnected with an industry standard local area network, i.e.:I do not consider the potential effect of micro-servers (Atom, ARM), GPUs or other co-processorsThis does not mean that we exclude the use of these, but the study of their implications on the DAQ architecture requires dedicated effort and needs (first) to be warranted by demonstrated potential gains (also we need topics for further workshops )

Which architectures are possible with these technologies and their relative merits

Page 5: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 5

The good things about our current DAQ – What we want to keep

Simple single-stage data-flowSmallest possible number of hardware components

It’s not easy to imagine to take something away and replace it by software and/or more work in other elements in the system (a kind of “Nash” equilibrium )

Scalability

Page 6: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 6

What we probably need to change in the Upgrade DAQ

Currently we buffer mostly in the networkIt looks like that buffering in the servers will be significantly cheaper and large buffers leave more options for the event-building protocols used

Currently the packing factor in MEP is determined by the Ethernet MTU (9000 bytes)

With hardware assists in network ASICs the MTU becomes less relevant, however reducing the message rate becomes even more important at 10, 20, 30 MHz

Page 7: Overview of Upgrade DAQ

Technology

Links, PCs and networks

Page 8: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 8

The evolution of PCs - 1PCs used to be relatively modest I/O performers (compared to FPGAs), this has radically changed with PCIe Gen3 Xeon processor line has now 40 PCIe Gen3 lanes / socketDual-socket system has a theoretical throughput of 1.2 Tbit/s(!)

Tests suggest that we can get quite close to the theoretical limit (using RDMA at least)

This is driven by the need for fast interfaces for co-processors (GPGPUs, XeonPhi) CPU will be the bottle-neck in the server - not the LAN interconnect – 10 Gb/s by far sufficient

Page 9: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 9

The evolution of PCs - 2

More integration (Intel roadmap) Integrate all important parts of the server in the CPU:

memory controller , PCIe controller NIC (2 – 3 years?) physical LAN interface (4 – 5 years)?

Aim for high density, because ofshort distances (!)efficient coolingless real-estate needed (rack-space)

Page 10: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 10

EthernetThe champion of all classes in networking10 Gbit/s, 40 Gbit/s, 100 Gbit/s out and 400 Gbit/s in preparationAs speed goes up so does power-dissipation high-density switches come with high integrationMarket seems to separate more and more in to two camps:

carrier-class, deep-buffer, high-density, flexible firmware (FPGA, network processors) (Cisco, Juniper, Brocade, Huawei), $$$/portdata-centre, shallow buffer, ASIC based, ultra-high density, focused on layer 2 and simple layer 3 features, very low latency, $/port (these are often also called Top Of the Rack TOR)These trends get more pronounced as speed goes up – i.e. from 10 Gb/s to 40 Gb/s

Page 11: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 11

The “two” Ethernet’sSpeed Core [ USD / port ] TOR [ USD / port ]

10 Gb/s 400 – 1000 200 - 250

40 Gb/s 1000 - 4000 500 - 900

If possible less core and more TOR portsbuffering needs to be done elsewhere

Prices exclude optics

Page 12: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 12

InfiniBandDriven by a relatively small, agile company: MellanoxEssentially HPC + some DB and storage applicationsOnly competitor: Intel (ex-Qlogic)Extremely cost-effective in terms of Gbit/s / $Open standard, but almost single-vendor – unlikely for a small startup to enter Software stack (OFED including RDMA) also supported by Ethernet (NIC) vendors

Many recent Mellanox products (as of FDR) compatible with Ethernet

Page 13: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 13

InfiniBand pricesSpeed Core [ USD / port ] TOR [ USD / port ]52 Gb/s 850 250

Worth some R&D at least

Speed Core [ USD / port ] TOR [ USD / port ]

10 Gb/s 400 – 1000 200 - 25040 Gb/s 1200 - 4000 500 - 900

Page 14: Overview of Upgrade DAQ

14

The devil’s advocate: why not InfiniBand *?

InfiniBand is (almost) a single-vendor technologyCan the InfiniBand flow-control cope with the very bursty traffic typical for a DAQ system, while preserving high bandwidth utilization? InfiniBand is mostly used in HPC, where hardware is regularly and radically refreshed (unlike IT systems like ours which are refreshed and grown over time in an evolutionary way). Is there not a big risk that the HPC guys jump onto a new technology and InfiniBand is gone?The InfiniBand software stack seems awfully complicated compared to the Berkeley socket API. What about tools, documentation, tutorial material to train engineers, students, etc…?

Upgrade DAQ Overview - Niko Neufeld

* Enter the name of your favorite non-favorite here!

Page 15: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 15

The evolution of NICs

2008 2012 2013 2014 2015 2016 20170

20

40

60

80

100

120

140

1040

100

32

54

100

32

64

128

Ethernet InfiniBand x4 PCIe x8

Gbit/

sPCIe Gen4expected

Chelsio T5 (40 GbEand Intel 40 GbEexpected

Mellanox FDRMellanox 40 GbE NIC

PCIe Gen3available

EDR (100 Gb/s) HCAexpected

100 GbE NICexpected

Page 16: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 16

The evolution of lane-speedAll modern interconnects (> 10 Gb/s) are multiple serialAnother aspect of “Moore’s” law is the increase of serialiser speedHigher speed reduces number of lanes (fibres)Cheaper interconnects also require availability of cheap optics (VCSEL, Silicon-Photonics)VCSEL currently runs better over MMF (OM3, OM4 for 10 Gb/s and above) per meter these fibres are more expensive than SMFCurrent lane-speed 10 Gb/s (same as 8 Gb/s, 14 Gb/s)Next lane-speed (coming soon and already available on high-end FPGAs) is 25 Gb/s (same as 16 Gb/s)

should be safely established by 2017 (a hint for GBT v3 ?)

It is widely believed that the volume of 100 Gb/s deployments and beyond will at least use 25 Gb/s (not 10 Gb/s) – example InfiniBand EDR (this year!). PCIe may soon be the only volume market for 10 Gb/s VCSELS.

Page 17: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 17

The evolution of switches – BBOT(Big Beasts Out There)

Brocade MLX: 768 10-GigE Juniper QFabric: up to 6144 10-GigE (not a single chassis solution)Mellanox SX6536: 648 x 56 Gb (IB) / 40 GbE ports Huawei CE12800: 288 x 40 GbE / 1152 x 10 GbE

Date

of r

elea

se

Page 18: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 18

Inside the beasts

Many ports does not mean that the interior is similar Some are a high-density combination of TOR elements (Mellanox, Juniper Qfabric)

Usually just so priced that you can’t do it cheaper by cabling up the pizza-boxes yourself

Some are classical fat cross-bars (Brocade)Some are in-between (Huawei, CLOS but lots of buffering)Upshot: Suitable devices exist for both InfiniBand and Ethernet cost will determine

Page 19: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 19

The eternal copper

Surprisingly (for some) copper cables remain the cheapest option, iff distances are shortA copper interconnect is even planned for InfiniBand EDR (100 Gbit/s)For 10 Gigabit Ethernet the verdict is not yet passed, but I am convinced that we will have 10 GBaseT on main-board “for free”

It is not yet clear if there will be (a need for) truly high-density 10 GBaseT line-cards (100 ports+)

Page 20: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 20

Keep distances shortMulti-lane optics (Ethernet SR4, SR10, InfiniBand QDR) over multi-mode fibres are limited to 100 (OM3) to 150 (OM4) metersCable assemblies (“direct-attach) cables are either

passive (“copper”, “twinax”), very cheap and rather short (max. 4 to 5 m), oractive – still cheaper than discreet optics , but as they use the same components internally they have similar range limitations

For comparison: price-ratio of 40G QSFP+ copper cable assembly, 40G QSFP+ active cable, 2 x QSFP+ SR4 optics + fibre (30 m) = 1 : 8 : 10On-chip transceivers (si-photonics) are expected to be limited to about 70 m (on MMF)

Page 21: Overview of Upgrade DAQ

Architecture

Make the best of available technologies

Page 22: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 22

Principles

Minimize number of expensive “core” network portsUse the most efficient technology for a given connection

different technologies should be able to co-exist (e.g. fast for building, slow for end-node)Try to be open for interconnect technologykeep distances short

Exploit the economy of scale try to do what everybody does (but smarter )

Page 23: Overview of Upgrade DAQ

23

100 m rock

Complete read-out on/to the surface

Upgrade DAQ Overview - Niko Neufeld

Detector

DAQ network

Readout Units

Compute Units

Long distance covered by versatile links (GBT) Maximum of 303 m from FE on detector to AMC40 input Measurements with prototype FE and AMC40 show that this works well with excellent marginsCan use cheapest commercial links Easiest operation (accessibility)

Page 24: Overview of Upgrade DAQ

24

Split read-out system

Upgrade DAQ Overview - Niko Neufeld

Detector

DAQ network

100 m rock

Event-building / HLT0

Compute Units

In this scenario the event-building is done in UX85APart of the HLT can also be run therePros:

fewer links to the surface

Cons: need to refurbish completely D1 and D2Remaining links to the surface will be very expensive – adding bandwidth (scaling!) becomes expensive

Since the bulk of the cost is in the connectors, break-points and testing of the fibres and not in the material itself, the cost savings due to reduced number of links are not as big as might be hoped

Page 25: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 25

Protocols & topologies

Pull, push, barrel-shifting can be done of course on any topology (more in Daniel & Guoming’s talk)Also it is not an ideological discussion between “pushing” or “pulling” – after all the data flows always from the detector to the farm , but which management of the flow is the most efficientIt is more the switch-hardware (and there in particular the amount of buffer-space) and the properties of the low-level (layer-2) protocol which will make the difference

Page 26: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 26

Classical fat-core event-builder

About 1600 core ports

Page 27: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 27

Uniform event-builder network

10 GBaseT

100 Gb/sbi-directional

These 2 not necessarily the same technology

Page 28: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 28

Uniform NetworkAdvantages

Fewer overall network portsTechnology independence of event-building networkAllows simple push architecture with minimum buffering in AMC40Allows using most light-weight event-building based on Remote DMA

ChallengesLots of I/O in event-building serversNeed to find the most efficient way to couple the AMC40 to a PC: Ethernet, InfiniBand, PCIeNeed careful task organization on event-building PCs (task priorities, IRQ handling etc…)

Page 29: Overview of Upgrade DAQ

29

PlanningUnder the assumption that we need to be ready to take data in 2019

LS2 shifts so do weSystem for SD-tests ready whenever needed minimal investment

2013: DAQPIPE ready, tests of 40 GigE and IB, PC I/O technology, PCIe performance (see talks by Umberto and Beat)2013 – 16: technology following (PC and network)2014 - 15: Large scale IB and Ethernet tests2016: TDR & tender preparations 2017: Acquisition of minimal system to be able to read out every GBT (at some speed)

Acquisition of modular data-center2018: Acquisition and Commissioning of full system

starting with networkfarm as needed

Upgrade DAQ Overview - Niko Neufeld

Page 30: Overview of Upgrade DAQ

Upgrade DAQ Overview - Niko Neufeld 30

Conclusions

Technology is on our sideAim for a simple data-flow (push wherever possible), as few components as possible Stay flexible: decide on core network technology as late as possibleUniform network currently looks like the best architecture to achieve the above goals