reportdavid.web.cern.ch/david/pasta/pasta-wgg-final.doc · web viewpasta 2002 edition working group...

24
CERN Technology Tracking for LHC PASTA 2002 Edition Working Group G Computer Architectures System Interconnects Scalable Clusters Arie Van Praag, CERN IT/ADC 30 th September 2002

Upload: hathuy

Post on 14-May-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

CERN Technology Tracking for LHC

PASTA 2002 Edition

Working Group G

Computer ArchitecturesSystem InterconnectsScalable Clusters

Arie Van Praag, CERN IT/ADC

30th September 2002

PASTA 2002 WORKING GROUP G

Introduction

With the rise of clock frequencies from the first Pentium processors of 60 MHz up to 3 GHz in the Pentium 4, PC’s have an excess of power for most daily tasks. For normal office work and home applications, a 300 MHz Pentium was adequate. This brings PC development to a crossroad that changes the orientation of mass production. A split will occur between the standard desktop, high-end PCs and Intel oriented Workstations. Also, an increasing part of the market will be portables and soon the so-called “tablets”. Besides the Intel Pentium, other manufacturers produce equivalent processor chips such as the AMD Athlon and the low power Crusoe. They all have different architectures that may have a different efficiency when running LINUX.

Networking and Interconnects move also up for an order of magnitude into the bandwidth rang from 10 Gbit/s to 40 Gbit/s. The I/O and storage connections will profit from new standards. Network storage has become more reliable and allows bandwidth that can not be reached with classical file servers.

System Interconnects and I/O facilities

2.1 Busses

The bus that delivers the highest performance is still the SGI XIO bus. Sun’s UPA and the build-in crossbar switches seems to have disappeared, but a large part of the design objectives have been realized with what is now Infiniband. Firewire is unchanged and not very widespread and USB is now in its second version increasing the bandwidth to 60 MByte/s. The PCIbus standard is extending rapidly with PCI-X, PCI-X2 and the serial PCI-Express.

2.1.1 IDE developments

Changes in the sector of disk connections are mainly in the physical connection. The parallel connection ATA flat-cable will be replaced by a new standard called “Serial-ATA”. The physical connection is in the form of a coaxial cable pair completed with power wires, and multiple discs are connected in a star pattern.

The standard knows three classes: 1500 = 150 MByte/s3000 = 300 MByte/s6000 = 600 MByte/s

As the difference is purely in the hardware, no new drivers are needed. Serial ATA disks and interfaces are demonstrated and a large activity is going on at the Disk manufacturers and in the Silicon industry.

PCI interfaces for Serial-ATA with or without RAID possibilities are announced with the first products to be delivered in 2 Q 2002.

Most probably, the fastest classes will first be introduced in large server machines. The lower two classes may appear with 1500 level systems soon as the thinner cables take serious cooling problems away. The serial cables is meant to be more robust than the parallel flat cables now in use.

2.1.2 SCSI developments 2002

1

PASTA 2002 WORKING GROUP G 5/8/2023

SCSI is currently at 160 MByte/s and 320 MByte/s still announced. A standard for Serial-SCSI is in discussion that adapts the physical layer of Serial-ATA class 3 with a bandwidth of 600 MByte/s for SCSI connections. There are no products announced by now. However the same prognosis may be valid as for Serial-ATA.

2.1.3 iSCSI

During the October 2000 workshop at CERN Julian Satran from IBM Haifa presented his project for networked storage called iSCSI. The international handling of this TCP/IP extension of SCSI is handled by ANSI T10 and is now an accepted standard.

iSCSI is very well received and a number of products, including iSCSI to Fiber Channel converters for connections to NAS and SAN storage and PCI interfaces are available. The last one using PCI 64/66 and a TCP/IP offload engine in the interface.

This kind of interfaces brings network traffic preparation back to a DMA transfer and as such, offloads the processor and memory. Using this kind of interface for TCP/IP or iSCSI interface in disk servers can substantially increase I/O bandwidth. However network latency will be at best slightly shorter. iSCSI has a bright future in large low-end clusters and in database applications.

2.1.4 USB

USB-1 is now a commodity serial I/O bus with a long list of commercial products available

New is USB-2, announced with some specification corrections and with a bandwidth increase up to 60 MByte/s while still being compatible with USB-1. Intel and a number of other chipmakers have adopted USB-2 and incorporated it on many motherboards. Also peripherals start to become commercially available.

2.1.5 IEEE 1394b

Introduced by Apple in 1993 as Firewire, the IEEE1394b standard for high speed serial I/O is not very wide spread. It has been adapted by Apple, and can be found on a number of digital video cameras.

2.1.6 PCI developments 2002

PCI 66/64 can be found mostly on server systems and on a limited number of small systems. Implementation of the PCI-X standard from 1999 can be found on some newer server systems but introduction is slow.

The PCI-X 2 standard is adapted with a maximum clock of:266 MHz with a theoretical bandwidth of 2.13 GByte/s533 MHz with a theoretical bandwidth of 4.26 GByte/sThe standard supports DDR and QDR technology.

As with all PCI figures, they include + 25 % bus overhead.

PCI Express is new and is a new name for 3GIO or project Arapahoe (see NGIO). Again the first applications will be in high-end servers moving down to smaller systems. It is questionable if the highest data rates of PCI-X 2 will ever be found in low-level workstations as the resulting bandwidth depends also on processor and memory. PCI Express is an internal serial bus and not an interconnect; it has to move the market away from the massively installed and popular PCI standard to be successful.

2

PASTA 2002 WORKING GROUP G 5/8/2023

2.2 New Interconnects

2.2.1 NGIO and Future I/O

NGIO and Future I/O did not make it to a standard but merged into the INFINIBAND (see 2.3 ).

Intel continued the work on NGIO under the project name Arapahoe, resulting in an industrial proposal 3GIO. A symmetrical bi-directional bus with 2.5 Gbit/s bandwidth and 32 and 64 bit address space. Its implementation is on the chipset level and is as such an internal I/O bus. Intel proposes 3GIO to be PCI Express.

This may be a political move by Intel to impose 3GIO against the superior HyperTransport from AMD ( see also PCI Developments 2.1.3 ). ( PCISIG the PCI governing organization was founded by Intel and is still largely depended on their financial backup.

2.2.2 HyperTransport

HyperTransport is proposed by AMD as the future internal bus for its Intel equivalent processors. It is a 12.8 GByte/s asymmetrical bi-directional bus with 64-bit address space. The physical connections, called “tunnels” are 8 or 16 bit wide. The chipset includes a small routing crossbar switch.

The specification foresees a connection to the outside that can be used: either to connect peripherals or to connect up to 4 or 8 systems to a small cluster with cross-linked memory access. Several independent chipset makers announced that they support HyperTransport (VIA P.R. 1-7-2002)

HyperTransport is theoretically superior to 3GIO from Intel but market acceptance will decide its future.

2.2.3 I2O

This serial bus developed in several Industrial busses with applications in control and robotic technology (Fieldbus, Canbus, Profibus, etc ). It is a relatively slow interconnect that may find its place for slow control in experiments.

However USB 1 and 2 and IEEE1394b are both children of the Philips I2O bus.

2.3 LANs as System interconnects

The example of GSN to demonstrate a reliable network in the 10 Gbit/s range generated several similar interconnects. INFINIBAND copied many details from the GSN standard and has 3 bandwidth options, 2.5 Gbit/s, 10 Gbit/s and 30 Gbit/s.

The Ethernet community introduced a 10 Gbit/s version as 10GE, and works already on a future 40 Gbit/s.

In the Ethernet type of connection a second reversal took place in that the relatively slow processors could not keep up with always-faster networks. With multi-GHz processors this is not valid anymore. A new reversal will come with 10 Gbit Ethernet. The tendency to bring network handling to the interface will only move the latency problem to the network connection, but not take it away. However processor and memory efficiency will increase.Tremendous developments have been seen in wireless networking with at least three different standards commercially available.

2.3.1 GSN

3

PASTA 2002 WORKING GROUP G 5/8/2023

GSN has developed to a serious network standard and is used in a number of computer centers in the USA and in Europe. It is the only “secure network” ( to be understood as “granted Data Integrity” ) in this bandwidth range and is the network with the lowest latency. Physical connections are available as copper cable and as parallel fiber cable.

Bridges with conversion to Gigabit Ethernet, Fiber-channel, HIPPI and SONET OC48/SDH16 exist.GSN has its own low latency protocol Scheduled Transfer (ST) based on network DMA, but is also fully TCP/IP capable. A SCSI extension over ST ( SST ) makes storage access over the network possible ( SAN, NAS etc. ).

GSN has been demonstrated on systems with high bandwidth memory and high bandwidth I/O to reach almost wire speeds ( 780 MByte/s ) with a processor use under 5 %. GSN is applied at + 40 places mainly in the USA and some in Europe.

GSN is successfully applied in high-end systems but low-end servers and workstations for the moment miss the necessary I/O bandwidth to use the full bandwidth. PCI-64/66 interfaces have shown that 400 MByte/s is obtainable.

2.3.2 FC, HIPPI, Myrinet

Fiber Channel (FC) crystallized out to be dominant in one field: fast hard disks for SAN technology. To-days standard is 1 Gbit/s and 2 Gbit/s is also operational. . 10 Gbit/s is announced but given the difficulty to reliably implement the different protocols at this speed, it will not be around soon.

Even if HIPPI had not seen new technological developments after Serial-HIPPI, it is still a strong contender with a constant commercial activity. Myrinet is still a relatively cheap interconnect that has proven reliable up to 1Gbit/s. Versions of 2 Gbit/s and 10 Gbit/s are announced. For cluster computing Myrinet seems, according to several reports, to have proven communication and latency advantages in combination with MPI.

2.3.3 INFINIBAND

Infiniband is a combination of Fiber-channel and PCI at one end together with GSN, RIO and some IBM technology. The definition is “a network capable interconnect”. It has coaxial copper connects for short distances and parallel fiber connections for the longer distances. The basic speed is 2.5 Gbit/s that can be multiplied by 4 and by 12, Striping is sequential over the parallel connections. Networking is done with crossbar switches.

Infiniband has many proprietary protocols but supports also TCP/IP. One of these protocols is foreseen for system management.

RDMA is a low latency protocol much the same as ST and also based on network DMA. The fact that every INFINIBAND node can be its own network controller makes maintenance difficult. Intel stopped chip design for INFINIBAND, and in the same move forwarded its own PCI Express to be a system interconnect ( see also NGIO ).

INFINIBAND will survive Intel’s move away from producing specialized silicon as a very large number of firms adhere to the Interest group. But INFINIBAND is a complex system with many protocols, which will mean it will be successful in one or more key applications such as Blade Servers (as Fiber-channel in storage) and will lose against more accepted networking standards.

4

PASTA 2002 WORKING GROUP G 5/8/2023

2.3.4 Advances in Ethernet

AS PC clock frequencies have gone up to over 2GHz the bottleneck of packet building and IP stack control for Gigabit Ethernet have almost gone. Several companies announce network cards with IP stacks as part of the interface. However if this offloads the processor it will shrink the transfer latency only marginally.

The 10 Gigabit Ethernet ( 10GE ) standard is accepted on 17 June 2002. The physical connection has many different options and is divided in two classes.

The first uses 4 streams at 2.5 Gbit/s with Coarse Wavelength Division Multiplexing (CWDM). A technology that is not applicable for longer distances, and is very difficult to make stable.The second is a set of single fiber single stream connections with different wavelengths using multimode fiber or single mode fiber. Distances of up to 80 Km are announced.

Part of the standard is the modulation of 1500 Byte Ethernet frames in 64 KByte SONET/SDH packets in POS mode making long distance transport of Ethernet frames over standard communication channels possible. To avoid conflicts by using this feature it is impossible to allow Jumbo frames. After adapting communication hardware Ethernet frames will move directly over the wan, with a tenfold reduction of (service) cost.

The 10 Gigabit Ethernet standard is made in a way that it can easily be adapted to higher speeds. Work on 40 Gigabit Ethernet has started. The introduction of 40GE will mainly be decided on the price performance relationship of the necessary optical components.

10 GE will become successful in the coming years, as backbone interconnect and for high-end system interconnect. The compatibility with SONET/SDH will make it the ideal WAN interconnect. However the small Ethernet frames will be a constant problem introducing latency, especially as Jumbo-frames are refused as an official standard. Again Network Offload Engines may bring some help for processor efficiency, but displace the network latency to the Interface.

2.3.5 Wireless LAN’s

The last three years have seen the introduction and commercialization of 803.11b wireless Ethernet with a theoretical bandwidth of 10 Mbit/s. It uses the free 2.4 GHz band and interference with other users of this frequency is sometimes a problem. Early security problems get corrected by time, as encoding algorithms grow better.

More recently the wireless Ethernet standard was followed by 803.11a with a bandwidth of up to 50 Mbit/s, using a frequency band at 5 GHz and the latest 128 bit security algorithms were implemented. Both 803.11 variants should be compatible except for speed.

The Bluetooth standard is extending rapidly. It uses the 2.4 GHz band and copies many of the characteristics of 803.11b, but is also able to use other protocols for printer connections etc. Its range is limited to about 5m to 10 m due to the low transmission power.

All three of these systems may be very useful in office applications and may avoid pulling new network cables. However in places as a computer center, or HEP experiments, the high level of electromagnetic noise may influence reliable working.

2.4 Middleware

2.4.1 VIAVia has not succeeded to impose itself. However Microsoft is still working on a version that merge the VIA ideas and ST technology. (Microsoft hired Jim Pinckerton from SGI)

2.4.2 PVM

5

PASTA 2002 WORKING GROUP G 5/8/2023

2.4.3 MOSIX

MOSIX is software that can transform a cluster of PC's (workstations and servers) to run almost like an SMP. MOSIX can support configurations with large numbers of computers, with minimal scaling overheads to impair the performance. A low-end configuration may include several PCs that are connected by Ethernet, while a larger configuration may include workstations and servers that are connected by higher speed LAN, e.g., Fast Ethernet.

Even if the Hebrew University of Jerusalem puts a real effort into this OS extension, the network interconnects will always be a disadvantage against real SMP systems.

2.4.4 MPI

MPI is successfully applied in clusters that need inter-processor message passing at NASA Ames, Mountain View and at Sandia Lab, Albuquerque. Activities are going on to make the I/O part of MPI more performant.

2.4.5 Threading

Methods have been developed by Intel and AMD to increase processor efficiency by allowing multiple threads pipelined or simultaneously in the processor. Intel will introduce this technology in the XEON version to be introduced for 3 Q 2002 and AMD announced this feature for the HAMMER series.

Scalable Clusters

Tendency is to see three different technologies continuing in scalable clusters, with a multitude of smaller partners around one main player. Such the MIPS group with SGI, The SPARC group with SUN and the PowerPC group with IBM. In addition, all three propose Intel based clusters with proprietary or shareware operating systems (LINUX).

3.1. High Level Clusters

Under pressure of the ASCI project in the USA and the Global Simulation project in Japan, the capacity of high level clusters move toward or beyond 10 Teraflop. They are the domain of view manufacturers such as Compaq with Alpha processors, IBM with PowerPC processors and SGI with MIPS and/or Itanium processors in the USA, and NEC with its own vector oriented microprocessor in Japan. Some of these systems can be delivered with the LINUX OS.

High-end clusters will continue to exist. They will be used mainly in large simulation programs, weather forecast and biological computing where multiple processors need access to a single large data frame.

3.2. Medium Level Clusters A class of systems built especially to be combined into clusters can be seen as medium level cluster machines. They are formed for a large part by the so-called 1” high “pizza box” systems that exist mainly as one and two processor configurations with some four processor exceptions.

Very high-density clusters can be built with these systems, but power dissipation in a small volume is a problem. This adds to an installation price that is already higher than double

6

PASTA 2002 WORKING GROUP G 5/8/2023

processor workstations. Pizza box systems are available from all larger manufacturers and some smaller start-up companies.

Newcomers in this field are the so-called “Blade Processors”. They can be seen as naked motherboards that plug into a back-panel with power and networking interconnects. To avoid power problems the blades are normally equipped low power processors. Production methods can be compared to the miniaturization of portables together with the use of high reliability components, which keeps prices high. Blade systems are available or announced from Compaq-HP, Dell and Sun. It may be of interest to make a comparison of price performance relation if these systems become more common and the costs drop near workstation prices. Medium level clusters is what Service Providers need, such there is a huge market out there. But they will keep high prices, ones due to the high demand and secondly because of the high reliability components included. Sophisticated cooling needs increment the cost of ownership.

3.3. Low cost Clusters

PC based clusters continue to grow in popularity as the basic machines get more powerful, and in the extended instruction set and with faster clock speeds. Memory bandwidth will adapt to incremented processor speeds with code 2700, code 3200 and code 4300 DDR RAM that allows 533 MHz bus speeds. Storage access and latency will be better with the upcoming standard of Serial-ATA, which allows throughputs up to 600 MByte/s. Intel announced a new version of the XEON processor, that is faster and includes the P4 processor core.Double processor systems are relatively widespread now. However 4 processor systems are only available in the server market with a consequent pricing (DELL, COMPAQ-HP) more that 4 processor systems are rare and have only view sources.AMD has entered the market with its own Intel compatible processor, named Athlon. Many experts say that the AMD Athlon series are up to 20 % more efficient with the LINUX OS than Intel.

The battle is open between AMD and Intel, influencing component prices. It will be interesting to compare both products in a small LINUX clusters. Double processor workstations will become available for both the latest Athlon and the new P4 XEON.

3.4 64 bit Processors

The Intel Itanium chip has long been plagued with bugs. Its low clock speed makes it less performing than the P4. Its follow up the Itanium 2, a 212 Million-transistor chip will become available mid July 2002 with a clock speed of 1GHz. Even with incremented clock speeds it is less performing that its latest P4 versions with a 2.5 GHz clock. The Intel 64 bit systems cannot run 32-bit code.

Itanium 2 will be delivered in 3 versions: with 1 GHz clock and 3 MByte cache, with 1 GHz clock and 1.5 MByte cache and 900 MHz with 1.5 MByte cache. Prices for the processor will be between $ 4227 and $ 1177.

The AMD demonstrated prototypes of its Hammer series. They are produced using 0.13 μ rules with announced clock speeds between 2 GHz and 3.4 GHz and with an 800MHz HyperTransport as front-end bus that allows systems with up to 8 processor. The AMD Hammer series run 64-bit code and 32-bit code as is demonstrated with 64-bit LINUX and 32 bit Windows on a single system. The Hammer Series “Opteron” is announced for a SPECint 2000 score of 1,202 and an estimated SPECfp 2000 score of 1,170. The silicon for the Hammer series is made in Germany.

3.5 High Level Cluster Management

Except for commercially delivered high level clusters there are practically no general high

7

PASTA 2002 WORKING GROUP G 5/8/2023

level cluster management systems available. Compaq uses a separate network using Myrinet for management of its clusters, IBM uses special motherboards with two Ethernet interfaces of which one is separately powered and allows system reset.LINUXNETWORX has announced a cluster management system that is universally working for LINUX systems.

3.6 Low Level Cluster Management

No ready to use low level control system is available. Simple low cost data acquisition & control components can be used to install a software controlled selective reset generator. A coupling between both levels of management is necessary to select the machine(s) that need to be restarted.

8

PASTA 2002 WORKING GROUP G 5/8/2023

Issues for a Future Cost Optimized Cluster

The PC market is for the moment at a cross point that makes it extremely difficult to give an opinion of what will be the best and most cost effective solution in some years from now. The art will be to read the different tendencies and decode what the longer term future will bring, and deduct from it what will be useful for CERN.

4.1 Tendencies for Cluster ComputingWith the new generation of Pentium processors with clock speeds up to 3 GHz the Desk-Top PC has reached a capacity that is almost excessive for standard desktop applications. Games machines will move to specialized game boxes with a negative influence on the standard PC market.

Besides the standard PC the need for high-end machines is limited to some specialized applications such as high-end graphics, CAD, etc. And this is a limited market share of less than 10 %. A new market trend seen is, especially in the student market but also the SOHO user: the move to portables, After many years of announcements and prototypes it seems that the Tablet computer is finally appearing and projected to gain a substantial share of the market.

This means that a number of very important decisions have to be made about future machines to be used in the CERN cluster to obtain the necessary capacity on a most cost effective basis ( see also Fig 1 for 32 bit machines and Fig 2 for 64 bit machines ).. The number of parameters to choose from is larger than ever before and may influence the number of machines necessary. The price to pay for different processors / manufacturers may bring differences of up to 30 %.

4.1.1 Tendencies for 32 bit machines:

Both Intel and AMD processors are available in double processor systems. Such a first choice has to be made between this two based on tests which of the two is more efficiently handling LINUX. On the actual base this decision will influence costs with a difference of up to 30 %.

4.1.2 Desk-Top machines

The development of new Desk Top machines will stagnate during the coming years, but they will become cheaper. It needs careful analyses if CERN wants to continue in this direction.

9

Fig: 1 The decision field for 32 bit processors

PASTA 2002 WORKING GROUP G 5/8/2023

4.1.3 Workstations

This type of machine that is also called Super Desk-Tops are normally capable of higher throughput, but reach this by more elaborate circuitry and a more complex architecture. Also built with professional components for reliability these systems are normally less cost effective as the double processor Desk-Tops.

4.1.4 Blade systems

Build on a single board and plugged into a special back-plane, Blade Processors concentrate high computing power on a small volume. To avoid heat buildup they are made of low power components and run at lower clock speed. All three manufacturers of such systems warn that even as the price per processor seems low, a crate with a network back-plane and power supply are necessary. Once assembled these systems turn out to be more expensive as the same capacity with independent systems. Added to this should be the infrastructure for heat dissipation. Blade systems have no individual hard drive and have to boot from a server. Blade systems are proposed by H.P. and by IBM and some small manufacturers

4.1.5 Compact Systems

Compact Systems are in reality a variant on Blade Systems with that difference that they are mechanically more developed. The base is not a single blade but a small plug in unit that only needs an outside power supply to be able to work independent, and a back panel for IO connections. In general the Compact Systems boot from a server. Compact systems can be build to very large systems with up to several thousands of processors. Communication between nodes is mostly Infiniband, but IO to the outside world is of the Ethernet Type. Compact systems are made or announced by LinuxNetworX, DELL, Mellanox and some other small manufacturers. They should not be confused with Compact PCI systems which belong to the embedded systems category.

4.1.6 Embedded Systems

Embedded systems are mainly made for factory automation and robotics and come in different qualities. They have many processor options, but are in general single processor systems. Build to be part of a larger machine this machines come naked and need to be integrated in an infrastructure that consists of housing, power and cooling. Many embedded systems have basic Ethernet connections build in, but not Gigabit Ethernet. A class apart under the embedded systems are: the Compact PCI systems and in the future a number of Infiniband systems. But this last two will stay expensive and as such will be less cost effective. The fact that an costly infrastructure is needed makes theme less useful to build large clusters

4.1.7 Custom Build Systems

A number of medium sized companies specialized in the production of small series of machines. The customer can make his own choice concerning case, motherboard, processor(s), storage and IO connections. Pricing if somewhat more expensive are still competitive. For the CERN cluster this would give us the possibility to gradually modernize, to rack mounted systems with its own selected set of components for a reasonable price-performance ratio.

4.1.6 Tendencies for 64 bit machines

10

PASTA 2002 WORKING GROUP G 5/8/2023

The class of 64 bit microprocessors is around for some time, but until recently they were company oriented, such as the Alpha processor at Compaq, the MIPS processor for SGI and the Sparc chip for SUN. Their use is mainly in high-end servers. This picture will change with the arrival on the market of masse produced 64 bit microprocessors from Intel and AMD. The architectural approach of this two machines is different and will influence largely the field of applications.

4.2.1 Intel 64 bit

The Intel ITANIUM is a pure 64 bit engine with a complete new architecture based on the HP RISC chip but with a new instruction set. If older software can be recompiled directly, it must also be adapted to run efficiently. Many manufacturers announced plans to use the Itanium type of chips in future machines once available on the market. According to Intel this 64 bit systems should be used in high-end servers where its enormous address space can bring an advantage. Multiprocessor systems are foreseen, but up to now single processor systems coupled by advanced networking are the only ones available.Even after 5 years of development the Intel 64 bit processors are still less powerful than the fastest Pentium processors. An effort is made by both H.P. and Intel to adapt LINUX to the Itanium type of processors.

4.2.2 AMD 64 bit

The AMD Hammer series have a completely different approach. Being 64 bit processors with their own instruction set they are able to handle the 32 bit Pentium instruction set. Several versions are announced with a difference in the way internal connections are handled and the number of processors that can be handled efficiently in SMP systems.The Hammer series use the HyperTransport standard as internal interconnect, and as such can have up to 8 processors or machines linked up for SMP in a memory coherent system. AMD announces Hammer series of machines for the Desktop with up to 2 processors and for servers with up to 8 processors. The HAMMER series will be made in 130 nm technology such that clock frequencies will start at 3 GHz. Prototypes of the HAMMER series are publicly shown and production is expected 1Q 2003. For example the Hammer Series “Opteron” is measured for a SPECint 2000 score of 1,202 and an estimated SPECfp 2000 score of 1,170, running at a clock speed of 1 GHz. Even though AMD is a US company the HAMMER series of processors are produced in Dresden, Germany.

11

Fig 2 The decision field for 64 bit processors

PASTA 2002 WORKING GROUP G 5/8/2023

Tendencies for Networking and Interconnects

In the future more high speed interconnect systems will be build directly into the motherboard, such as: Gigabit Ethernet? ?

InfinibandHyperTransport PCI Express

Also Network equipment gets more modular with small plug-in modules to adapt to these technologies. It may be good to start now to think how to handle this multitude of possibilities that come with these systems.

Another question is if we do need in the future to have extensive WAN connections and how to handle them in an easy way. Of course there is 10 GE and it’s substandard to handle Ethernet frames in OC192/SDH48 in POS mode that imposes itself.

An other trend is to get network interfaces with TOE ( Transfer Offload Engines ). They have the TCP/IP stack sometimes iSCSI capable processors or hardware on board and avoid latency in processor activities and memory bandwidth by relying only on DMA transfers. Those developments seem to be of crucial interest to gain better performance from the CERN disk servers, and at some critical path network servers in our data acquisition and file handling clusters.

The following figures give some basic ideas how future cluster interconnects may look as multi Gbit networks get commodity interconnects.

12

To More Clusters Storage Areas etc.

Ethernet Ethernet

Ethernet Ethernet

Gigabit Ethernet

Gigabit Ethernet

Gigabit Ethernet

GEFabric

To Storage and

Networking

To More Clusters Storage Areas etc.

Ethernet Ethernet

Ethernet Ethernet

Gigabit Ethernet

Gigabit Ethernet

Gigabit Ethernet

GEFabric

To Storage and

Networking

5-1 The actual situation

PASTA 2002 WORKING GROUP G 5/8/2023

13

To More Clusters Storage Areas etc.

Gigabit Ethernet

10 Gigabit Ethernet

10 Gigabit Ethernet

10 Gigabit Ethernet

Gigabit Ethernet

Gigabit Ethernet Gigabit Ethernet

10 GEFabric

10 GE toOC192/SDH48

Standard 10 Gbit Internet803.11ae POS capable

OC192/SDH348

To More Clusters Storage Areas etc.

Gigabit Ethernet

10 Gigabit Ethernet

10 Gigabit Ethernet

10 Gigabit Ethernet

Gigabit Ethernet

Gigabit Ethernet Gigabit Ethernet

10 GEFabric

10 GE toOC192/SDH48

Standard 10 Gbit Internet803.11ae POS capable

OC192/SDH348

5-2 More Powerful Machines need better Networking.All levels are upgraded

To More Clusters Storage Areas etc.10 Gigabit Ethernet

orINFINIBAND III

10 Gigabit Ethernet

10 Gigabit Ethernetor

INFINIBAND III

ETHERNET

Gigabit Ethernet

INFINIBAND I

INFINIBAND II

INFINIBAND II

INFINIBAND III

10 GEFabric

10 GE toOC192/SDH48

Standard 10 Gbit Internet803.11ae POS capable

OC192/SDH348

To More Clusters Storage Areas etc.10 Gigabit Ethernet

orINFINIBAND III

10 Gigabit Ethernet

10 Gigabit Ethernetor

INFINIBAND III

ETHERNET

Gigabit Ethernet

INFINIBAND I

INFINIBAND II

INFINIBAND II

INFINIBAND III

10 GEFabric

10 GE toOC192/SDH48

Standard 10 Gbit Internet803.11ae POS capable

OC192/SDH348

5-3 Mixed Solution including Infiniband

PASTA 2002 WORKING GROUP G 5/8/2023

1999 predictions and what to expect for the future

6.01 the distinction between single-box implementations and spatially extended implementations is becoming blurred

Correct: but there is a new tendency to invert this development with the arrival of Blade systems and Compact Systems, so we will see single box implementations and modular systems. The division will be application oriented; shared memory, message oriented or individual cluster machines. Most probably all three will develop further.

6.02 since 1996, we have passed from early Gigabit/sec network technology into the Gigabytes/sec era and beyond,

Partly correct: Even if Gigabyte/s systems are operational now they are not widespread and are still expensive. The main applications are found in high-end systems such as ASCI 1 and ASCI 3. Cheap Commodity Gigabyte networks and interconnects have still to come.There will be a number of developments for hardware and software or a combination of the two that will allow to make high speed networking equal to a DMA transfer.

6.03 the emergence of the "fully functional PC", by which we mean the arrival of commodity hardware configurations with CPU, I/O and memory performance rivaling that offered by the RISC workstation in terms of price/ performance

Correct but it is questionable if this development continues as the market gets saturated and PCs get overpowered for normal office work.

6.04 the explosive growth of "Open Source" or "Community Source" software, middleware and hardware

14

INFINIBAND Integrated

In CHIPSET

3GIO as Internal Bus

( INTEL )

HYPERTRANSPORT has a small crossbar switch integrated in the CHIPSETINFINIBAND is Bridged from HyperTransport ( AMD )

10 Gigabit Ethernet10 GEFabric

UP TO 8 PROSESSORS

Infiniband Infiniband

Infiniband

Infiniband

AthlonHammer

AthlonHammer

INFINIBAND Integrated

In CHIPSET

3GIO as Internal Bus

( INTEL )

HYPERTRANSPORT has a small crossbar switch integrated in the CHIPSETINFINIBAND is Bridged from HyperTransport ( AMD )

10 Gigabit Ethernet10 GEFabric

UP TO 8 PROSESSORS

Infiniband Infiniband

Infiniband

Infiniband

AthlonHammer

AthlonHammer

5-4 Use of the Native Channels ( Infiniband and HyperTransport )

PASTA 2002 WORKING GROUP G 5/8/2023

This is correct and will continue but a tendency can be seen to have proprietary software inside open source material

6.05 The hardware includes buses, switches and networks, together with low-level connectivity standards (I2O, NGIO, Future IO, etc).

Wrong: None of these standards has been successful. PCI is still the higher performance I/O, and this will continue with PCI-X and PCI-X2

6.06 SCSI predicted to go to 320 MByte/s in 2001 and 640 MByte/s in 2003

Wrong. SCSI never reached this speeds, but is now predicting them with Serial-SCSI.

6.07 GSN is of definite interest as a cluster interconnect:

Correct but only for high-end systems that have the necessary internal bandwidth. This may change with PCI-X systems. This will also be valid for other 10 Gbit/s network connections.

6.08 Gigabit Ethernet itself will become the default high-speed commodity LAN technology

This is Correct and a tendency that will continue with the arrival of 10/100/1000 Base T chips

6.09 The VIA initiative seems not to be succeeding

Correct:

6.09 PC based clusters, running Linux and/or Windows 2000, will likely dominate the field for low cost systems by the early 2000’s.

Wrong: LINUX dominates the field and as seen by the price of Microsoft licenses, this will continue to be the case.

6.10 Individual "cheap" cluster nodes will still contain a small number of CPUs (2-4), as today. They will be interconnected by commodity LAN technology such as 100/1000 BaseT, with appropriate protocol support (such as ST at low level and GFS at higher level) to handle shared I/O requirements.

Partly correct: Only 2 CPU systems are in the commodity class in terms of price. Only 100 BaseT is used as large scale interconnects. ST applied as it is on Ethernet has a data integrity problem. And GFS, even though it progressed enormously is not applied at CERN

6.11 This should produce advances in the state of the art of building and running very large scalable clusters with "industrial" quality levels but "mass market" pricing.

Yes and no: The systems are there, but the pricing is special. The split in consumer market and servers will extend further.

6.12 Within 5 years, we may see large distributed clusters built from a collection of "moderately" sized and priced nodes interconnected with standardized technologies and running either Windows 2000 or (more likely) open source operating systems containing quite sophisticated system management code from specialized companies. These will provide cheaper and more open alternatives to the large proprietary SMP cluster systems available today from companies like SGI or Sun, but will still be

15

PASTA 2002 WORKING GROUP G 5/8/2023

relatively expensive packages compared with the "low cost" clusters considered above.

Almost Correct: What is missing is a generally used and standardized technology: Windows or LINUX that runs a cluster without the necessary adaptations.Both Operating systems are making an effort to be a standard for cluster computing. Windows and UNIX may continue to gain mainly on the commercial market with outsourced maintenance and LINUX will continue to develop at research and scientific centers.

Exotic Architectures

7.1 Quantum computing

No sensational breakthrough but slow advances. LANL has demonstrated simple quantum processors capable of calculating only one algorithm.

7.2 Chemical computing

Chemical computing has extended itself with one more category, the older resin oriented semiconductors and biochemical computing, and the new class of organic components.

Chemical computers have been made more attractive as methods are worked out to use modified commercial ink-jet printers to make circuits on almost any material. It has brought advances in TFT display screens, and some very special applications printed on steel, but are rarely used for logic designs.

Computing with ADN molecules using trinity logic advances slowly. Prof Shapiro at Weizmann Institute demonstrated a molecular computer that solves the traveling salesman equation better and faster than a silicon machine. A parameter of 109

operations/sec is given.

A new form of chemical computing uses organic materials. The original design starts with optical components and small display units called OLED’s ( Organic Light Emitting Device ). OLED devices are printed with modified ink-jet printers on nearly any base material. Research programs are started to look for possibilities to build printable electro optical logic devices that direct to optical computing. More sophisticated production methods are under development, mainly for quantity production and higher precision. .

7.3 Neural computingNeural computing can be split in several directions.The replication of neural networks with electronic components to always larger systems. A field where continuous research is going on. Most neural systems are relatively slow in logic but compensate this by faster decision making.An other direction is to grow living neurons on gold wires or on silicon chips and create in this way creative thinking machines.

7.4 Optical computing

Optical computers are already around as prototypes for a long time. However no one has ever arrived at a production system. Promising technologies turn up from time to

16

PASTA 2002 WORKING GROUP G 5/8/2023

time in the storage field such as DVD type of optical disks that store a TByte/disk using holographic storage.

7.5 Nano Technology for logic and mechanics

Silicon “nano-technology” is mainly a kind of mechanical design using silicon technology. It has a large application in sensor technology and is used in computing used to make very safe locks. IBM announced the millipede technology that stores 3 Terabit on a single chip, using an electro quantum mechanical principal to punch data in an organic layer. The underlying Silicon can read the data. The storage is non volatile and can be erased or rewritten in the same manner.Production foreseen in 2004

Much activity is going on in the field of “nano-tubes”. Using specific characteristics of the carbon atom, tiny cylinders with atomic dimensions are formed and used to make basic electronics components that are much smaller than even .10 technology transistors.Several universities and industrial laboratories have demonstrated transistors and simple circuits as shift registers and memory cells in this technology.The technology is extremely fast and uses very little power. The problem for the moment is to make the tubes in a controlled order such that circuits can be made much the same way as in silicon. The developments are promising but practical applications are at least 5 to 10 years ahead.

The State University of New York at Buffalo developed a nano-technology that uses nickel atoms for storage that is published in July 1 issue of Physical Review B. If fully developed a capacity of 1 Terabit/Inch2 is mentioned.

7.6 Game Consoles

Game consoles are still going very strong and there are about as many about worldwide as PC’s. Aimed at a mass market, they are somewhere between a sophisticated toy and powerful gaming device. The costs are only a fraction of a fully equipped PC. Partly because they are simplified in functions not needed for gaming and partly because financial losses on the box are compensated by profit from the software Most of these machines are extremely specialized on sophisticated graphics executed by the main processor. An exception is the Microsoft X-Box which is a simplified PC with high-end graphics.

Some of the later machines offer Internet access, but no real networking. As such, it will be difficult to connect this kind of machines to a fully capable cluster and it has to be seen that this kind of processors are successful in a cluster environment. A third negative point is the lack of a universal O.S.

It seems an interesting idea to use game machines as cheap cluster processors, but the reduction in flexibility and the specilisation of these machines make it difficult and expensive to do universal cluster computing with them.

7.7 PDA’s

Personal electronic Agendas and handheld PCs are a very popular item. These little machines are driven by MIPS and ARM processors. Most of this machines use Windows CE or PalmOs, and some run LINUX. The price range much depending on the screen type ( Color or Black and White ) and on the number of features runs from 250 SFr. to over 1000 SFr.

Interconnects either I/O or networking are primitive and slow, and it is questionable if this machines can effectively be combined into more powerful clusters running a standard OS.

17

PASTA 2002 WORKING GROUP G 5/8/2023

It may be possible that this little machines can be used as offload engines for non critical functions, however the software effort needed will make this less interesting.

7.8 PIMs

Up to now computing engines are build from Standard Processors and Standard Memories, with the negative point of a long and relatively slow access path between the two. In other words processors with little memory on board and large memories are not able to do any computation.

A new technology called PIM for Processor in memory tries to change this by combining memory with on chip processing power.The DIVA PIM project is a chip that is build of a 8 Mbit memory combined with a 256 bit wide-word processor.

As such the total processor is scalable to very large data-grams and able to handle small local calculations extremely fast.This fits well to the type of HEP calculation as done at CERN and is a development to watch for its usefulness to future LHC computing.

Conclusions

The market develops in three different directions, the SOHO PC, Portables and Workstations, with a limited place for the High-End PC as used up to now. At the same time more competition means that for the moment more types of compatible machines are offered than ever before.

Three types of microprocessors crystallize out to be good candidates for future LHC cluster computing. It is therefore extremely important to carefully test and compare the 32 bit Intel Pentium 4, the AMD Athlon and the 32/64 bit AMD Hammer series on LINUX code efficiency and on there behavior in the CERN cluster and Grid test bed.

An important factor is also price as AMD processors can be up to 30 % less costly as the Intel counterparts. For local storage Serial ATA and DDR-Ram are obvious choices that will penetrate the market place.

For larger quantities it seems to be attractive to use one of the firms that can deliver custom built systems with components as chosen by CERN.

This gives the possibility to move without a price difference from desktop machines to rack mounted versions. With the advantages that standard functional units can be built and combined to form larger clusters, such that power handling and networking gets simplified, and control circuitry can be structured.

Blade processors and Compact processors are attractive for their capacity, are sometimes optimized to run LINUX but will stay on a more expensive level per unit.

Mass storage should move to structured network storage where the SAN technology together, with an adapted file system give, the highest flexibility.Simpler technologies as NAS and JBOD will not bring the bandwidth necessary for LHC. Using wide RAID oriented arrays give the best bandwidth figures. Bandwidth that should be available the moment LHC physics starts.

Even if many reports are published about exotic computing this new technologies are not ready to take over from Silicon, but may play a role at the long term future.

18

PASTA 2002 WORKING GROUP G 5/8/2023

Networking and Interconnects are in a multi-path development cycle. GSN will mainly be used in the supercomputer environment. Infiniband seems to find its place in the Blade and Compact computer field and some specialized applications with cluster computing. The general interconnect will remain the TCP/IP oriented Ethernet type with the extension to 10 Gigabit Ethernet for backbones using the new standard that couples 10GE to SONET/SDH for long distance communication.

9 Acknowledgement

A word of thanks has to go to the different people of the IT/ADC group who helped me with their comments to write this part of the 3rd Pasta report.

19