university of mannheim1 atoll atomic low latency – a high-perfomance, low cost san patrick r....
TRANSCRIPT
University of Mannheim 1
ATOLL
ATOm ic Lo w Latency
RATOLL
ATOmic Low Latency – ATOmic Low Latency – A high-perfomance, low cost SANA high-perfomance, low cost SAN
Patrick R. [email protected]
Computer Architecture GroupUniversity of Mannheim, Germany
University of Mannheim 2
ATOLL
ATOm ic Lo w Latency
R
Cluster Computing
• Cluster Computing evolves as a new way of High Performance Computing as result of its superior price/performance ratio
• the key to Cluster Computing is a SAN delivering the communication performance normally found in Supercomputers
• several SANs have been developed in the last years:
ServerNetMemory Channel
QsNetSCI ATOLL
ATOm ic Lo w Latency
R
University of Mannheim 3
ATOLL
ATOm ic Lo w Latency
R
ATOLL Basic Architecture
ATOLL-ChipATOLL-Chip
4,5 Mio transistors
0.18µm CMOS process
5,7 x 5,7 mm Chip
PC I-XInte rf ac e
64 bit/13 3 M H z
Ho st Port 0
4x 4
Ful l-d uplex
Xba r
Link0
Link1
Link2
Link3
ATO LL -L ink
PC I- X -Bu s
ATOLL Top-Level Block Diagram
Ho st Port 1
Ho st Port 2
Ho st Port 3
Fastest and Second Biggest Design of a European University
University of Mannheim 4
ATOLL
ATOm ic Lo w Latency
ROptimization for Performance and Cost
Costs in Percent
PCB
Connector HDRA
Discrete Comp.
div. Chips
ATOLL chip
Link Cable
Package (BGA)
Mechanics
Soldering
Test
Components %PCB 1,44Connector HDRA 2,65Discrete Comp. 1,00div. Chips 3,52ATOLL chip 38,30Package (BGA) 4,30Link Cable 38,29Mechanics 1,56Soldering 2,55Test 6,38
University of Mannheim 6
ATOLL
ATOm ic Lo w Latency
RATOLL Performance
PC I-XInte rf ac e
64 bit/13 3 M H z
Ho st Port 0
4x 4
Ful l-d uplex
Xba r
Link0
Link1
Link2
Link3
ATO LL -L ink
PC I- X -Bu s
ATOLL Top-Level Block Diagram
Ho st Port 1
Ho st Port 2
Ho st Port 3
DMA-Mode Test Test system: P3-1000 (Serverworks)PCI 66/64bit ATOLL@245MHz
SWsendSWsend
SWreceiveSW
receive
4µs 3,8µs1,2µs
Not fully optimized yetNot fully optimized yet
533MB/s write burst rate
137MB/s read burst rate (bridge problem w. stop)
240 Byte Message
Sum 9µs
University of Mannheim 7
ATOLL
ATOm ic Lo w Latency
RATOLL Performance
A module has been developed in collaboration with the Universityof Mannheim to evaluate their ATOLL network cards. Thisexperimental hardware delivers the best performance for messagessmaller than 10 kB, and matches the 2 Gbps throughput seen withmany proprietary solutions like SCI and Myrinet.
University of Mannheim 8
ATOLL
ATOm ic Lo w Latency
R
ATOLL-Software
User Application
MPI PVMTCP/IP
Kernel Driver
ATOLL HW
ATOLL API ATOLLdaemon
•Controls Network Startup (clock distribution, routing)•Supervises NIC at runtime•Provides routing information
OpenSourceSW
University of Mannheim 9
ATOLL
ATOm ic Lo w Latency
R
Future Development
Future of ATOLL Hardware-DevelopmentFuture of ATOLL Hardware-Development
• EXTOLL• 500 - 1000 MHz clock• higher dimensional Crossbar for multidimensional IN structures• multithreaded cached host interface• memory management support • command extension for direct memory operations (put, get, …) => MPI-2
University of Mannheim 10
ATOLL
ATOm ic Lo w Latency
RChip Photo
University of Mannheim 11
ATOLL
ATOm ic Lo w Latency
RChip Photo
University of Mannheim 12
ATOLL
ATOm ic Lo w Latency
RATOLL Board
University of Mannheim 13
ATOLL
ATOm ic Lo w Latency
RInterconnect