significantly reducing mpi intercommunication latency and …pmat/research/hipeac.pdf · motivation...

Pavlos M. Mattheakis Ioannis Papaefstathiou SIGNIFICANTLY REDUCING MPI INTERCOMMUNICATION LATENCY AND POWER OVERHEAD IN BOTH EMBEDDED AND HPC SYSTEMS

Upload: others

Post on 14-Oct-2020

0 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Pavlos M. Mattheakis

Ioannis Papaefstathiou

SIGNIFICANTLY REDUCING MPI INTERCOMMUNICATION LATENCY AND POWER OVERHEAD IN BOTH EMBEDDED AND HPC SYSTEMS

Motivation • Number Of Nodes in HPC Systems

Increases Exponentially • IBM 2012

• Asynchronous MPI Messages increase Linearly with the Nodes. • Rainer Keller et al. : "Characteristics of the

Unexpected Message Queue of MPI Applications“, EuroMPI 2010

• MPI Protocol Implementations do not scale • Keith D. Underwood et al. "The Impact of MPI

Queue Usage on Message Latency", ICPP 2004

Page 3: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Background: MPI Asynchronous Commands

3 2 1

Scenario: A node buffers N messages in UMQ. Then executes N receives crossing the whole UMQ each time:

• N(N+1)/2 MPI Message Comparisons

Page 4: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Previous Work • Offloaded MPI Stack on the NIC Processor • Myricom Lanai Z8ES, 2009

• Quadrics Elan4, 2002

• List Manager Accelerator • Accelerating List Management for MPI, CLUSTER 2005

• An MPI Implementation for Multiple Processors Across Multiple FPGAs, FPL 2006

• TCAM-Based Accelerator • A Hardware Acceleration Unit for MPI Queue Processing, IPDPS 2005

• Microcoded Architecture • An architecture to perform NIC based MPI matching, CLUSTER 2007

Page 5: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Contributions

•Novel Scheme for Processing Asynchronous MPI Messages

•Profiled with real world applications

• Implementation in a state-of-the-art CMOS technology

Page 6: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Novel Scheme

1. Hash the posted receives according to the sender

2. Use an order list for messages with wildcards

3. Keep in SRAM the message fields accessed during search

Page 7: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Profiling with real world applications

• Imperas OVPsim Software Used

•OR1K Processors Running a lightweight MPI implementation

•MPI Accelerator modeled in C

Page 8: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Simulation Accuracy

•Average Search Depth of PRQ for the IS benchmark used to evaluate the accuracy of simulation

Page 9: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Implementation Flow

Untimed Functional C

Synthesizable SystemC

Verilog HDL

Netlist

Module Frequency Area (um2) Power (mW)

Accelerator 1GHz 95055 39.08

Page 10: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Results – Brightwell’s Benchmark

• Benchmark introduced in “Implications of application usage characteristics for collective communication offload”, IJHPC 2006.

• Send N messages from Node 1 to Node 0 with tags [0, N-1]. Post N receives at Node 0 with tags [0, N-1].

• Measure the time needed to unload the UMQ at Node 0.

Page 11: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Results – NASPB Suite

• Queue processing speedup for MPI processor with hashing scales with the number of processors

• Requires scaling the number of buckets as well

Page 12: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

Conclusions and Future Work

• Novel scheme for reducing the processing time in MPI queues

• Simulated in a Multiprocessor Environment

• Synthesized in a state-of-the-art CMOS technology

• Profiled on real world applications

• Future Directions

• Further measurements

• Not only Benchmark-Style Applications

• Integration of more MPI tasks

Page 13: Significantly reducing MPI intercommunication latency and …pmat/research/HIPEAC.pdf · Motivation •Number Of Nodes in HPC Systems Increases Exponentially •IBM 2012 •Asynchronous

• Special Thanks to

• Dimitrios S. Nikolopoulos, Professor - Queen’s University of Belfast

• Nikos Taburatzis, Msc Student - Technical University of Crete

Questions?

MTAT.03.244 Software Economics Workshop 3: Software … · Software Economics Workshop 3: Software Cost Estimation ... Most recent version COCOMO II.2000 ... (PMAT) – SEI CMM

SPecTracool indoor/ouTdoor - alliedelec.com · • t ecoel mmuniconesquii pmat ent ... gaskets and user manual furnished with the ... Visit to download 2D and 3D caD

EU Regulation for Intercommunication And Roaming Lasse Rautopuro Helsinki University of Technology 10.11.2004

rMPI: Message Passing on Multicore Processors with On-Chip ...people.csail.mit.edu/jim/temp/hipeac.pdf · and validation vehicle, although the ﬁndings presented are applicable to

Trehalose-Based Block Copolycations Promote … · Trehalose-Based Block Copolycations Promote Polyplex Stabilization for Lyophilization and ... polymerization to yield pMAT 43.

suppliers.pdf · erization solutions, IT and Communication equipment, Follow-up and recording systems wi th cameras, Intercommunication systems, Security and Alarm systems, Systems

The ERATO Systems Biology Workbench Project: A Simplified Framework for Application Intercommunication Michael Hucka, Andrew Finney, Herbert Sauro, Hamid

Rated No. 16management.pacific-university.ac.in/Download/PMAT...With Best Placement Record of Two Job Offers per student from 150 + Corporate Recruiters • MBA International (Global

Molecular Intercommunication between the … Journal of Immunology Molecular Intercommunication between the Complement and Coagulation Systems Umme Amara,* Michael A. Flierl,† Daniel

PMAT Broucher Jan 28management.pacific-university.ac.in/Download/PMAT Broucher_Jan_… · With Best Placement Record of Two Job Offers per passing out student from 150 + Corporate

Built Pedagogy Universities and their potential to transform not … · 2020. 6. 19. · trends for interdisciplinary biomedical research, encouraging intercommunication and providing

NFS - Todo para la industria aeronáutica · (PMAT 2000TM) ACARS CMU Broadband SATCOM AirLANTM WiFi/Cellular TWLU Wireless GroundLink

[email protected] w/ Johnathan Alsop, Rakesh Komuravelli ...rsim.cs.illinois.edu/Talks/17-hipeac.pdf•1988 to 1989: What is a memory model? –What value can a read return? • 1990s:

PPR ABJ Conf - PMAT (NLQ - V4) · • Objective of the PMAT To qualify countries at the appropriate stage along the step‐wise approach for the control and eradication of PPR To

Pressure Management Assessment Tool (PMAT ......PMAT Content Causative Factors Pressure Ulcer History Physical Status Positioning & Support Surfaces Repositioning Strategies Mobility,

Users Guide PMAT 2000®

Sealed encloSure cooling Air Conditioners · • ecoeTl mmuniconesquii pmat ent ... • Mounting hardware, gaskets and user manual furnished with the ... 2000 3000 4000 5000 6000

Intercommunication Between Two MyPBX(VoIP Trunking)

Sound, intercommunication and home automation - · PDF fileSound, intercommunication and home automation Technology for your life cód. 006215 • DISEÑO HEADQUARTERS Electroacústica

EMERGENCY WARNING and INTERCOMMUNICATION SYSTEM (EWIS) · PDF filesimpex 4100 ewis operator manual document no.: 4100-m010a may 1999 issue 3.1 type 4100 emergency warning and intercommunication

FreeNET - manuale 32 pagg - Amazon Web Servicesvivaldigroup.s3.amazonaws.com/uploads/attachments/56fcd8... · 2017-09-07 · audio controller, it becomes an intercommunication system,

SPecTracool indoor/ouTdoor - static.graybar.com · • ecoeTl mmuniconesquii pmat ent ... gaskets and user manual furnished with the ... Visit to download 2D and 3D CAD

CICS for iSeries Intercommunication V5 - IBM€¦ · Conversation initiation ... viii CICS for iSeries Intercommunication V5, ™ ™ ™ CICS for iSeries Intercommunication. CICS

Mitosis (PMAT/IPMAT)raghav/pdfs/bio121/Mitosis.pdf · 2014-11-03 · Mitosis (PMAT/IPMAT) The Cell cycle . Onion root tip after Aceto-Orcein stain . Stages of Mitosis in Onion Root

SPecTracool indoor/ouTdoor - static.graybar.com · • ecoeTl mmuniconesquii pmat ent ... • Mounting hardware, gaskets and user manual furnished with the ... 2000 4000 6000 8000

Warm Up Use the following words in 2-3 sentences to demonstrate what you remember about Mitosis: division, chromosomes, copying, cells, PMAT

- GitHub Pagesriso.github.io/assets/hipeac.pdf · global information sharing increasingly empowers individuals to influence the course of history. Whereas in Poland, the support of

IoT Architecture to Enable Intercommunication … Architecture to Enable Intercommunication Through REST API and UPnP Using IP, ZigBee and Arduino Hiro Gabriel Cerqueira Ferreira,

WebSphere MQ Intercommunication - synergy.psi-nj.comsynergy.psi-nj.com/psiweb/MQSeries Documentation Guide_files... · z/OS without CICS.....73 z/OS using CICS .....73 Windows

PMAT 2000 Data - AvionTEq · Teledyne's Portable Maintenance Access Terminal 2000 (PMAT 2000®) is a rugged flightline maintenance support tool that performs the following functions:

Platform Management Component Intercommunication (PMCI) Architecture ...€¦ · 3 Date: 2019-09-25 4 Version: 2.0.0 5 Platform Management Component 6 Intercommunication (PMCI) Architecture

Wireless Communications - Musson Theatrical€¦ · Wireless Communications PRODUCT BROCHURE FreeSpeak II ® DX Series Wireless IFB WBS Series. An intercom (intercommunication system)

PORTABLE MAINTENANCE ACCESS TERMINAL 2000dacint.com/wp-content/uploads/2015/08/PMAT-2000-Brochure.pdf · PMAT 2000 PORTABLE MAINTENANCE ACCESS TERMINAL 2000 The Complete Electronic

Amphenol Interconnection Products - Mercado Ideal INTERCONNECTIO… · Amphenol Interconnection Products Today’s Broadest Range of Interconnection ... PMAT (ARINC 644), Aquacon,

E-001 · electrical or mechanical ... intercommunication system amplifier - wall station - line balance intercommunication desk set ... ct current transformer ctr center