stephen blair-chappell technical consulting engineer intel

44
Intel Intel Intel Intel® ® ® Cluster Tools Cluster Tools Cluster Tools Cluster Tools Stephen Blair-Chappell Technical Consulting Engineer Intel Compiler Labs

Upload: others

Post on 12-Sep-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stephen Blair-Chappell Technical Consulting Engineer Intel

IntelIntelIntelIntel®®®® Cluster ToolsCluster ToolsCluster ToolsCluster Tools

Stephen Blair-ChappellTechnical Consulting Engineer

Intel Compiler Labs

Page 2: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/20102

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

AgendaAgendaAgendaAgenda

� Introduction

� Intel® Software Development Products overview

� Cluster Toolkit and components

Page 3: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/20103

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

What Are the Biggest Bottlenecks Today What Are the Biggest Bottlenecks Today What Are the Biggest Bottlenecks Today What Are the Biggest Bottlenecks Today in Creating Parallel Applications?in Creating Parallel Applications?in Creating Parallel Applications?in Creating Parallel Applications?

Source: Developing Custom Parallel Computing Applications, Simon Management Group, September 2006

Page 4: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/20104

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Cluster Market Rapidly GrowingCluster Market Rapidly GrowingCluster Market Rapidly GrowingCluster Market Rapidly Growing

Source: *IDC HPC Technical Computing And Cluster Market Update May, 2006

Clusters are now the majority of HPC market

Why? Less expensive hardware. Easier implementation.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

03Q1 03Q2 03Q3 03Q4 04Q1 04Q2 04Q3 04Q4 05Q1 05Q2 05Q3 05Q4 06Q1 06Q2

Cluster Market Penetration

Clusters

Non-Clustered

Page 5: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/20105

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Definition of ClustersDefinition of ClustersDefinition of ClustersDefinition of Clusters

� Distributed computing systems which communicate with each other over an interconnect

� Examples of interconnect:

– Gigabit Ethernet

– InfiniBand*

– Myrinet*

– Quadrics*

Page 6: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/20106

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

New releases of IntelNew releases of IntelNew releases of IntelNew releases of Intel®®®® Cluster ToolsCluster ToolsCluster ToolsCluster ToolsIntel software tools make clusters easier to program and optimizIntel software tools make clusters easier to program and optimizIntel software tools make clusters easier to program and optimizIntel software tools make clusters easier to program and optimizeeee

– IntelIntelIntelIntel®®®® Cluster Toolkit 3.0Cluster Toolkit 3.0Cluster Toolkit 3.0Cluster Toolkit 3.0

• Bundle with single installer and license

– Intel® MPI Library 3.0

•• Automated fabric selection and performance Automated fabric selection and performance

optimizationsoptimizations

– Intel® Trace Analyzer and Collector 7.0

•• Trace file comparison and analyzing the Trace file comparison and analyzing the

effects on MPI performance of code changes.effects on MPI performance of code changes.

– Intel® Math Kernel Library 9.0 Cluster Edition

•• Optimizations for the latest dual and quad Optimizations for the latest dual and quad

core processorscore processors

– Cluster OpenMP* for Intel® C++ and Fortran compilers

• The first commercially available OpenMP for clusters

• Licensing and pricing for use by wider range of cluster users

Page 7: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/20107

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

IntelIntelIntelIntel®®®® MPI Library 3.0MPI Library 3.0MPI Library 3.0MPI Library 3.0A high performance universal MPI solution A high performance universal MPI solution A high performance universal MPI solution A high performance universal MPI solution enabling applications to run across multiple enabling applications to run across multiple enabling applications to run across multiple enabling applications to run across multiple network fabricsnetwork fabricsnetwork fabricsnetwork fabrics

� Features

– Easy to install and configure

– Save development resources and improve application quality

– Job scheduler support: PBS Pro*, Torque*, LSF*, etc.

– Debugger support: IDB, DDT*, gdb, TotalView*

– Based on the widely used ANL MPICH2

� What’s New

– Automated fabric selection

– Enhanced process pinning

– Performance optimizations and tuning options

– Full thread support (MPI_THREAD_MULTIPLE)

RIKENIntel’s MPI and Cluster Tools provide us the best cluster development environment.”

Dr. Takahiro Koichi

Computational Astro Physics Laboratory

RIKEN, Japan

Page 8: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/20108

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

IntelIntelIntelIntel®®®® Trace Analyzer and Collector 7.0Trace Analyzer and Collector 7.0Trace Analyzer and Collector 7.0Trace Analyzer and Collector 7.0The worldThe worldThe worldThe world’’’’s best analysis tool for MPI applicationss best analysis tool for MPI applicationss best analysis tool for MPI applicationss best analysis tool for MPI applications

� Features

– Increase productivity and cluster application performance

– Very low impact

– Excellent scalability on time and processors

– GUI on Linux* and Windows*

� What’s New

– Comparison of multiple trace files

– Timeline display for performance counters

– Powerful new aggregation and filtering functions

– Better and faster GUI

– MPI Checking - correctness checking library

EM Software SystemsEM Software SystemsEM Software SystemsEM Software SystemsIntel Trace Analyzer and Collector have proven to be very valuable tools to help understand FEKO parallel communication patterns and consequently in optimizing the message passing call that result in an extremely well performing electromagnetic ISV cluster application

Dr. Ing. Ulrich Jakobus, Technical Director

Page 9: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/20109

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

MPI Message Checking Case StudyMPI Message Checking Case StudyMPI Message Checking Case StudyMPI Message Checking Case StudyLSTC LSLSTC LSLSTC LSLSTC LS----DYNA* Transient Finite Element Analysis DYNA* Transient Finite Element Analysis DYNA* Transient Finite Element Analysis DYNA* Transient Finite Element Analysis ApplicationApplicationApplicationApplication

Images and Logo copyright Livermore

Software Technology Corporation

"At LSTC we know how difficult MPI programming can be and invest considerable effort into making LS-Dyna robust. Message Checking with Intel Trace Analyzer and Collector identified a very subtle issue before it became a problem, saving us a significant amount of potential future debugging. No other tool No other tool No other tool No other tool of which I am aware has this capability or could have of which I am aware has this capability or could have of which I am aware has this capability or could have of which I am aware has this capability or could have detected this problem."detected this problem."detected this problem."detected this problem."

Brian Wainscott, Developer, LSTC/LS-Dyna*

Trace Analyzer and Collector helped LSTC to debug MPI code and create optimized code.

Solution

LSTC LS-DYNA* is a general-purpose transient dynamic finite element program capable of simulating complex real world problems in Automobile Design, Aerospace, Manufacturing, and Bioengineering.

Overview

Page 10: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201010

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

IntelIntelIntelIntel®®®® Math Kernel Library Math Kernel Library Math Kernel Library Math Kernel Library Cluster Edition 9.0Cluster Edition 9.0Cluster Edition 9.0Cluster Edition 9.0A highly optimized math library for desire A highly optimized math library for desire A highly optimized math library for desire A highly optimized math library for desire maximum performancemaximum performancemaximum performancemaximum performance

� What’s New

– Optimizations for the new multi-coreIntel® Xeon® 5100 and 5300 series processors

– New VML Functions

• floor, ceil, round, trunc, hypot, etc.

– New FMGRES iterative sparse solver

– FFTW Interface in Fortran & C

– New User’s Guide and Linux man pages

ABAQUS By adopting the Intel MKL DGEMM libraries, our standard timing improved between 43% and 71%, which is very impressive”

Matt Dunbar, Software Developer

Page 11: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201011

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Cluster OpenMP* for IntelCluster OpenMP* for IntelCluster OpenMP* for IntelCluster OpenMP* for Intel®®®® C++ and C++ and C++ and C++ and Fortran CompilersFortran CompilersFortran CompilersFortran CompilersINNOVATION for OpenMP:INNOVATION for OpenMP:INNOVATION for OpenMP:INNOVATION for OpenMP:The first commercially available OpenMP for The first commercially available OpenMP for The first commercially available OpenMP for The first commercially available OpenMP for ClustersClustersClustersClusters

RWTH Aachen UniversityRWTH Aachen UniversityRWTH Aachen UniversityRWTH Aachen University“RWTH Aachen has used OpenMP to parallelize many of our scientific applications because it is easier to use than MPI and provides comparable performance on large shared-memory machines. We are in the process of evaluating Intel's Cluster OpenMP. We believe that Cluster OpenMP will allow some of our OpenMP applications to run on clustered Intel processors at lower cost and with less effort than either rewriting in MPI or buying additional large SMP machines.“

Dieter an MeyRWTH Aachen University

Available as add-on to Intel Compilers!

� Features

– Bringing the ease of OpenMP to cluster systems

– Run (slightly modified) OpenMP code on a commodity cluster

– Exploit existing SMP OpenMP codes on cheaper clusters

– Equivalent OpenMP performance compared to SMP machine with the same number of CPUs

� Suitable Programs

– Scale with OpenMP on SMP

– Have good data locality

– Use synchronization sparingly

Page 12: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201012

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Applies to majority of OpenMP* codes easilyApplies to majority of OpenMP* codes easilyApplies to majority of OpenMP* codes easilyApplies to majority of OpenMP* codes easily

� Only one new statement “sharable” is required

– Used at the declaration (or allocation) point of variables which are shared between threads

• In many cases the compiler can deduce the need for a sharable qualification and introduce it automatically

– As with OpenMP you still have a valid serial code after porting

– As an example, internally we took the SPECOMPM benchmarks (Spec OpenMP) and “ported” them to use Cluster OpenMP.

• 9 out of 11 ported easily and showed good results in scaling. The other 2 would need non-trivial work to scale well. We think this is typical – most OpenMP will port easily and be able to harness small clusters well. With some effort, OpenMP can be used in a manner where this will work.

• only about 2% of source lines needed to be changed.

• The largest code (FMA3D, ~60,000 lines) needed no source code changes at all (a global switch on the compiler was sufficient to port all the code to clusters)

Page 13: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201013

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

IntelIntelIntelIntel® MPI LibraryMPI LibraryMPI LibraryMPI Library

� � Linux

� � Windows

� � Itanium® 2

� � Xeon™/EM64T

� � Pentium® 4

Page 14: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201014

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

WhatWhatWhatWhat is MPI ?is MPI ?is MPI ?is MPI ?

� MPI is a de facto standard for communication among the processes modeling a parallel program on a distributed memory system. Often these programs are mapped to clusters and distributed memory supercomputers -- from Wikepdia

� Features

– Explicit communication and synchronization

– Explicit distribution of data

– Collective operations

– Single sided communication

– Parallel I/O

� Bindings

– C / C++ / Fortran

Page 15: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201015

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Intel MPI Library?Intel MPI Library?Intel MPI Library?Intel MPI Library?

Customers select

interconnect at

runtime

ISVs see &

support single

interconnect

A

B

C

D

E

F

TCP/IP

Myrinet

InfiniBand

SharedMemory

Applications

Fabrics

Intel® MPI atopAbstract Fabric

IHVs create DAPL

providers and fabric

drivers

Quadrics

Othernetworks

Page 16: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201016

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Intel MPI Library 3.0 enhancementsIntel MPI Library 3.0 enhancementsIntel MPI Library 3.0 enhancementsIntel MPI Library 3.0 enhancements

� Increased Application Performance

– Fine tuning by env variables

– Faster start-up

� Optimized Collective Operations

� Improved Stability and Correctness

� Increased Interoperability

– Thread-safe libraries at the MPI_THREAD_MULTIPLE level

– Support for Etnus*, Totalview*, DDT*, and Intel debuggers

– Simplified process management by integration with leading job schedulers (LSF, PBS Pro, Torque)

� Enhanced Operating System and Compiler Support

� Improved Support of OpenFabrics stack

Page 17: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201017

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Advantages for Developers Advantages for Developers Advantages for Developers Advantages for Developers

– Reduce development and testing costs

– Increase productivity and functionality

– Simplify maintenance

Eliminate the need to develop, maintain, and test an application on various, supported

fabrics, thus saving resources and improve product quality

Page 18: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201018

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Intel Trace Analyzer and CollectorIntel Trace Analyzer and CollectorIntel Trace Analyzer and CollectorIntel Trace Analyzer and Collector

� � Linux

� Windows

� � for Trace Analyzer GUI

� � Itanium® 2

� � Xeon™/EM64T

� � Pentium® 4

Page 19: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201019

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Trace UniverseTrace UniverseTrace UniverseTrace Universe

Intel TraceIntel Trace

CollectorCollector TracefileTracefileIntel TraceIntel Trace

AnalyzerAnalyzer

ApplicationApplication

Page 20: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201020

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Components and InteractionComponents and InteractionComponents and InteractionComponents and Interaction

TracesSTF

Intel® Trace Collector Lib

API

Intel® Trace Collector Lib

itcinstrument instrument

Executable

Application

Compiler

Linker

Instrumented

Executable

Page 21: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201021

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

IntelIntelIntelIntel®®®® Trace CollectorTrace CollectorTrace CollectorTrace CollectorOverviewOverviewOverviewOverview

– Event based approach

• Event = time stamp + thread ID + description

• Function entry/exit

• Messages

• Collective operations

• Counter samples

– Low impact on application performance

– Provides API to instrument user code

– Trace optimized program runs

– Analyzes communication layer (default)

Page 22: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201022

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Key FeaturesKey FeaturesKey FeaturesKey Features

� Catch all MPI events

� Strong configuration mechanism

– Filters, settings, features

� Automatic source-code references

� Instrumentation

– Rich API

– Binary instrumentation (itcinstrument)

– Compiler based (beta)

� Fail-safe version

� Comparison feature

Page 23: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201023

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Intel Trace Analyzer OverviewIntel Trace Analyzer OverviewIntel Trace Analyzer OverviewIntel Trace Analyzer Overview

� Enables the user to quickly focus at the appropriate level of detail to find performance hotspots and bottlenecks.

� Use of hierarchical displays to address scalability in time and processor–space

� High–performance graphics, excellent zooming and filtering

� Windows version of the Graphical User Interface

Page 24: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201024

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

ChartChartChartChart

A Chart is a numerical or graphical diagram

Page 25: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201025

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Timelines: Event Timeline� Get impression of program structure

� Display functions, messages and collective operations for each process/thread along time-axis

� Retrieval of detailed event information

Page 26: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201026

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Timelines: Qualitative Timeline

� Find patterns and irregularities

� Display attributes of functions, messages or collective operations as they occur for any process/thread

� Retrieval of detailed event information

Page 27: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201027

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Timelines: Quantitative Timeline

� Get impression on parallelism and load balance

� Show for every function how many threads/processes are currently executing it

Page 28: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201028

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Profiles: Flat Function ProfileProfiles: Flat Function ProfileProfiles: Flat Function ProfileProfiles: Flat Function Profile

� Statistics about functions

Page 29: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201029

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

� Function statistics

� including calling hierarchy

– Tree: call-stack

– Graph: calling dependencies

Profiles: CallProfiles: CallProfiles: CallProfiles: Call----Tree and CallTree and CallTree and CallTree and Call----GraphGraphGraphGraph

Page 30: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201030

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Communication ProfilesCommunication ProfilesCommunication ProfilesCommunication Profiles

� Statistics about point-to-point or collective communication

� Generic matrix supports grouping by several attributes in each dimensionSender, Receiver, Data volume per msg, Tag, Communicator, Type

� Available attributesCount, Bytes transferred, Time, Transfer rate

Page 31: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201031

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

ViewViewViewView

� Helps navigating through the trace data and keep orientation

� Every View can contain several Charts

� A View on a file is defined by a triplet of

– time-span

– set of threads

– set of functions

� All Charts follow changes to View (e.g. zooming)

� Timelines are correctly aligned along time

Page 32: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201032

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

View View View View ---- zoomingzoomingzoomingzooming

Page 33: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201033

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Flexibility of ViewsFlexibility of ViewsFlexibility of ViewsFlexibility of Views

� Several Views can be opened (on the same or on different files)

� Location, orientation and size of charts can easily be changed

� Entire Views can and individual charts can be cloned and closed

� Individual charts can be cloned in own View

Page 34: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201034

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Understanding your codeUnderstanding your codeUnderstanding your codeUnderstanding your code

� Parallel Poisson

� Example of intuitive parallelization with disadvantageous communication pattern

Page 35: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201035

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Understanding the problemUnderstanding the problemUnderstanding the problemUnderstanding the problem

� Blocking border exchange

� Pn has blocks until communication between Pn+1 and Pn+2 was completed

� Solution: Non blocking communicationSolution: Non blocking communicationSolution: Non blocking communicationSolution: Non blocking communication

Page 36: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201036

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Detecting Load ImbalanceDetecting Load ImbalanceDetecting Load ImbalanceDetecting Load Imbalance

� Mandelbrot set (MPI-tutorial)

– mpitutorial.tar.gz

� Example of intuitive parallelization with huge load imbalance

Page 37: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201037

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Understanding Load ImbalanceUnderstanding Load ImbalanceUnderstanding Load ImbalanceUnderstanding Load Imbalance

Page 38: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201038

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

IntelIntelIntelIntel®®®® Trace Analyzer and Collector 7.0 Trace Analyzer and Collector 7.0 Trace Analyzer and Collector 7.0 Trace Analyzer and Collector 7.0 New Feature: Comparison of two program runsNew Feature: Comparison of two program runsNew Feature: Comparison of two program runsNew Feature: Comparison of two program runs

Timeline ofinitial

application run

Comparison of function and process profile

data

Network usage profile data for MPI messages

Shorter RED barsmeans less MPI traffic and increased performance

Timeline ofoptimized

application run

Works on systems from 2 processes to more that a thousand processes

Page 39: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201039

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Intel Message Checking with Intel Trace Analyzer and CollectorIntel Message Checking with Intel Trace Analyzer and CollectorIntel Message Checking with Intel Trace Analyzer and CollectorIntel Message Checking with Intel Trace Analyzer and Collector

� A novel MPI correctness tool– Detects errors with data types, buffers, communicators, point-to-point & collective ops, deadlocks and hangs.

� Online-based

– MPI correctness checking using IMC library for Intel® Trace Collector– All error-checking done at runtime

� Offline analysis

– Interactive debugging using a traditional debugger– Text error output for analysis– Automates detection of errors

� Platforms

– Intel MPI Library on Linux*, IA32, Intel® EM64T, IPF

Page 40: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201040

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

IntelIntelIntelIntel® Math Kernel Library Cluster EditionMath Kernel Library Cluster EditionMath Kernel Library Cluster EditionMath Kernel Library Cluster Edition

� � Linux

� � Windows

� � Itanium® 2

� � Xeon™/EM64T

� � Pentium® 4

Page 41: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201041

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

IntelIntelIntelIntel® Math Kernel Library Cluster EditionMath Kernel Library Cluster EditionMath Kernel Library Cluster EditionMath Kernel Library Cluster Edition

� All the functionality of Intel® MKL plus …

ScaLAPACK

– ScaLAPACK is for solving dense linear systems and computing eigenvalues for dense matrices

– Optimized version for Intel ® processors

Page 42: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201042

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

ScaLAPACKScaLAPACKScaLAPACKScaLAPACK

� “Scalable LAPACK” or LAPACK for distributed memory computer systems

� The standard for Linear Algebra problem solutions for clusters

� Netlib*

– Standard publicly available implementation of ScaLAPACK

� Performance (PDGETRF function)

– Intel Cluster MKL significantly outperforms Netlib* implementation• >20% faster for block sizes of 64-

128• >50% faster for block sizes 256 or

greater

– Intel Cluster MKL is much less sensitive to block size differences• Intel Cluster MKL performs well on a

wide range of block sizes

Configuration Info:• Cluster of four 4-way Intel Itanium® 2, 1.4 GHz, 16 GB memory

• Red Hat Linux* Advanced Server release 2.1AS

Linear AlgebraLinear AlgebraLinear AlgebraLinear Algebra

Page 43: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201043

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

IntelIntelIntelIntel® MPI Benchmarks MPI Benchmarks MPI Benchmarks MPI Benchmarks ((((open source))))

– Successor of the well-known Pallas MPI benchmarks (PMB)

– Comprehensive set of MPI kernels that provide performance measurements for:

• Point-to-point message-passing

• Global data movement and computation routines

• One-sided communications

• File I/O

– Intel® MPI Benchmarks helps to compare the performance of various computing platforms, MPI implementations, and interconnection fabrics

Page 44: Stephen Blair-Chappell Technical Consulting Engineer Intel

1/11/201044

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Conclusion and Next stepsConclusion and Next stepsConclusion and Next stepsConclusion and Next steps

� Intel® Software Development tools help make software faster and developers more productive

– Gain competitive advantage

– Reduce development and deployment investment

– Increase productivity with profiling tools and libraries

Learn more and download evals at: www.intel.com/software/products