regression models - tu dresden · spec mpi npb hpcc number of applications 13 8 7 ... f90 155 20...

81
Matthias Müller ([email protected]) Center for Information Services and High Performance Computing (ZIH) Vorlesung Leistungsanalyse Parallel SPEC Benchmarks Regression Models

Upload: others

Post on 20-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

Matthias Müller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

Vorlesung Leistungsanalyse

Parallel SPEC Benchmarks

Regression Models

Page 2: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

Matthias Müller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

Parallel SPEC Benchmarks

Page 3: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

Matthias Müller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

SPEC OMP

Page 4: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

4 Holger Brunst, Matthias Müller: Leistungsanalyse

SPEC OMP

Benchmark suite developed by SPEC HPG

Benchmark suite for performance testing of shared memory processor systems

Uses OpenMP versions of SPEC CPU2000 benchmarks

SPEC OMP mixes integer and FP in one suite

OMPM is focused on 4-way to 16-way systems

OMPL is targeting 32-way and larger systems

Page 5: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

5 Holger Brunst, Matthias Müller: Leistungsanalyse

SPEC OMP Applications

Code Applications Language lines ammp Molecular Dynamics C 13500

applu CFD, partial LU Fortran 4000

apsi Air pollution Fortran 7500

art Image Recognition\

neural networks C 1300

fma3d Crash simulation Fortran 60000

gafort Genetic algorithm Fortran 1500

galgel CFD, Galerkin FE Fortran 15300

equake Earthquake modeling C 1500

mgrid Multigrid solver Fortran 500

swim Shallow water modeling Fortran 400

wupwise Quantum chromodynamics Fortran 2200

Page 6: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

6 Holger Brunst, Matthias Müller: Leistungsanalyse

CPU2000 vs OMPL2001

Page 7: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

Matthias Müller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

SPEC MPI2007

Page 8: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

8 Holger Brunst, Matthias Müller: Leistungsanalyse

An application benchmark suite that measures:

– Type of computer processor

– Number of computer processors

– Communication interconnect

– Memory architecture

– Compilers

– MPI library performance

– File system performance

Identifying Candidate Applications

– From SPEC CPU2006

– With a search for candidate call

MPI2007 design goals: benchmark for distributed memory

Page 9: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

9 Holger Brunst, Matthias Müller: Leistungsanalyse

Comparison of Different Benchmarks using MPI

SPEC MPI NPB HPCC

Number of applications

13 8 7

Language F77,F90,C,C++ F77,C C

Code size ~530.000 lines 28.000 lines 47.200 lines

#MPI calls in the code ~2400 ~400 ~600

#different MPI calls in the code

~59 ~36 ~44

Page 10: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

10 Holger Brunst, Matthias Müller: Leistungsanalyse

Application Fields

– Computation fluid dynamics

– Quantum chromodynamics

– Climate modeling

– Ray tracing

– Molecular Dynamics

– Weather prediction

– Heat transfer

– Hydrodynamics

– Flow Simulation

Page 11: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

11 Holger Brunst, Matthias Müller: Leistungsanalyse

MPI2007 Benchmark Goals

–Runs on Clusters or SMP s

–Validates for correctness and measures performance

–Supports 32-bit or 64-bit OS/ABI.

–Consists of applications drawn from National Labs and University research centers

–Supports a broad range of MPI implementations and Operating systems including Windows, Linux, Proprietary Unix

–Has a runtime of ~1 hour per benchmark test at 16 ranks using GigE with 1 GB memory footprint per rank

–Scales to 128 ranks

–Is extensible to future large and extreme data sets planned to cover larger number of ranks.

Page 12: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

12 Holger Brunst, Matthias Müller: Leistungsanalyse

MPI2007 – tested for portability

– Architectures:

• Opteron, Xeon, Itanium2, PA-Risc, Power5, Sparc

– Interconnects:

• Ethernet, Infiniband, Infinipath, SGI NUMAlink, and shared memory.

– Operating systems

• Linux (RH FC3, SLES9/10,Suse 9.3), Windows CCS, HPUX, Solaris, AIX

– MPI implementations

• HP-MPI, MPICH, MPICH2, Open MPI, IBM-MPI, Intel MPI, MPICH-GM, MVAPICH, Fujitsu MPI, InfiniPath MPI, SGI MPT

– Compilers:

• SUN Studio, Fujitsu, Intel, PathScale, PGI, HP, and IBM compilers.

Page 13: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

13 Holger Brunst, Matthias Müller: Leistungsanalyse

MPI2007 – tested for scalability

– Scalable from 16 to 128 ranks (processes) for medium data set

– Runtime of 1 hour per benchmark test at 16 ranks using GigE on an unspecified reference cluster.

– Memory footprint should be < 1GB per rank at 16 ranks.

– Exhaustively tested for each rank count - 12 - 15 -> 130 - 140, 160, 180, 200, 225, 256, 512

Page 14: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

14 Holger Brunst, Matthias Müller: Leistungsanalyse

Overview of the applications

Code LOC Language MPI MPI Area call sites calls

104.milc 17987 C 51 18 Lattice QCD 107.leslie3d 10503 F77,F90 43 13 Combustion 113.GemsFDTD 21858 F90 237 16 Electrodynamic simulation 115.fds4 44524 F90,C 239 15 CFD

121.pop2 69203 F90 158 17 Geophysical fluid

dynamics 122.tachyon 15512 C 17 16 Ray tracing 126.lammps 6796 C++ 625 25 Molecular dynamics 127.wrf2 163462 F90,C 132 23 Weather forecast 128.GAPgeofem 30935 F77,C 58 18 Geophysical FEM 129.tera_tf 6468 F90 42 13 Eulerian hydrodynamics 130.socorro 91585 F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu 5671 F90 72 13 SSOR

Page 15: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

15 Holger Brunst, Matthias Müller: Leistungsanalyse

MPI2007 Benchmark dynamic message call counts

Page 16: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

16 Holger Brunst, Matthias Müller: Leistungsanalyse

Pt2Pt Communication Statistics: 122.tachyon (ray tracing)

Page 17: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

17 Holger Brunst, Matthias Müller: Leistungsanalyse

Pt2Pt Communication Statistics: 107.leslie3D (combustion)

Page 18: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

18 Holger Brunst, Matthias Müller: Leistungsanalyse

Pt2Pt Communication Statistics: 113.GemsFDTD (electrodynamics)

Page 19: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

19 Holger Brunst, Matthias Müller: Leistungsanalyse

Message Length Statistics (Pt2Pt)

Page 20: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

Matthias Müller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

Available Results

Page 21: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

21 Holger Brunst, Matthias Müller: Leistungsanalyse

Available Results (blind submission)

– AMD A2210 Reference Platform (16 cores)

• Gigabit Ethernet

• Single Core AMD Opteron 848, 2.2 GHz

– SGI Altix 4700 (16-128 cores)

• SGI Numalink, SGI MPT 1.15

• Dual-Core Intel Itanium II 9040, 1.6 GHz

– HP Proliant BL460c Blade Cluster Platform 3000 BL (16-256 cores)

• Infiniband DDR, HP-MPI 2.2.5

• Dual-Core Intel Xeon 5160, 3.0 GHz

– QLogic, U. Cambridge Darwin Cluster (32-512 cores)

• Infinipath, QLogic Infinipath MPI library 2.0

• Dual-Core Intel Xeon 5160, 3.0 GHz

– QLogic, AMD Emerald Cluster (32-512 cores)

• Infinipath, QLogic Infinipath MPI library 2.1

• Dual-Core AMD Opteron 290, 2.8 GHz

Page 22: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

22 Holger Brunst, Matthias Müller: Leistungsanalyse

Scales to 128 , works on 512

Page 23: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

23 Holger Brunst, Matthias Müller: Leistungsanalyse

Scalability on U. Cambridge s Darwin Cluster (II)

Page 24: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

24 Holger Brunst, Matthias Müller: Leistungsanalyse

Scalability on HP Cluster

Page 25: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

25 Holger Brunst, Matthias Müller: Leistungsanalyse

Summary and Conclusion

SPEC MPI2007 properties:

– Application benchmark with 13 different codes

– Run and reporting rules for reproducibility

– Tested on a wide range of platforms:

• CPU and Node Architectures

• Interconnects

• Compilers

• MPI implementations

– Available dataset (medium) scales to 128 ranks

– Next steps:

• Large dataset with enhanced scalability for larger systems

• …

Page 26: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

Matthias Müller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

Use Cases

Page 27: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

27 Holger Brunst, Matthias Müller: Leistungsanalyse

Use cases

– Performance trends

– Compiler and performance

– Comparing different Itanium systems

– Comparing different system generations

Page 28: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

28 Holger Brunst, Matthias Müller: Leistungsanalyse

Where Does the Performance Go? or Why Should I Care About the Memory Hierarchy?

Processor-DRAM Memory Gap (latency) Proc

60%/yr.

(2X/1.5yr)

DRAM

9%/yr.

(2X/10 yrs)

Moore s Law

Processor-Memory

Performance Gap:

(grows 50% / year)

CPU

DRAM

Page 29: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

29

Performance Trends measured by SPECint

Source: Hennessy, Patterson: „Computer Architecture, a quantitative approach“.

Page 30: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

30

CPUint2006 development between 2005 and 2009

Page 31: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

31

Performance Trends measured by SPECint

2009

23%

Page 32: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

32

SPEC CPU benchmark development over time

Page 33: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

33

CPUfp2006 development between 1991 and 2009

CPU 95

Released 1995

602 results between 3/1991 and 1/2001

CPUfp2000

Released 2000

1385 results between 10/1996 and 2/2007

CPUfp2006

Released 2006

1217 results between 4/1997 and 4/2009

42%

33%

30%

Page 34: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

34

Performance Trends over 20 years of code life cycle

Page 35: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

35 Holger Brunst, Matthias Müller: Leistungsanalyse

Comparison OMPM base compilers

Page 36: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

36 Holger Brunst, Matthias Müller: Leistungsanalyse

Influence of compilers on OMPM base 32-way results

Page 37: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

37 Holger Brunst, Matthias Müller: Leistungsanalyse

Comparison OMPM on 32-way 1.5 GHz Itanium

Page 38: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

38 Holger Brunst, Matthias Müller: Leistungsanalyse

SMP Performance Gain Itanium/Itanium 2

Page 39: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

39 Holger Brunst, Matthias Müller: Leistungsanalyse

45.7cm

38.6cm CPU

1985 1990 1995 1998

Perf

orm

ance

Bipolar Water-cooled

CMOS Air-cooled

Multi Nodes

Large scale cluster

>100nodes

SX-3

SX-5

Over 1GFLOP Per Node

SX-6/7

SX-1/2

SX-4

Technology

2cm

2cm

SX-8

Massive scale cluster >500nodes

2004

Single module node

Single Chip Vector Processor

Multi CPUs

Architecture

The history of NEC SX series

2001

Page 40: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

40 Holger Brunst, Matthias Müller: Leistungsanalyse

Performance Properties of Different SX systems

System Availability CPU perf. Mem band/ CPU Node perf. Mem. Band/ Node

SX-4 1996 2 GF/s 16 GB/s 64 GF/s 512 GB/s

SX-5e 1999 4 GF/s 32 GB/s 64 GF/s 512 GB/s

SX-6 2001 8 GF/s 32 GB/s 64 GF/s 256 GB/s

SX-6+ 2002 9 GF/s 36 GB/s 72 GF/s 324 GB/s

SX-8 2004 16 GF/s 64 GB/s 128 GF/s 512 GB/s

Factor 2 in

two years

Factor 2 in

eight years

Page 41: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

41 Holger Brunst, Matthias Müller: Leistungsanalyse

Properties of SPEC codes on vector systems

Name Lang Vratio Vlen MEM (MB)

Wupwise F 87.34 58.74 1488

Swim F 99.75 253.48 1584

Mgrid F 99.14 211.04 480

Applu F 81.31 34.17 1520

Galgel F 92.57 45.14 272

Equake C 0.06 9.6 464

Apsi F 76.70 23.02 1648

Gafort F 40.25 59.60 1680

Fma3d F 10.29 8.95 1040

Art C 32.06 242.14 272

Ammp C 76.67 102.79 176

Page 42: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

42 Holger Brunst, Matthias Müller: Leistungsanalyse

Expectations

Swim, mgrid and maybe galgel should perform well

Equake, fma3d and art should perform poorly

However, the focus was not on absolute, but relative performance and scalability

Page 43: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

43 Holger Brunst, Matthias Müller: Leistungsanalyse

SPEC efficiency on SX

Page 44: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

44 Holger Brunst, Matthias Müller: Leistungsanalyse

Performance measurements

All performance is reported relative to the performance of one thread on SX-4

Number of threads used:

– 1,2,4,8,16,32 on SX-4

– 1,2,4,8,16 on SX-5

– 1,2,4,8 on SX-6+

– 1,2,4,8 on SX-8

Page 45: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

45 Holger Brunst, Matthias Müller: Leistungsanalyse

Wupwise – expected behavior

Same node

performance

of SX-4/5/6

Page 46: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

46 Holger Brunst, Matthias Müller: Leistungsanalyse

Art – improves better than peak performance

Art benefits from

improvements of

scalar unit

Page 47: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

47 Holger Brunst, Matthias Müller: Leistungsanalyse

Swim – surprisingly improves with every generation

Compute

bound on SX-4

and SX-5 !

Page 48: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

48 Holger Brunst, Matthias Müller: Leistungsanalyse

Mgrid – large improvements from SX-6+ to SX-8

Improved

stride 2

memory access

Page 49: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

49 Holger Brunst, Matthias Müller: Leistungsanalyse

Not much improvement from SX-4 to 5 and 6 to 8

Page 50: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

50 Holger Brunst, Matthias Müller: Leistungsanalyse

Explanation for ammp improvements

Ammp contains a lot of locks

Lock performance (measured by EPCC microbenchmarks)

Lock Lock Ratio Ammp Ammp ratio

SX-6+ 4.3 micro s 1.23 2.82 1

SX-8 3.5 micro s 1 3.40 1.21

Page 51: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

51 Holger Brunst, Matthias Müller: Leistungsanalyse

General observations

With the exception of equake and galgel the applications show good scalability

Peak performance improvements

– realized to 87% to 96% for 1 thread

– realized to 81% to 89% for 8 threads

On average an SX-8 CPU is 6.14 times faster than an SX-4 CPU (peak ratio is 8)

No significant difference between scalar and vector codes

Page 52: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

Matthias Müller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

Summary for SPEC

Page 53: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

53 Holger Brunst, Matthias Müller: Leistungsanalyse

Summary – What you should have learned

– There are many different benchmark approaches: microbenchmarks, kernels, applications…

– SPEC benchmarks are application or at least application oriented benchmarks, designed to represent current workloads

• An update is required after a few years

– SPEC benchmarks are used to:

• Measure and compare performance of systems

• Drive future development

• …

– Different metrics are used (base/peak, speed/throughput)

– Many different factors have an influence on application performance:

• CPU

• Memory system

• Compilers

• OS and runtime environment

• I/O system

• …

Page 54: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

Matthias Müller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

Regression Models

Page 55: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

55 Holger Brunst, Matthias Müller: Leistungsanalyse

Terms

Regression models allow to estimate or predict a random variable as a function of several other variables

The estimated variable is called response variable, the variables used to predict the response are called predictor variables, predictors or factors.

Page 56: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

56 Holger Brunst, Matthias Müller: Leistungsanalyse

Simple Linear Regression Model

Predictor variable x and predicted response y:

Regression parameters b

Error

x

y

Measured y Estimated y

xbby10

ˆ +=

Page 57: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

57 Holger Brunst, Matthias Müller: Leistungsanalyse

Definitions

n observation pairs:

Error:

Sum of Squared Errors (SSE):

Mean Error:

Best linear model minimizes SSE and has a mean error of zero.

Exercise: calculate regression parameters for best linear model

)},(),...,,{( 11 nnyxyx

iiiyye ˆ=

= =

=n

i

n

i

iii xbbye1 1

2

10

2 )(

= =

=n

i

n

i

iii xbbye1 1

10 )(

Page 58: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

58 Holger Brunst, Matthias Müller: Leistungsanalyse

Calculation of Linear Regression Parameters

xbyb10

=

=

==n

i

i

n

i

ii

xnx

yxnyx

b

1

22

11

)(

Page 59: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

59 Holger Brunst, Matthias Müller: Leistungsanalyse

Coefficient of determination

Sum of Squared Errors (SSE):

SSE without regression would be (total sum of squares SST):

Difference between SSE and SST is explained by regression:

SSR=SST-SSE

Coefficient of determination (the higher R, the better the regression)

= =

=n

i

n

i

iii xbbye1 1

2

10

2 )(

=

n

i

iyy

1

2)(

SST

SSESST

SST

SSRR ==

2

Page 60: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

60 Holger Brunst, Matthias Müller: Leistungsanalyse

Assumptions

The relationship between the response variable y and the predictor variable x is linear

The predictor variable x is measured without any error

The model errors are statistically independent

The errors are normally distributed with zero mean and a constant standard deviation

Page 61: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

61 Holger Brunst, Matthias Müller: Leistungsanalyse

Visual tests: look at the data

x

y (a) Linear

x

y (c) Outlier

x

y (d) Nonlinear

x

y (b) Multilinear

Page 62: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

62 Holger Brunst, Matthias Müller: Leistungsanalyse

Residual versus predicted response graph

Predicted response

(a) No trend R

esid

ual

Predicted response

(b) Trend

Res

idua

l Predicted response

(c) Trend

Res

idua

l

Page 63: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

63 Holger Brunst, Matthias Müller: Leistungsanalyse

Residual versus experiment number

Experiment number

(a) No trend R

esid

ual

Experiment number

(b) Trend

Res

idua

l

Example: physical experiment with insufficient initial conditions.

Page 64: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

64 Holger Brunst, Matthias Müller: Leistungsanalyse

Check for constant standard deviation of errors

Predicted response

(a) No trend R

esid

ual

Predicted response

(b) Increasing spread

Res

idua

l

Page 65: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

65 Holger Brunst, Matthias Müller: Leistungsanalyse

Automatic fitting with gnuplot

Page 66: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

66 Holger Brunst, Matthias Müller: Leistungsanalyse

gnuplot> f(x)=a*x+b

gnuplot> fit f(x) "data.txt" u 1:2 via a,b

After 4 iterations the fit converged.

final sum of squares of residuals : 1.80841

rel. change during last iteration : -6.64694e-07

degrees of freedom (ndf) : 15

rms of residuals (stdfit) = sqrt(WSSR/ndf) : 0.347218

variance of residuals (reduced chisquare) = WSSR/ndf : 0.120561

Final set of parameters Asymptotic Standard Error

======================= ==========================

a = 0.530196 +/- 0.01719 (3.242%)

b = 3.70353 +/- 0.1761 (4.756%)

Page 67: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

67 Holger Brunst, Matthias Müller: Leistungsanalyse

Visual test

plot [0:][0:] "data.txt" u 1:2 w p, f(x)

Page 68: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

68 Holger Brunst, Matthias Müller: Leistungsanalyse

Residual versus predicted response graph

Page 69: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

69 Holger Brunst, Matthias Müller: Leistungsanalyse

Residual versus experiment number

Page 70: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

70 Holger Brunst, Matthias Müller: Leistungsanalyse

Fitting with gnuplot: basics

The `fit` command can fit a user-defined function to a set of data points

(x,y), using an implementation of the nonlinear least-squares

(NLLS) Marquardt-Levenberg algorithm. Any user-defined variable occurring in

the function body may serve as a fit parameter, but the return type of the

function must be real.

Syntax: fit {[xrange] {[yrange]}} <function> '<datafile>'

{datafile-modifiers} via '<parameter file>' | <var1>{,<var2>,...}

Page 71: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

71 Holger Brunst, Matthias Müller: Leistungsanalyse

Fitting with gnuplot: advanced

The default data formats for fitting functions with a single independent

variable, y=f(x), are {x:}y or x:y:s; those formats can be changed with

the datafile `using` qualifier. The third item (a column number or an

expression), if present, is interpreted as the standard deviation of the

corresponding y value and is used to compute a weight for the datum, 1/s**2.

Page 72: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

72 Holger Brunst, Matthias Müller: Leistungsanalyse

Curvilinear regression

Sometimes life is more difficult than linear dependencies: nonlinear regression is needed

Often it is sufficient to convert the nonlinear function in a linear form with suitable variable conversion, this is called curvilinear regression

Example:

abxy =

xaby lnlnln +=

Page 73: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

73 Holger Brunst, Matthias Müller: Leistungsanalyse

Examples of curvilinear regression functions

Note: if a predictor variable appears in more than one transformed predictor variable, the transformed variables are likely to be correlated, causing the problem of multicolinearity

Nonlinear Linear Y=a+b/x Y=a+b(1/x)

y = 1(a+bx) (1/y) = a+bx

y = x / (a+bx) (x/y) = a + bx

y = a b^x ln y = ln a + (ln b) x

y = a + b x^n y = a + b (x ^ n )

Page 74: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

74 Holger Brunst, Matthias Müller: Leistungsanalyse

Common mistakes in regression

Not verifying that the relationship is linear

Relying on automated results without visual verification

Not specifying confidence intervalls for the regression parameters

Not specifying the coefficient of determination

Confusing the Coefficient of Determination R^2 and the Coefficient of Correlation R

Using regression to predict far beyond the measure range

Page 75: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

75 Holger Brunst, Matthias Müller: Leistungsanalyse

Coefficient of determination provides wrong indication

x

y

x

y

x

y

x

y

Page 76: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

76 Holger Brunst, Matthias Müller: Leistungsanalyse

Short checklist for simple linear regression analysis

1. Visually verified that the relationship is linear?

2. Are all predictors in appropriate units so that the regression coeeficients are comparable?

3. Has the coefficient of determination been specified?

4. Is the coefficient of determination high enough?

5. Have the confidence intervals for regression parameters been calculated?

6. Are all regression parameters statistically significant?

7. Is the regression only been used for predictions closed to the measured range?

Page 77: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

77 Holger Brunst, Matthias Müller: Leistungsanalyse

Not treated here

Confidence intervals for regression parameters

Confidence intervals for predictions

Multiple linear regression

General transformations

.. and much more…

Page 78: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

Matthias Müller ([email protected])

Werbung

Holger Brunst, Matthias Müller: Leistungsanalyse

Page 79: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

79

SHK gesucht für Benutzerberatung

Was ?

• Bearbeitung von Anfragen im Trouble Ticket Systems (OTRS)

• Bearbeiten von Login-Anträgen - einrichten, verlängern, löschen

• Neuvergabe von vergessenen Passworten

• Entgegennahme von Störungsmeldungen (Datennetz) und Weiterleitung an Kollegen des ZIH

• Unterstützung von Studenten /Mitarbeiter beim Einrichten des WLAN auf deren Notebooks

• Update von Webseiten

Wo und Wann ?

Willers-Bau (WIL) A218

Mo – Fr, 14:00-19:00 Uhr (Wochentag nach Absprache)

Gesucht: ab 15.02. oder

01.03. oder

01.04 mit je 5h oder 10h pro Woche

Bei Interesse:

Claudia Schmidt, WIL A116, 39833, [email protected]

Benutzerberatung

Page 80: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

Matthias Müller ([email protected])

Campus 2020 Ideenwettbewerb der TUD

Zentrum für Informationsdienste und Hochleistungsrechnen

(ZIH)

Dresden, Januar 2012

In Kooperation mit:

Page 81: Regression Models - TU Dresden · SPEC MPI NPB HPCC Number of applications 13 8 7 ... F90 155 20 density-functional theory 132.zeusmp2 44441 C,F90 639 21 Astrophysical CFD 137.lu

81

Mitgestaltung des Campus durch die

Studierenden

Erarbeitung innovativer Konzeptideen durch

interdisziplinäre Teams

TU Dresden, 1/19/12

Zentrum für Informationsdienste und Hochleistungsrechnen

(ZIH)

Ausschreibung WS 2011/2012: Moderne Zugangs und Schließsysteme

„Campus 2020“: http://tu-dresden.de/campus-2020