tera scale systems and applications · cug 2001 - bob [email protected] 4 nasa advanced...

45
NASA Advanced Supercomputing Tera Tera Scale Scale Systems Systems and and Applications Applications Bob Ciotti Tera-Scale Application Group Lead NASA Advanced Supercomputing Division NASA Ames Research Center Moffett Field, CA 94035-1000 [email protected]

Upload: others

Post on 10-Nov-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

NASAAdvancedSupercomputing

Tera Tera Scale Scale Systems Systems

and and ApplicationsApplications

Bob Ciotti

Tera-Scale Application Group LeadNASA Advanced Supercomputing Division

NASA Ames Research Center

Moffett Field, CA [email protected]

Page 2: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

2

NASAAdvancedSupercomputing

NASA’s Computational ChallengesNASA’s Computational Challenges

– Aeronautics– X-Vehicle Development– Shuttle Upgrades– Airspace Control and

Simulation

– Astrobiology– Protein Folding and

Beyond

– Earth Science– Atmospheric Physics– Oceanography

– Nano Technology– Chemistry

– Space Science– Stellar Dynamics– Chemistry/Spectra– Instrument Data

Analysis and Support– Planetary Modeling

Page 3: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

3

NASAAdvancedSupercomputing

NASA Advanced Supercomputing FacilityNASA Advanced Supercomputing Facility

Current SGI Inventory - ~ 20 systems#CPUs System Name

512 O2k Lomax Development1024 O3k Chapman Development

8 O2k Evelyn Development4 O2k Piglet Development

1548 Subtotal512 O2k Lomax2 Production256 O2k Steger Production64 O2k Hopper Production64 O2k Dixon0 Production64 O2k Jimpf0 Production64 O2k Jimpf1 Production64 O2k Kalney Production64 O2k Sunrise Production24 O2k Turing Production

8 O2k Fermi Production1184 Subtotal

12 O2k Lou Storage8 O2k Lou2 Storage4 O2k This Storage4 O2k That Storage

28 Subtotal

2760 Origin CPUs

Page 4: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

4

NASAAdvancedSupercomputing

Goal: Production Quality Goal: Production Quality Highly Parallel SupercomputerHighly Parallel Supercomputer

• What is a Production Supercomputer?– Significantly faster that previous generation – Less Expensive– Productive Environment for the Users (easy to use)– As reliable as the “Gold Standard”

• Have to move into the realm of hundreds or thousands of processors– Its all about the interconnect

• Many Attempts (these all failed…)– Connection Machines - CM2 (performance)– Connection Machines - CM5 (performance)– Intel - IPSC-860 (performance)– Intel – Paragon (performance)– IBM - SP2 (performance)– SGI - Power Challenge Cluster (reliability)– Cray - J90 Cluster (vendor backed out of commitment)

Page 5: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

5

NASAAdvancedSupercomputing

Barriers for Scientists Obtaining a Barriers for Scientists Obtaining a SuperComputing SuperComputing Capability for Their ResearchCapability for Their Research

• There are many problems that require supercomputing

• However, its typical that applications may require substantial modifications to achieve even moderate levels of parallelism.

• This can translate into several man years of effort, and when you have done this, you’ll likely not run faster that a C90 supercomputer manufactured in 1993.

• Many (most) productive scientists are simply unable to access supercomputing because it either difficult or even not possible to effectively scale their applications.

• For example,– Computational Chemistry– DAO– Molecular Dynamics– Space Science

Shared memory can go a long way towards SIMPLIFICATION.

Page 6: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

6

NASAAdvancedSupercomputing

Are we getting sustained performance on parallel Are we getting sustained performance on parallel systems for our real applications?systems for our real applications?

FVCORE 2x2.5@55 Levels (B55)

4.5

0

1

2

3

4

5

6

7

8

9

10

11

12

13

0 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256#CPUs

Gig

aflo

ps

O2K 300mhz - NAS - MLP

C90 250mhz - NAS (estimated)

Power3 200mhz - NERSC

O2K 250mhz - NAS

PPC604e 332mhz - ASCI Blue

O2K 195mhz - NAS

EV5 440mhz - LLNL

T3E-900 - NERSC

Page 7: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

7

NASAAdvancedSupercomputing

Are we getting sustained performance on parallel Are we getting sustained performance on parallel systems for our real applications?systems for our real applications?

FVCORE 1x1.25@55 Levels (C55)

0

2

4

6

8

10

12

14

16

18

20

22

24

26

28

30

0 32 64 96 128 160 192 224 256 288 320 352 384 416 448 480 512

#CPUs

Gig

aflo

ps

O2K 300mhz - NAS - MLP

O2K 300mhz - NAS - MPI - w/PP

O2K 300mhz - NAS - MPI

C90 250mhz - NAS (estimated)

Page 8: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

8

NASAAdvancedSupercomputing

Performance Performance (mostly from 1996)(mostly from 1996)

63105640

710 470 234 210 129 86 770

1000

2000

3000

4000

5000

6000

7000

Meg

aflo

ps

O2K64p/195mhz

49x/77%

C90/1612x/75%

SX4 C90/16 O2K300mhz

SP3 power3 O2K195mhz

J90 SPP2000

ARC3D

Page 9: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

9

NASAAdvancedSupercomputing

256p Origin2000256p Origin2000

Page 10: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

10

NASAAdvancedSupercomputing

512p Origin2000512p Origin2000

Page 11: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

11

NASAAdvancedSupercomputing

Lomax: 512 Processor Origin2000Lomax: 512 Processor Origin2000

CPUs

512 MIPS R12000 300 MHz CPUs

600 MFLOP/s per CPU Peak

307 GFLOPS total (19 x C90) Peak

8 MByte cache per CPU (4 GByte total)

Memory

192 GB main memory (24 x C90)

Hypercube interconnect (9 hops)

Disk

2 TB FC Raid disk sub-system

System Software

Develop OS true single system image

Single XFS File SystemCompiler parallel loops

512 CPUs wide

Cost/performance OVERFLOW-MLP• Cost/perf O2K/C90 = 37.7x• Cost/perf O2K512 to O2K256 = 1.6x • Performance over 256p = 3x• Performance over C90/16 = 13x

XtremeccNUMA

Page 12: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

12

NASAAdvancedSupercomputing

OVERFLOW-MLP Performance vs CPU CountSteger (250MHz O2K/256p) Lomax (300MHz O2K/512p)

20 Gflops on 256p = 15.6% vs 60 Gflops on 512p =19.5% of peak

Page 13: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

13

NASAAdvancedSupercomputing

256p Single System Image256p Single System Image

Metarouter

Page 14: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

14

NASAAdvancedSupercomputing

512p Single System Image512p Single System Image(NUMA Effects Worsen….)(NUMA Effects Worsen….)

MetarouterMetarouter

Page 15: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

15

NASAAdvancedSupercomputing

NUMA Aware Operating SystemNUMA Aware Operating System

Metarouter

IRIX lacked sufficient control over

T2

T1

T3

Thread Affinity

Thread Location

Memory Allocation

Memory De-Allocation

NUMA effects require that these be dealt with

M3

M2M1

M1

T1tmp

IDEAL:

Logical Partitions

Page 16: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

16

NASAAdvancedSupercomputing

NUMA Aware Operating SystemNUMA Aware Operating System

Metarouter

CPU SetsMemory Limited CPU SetsThread Affinity (mustrun)

Page Placement (MLD Sets)

IDEAL:

Logical Partitions

Page 17: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

17

NASAAdvancedSupercomputing

NUMA Aware Job SchedulerNUMA Aware Job Scheduler

• Dynamic management of CPUSETs

– CPUSET created of size N at job start– CPUSET deleted at job termination

Page 18: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

18

NASAAdvancedSupercomputing

SpecFP SpecFP 20002000Single CPUSingle CPU

SpecFP 2000harmonic mean of base ratios

0 50 100 150 200 250 300 350 400 450 500 550 600

Intel P4 1.7 ghz

Alpha 21264B 833 mhz

Fijitsu P4 1.5 ghz

HP PA-8700 750mhz

Intel P4 1.4 ghz

Alpha 21264A 667 mhz

AMD Athlon 1.3 ghz

SN1MIPS R14k 500 mhz

HP-PA8600 552 mhz

SUN US3 900 mhz

SN1MIPS R12K 400 mhz

IBM P3-2 450mhz

SN0MIPS R12k 400 mhz

Page 19: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

19

NASAAdvancedSupercomputing

SpecFP2000 ThroughputSpecFP2000 ThroughputSpecFP 2000 Throughput

0

50

100

150

200

250

300

350

400

450

500

1 4 8 16 32 64 128

CPUs

SGI Origin

Dec AlphaServers

HP Superdome

IBM RS/6000

Sun Enterprise

Page 20: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

20

NASAAdvancedSupercomputing

What is Shared Memory MLP?What is Shared Memory MLP?

Shared Memory Multi-level Parallelism (MLP) is the utilization of multiple levels of parallelism within an application executing on a NUMA based system architecture in order to increase its parallel efficiency during execution. It is an open system design and has the following attributes:

• Usually two levels of parallelism

• Coarse grained parallelism provided by Unix forked processes

• Fine grained parallelism provided by the compiler at loop level

• No messaging - communication by Unix shared memory arenas

• Targeted for the new large CPU count NUMA SMP systems

• Method can also execute across clusters

Page 21: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

21

NASAAdvancedSupercomputing

Mapping MLP onto the Origin 2000

The diagram below is a graphical description of how user common blocks are mapped onto the system. In essence each virtual SMP executes a single a.out. Any data the user wants to share across a.out's is declared in the shared memory arena, and essentially appears as a common variable seen by all.

Common /global/ x,y,z

Common /local1/ a1,b1Common /local2/ c1,d1

Virtual SMP 1

Common /local1/ a1,b1Common /local2/ c1,d1

Virtual SMP 2

Memory

CPU

Memory

CPU

Memory

CPU

Memory

CPU

Memory

CPU

Memory

CPU

NUMA Memory Interface

Page 22: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

22

NASAAdvancedSupercomputing

MLPlibMLPlib

The MLPlib routines for scalable parallel execution support are:.• Subroutine GETMEM(xarray,xpoint,numxbyt)

– The GETMEM routine allocates numxbyt bytes to the xarray variable – resident in the shared memory arena. The xarray data will be visible to all – MLP processes using the shared memory arena.

• Subroutine FORKIT(nthread,numpro,nowpro)– The FORKIT routine spawns a total of numpro processes for the job – including the current one in the total. Nowpro is returned, and is the– current process id (1-numpro).– Nthread is an array of ints, indicating the number of OMP threads for each MLP

process• Subroutine BARRIER(numpro)

– The BARRIER routine waits until numpro processes have hit the – barrier, then all drop through.

Page 23: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

23

NASAAdvancedSupercomputing

ForkitForkit.f.f

c* Fork the Processes *nowpro=1c-----count offsets to starting cpus

ival=0do n=1,numprowrite(*,520) nistart(n) = ivalival=ival+nthread(n)

enddoc-----spawn the threads - manual forks

do n=2,numpronowpid=getpid()if(nowpid.eq.master) then

ierror=fork()endif

nowpid=getpid()if(nowpid.ne.master) then

nowpro=ngo to 200endif

enddo

Page 24: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

24

NASAAdvancedSupercomputing

ForkitForkit.f.f

c-----create mp environment200 call omp_set_num_threads(nthread(nowpro))

c-----check for pin requestcall getenv('PIN_TO_CPUS',evalue)if(evalue.ne.'') then

read(evalue,*) idopinendif

if(evalue.eq.'') thenidopin=0endif

if(idopin.eq.0) return

!$omp parallelcall mp_assign_to_cpu3(omp_get_thread_num(), istart(nowpro))

!$omp end parallel

Page 25: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

25

NASAAdvancedSupercomputing

Mp_assign_to_cpu3 Mp_assign_to_cpu3 (my_rank, starting_point)(my_rank, starting_point)

Figure out which physical cpus you have access to pmoctl(PMO_GETNODEMASK_UINT64,&sys_nodemask,sizeof(sys_nodemask))req.request = CPUSET_QUERY_CPUS;sysmp(MP_CPUSET, &req)

Figure out which physical memory the cpu is attached to sysmp(MP_NUMA_GETCPUNODEMAP,

(void *)cpu_node_mapping, sizeof(cnodeid_t) * ncpus)

Demand physical pages be allocated from the memory attached to the CPU you will assign to this thread.

mld_create(1,1024)mldset_create(&mld, 1)mldset_place(mldset, TOPOLOGY_PHYSNODES,

&affinity_info, 1, RQMODE_MANDATORY);

Locks the thread that called this routine to the chosen physical cpusysmp(MP_MUSTRUN, cpulist[my_rank+starting_point])

Page 26: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

26

NASAAdvancedSupercomputing

Result Result -- nthreadnthread=[8,13,…,16],=[8,13,…,16],numpronumpro=N=N

MLP #1

Thread

0 1 … 7

C C C P P PU U U0 1 7

Node0 Node1

MLP #2

Thread

0 1 … 12

C C C P P PU U U8 9 20

Node2 Node5

MLP #n

Thread

0 1 … 15

C C C P P PU U U78 79 93

Node19 Node23

Page 27: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

27

NASAAdvancedSupercomputing

Enhance incompressible flow simulation capability for developing aerospace vehicle components, especially, unsteady flow phenomena associated with high

speed turbo pump.

INS3D ObjectivesINS3D Objectives

Page 28: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

28

NASAAdvancedSupercomputing

Space Shuttle Main Engine RedesignSpace Shuttle Main Engine Redesign• External Tank filled with approx. 2

Olympic size pools worth of Liquid Hydrogen and Oxygen.

• Main engine turbo pump pushes 1 Olympic size pool every two minutes.

• If water were pumped it would produce a fountain 12 miles high.

• Redesign to reduce approx 1000 lbs of engine weight and increase engine efficiency.

• Will result in a significant payload capacity increase.

(Fact – First Shuttle Launch April 12, 1981, 7:00:03 a.m. EST)

Page 29: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

29

NASAAdvancedSupercomputing

Validation (SSME Impeller)

Page 30: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

30

NASAAdvancedSupercomputing

SSME Upgrade SSME Upgrade -- RIG1RIG1

• Full analysis - 3-5 rotations of the pump.

– Takes about a year on a Y-MP– Currenly about a week on 80

CPUs of Origin3000

– performance goal 1 day• After about 2 weeks on 160 Origin

3000 CPUs, got it to turn 40 degrees!

Page 31: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

31

NASAAdvancedSupercomputing

SSMESSME--rig1 / Initial startrig1 / Initial start

TIME STEP 5

Page 32: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

32

NASAAdvancedSupercomputing

SSMESSME--rig1 / Initial startrig1 / Initial start

TIME STEP 7 / Impeller rotated 3-degrees at 30% of design speed

PRESSURE VELOCITY MAGNITUDE

Page 33: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

33

NASAAdvancedSupercomputing

INS3DINS3D--MLP / MLP / 40 Groups40 GroupsRLV 2nd Gen Turbo pump114 Zones / 34.3 M grid points

INS3D INS3D ParallelizationParallelization

Per processor Mflop is between 60-70. This is a vector code.Code optimization for cache based platforms is currently underway. Target Mflops is to reach 120 per processor.Increasing number of OpenMP threads is also the main objective for this effort.

Page 34: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

34

NASAAdvancedSupercomputing

INS3DINS3D

INS3D Shuttle Main Engine Upgrade34.3 Million Points/114 Zones40 MLP Groups Pin vs. No Pin

0

25

50

75

100

125

150

175

200

225

250

275

300

325

350

375

400

425

450

475

500

40 80 160 320

#CPUs

Sec

on

ds/

Itte

rati

on

SN1 400mhz No Pin

Page 35: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

35

NASAAdvancedSupercomputing

INS3DINS3D

INS3D Shuttle Main Engine Upgrade34.3 Million Points/114 Zones40 MLP Groups Pin vs. No Pin

0

25

50

75

100

125

150

175

200

225

250

275

300

325

350

375

400

425

450

475

500

40 80 160 320

#CPUs

Sec

on

ds/

Itte

rati

on

SN0 400mhz Pin

SN1 400mhz No Pin

SN1 400mhz Pin

Page 36: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

36

NASAAdvancedSupercomputing

INS3DINS3D

INS3D Shuttle Main Engine Upgrade34.3 Million Points/114 Zones40 MLP Groups Pin vs. No Pin

0

25

50

75

100

125

150

175

200

225

250

275

300

325

350

375

400

425

450

475

500

40 80 160 320

#CPUs

Sec

on

ds/

Itte

rati

on

SN0 400mhz Pin

SN1 400mhz No Pin

SN1 400mhz Pin

Perfect Speedup

Page 37: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

37

NASAAdvancedSupercomputing

ForkitForkit.f.f

c-----check for pin requestcall getenv('PIN_TO_CPUS',evalue)if(evalue.ne.'') then

read(evalue,*) idopinendif

if(evalue.eq.'') thenidopin=0endif

if(idopin.eq.0) return

!$omp parallelcall mp_assign_to_cpu3(omp_get_thread_num(), istart(nowpro))

!$omp end parallel

Page 38: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

38

NASAAdvancedSupercomputing

INS3DINS3D

INS3D - Speedups Pin vs No Pin

0

20

40

60

80

100

120

140

160

180

200

220

240

260

280

300

320

340

40 80 160 320

#CPUs

Sp

eed

up

Perfect Speedup

SN1 400 pin Speedup

SN1 400 nopin Speedup

Page 39: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

39

NASAAdvancedSupercomputing

Overflow w/Test Case

This case is a simulation of a full aircraft configured for landing. It is a 35 million point problem distributed across 160 grids of widely varying size. The problem is interesting for several reasons. They are:

• It is one of the largest problems ever solved at NAS

• It requires hundreds of dedicated C90 CPU hours per run

• It fully stress tests the MLP code for correctness and performance• Load balance is a major issue with this problem• Cache issues are important (some grids break cache)

• It fully tests the Origin system hardware and software components

• The Overflow code is widely used within and outside of NASA

Page 40: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

40

NASAAdvancedSupercomputing

OverflowOverflowOverflow - Transport Configured for Landing

32 Million Points/150 zonesNo Pin vs Pin (O3000)

0

2

4

6

8

10

12

14

16

18

20

22

24

26

28

30

32

34

32 64 96 120 240 256 384 480

#CPUS

seco

nd

s/it

tera

tio

n

SN1 400mhz No Pin

Page 41: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

41

NASAAdvancedSupercomputing

OverflowOverflowOverflow - Transport Configured for Landing

32 Million Points/150 zonesNo Pin vs Pin (O3000)

0

2

4

6

8

10

12

14

16

18

20

22

24

26

28

30

32

34

32 64 96 120 240 256 384 480

#CPUS

seco

nd

s/it

tera

tio

n

SN1 400mhz No Pin

SN1 400mhz Pin

Page 42: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

42

NASAAdvancedSupercomputing

OverflowOverflow

Overflow - Transport Configured for Landing 32 Million Points/150 zones

No Pin vs Pin (O3000)

0

2

4

6

8

10

12

14

16

18

20

22

24

26

28

30

32

34

32 64 96 120 240 256 384 480

#CPUS

seco

nd

s/itt

erat

ion

SN1 400mhz No Pin

SN1 400mhz Pin

Perfect Speedup

Page 43: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

43

NASAAdvancedSupercomputing

OverflowOverflowSpeedupsPin vs No Pin

0

32

64

96

128

160

192

224

256

288

320

352

384

416

448

480

32 64 96 120 240 256 384 480

#CPUs

Sp

eed

up

ove

r 32

p

Perfect Speedup

SN1 400 pin Speedup

SN1 400 nopin Speedup

Page 44: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

44

NASAAdvancedSupercomputing

1024p Status1024p Status

• 2 operational 512p 400mhz O3000 system– testing and fallout

– staff and small group of users on systems

• Plan to bring up 1024p in initial topology configuration in June

• Move to higher bandwidth/lower latency topology by end of summer

• Processor speed upgrade around year end +-

• Expectation is to sustain 20% of peak - 240 Gflops

Page 45: Tera Scale Systems and Applications · CUG 2001 - Bob Ciotti@nas.nasa.gov 4 NASA Advanced Supercomputing Goal: Production Quality Highly Parallel Supercomputer • What is a Production

CUG 2001 - Bob [email protected]

45

NASAAdvancedSupercomputing

Some Open IssuesSome Open Issues

• Page placement doesn’t always work as expected

– some (not all) pages allocated via the mld scheme do not end up on the designated node

• Some nodes do not de-allocate memory fully when a job exits, forcing future jobs to allocate memory off other nodes.

• Mld_create returns a EINVAL with static parameters

• Sporadic issues with responsiveness possibly related to I/O activity

• CPU X not seen for 20 seconds. Process sleeping at SPL high.