cs8625-june-22-2006

21
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson CS8625-June-22-2006 Class Will Start Momentarily… Homework & Midterm Review CS8625 High Performance and Parallel Computing Dr. Ken Hoganson

Upload: mckenzie-english

Post on 30-Dec-2015

14 views

Category:

Documents


0 download

DESCRIPTION

CS8625-June-22-2006. Homework & Midterm Review. CS8625 High Performance and Parallel Computing Dr. Ken Hoganson. Class Will Start Momentarily…. Balance Point. The basis for the argument against “putting all your (speedup) eggs in one basket”: Amdahl’s Law - PowerPoint PPT Presentation

TRANSCRIPT

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

CS8625-June-22-2006

ClassWill

Start Momentarily…

Homework & Midterm ReviewCS8625 High Performance and

Parallel ComputingDr. Ken Hoganson

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

Balance Point

• The basis for the argument against “putting all your (speedup) eggs in one basket”: Amdahl’s Law

• Note the balance point in the denominator where both parts are equal.

• Increasing N (number of processors) beyond this point can at best halve the denominator, and double the speedup.

N

Speedup

1

1

N

1 wherePoint, Balance

N

N

increasing through

possible is speedup additional

little very ,1 When

N

N

increasing through

possible bemay speedup additional

tsignifican ,1 When

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

Balance Point Heuristic

• Increasing N (number of processors) beyond this point can at best halve the denominator, and double the speedup.

N

Speedup

1

1

N

1 wherePoint, Balance

N

N

increasing through

possible is speedup additional

little very ,1 When

N

N

increasing through

possible bemay speedup additional

tsignifican ,1 When

Solved for N N= α --------

1-α

Solved for α α= N --------

N + 1

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

Balance Point

• Example• Parallel Fraction =

90%• (10% in serial)

N Alpha/N 1-alpha Speedup

1 0.90 0.10 1/1

2 0.45 0.10 1/(0.1+0.45) = 1.82

4 0.225 0.10 1/(0.1+0.225)= 3.07

8 0.1125 0.10 1/(0.1+0.1125)= 4.716

16 0.056 0.10 1/(0.1+0.056)= 6.41

32 0.028 0.10 1/(0.1+0.028)= 7.8125

64 0.014 0.10 1/(0.1+0.014)= 8.77

infinity 0.0 0.10 1/(0.1+0.0)= 10

Solved for N N= α --------

1-αN=0.90/0.10=9, Sup=5

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

Example

• Example: Workload has an average alpha of 94%. How many processors can reasonably be applied to speedup this workload?

Solved for N N= α --------

1-α

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

Example

• Example: An architecture has 32 processors. What workload parallel fraction is the minimum need to make reasonably efficient use of the processors?

Solved for α α= N --------

N + 1

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

Multi-Bus Multiprocessors

• Shared-Memory Multiprocessors are very fast– Low latency to memory on bus– Low communication overhead through shared-

memory• Scalability problems

– Length of bus slows signals (.75 SOL)– Contention for the bus reduces performance– Requires Cache to reduce contention

CPU CPUCPU MEM

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

Bus Contention

Multiple devices – processors, etc, compete for access to a bus

Only one device can use a bus at a time, limiting performance and scalability

)1(

)1()1(

)1()1(1

blocked) isrequest oneleast (at request oneor zero thanmore ofy probabilit

)1()1(!1)!1(

!

bus erequest th willprocessor oneexactly y that probabilit

)1(1 bus erequest th willoneleast at y that probabilit

)1( bus erequest th willnoney that probabilit

bus a requestingnot ofy probabilit 1

processors ofnumber

bus a requestingprocessor a ofy probabilit

nn

nn

n

n

rnrr

rnrrrn

n

r

r

r

n

r

1 – zero requests – exactly one request = probability of 2 or more (at least one blocked request)

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

• Performance degrades as requests are blocked• Resubmitted blocked requests degrades

performance even further than that shown above

N=4 N=8 N=16

R 0.1 0.1 0.2

1-r 0.9 0.9 0.8

(1-r)^n 0.6561 0.430 0.028

Nr(1-r)^(n-1) 0.2916 0.3826 0.1126

Blocked 0.0523 0.1873 0.8594

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

Clearly, the probability that a processor’s access to a shared bus will be denied will increase with both:

• The number of processors sharing a bus• The probability a processor will need access

to the bus.

• What can be done? What is the “universal band-aid” for performance problems?

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

• If cache greatly reduces access to mem, then

• Blocking rate on the bus is much lower.

N=4 N=8 N=16 N=16

R 0.1 0.1 0.2 0.01

1-r 0.9 0.9 0.8 0.99

(1-r)^n 0.6561 0.430 0.028 0.8515

Nr(1-r)^(n-1) 0.2916 0.3826

0.1126 0.1376

Blocked 0.0523 0.1873

0.8594 0.0109

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

Two approaches to improving shared memory/bus machine performance:

• Invest in large amounts, and multiple levels of, cache, – and a connection network to allow caches

to synchronize contents.

• Invest in multiple buses and independently accessible blocks of memory

• Combining both may be the best strategy.

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

Homework

• Your project is to explore the effect on the performance of a shared-memory bus-based multiprocessor, of interconnection network contention.

• You will do some calculations, use the HPPAS simulator, and write a couple-page report to turn in.

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

Task 1

• For a machine with processors that include on-chip cache that yield a cache hit rate of 90%, determine the maximum number of processors that can go on a single shared-bus, and still maintain at least a 98% acceptance of requests.

• Use the calculations shown in the lecture to zero in on the correct answer, recording your calculations in a table for your report. Show each step of the calculation as was done in the lecture/ppt.

• Your results should “bracket” the maximum.

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

Task 1

• Task 1: Use the formula in the table to find

N=4 N=8 N=16 N=? N=?

R=10% 0.10 0.10 0.10 0.10 0.10

1-r 0.90 0.90 0.90 0.90 0.90

(1-r)^n

Nr(1-r)^(n-1)

Blocked 1 - 0Req - 1Req

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

Task 2

• Use the maximum number of processors (Task 1) and Amdahl’s law at the balance point, to figure out what workload parallel fraction yields a balance in the denominator.

• Determine the theoretical speedup that will be obtained.

Solved for α α= N --------

N + 1

N

Speedup

1

1

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

Task 3

• Use the data values developed so far, to run the HPPAS simulation system. Record the speedup obtained from this system.

• If it differs markedly from the theoretical value, check all the settings, and rerun the simulation, and explain any variation from the theoretical expected value.

• Record your results in your report, showing each step of the calculation as was done in the lecture/ppt.

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

Dates

• The current plan:• Make the midterm available on Friday June

23.• Due date will be July 10 (after the conference

and after the July 4th weekend).

• Conference week: • Complete homework: Due on July 3 by email.• Work on Midterm exam.

• No class lecture on June 27 and 29.• No class on July 4.• Next live class is Wed July 6.

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

Topic Overview

Overview of topics for the exam:• Five parallel levels• Problems to be solved for parallelism• Limitations to parallel speedup• Amdahl’s Law: theory, implications• Limiting factors in realizing parallel performance• Pipelines and their performance issues• Flynn’s classification• SIMD architectures• SIMD algorithms• Elementary analysis of algorithms• MIMD: Multiprocessors and Multicomputers• Balance point and heuristic (from Amdahl’s Law)• Bus contention and analysis of single shared bus.• Use of the online HPPAS tool.• Specific multiprocessor clustered architectures:

– Compaq– DASH– Dell Blade Cluster

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

End of Lecture

End Of

Today’s

Lecture.

CS 8625 High Performance and Parallel, Dr. Hoganson

Copyright © 2005, 2006 Dr. Ken Hoganson

This slide left intentionally blank.