cs8625-june-22-2006
DESCRIPTION
CS8625-June-22-2006. Homework & Midterm Review. CS8625 High Performance and Parallel Computing Dr. Ken Hoganson. Class Will Start Momentarily…. Balance Point. The basis for the argument against “putting all your (speedup) eggs in one basket”: Amdahl’s Law - PowerPoint PPT PresentationTRANSCRIPT
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
CS8625-June-22-2006
ClassWill
Start Momentarily…
Homework & Midterm ReviewCS8625 High Performance and
Parallel ComputingDr. Ken Hoganson
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Balance Point
• The basis for the argument against “putting all your (speedup) eggs in one basket”: Amdahl’s Law
• Note the balance point in the denominator where both parts are equal.
• Increasing N (number of processors) beyond this point can at best halve the denominator, and double the speedup.
N
Speedup
1
1
N
1 wherePoint, Balance
N
N
increasing through
possible is speedup additional
little very ,1 When
N
N
increasing through
possible bemay speedup additional
tsignifican ,1 When
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Balance Point Heuristic
• Increasing N (number of processors) beyond this point can at best halve the denominator, and double the speedup.
N
Speedup
1
1
N
1 wherePoint, Balance
N
N
increasing through
possible is speedup additional
little very ,1 When
N
N
increasing through
possible bemay speedup additional
tsignifican ,1 When
Solved for N N= α --------
1-α
Solved for α α= N --------
N + 1
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Balance Point
• Example• Parallel Fraction =
90%• (10% in serial)
N Alpha/N 1-alpha Speedup
1 0.90 0.10 1/1
2 0.45 0.10 1/(0.1+0.45) = 1.82
4 0.225 0.10 1/(0.1+0.225)= 3.07
8 0.1125 0.10 1/(0.1+0.1125)= 4.716
16 0.056 0.10 1/(0.1+0.056)= 6.41
32 0.028 0.10 1/(0.1+0.028)= 7.8125
64 0.014 0.10 1/(0.1+0.014)= 8.77
infinity 0.0 0.10 1/(0.1+0.0)= 10
Solved for N N= α --------
1-αN=0.90/0.10=9, Sup=5
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Example
• Example: Workload has an average alpha of 94%. How many processors can reasonably be applied to speedup this workload?
Solved for N N= α --------
1-α
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Example
• Example: An architecture has 32 processors. What workload parallel fraction is the minimum need to make reasonably efficient use of the processors?
Solved for α α= N --------
N + 1
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Multi-Bus Multiprocessors
• Shared-Memory Multiprocessors are very fast– Low latency to memory on bus– Low communication overhead through shared-
memory• Scalability problems
– Length of bus slows signals (.75 SOL)– Contention for the bus reduces performance– Requires Cache to reduce contention
CPU CPUCPU MEM
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Bus Contention
Multiple devices – processors, etc, compete for access to a bus
Only one device can use a bus at a time, limiting performance and scalability
)1(
)1()1(
)1()1(1
blocked) isrequest oneleast (at request oneor zero thanmore ofy probabilit
)1()1(!1)!1(
!
bus erequest th willprocessor oneexactly y that probabilit
)1(1 bus erequest th willoneleast at y that probabilit
)1( bus erequest th willnoney that probabilit
bus a requestingnot ofy probabilit 1
processors ofnumber
bus a requestingprocessor a ofy probabilit
nn
nn
n
n
rnrr
rnrrrn
n
r
r
r
n
r
1 – zero requests – exactly one request = probability of 2 or more (at least one blocked request)
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
• Performance degrades as requests are blocked• Resubmitted blocked requests degrades
performance even further than that shown above
N=4 N=8 N=16
R 0.1 0.1 0.2
1-r 0.9 0.9 0.8
(1-r)^n 0.6561 0.430 0.028
Nr(1-r)^(n-1) 0.2916 0.3826 0.1126
Blocked 0.0523 0.1873 0.8594
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Clearly, the probability that a processor’s access to a shared bus will be denied will increase with both:
• The number of processors sharing a bus• The probability a processor will need access
to the bus.
• What can be done? What is the “universal band-aid” for performance problems?
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
• If cache greatly reduces access to mem, then
• Blocking rate on the bus is much lower.
N=4 N=8 N=16 N=16
R 0.1 0.1 0.2 0.01
1-r 0.9 0.9 0.8 0.99
(1-r)^n 0.6561 0.430 0.028 0.8515
Nr(1-r)^(n-1) 0.2916 0.3826
0.1126 0.1376
Blocked 0.0523 0.1873
0.8594 0.0109
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Two approaches to improving shared memory/bus machine performance:
• Invest in large amounts, and multiple levels of, cache, – and a connection network to allow caches
to synchronize contents.
• Invest in multiple buses and independently accessible blocks of memory
• Combining both may be the best strategy.
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Homework
• Your project is to explore the effect on the performance of a shared-memory bus-based multiprocessor, of interconnection network contention.
• You will do some calculations, use the HPPAS simulator, and write a couple-page report to turn in.
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Task 1
• For a machine with processors that include on-chip cache that yield a cache hit rate of 90%, determine the maximum number of processors that can go on a single shared-bus, and still maintain at least a 98% acceptance of requests.
• Use the calculations shown in the lecture to zero in on the correct answer, recording your calculations in a table for your report. Show each step of the calculation as was done in the lecture/ppt.
• Your results should “bracket” the maximum.
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Task 1
• Task 1: Use the formula in the table to find
N=4 N=8 N=16 N=? N=?
R=10% 0.10 0.10 0.10 0.10 0.10
1-r 0.90 0.90 0.90 0.90 0.90
(1-r)^n
Nr(1-r)^(n-1)
Blocked 1 - 0Req - 1Req
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Task 2
• Use the maximum number of processors (Task 1) and Amdahl’s law at the balance point, to figure out what workload parallel fraction yields a balance in the denominator.
• Determine the theoretical speedup that will be obtained.
Solved for α α= N --------
N + 1
N
Speedup
1
1
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Task 3
• Use the data values developed so far, to run the HPPAS simulation system. Record the speedup obtained from this system.
• If it differs markedly from the theoretical value, check all the settings, and rerun the simulation, and explain any variation from the theoretical expected value.
• Record your results in your report, showing each step of the calculation as was done in the lecture/ppt.
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Dates
• The current plan:• Make the midterm available on Friday June
23.• Due date will be July 10 (after the conference
and after the July 4th weekend).
• Conference week: • Complete homework: Due on July 3 by email.• Work on Midterm exam.
• No class lecture on June 27 and 29.• No class on July 4.• Next live class is Wed July 6.
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Topic Overview
Overview of topics for the exam:• Five parallel levels• Problems to be solved for parallelism• Limitations to parallel speedup• Amdahl’s Law: theory, implications• Limiting factors in realizing parallel performance• Pipelines and their performance issues• Flynn’s classification• SIMD architectures• SIMD algorithms• Elementary analysis of algorithms• MIMD: Multiprocessors and Multicomputers• Balance point and heuristic (from Amdahl’s Law)• Bus contention and analysis of single shared bus.• Use of the online HPPAS tool.• Specific multiprocessor clustered architectures:
– Compaq– DASH– Dell Blade Cluster
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
End of Lecture
End Of
Today’s
Lecture.