chapter 16 parallel processing - yonsei...

79
Yonsei Yonsei University University Chapter 16 Chapter 16 Parallel Processing Parallel Processing

Upload: others

Post on 26-Apr-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity

Chapter 16Chapter 16

Parallel ProcessingParallel Processing

Page 2: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-2

ContentsContents• Multiple Processor Organizations• Symmetric Multiprocessors• Cache Coherence and the MESI Protocol• Clusters• Nonuniform Memory Access• Vector Computation

Page 3: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-3

Types of Parallel ProcessorTypes of Parallel Processor Multiple Processor Multiple Processor OrganizationOrganization

• Types of Parallel Processor Systems– Single instruction, single data stream - SISD– Single instruction, multiple data stream - SIMD– Multiple instruction, single data stream - MISD– Multiple instruction, multiple data stream- MIMD

Page 4: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-4

Taxonomy Of Parallel ProcessorTaxonomy Of Parallel Processor Multiple Processor Multiple Processor OrganizationOrganization

Page 5: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-5

Alternative Computer OrganizationAlternative Computer Organization Multiple Processor Multiple Processor OrganizationOrganization

• SISD

Page 6: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-6

Alternative Computer OrganizationAlternative Computer Organization Multiple Processor Multiple Processor OrganizationOrganization

• SIMD (with distributed memory)

Page 7: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-7

Alternative Computer OrganizationAlternative Computer Organization Multiple Processor Multiple Processor OrganizationOrganization

• MIMD(with shared memory)

Page 8: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-8

Alternative Computer OrganizationAlternative Computer Organization Multiple Processor Multiple Processor OrganizationOrganization

• MIMD (with distributed memory)

Page 9: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-9

Design IssuesDesign Issues Multiple Processor Multiple Processor OrganizationOrganization

• Physical Organization• Interconnection Structures• Interprocessor Communication• Operating System Design• Application Software Techniques

Page 10: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-10

SMP CharacteristicsSMP Characteristics Symmetric Symmetric MultiprocessorsMultiprocessors

• Two or more similar processors • Share the same main memory and I/O

facilities• All processors share access to I/O devices• All processors can perform the same

functions• The system is controlled by an integrated

OS

Page 11: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-11

SMP AdvantagesSMP Advantages Symmetric Symmetric MultiprocessorsMultiprocessors

• Performance• availability• Incremental growth• Scaling

Page 12: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-12

Multiprogramming & MultiprocessingMultiprogramming & Multiprocessing Symmetric Symmetric MultiprocessorsMultiprocessors

• Interleaving(multiprogramming)

Page 13: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-13

Multiprogramming & MultiprocessingMultiprogramming & Multiprocessing Symmetric Symmetric MultiprocessorsMultiprocessors

• Interleaving and overlapping(multiprocessing)

Page 14: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-14

Multiprogramming & MultiprocessingMultiprogramming & Multiprocessing Symmetric Symmetric MultiprocessorsMultiprocessors

• Interleaving(multiprogramming)

Page 15: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-15

OrganizationOrganization• Time-shared or

common bus• Multiport memory• Central control unit

Symmetric Symmetric MultiprocessorsMultiprocessors

Page 16: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-16

TimeTime--Shared BusShared Bus Symmetric Symmetric MultiprocessorsMultiprocessors

• To facilitate DMA transfers from I/O processors, the following features are provided– Addressing

• Distinguish modules on the bus to determine the source and destination of data

– Arbitration• I/O module can temporarily function as “master”

– Time sharing• When module is controlling the bus, other modules

are locked out and must, if necessary, suspend operation until bus access is achieved

Page 17: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-17

Time Shared BusTime Shared Bus Symmetric Symmetric MultiprocessorsMultiprocessors

Page 18: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-18

Bus OrganizationBus Organization’’s Advantagess Advantages Symmetric Symmetric MultiprocessorsMultiprocessors

• Simplicity– The simplest approach to multiprocessor

organization

• Flexibility– Easy to expand the system by attaching more

processors to the bus

• Reliability– The bus is essentially a passive medium

Page 19: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-19

Bus OrganizationBus Organization’’s Drawbacks Drawback Symmetric Symmetric MultiprocessorsMultiprocessors

• Performance– The speed of the system is limited by the bus

cycle time– To equip each processor with a cache memory for

improving performance

Page 20: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-20

MultiportMultiport MemoryMemory Symmetric Symmetric MultiprocessorsMultiprocessors

• The multiport memory allows the direct, independent access of main memory modules by each processor and I/O module

• Advantages– little or no modification is needed for either processor or

I/Omodules to accommodate multiport memory– better performance than bus approach– security

• To configure portions of memory as private to one or more processors and/or I/O module

• Disadvantage– Logic associated with memory is required for resolving

conflicts– more complex than the bus approach

Page 21: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-21

Central Control UnitCentral Control Unit Symmetric Symmetric MultiprocessorsMultiprocessors

• To separate data streams back and forth between independent modules: processor, memory, I/O module

• The controller can buffer requests and perform arbitration and timing functions

• To pass status and control messages between processors

• To perform cache update alerting• Advantages

– flexibility and simplicity of interfacing of the bus approach

• Disadvantages– the control unit is quite complex– A potential performance bottleneck

Page 22: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-22

Multiprocessor OS Multiprocessor OS DeisgnDeisgn Symmetric Symmetric MultiprocessorsMultiprocessors

• An SMP OS manages processor and other computer resources so that the user perceives a single OS controlling system resources

• The key design issues– Simultaneous concurrent processes– Scheduling– Synchronization– Memory management– Reliability and fault tolerance

Page 23: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-23

Mainframe SMPMainframe SMP• IBM S/390 SMP Organization

Symmetric Symmetric MultiprocessorsMultiprocessors

Page 24: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-24

Mainframe SMPMainframe SMP• PU

– CISC microprocessor– 64-KB L1 cache

• L2 cache– 384 KB– To be arranged in cluster of two– Each cluster supports three PUs

• Bus-switching network adapter(BSN)– to interconnect the L2 caches and the main

memory– To includes a level 3 (L3) cache (2 MB)

• Memory card

Symmetric Symmetric MultiprocessorsMultiprocessors

Page 25: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-25

Switched InterconnectionSwitched Interconnection Symmetric Symmetric MultiprocessorsMultiprocessors

• The single bus becomes a bottlneck affecting the scalability

• The S/390 copes with this problem in two ways– Main memory is split into four separate cards,

each with its own storage controller that can handle memory accesses at high speeds

– The connection from processors to a single memory card is not in the form of a shared bus but rather point to point links, where each link connects a group of three processors via an L2 cache to a BSN

Page 26: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-26

Shared L2 CachesShared L2 Caches Symmetric Symmetric MultiprocessorsMultiprocessors

• Needs of Shared L2 Caches– In moving from G3(generation 3) to G4, IBM

doubled the speed of the microprocessors. If the G3 organization was retained, a significant increase in bus traffic would occur. At the same time, it was desired to reuse as many G3 components as possible. Without a significant bus upgrade, the BSNs would become a bottleneck

– Analysis of typical S/390 workloads revealed a large degree of sharing of instructions and data among processors

• The use of one or more L2 caches, each of which was shared by multiple processors

Page 27: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-27

L3 CacheL3 Cache Symmetric Symmetric MultiprocessorsMultiprocessors

• Each L3 cache provides a buffer between L2 cachees and one memory card

• To provides the data much more quickly than a main memory access if the requested cache line is already shared by other processors but was not recently used by the requesting processor

38 GB32Memory32 MB14L3 cache5256 KB5L2 cache

8932 KB1L1 cache

Hit Rate(%)Cache SizeAccess Penalty

(PU cycles)Memory

subsystem

Page 28: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-28

Cache Coherence ProblemCache Coherence Problem• Write back

– Write operations are usually made only to the cache.

– Main memory is only updated when the corresponding cache line is flushed from the cache.

• Write through – All write operations are made to main memory as

well as to the cache, ensuring that main memory is always valid.

Cache coherence and Cache coherence and the the mesimesi protocolprotocol

Page 29: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-29

Software SolutionsSoftware Solutions• Compiler-based coherence mechanisms :

analysis on the code to determine which data items may become unsafe for caching, and they mark those items accordingly

• Simplest approach : Prevent any shared data variables from being cached

• Efficient approach : Analyze the code to determine safe periods for shared variables

Cache coherence and Cache coherence and the the mesimesi protocolprotocol

Page 30: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-30

Hardware SolutionsHardware Solutions• Cache coherence protocols• Dynamic recognition at run time of

potential inconsistency conditions• Hardware schemes can be divided into

two categories– Directory protocols– Snoopy protocols

Cache coherence and Cache coherence and the the mesimesi protocolprotocol

Page 31: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-31

Directory ProtocolsDirectory Protocols• Collect and maintain information about

where copies of lines reside• Typically, there is a centralized controller

that is part of the main memory controller, and a directory that is stored in main memory

• The directory contains global state information about the contents of the various local cache

Cache coherence and Cache coherence and the the mesimesi protocolprotocol

Page 32: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-32

Snoopy ProtocolsSnoopy Protocols• Distribute the responsibility for maintaining

cache coherence among all of the cache controllers in a multiprocessor

• A cache must recognize when a line that it holds is shared with other caches

• When an update action is performed on a shared cache line, it must be announced to all other caches by a broadcast mechanism

• Each cache controller is able to “snoop” on the network to observe these broadcasted notifications and react accordingly

• Suited to a bus-based multiprocessor

Cache coherence and Cache coherence and the the mesimesi protocolprotocol

Page 33: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-33

Snoopy ProtocolsSnoopy Protocols• Two approach to the snoopy protocol

– Write-invalidate• Multiple readers but only one writer• When one of cache wants to write to the line, it

invalidates that line in the other cache and makes the line exclusive to the writing cache

– Write-update(Write-broadcast)• multiple writer as well as multiple reader• When a processor wishes to update a shared line, the

word to be updated is distributed to all others and caches containing that line can update it

• Performance depends on the number of local caches and pattern of memory reads and writes

• To adaptive protocols that employ both

Cache coherence and Cache coherence and the the mesimesi protocolprotocol

Page 34: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-34

MESI ProtocolMESI Protocol• Modified : The line in the cache has been

modified and is available only in this cache

• Exclusive : The line in the cache is the same as that in main memory and is not present in any other cache

• Shared : The line in the cache is the same as that in main memory and may be present in another cache

• Invalid : The line in the cache does not contain valid data

Cache coherence and Cache coherence and the the mesimesi protocolprotocol

Page 35: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-35

MESI State Transition DiagramMESI State Transition Diagram(a) Line in cache at initiating processor

Cache coherence and Cache coherence and the the mesimesi protocolprotocol

Page 36: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-36

MESI State Transition DiagramMESI State Transition Diagram(b) Line in snooping cache

Cache coherence and Cache coherence and the the mesimesi protocolprotocol

Page 37: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-37

MESI State Transition DiagramMESI State Transition Diagram Cache coherence and Cache coherence and the the mesimesi protocolprotocol

Page 38: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-38

Read MissRead Miss• When a read miss occurs in the local

cache, the processor initiates a memory read to read the line of main memory containing the missing address

• The processor inserts a signal on the bus that alerts all other processor/cache units to snoop the transaction

Cache coherence and Cache coherence and the the mesimesi protocolprotocol

Page 39: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-39

Read HitRead Hit• When a read hit occurs on a line currently

in the local cache, the processor simply reads the required item

• There is no state change• The state remains modified, shared, or

exclusive

Cache coherence and Cache coherence and the the mesimesi protocolprotocol

Page 40: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-40

Write MissWrite Miss• When a write miss occurs in the local

cache, the processor initiates a memory read to read the line of main memory containing the missing address

• RWITM(read-with-intent-to-modify)• When the line is loaded, it is immediately

marked modified

Cache coherence and Cache coherence and the the mesimesi protocolprotocol

Page 41: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-41

Write HitWrite Hit• When a write hit occurs on a line currently

in the local cache, the effect depends on the current state of that line in the local cache:

• Shared• Exclusive• Modified

Cache coherence and Cache coherence and the the mesimesi protocolprotocol

Page 42: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-42

L1L1--L2 Cache ConsistencyL2 Cache Consistency• Needs to maintain data integrity across

both levels of cache and across all caches in the SMP configuration

• To extend the MESI protocol to the L1 caches– Each line in the L1 cache includes bits to indicate

the state

• Why?– To adopt the write-through policy in the L1 cache– The write through is to the L2 cache and not to the

memory

Cache coherence and Cache coherence and the the mesimesi protocolprotocol

Page 43: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-43

ClusteringClustering• Clustering is an alternative to symmetric

multiprocessing(SMP)• Cluster

– A group of interconnected, whole computers working together as a unified computing resource that can create the illusion of being one machine

• Whole Computer– A system that can run on its own, apart from the cluster

• Benefits– Absolute scalability– Incremental scalability– High availability– Superior price/performance

ClustersClusters

Page 44: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-44

Cluster ConfigurationsCluster Configurations(a) Standby Server with No Shared Disk

ClustersClusters

Page 45: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-45

Cluster ConfigurationsCluster Configurations(b) Shared Disk

ClustersClusters

Page 46: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-46

Clustering MethodsClustering Methods ClustersClusters

Requires lock manager software. Usually used with disk mirroring or RAID technology

Low network and server overhead. Reduced risk of downtime caused by disk failure

Multiple servers simultaneously share access to disks

Servers share disks

Usually requires disk mirroring or RAID technology to compensate for risk of disk failure.

Reduced network and server overhead due to elimination of copying operations

Servers are cabled to the same disks, but each server owns its disks. If one server fails, its disks are taken over by the other server

Servers connected to disks

High network and server overhead due to copying operations

High availability

Separate servers have their own disks. Data is continuously copied from primary to secondary server

Separate servers

Increased complexityReduced cost because secondary servers can be used for processing

The secondary server is also used for processing tasks

Active secondary

High cost because the secondary server is unavailable for other processing tasks.

Easy to implementA secondary server takes over in case of primary server failure

Passive standby

LimitationsBenefitsDescriptionClustering Method

Page 47: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-47

Operating System Design IssuesOperating System Design Issues• Failure Management

– Failover– Failback

• Load Balancing• Clusters versus SMP

ClustersClusters

Page 48: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-48

Failure ManagementFailure Management• A highly available cluster

– A high probability that all resources will be in service

– If a failure does occur, the queries in progress are lost

– Any lost query will be serviced by a different computer in the cluster

• A fault-tolerant cluster– All resources are always available

– Achieved by the use of redundant shared disks and mechanisms for backing out uncommitted transactions

– Failover : Switching resources over from a failed to alternative system

– Failback : restoring resources to the original system

ClustersClusters

Page 49: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-49

Load BalancingLoad Balancing• An effective capability for balancing the

load among available computer includes the requirement that the cluster be incrementally scalable

• Middleware mechanisms need to recognize that services can appear on different members of the cluster and may migrate form one member to another

ClustersClusters

Page 50: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-50

Clusters Clusters vsvs SMPSMP• The advantages of the SMP approach

– SMP is easier to manage and configure than a cluster

– SMP usually takes up less physical space and draws less power than cluster

• The advantages of the cluster approach– To result in clusters dominating the high-

performance server market– Superior in terms of incremental and absolute

scalability– Superior in terms of availability

• All components of the system can readily be made highly redundant

ClustersClusters

Page 51: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-51

NonuniformNonuniform Memory AccessMemory Access• Uniform memory access (UMA)

– All processors have access to all parts of main memory using loads and stores

– The memory access time of a processor to all regions of memory is the same

• Nonuniform memory access(NUMA)– All processors have access to all parts of main

memory using loads and stores– The memory access time of a processor differs

depending on which region of main memory is accessed

• Cache-coherent NUMA(CC-NUMA)– A NUMA system in which cache coherence is

maintained among the caches of the various processor

NonuniformNonuniform memory memory accessaccess

Page 52: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-52

MotivationMotivation• A practical limit to the number of

processors that can be used– As the number of processors increase, bus traffic

also increases– The bus is used to exchange cache-coherence

signals, adding to the burden• An effective cache scheme reduces the

bus traffic between any one processor and main memory

• The objective with NUMA is to maintain a transparent system-wide memory while permitting multiple multiprocessor nodes, each with its own bus or other internal interconnect system

NonuniformNonuniform memory memory accessaccess

Page 53: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-53

CCCC--NUMA OrganizationNUMA Organization NonuniformNonuniform memory memory accessaccess

Page 54: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-54

OrganizationOrganization• Example

– Suppose the P2-3 requests a memory location 798, the memory of node 1

1. P2-3 issues read request on the snoopy bus of node 22. The directory on node 2 sees the request and recognizes

that the location is in lode13. Node 2’s directory sends a request to node 14. Node 1’s directory requests the contents of 7985. Node 1’s main memory responds by putting the requested

data on bus6. Node 1’s directory picks up the data from the bus7. The value is transferred back to node 2’s directory8. Node 2’s directory places the data back no node 2’bus9. The value is picked up and placed in P2-3’s cache and

delivered to P2-3

NonuniformNonuniform memory memory accessaccess

Page 55: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-55

NUMA Pros and ConsNUMA Pros and Cons• The advantage of a CC-NUMA system

– Can deliver effective performance at higher levels of parallelism than SMP, without requiring major softwa4re changes

• The disadvantages of a CC-NUMA system– Not transparently look like an SMP

• Software changes will be required to move an operating system and applications from an SMP to a CC-NUMA system

– Availability

NonuniformNonuniform memory memory accessaccess

Page 56: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-56

Vector ComputationVector Computation• The key to the design of a supercomputer

or array processor is to recognize that the main task is to perform arithmetic operations on arrays or vectors of floating-point numbers

• In general purpose computer, this will require iteration through each element of the array

• Vector processing• Parallel processing

Vector Vector computationcomputation

Page 57: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-57

Example of Vector AdditionExample of Vector Addition Vector Vector computationcomputation

Page 58: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-58

Matrix Multiplication (C = A x B)Matrix Multiplication (C = A x B) Vector Vector computationcomputation

Page 59: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-59

OrganizationsOrganizations• Pipelined ALU• Parallel ALUs• Parallel processors

Vector Vector computationcomputation

Page 60: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-60

Approaches to Vector ComputationApproaches to Vector Computation(a) Pipelined ALU

Vector Vector computationcomputation

Page 61: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-61

Approaches to Vector ComputationApproaches to Vector Computation(b) Parallel ALUs

Vector Vector computationcomputation

Page 62: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-62

Pipelined ProcessingPipelined Processing Vector Vector computationcomputation

Page 63: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-63

Ex. Computing C=(s x A) + BEx. Computing C=(s x A) + B1. Vector load A->Vector

Register(VR1)2. Vector load B->VR23. Vector multiply s x VR1->VR34. Vector add VR3 + VR2 -> VR45. Vector store VR4->C

Vector Vector computationcomputation

Page 64: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-64

Taxonomy of Computer OrganizationTaxonomy of Computer Organization Vector Vector computationcomputation

Page 65: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-65

Taxonomy of Computer OrganizationTaxonomy of Computer Organization• Parallel processors

– The multiple processors can function cooperatively on a given task

• Vector processor– often equated with pipelined ALU organization– Also designed by a parallel ALU organization and a parallel

processor

• Array processing– Usually refers to a Parallel ALU– But any of the three organizations is optimized for the processing

of array– Usually refers to an auxiliary processor attached to a general-

purpose processor and used to perform vector computation

• The pipelined ALU organization dominates the marketplace

– less complex

Vector Vector computationcomputation

Page 66: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-66

IBM 3090 Vector FacilityIBM 3090 Vector Facility• IBM facility makes use of a number of

vector registers• Each register is actually a bank of scalar

registers• To compute vector sum C=A+B, the vectors

A and B are loaded into two vector registers• The data from these registers are passed

through the ALU as fast as possible• The computation overlap, and the loading of

the input data into the resisters in a block, results in a significant speeding up over an ordinary ALU operation

Vector computationVector computation

Page 67: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-67

OrganizationOrganization• The fixed and predetermined structure of

vector data permits housekeeping instructions inside the loop to be replaced by faster internal machine operations

• Data-access and arithmetic operations on several successive vector elements can proceed concurrently by overlapping such operations in a pipelined design or by performing multiple-element operations in parallel

• The use of vector registers for intermediate results avoids additional storage reference

Vector computationVector computation

Page 68: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-68

IBM 3090 with Vector FacilityIBM 3090 with Vector Facility Vector computationVector computation

Page 69: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-69

RegistersRegisters• The IBM organization is referred to as

register-to-register, because the vector operands, both input and output, can be stored in vector registers

• The main disadvantage of the use of vectro registers is that the programmer or compiler must take them into account

Vector computationVector computation

Page 70: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-70

Alternative ProgramsAlternative Programs Vector computationVector computation

Page 71: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-71

Alternative ProgramsAlternative Programs(a) Storage to Storage

Vector computationVector computation

Page 72: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-72

Alternative ProgramsAlternative Programs Vector computationVector computation

(b) Register to Register

Page 73: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-73

Alternative ProgramsAlternative Programs(c) Storage to Register

Vector computationVector computation

Page 74: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-74

Alternative ProgramsAlternative Programs(d) Compound Instructions

Vector computationVector computation

Page 75: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-75

RegistersRegisters• The vector registers can also be coupled

to form 8 64-bit vector registers• The architecture specifies that each

register contains from 8 to 512 scalar elements

• Three additional registers are needed by the vector facility

Vector computationVector computation

Page 76: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-76

Registers of the IBM 3090Registers of the IBM 3090 Vector computationVector computation

Page 77: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-77

Compound InstructionCompound Instruction• Instruction execution can be overlapped

using chining to improve performance • The designers of the IBM vector facility

chose not to include this capability for several reasons

• Compound instruction do not require the use of additional registers for temporary storage of intermediate results and they require one less register

Vector computationVector computation

Page 78: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-78

The Instruction SetThe Instruction Set• There are memory-to-register load and

register-to-memory store instructions • Many of instruction use a three operand

format

Vector computationVector computation

Page 79: Chapter 16 Parallel Processing - Yonsei Universitysoc.yonsei.ac.kr/class/material/computersystems/2003/... · 2017-03-06 · 16-3 Yonsei University Types of Parallel Processor Multiple

YonseiYonsei UniversityUniversity16-79

IBM 3090 Vector FacilityIBM 3090 Vector Facility Vector computationVector computation