csci 232© 2005 jw ryder1 parallel processing large class of techniques used to provide simultaneous...
TRANSCRIPT
![Page 1: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/1.jpg)
CSCI 232 © 2005 JW Ryder 1
Parallel Processing
• Large class of techniques used to provide simultaneous data processing tasks
• Purpose: Increase computational speed of the computer
• A parallel processing system is able to process multiple tasks simultaneously
![Page 2: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/2.jpg)
CSCI 232 © 2005 JW Ryder 2
Parallel Processing
• Instruction in ALU, next instr. read from memory
• 2 or more ALUs, 2 or more processors
• Speed up, throughput - amount of processing that can be done in a given amount of time
• Amount of hardware increases, cost increases, complexity increases
![Page 3: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/3.jpg)
CSCI 232 © 2005 JW Ryder 3
Parallel Processing
• Viewed at various levels of complexity
• Lowest - distinguish between serial and parallel load registers
• Higher level - Multiple functional units (FU)– Arithmetic
• Adder-subtractor, Integer multiplier
– Logic• Logic unit, Incrementer, Shifter
– Floating point• add-subtract, multiply, divide
![Page 4: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/4.jpg)
CSCI 232 © 2005 JW Ryder 4
Parallel Processing Classification
• Internal organization of processors
• Interconnection structure between processors
• Flow of information through the system
• Organization of computer system by number of instructions and data items that are manipulated simultaneously
![Page 5: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/5.jpg)
CSCI 232 © 2005 JW Ryder 5
• Normal operation of computer is fetch from memory then execute in processor
• Sequence of instructions read from memory is instruction stream
• Operations performed on the data in the processor is data stream
• Parallel processing may occur in the instruction stream, data stream, or both
Classifications
![Page 6: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/6.jpg)
CSCI 232 © 2005 JW Ryder 6
• SISD - Single Instruction Single Data
• SIMD - Single Instruction Multiple Data
• MISD - Multiple Instruction Single Data
• MIMD - Multiple Instruction Multiple Data
4 Major Groups
![Page 7: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/7.jpg)
CSCI 232 © 2005 JW Ryder 7
• Single computer containing a– Control Unit– Processing Unit– Memory Unit
• Instructions executed sequentially
• System may or may not have internal parallel processing capabilities– Multiple FUs or pipelining
SISD
![Page 8: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/8.jpg)
CSCI 232 © 2005 JW Ryder 8
• Organization including many processing units under supervision of a common control unit
• All processors receive the same instruction from the control unit
• Operate on different items of data
• Shared memory unit must contain multiple modules so that it can communicate with all processors simultaneously
• Array processor
SIMD
![Page 9: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/9.jpg)
CSCI 232 © 2005 JW Ryder 9
• Only of theoretical interest
MISD
![Page 10: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/10.jpg)
CSCI 232 © 2005 JW Ryder 10
• Computer system capable of processing several programs at the same time
• Most multiprocessor and multi-computer systems are in this category
• Flynn’s classification depends on distinction between the performance of the control unit and the data processing unit
• Emphasizes behavioral characteristics of the computer system rather than its operational structures and interconnections
MIMD
![Page 11: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/11.jpg)
CSCI 232 © 2005 JW Ryder 11
• Pipelining does not fit into Flynn’s parallel processing classification scheme
• Only 2 categories used are SIMD, MIMD
Pipelining
![Page 12: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/12.jpg)
CSCI 232 © 2005 JW Ryder 12
• Multiprocessor system is an interconnection of 2 or more CPUs with memory and input-output equipment
• ‘Processor’ in multiprocessor can mean either a central processing unit (CPU) or an input-output processor (IOP)
• System with single CPU and multiple IOPs is not considered (usually) a multiprocessor
Multiprocessors
![Page 13: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/13.jpg)
CSCI 232 © 2005 JW Ryder 13
• Both support concurrent operations• Computers are interconnected with
each other by means of communications lines to form a computer network– Consists of several autonomous
computers that may or may not communicate with each other
• Multiprocessor system controlled by one operating system that provides interaction between processors and all components in the system cooperate to solve the problem at hand
Multiprocessors / Multicomputers
![Page 14: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/14.jpg)
CSCI 232 © 2005 JW Ryder 14
Multiprocessors• Microprocessors major
motivation - cheap, small
• VLSI helps make it possible too
• Improves reliability– mutual funds, some loss of
efficiency
• Benefits– Improved system performance– Computations can proceed in
parallel in 2 ways• Multiple independent jobs run in
parallel
• Single job can be partitioned into multiple parallel tasks
![Page 15: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/15.jpg)
CSCI 232 © 2005 JW Ryder 15
Multiprocessors• Overall functions can be partitioned
into several tasks• System tasks can be allocated to
specialized processors– Designed for optimal performance– Example: One processor
performs standard tasks for an industrial process and others sense and control various parameters such as temperature and flow rate
– Example: One processor takes care of high speed floating point operations while other processes standard operations and tasks
![Page 16: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/16.jpg)
CSCI 232 © 2005 JW Ryder 16
Performance Improvement
• Decompose problem into multiple discrete tasks
• User can explicitly direct computer to split tasks
• Provide a compiler that automatically detects when parts of program can be split– Parallelizing compiler
• Multiprocessors classified by way memory is organized
![Page 17: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/17.jpg)
CSCI 232 © 2005 JW Ryder 17
Tightly Coupled
• A multiprocessor system with common shared memory– Shared memory or Tightly
coupled multiprocessor
• Does not preclude each processor from having own local memory
• Most commercial tightly coupled systems provide cache memory for each CPU
• In addition, global common memory provided that all CPUs can access
![Page 18: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/18.jpg)
CSCI 232 © 2005 JW Ryder 18
Loosely Coupled
• Distributed memory = Loosely coupled
• Each processing element (PE) is a loosely coupled system has its own local memory
• Processors tied together by switching scheme designed to route information between processors through a message passing scheme
• Programs and data relayed in packets consisting of address, data, error detection codes
![Page 19: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/19.jpg)
CSCI 232 © 2005 JW Ryder 19
Loosely Coupled
• Packets either destined for a specific processor or grabbed by first processor that finds it depending on communication system design
• Most efficient when interaction between tasks is minimal
• Tightly coupled tasks can tolerate higher degree of interaction between tasks
![Page 20: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/20.jpg)
CSCI 232 © 2005 JW Ryder 20
Interconnection Structures
• Components forming a multiprocessor are
– CPUs
– IOPs
– A memory unit (may be partitioned into separate modules)
• Interconnections can have different physical configurations
– Depending on number transfer paths available between processors and memory in shared memory system
– Depending on number of transfer paths among PEs in a loosely coupled system
![Page 21: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/21.jpg)
CSCI 232 © 2005 JW Ryder 21
Physical Forms
• Time-Shared Common Bus• Multiported Memory• Crossbar Switch• Multistage Switching Network• Hypercube System
![Page 22: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/22.jpg)
CSCI 232 © 2005 JW Ryder 22
Time-Shared Common Bus
• N processors connected through a common bus to a memory unit
• Only 1 processor can have access (communicate with) the memory unit or another processor at a time
• Transfer operations conducted by processor that is in control of the bus
• Other processors must wait, checking availability
• Command issued to inform destination that communication is requested– What operation, from where
• Destination responds and transfer begins
![Page 23: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/23.jpg)
CSCI 232 © 2005 JW Ryder 23
Common Bus• Bus Contention• Resolved by including a bus
controller– Priorities
• Restricted to a single transfer at a time– When one processor transferring
to/from memory other processors are either busy with internal processing or idle waiting
• System overall transfer rate is limited by speed of bus
• Multiple buses possible but you pay penalty ($$)
![Page 24: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/24.jpg)
CSCI 232 © 2005 JW Ryder 24
Dual Buses
Not more economical• Local buses, local memory• System bus controller is big
coordinator• Local memory can be cache memory
– Coherency problems possible
![Page 25: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/25.jpg)
CSCI 232 © 2005 JW Ryder 25
Multiported Memory
• Separate buses between each memory module (MM) and processor
• Each processor bus connected to each MM
• Processor bus consists of – Address
– Data
– Control lines
• MM has 4 ports, 1 for each bus
![Page 26: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/26.jpg)
CSCI 232 © 2005 JW Ryder 26
Multiported Memory
• MM must have internal logic to determine which bus has control
• Fixed priorities assigned to each memory port (1,2,3,4)
• Advantage: High transfer rate• Disadvantage:
– Expensive memory control logic
– Many cables and connectors
• Usually only appropriate for small number of processors
![Page 27: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/27.jpg)
CSCI 232 © 2005 JW Ryder 27
Crossbar Switch
• Crosspoints placed at intersections of processor buses and memory buses
• See figure 13-4 on page 495• Each switch determines path (control
logic)– Examines address on bus
– Resolves conflicts on predetermined, hardcoded definition
• See figure 13-5 on page 495– Data both directions
– Multiplexers select data (remember select lines??)
![Page 28: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/28.jpg)
CSCI 232 © 2005 JW Ryder 28
Crossbar Switch
• Supports simultaneous transfers from all MM– Separate path associated with each MM
• Hardware can be large and complex • Number switches needed is
Processors x MM
![Page 29: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/29.jpg)
CSCI 232 © 2005 JW Ryder 29
Multistage Switching Network
• Basic Component is a 2-input 2-output interchange switch
• See figure 13-6 on page 496 - explain
• Switch can arbitrate between conflicts
• Can use to build a switching network• See figure 13-7 on page 497 -
explain
![Page 30: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/30.jpg)
CSCI 232 © 2005 JW Ryder 30
Patterns & Omega
• Not all patterns are always available to all processors
• P1 accessing 0xx then P2 can only access 1xx
• Used in both tightly and loosely coupled systems
• Omega Switching Network - see figure 13-8 on page 498– Exactly 1 path from each source to each
MM
– Some patterns cannot be connected simultaneously (000 and 001)
• 1 switch 1 signal at a time
![Page 31: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/31.jpg)
CSCI 232 © 2005 JW Ryder 31
Omega Network
• Tightly Coupled Systems– Sources - Processorrs
– Destinations - MM
• Loosely Coupled Systems– Source - Processor
– Destination - Processor
![Page 32: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/32.jpg)
CSCI 232 © 2005 JW Ryder 32
Hypercube
• Hypercube or binary n-cube• Loosely coupled system composed
of N = 2n processors interconnected in an n-dimensional binary cube
• Each node contains CPU, local memory, I/O interfaces
• Direct communications paths to n other nodes (1 hop)
• There are 2n distinct n-bit binary addresses to be assigned to the processors
• Each neighboring processor address differs by exactly 1 bit position
• See figure 13-9 on page 499
![Page 33: CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational](https://reader030.vdocument.in/reader030/viewer/2022032607/56649ece5503460f94bda62c/html5/thumbnails/33.jpg)
CSCI 232 © 2005 JW Ryder 33
• Will take from 1 to n hops (max source to destination)
• Routing procedure– XOR Source and Destination addresses
• Result will show on which axes addresses differ
– Send along any indicated axis
– Repeat until arrival at destination
Routing Messages