final mpsoc

22
Introduction to Parallel Processing 1 arallel and pipeline Processi

Upload: rachana-srinivas

Post on 25-Sep-2015

11 views

Category:

Documents


0 download

DESCRIPTION

mpsoc

TRANSCRIPT

  • Introduction to Parallel Processing*Parallel and pipeline Processing

    Introduction to Parallel Processing

  • *What is Pipelining?Like an Automobile Assembly Line for InstructionsEach step does a little job of processing the instructionIdeally each step operates in parallelSimple ModelInstruction FetchInstruction DecodeInstruction Execute

  • What is Parallel Processing?

    Parallel processing is another methodused to improve performance in a computer system, when a system processes two different instructions simultaneously, it is performing parallel processing

  • *Ideal Pipeline PerformanceIf stages are perfectly balanced:

    The more stages the better?Each stage typically corresponds to a clock cycleStages will not be perfectly balancedSynchronous: Slowest stage will dominate timeMany hazards await usTwo ways to view pipeliningReduced CPI (when going from non-piped to pipelined)Reduced Cycle Time (when increasing pipeline depth)

  • *Important Pipeline CharacteristicsLatencyTime required for an instruction to propagate through the pipelineBased on the Number of Stages * Cycle TimeDominant if there are lots of exceptions / hazards, i.e. we have to constantly be re-filling the pipelineThroughputThe rate at which instructions can start and finishDominant if there are few exceptions and hazards, i.e. the pipeline stays mostly fullNote we need an increased memory bandwidth over the non-pipelined processor

  • Basic IdeasParallel processingPipelined processinga1a2a3a4b1b2b3b4c1c2c3c4d1d2d3d4a1b1c1d1a2b2c2d2a3b3c3d3a4b4c4d4P1

    P2

    P3

    P4P1

    P2

    P3

    P4timeColors: different types of operations performeda, b, c, d: different data streams processedLess inter-processor communicationComplicated processor hardwaretimeMore inter-processor communicationSimpler processor hardware

  • Data DependenceParallel processing requires NO data dependence between processorsPipelined processing will involve inter-processor communicationP1

    P2

    P3

    P4P1

    P2

    P3

    P4timetime

  • *Pipelining ExampleAssume the 5 stages take time 10ns, 8ns, 10ns, 10ns, and 7ns respectivelyUnpipelinedAve instr execution time = 10+8+10+10+7= 45 nsPipelinedEach stage introduces some overhead, say 1ns per stageWe can only go as fast as the slowest stage!Each stage then takes 11ns; in steady state we execute each instruction in 11nsSpeedup = UnpipelinedTime / Pipelined Time= 45ns / 11ns = 4.1 times or about a 4X speedup

  • Usage of Pipelined ProcessingBy inserting latches or registers between combinational logic circuits, the critical path can be shortened. Consequence: reduce clock cycle time, increase clock frequency.Suitable for DSP applications that have (infinity) long data stream.Method to incorporate pipelining: Cut-set retimingCut set: A cut set is a set of edges of a graph. If these edges are removed from the original graph, the remaining graph will become two separate graphs.Retiming:The timing of an algorithm is re-adjusted while keeping the partial ordering of execution unchanged so that the results correct

  • *Parallel ComputingParallel Computing is a central and important problem in many computationally intensive applications, such as image processing, database processing, robotics, and so forth.Given a problem, the parallel computing is the process of splitting the problem into several subproblems, solving these subproblems simultaneously, and combing the solutions of subproblems to get the solution to the original problem.

  • *Parallel Computer StructuresPipelined Computers : a pipeline computer performs overlapped computations to exploit temporal parallelism.Array Processors : an array processor uses multiple synchronized arithmetic logic units to achieve spatial parallelism.Multiprocessor Systems : a multiprocessor system achieves asynchronous parallelism through a set of interactive processors.

  • *Nonpipelined Processor

  • Fall 2008Introduction to Parallel Processing*Pipeline Processor

    Introduction to Parallel Processing

  • *Pipeline ComputersNormally, four major steps to execute an instruction: Instruction Fetch (IF)Instruction Decoding (ID)Operand Fetch (OF)Execution (EX)

  • Performance Improvements* Computer Engineers improve performance through the reduction of C/II/P is the domain of CS writing softwareS/C is the domain of EE/VLSI IC fabrication

    CPI or C/I is improved through getting more instructions done in each cycle

    This means doing work in parallel distributed across the functional units of the IC

  • *Multiprocessor SystemsA multiprocessor system is a single computer that includes multiple processors (computer modules).Processors may communicate and cooperate at different levels in solving a given problem.The communication may occur by sending messages from one processor to the other or by sharing a common memory.A multiprocessor system is controlled by one operating system which provides interaction between processors and their programs at the process, data set, and data element levels.

  • *Array ComputersAn array processor is a synchronous parallel computer with multiple arithmetic logic units, called processing elements (PE), that can operate in parallel.The PEs are synchronized to perform the same function at the same time. Only a few array computers are designed primarily for numerical computation, while the others are for research purposes.

  • *Functional structure of multiprocessor system

  • *MulticomputersThere is a group of processors, in which each of the processors has sufficient amount of local memory.The communication between the processors is through messages.There is neither a common memory nor a common clock.This is also called distributed processing.

  • Power consumption trends for desktop processors

  • Reduced Cost: Multiple processors share the same resources. Separate power supply or mother board for each chip is not required. This reduces the cost. Increased Reliability: The reliability of system is also increased. The failure of one processor does not affect the other processors though it will slow down the machine. Several mechanisms are required to achieve increased reliability. If a processor fails, a job running on that processor also fails. The system must be able to reschedule the failed job or to alert the user that the job was not successfully completed. Increased Throughput: An increase in the number of processes completes the work in less time. It is important to note that doubling the number of processors does not halve the time to complete a job. It is due to the overhead in communication between processors and contention for shared resources etc.

  • Conclusion MPSoCs are an important chapter in the history of multiprocessing System Designers like uniprocessors with sufficient computation power. DSPs (Audio Processing)Computational power (Moores Law) vs low power, low- cost, real time requirements.*

    *********