cache and pipeline

Upload: raghu-raman

Post on 08-Mar-2016

218 views

Category:

Documents


0 download

DESCRIPTION

Cache addressing.

TRANSCRIPT

  • Microprocessors & Microcontrollers 1 RN Biswas

    Cache and Pipeline

    Prof. R. N. Biswas

  • Microprocessors & Microcontrollers 2 RN Biswas

    Improvement of Speed by Cache

    A Cache is a high-speed Memory interposed

    between the processor and the slower Main

    Memory, enabling faster access to data/code .

    Primary or L1 cache is at the chip level

    Secondary or L2 cache is at the board level

    Cache reduces access time by exploiting

    Locality of Reference

    Holds more often used data/code

    Frees the external bus for other operations

  • Microprocessors & Microcontrollers 3 RN Biswas

    Block M4095

    tag

    tag

    tag

    5 7 4

    Structure of

    Main Memory Address

    Tag Block Word

    Main Memory

    Cache

    Direct-mapped Cache

    Block M0

    Block C0

    Block C127

    Block C1

    Block M127

    Block M1

    Block M128

    Block M255

    Block M3968

    tag = 0

    tag = 1

    tag =

    31

  • Microprocessors & Microcontrollers 4 RN Biswas

    tag

    tag

    tag

    12 4

    Tag Word

    Main Memory Cache

    Fully Associative Cache

    Structure of Main Memory Address

    Block C0

    Block C1

    Block C127

    Block M0

    Block M1

    Block M2

    Block M4095

    Block M4094

    tag = 0

    tag = 1

    tag = 2

    tag = 4 094

    tag = 4 095

  • Microprocessors & Microcontrollers 5 RN Biswas

    Block M127

    tag

    tag

    tag

    7 5 4

    Tag Set Word

    Main Memory tag

    tag

    tag

    Cache

    Set 1

    Set

    31

    Set-associative Cache (4 blocks/set)

    Structure of Main Memory Address

    Block C0

    Block C1

    Block C126

    Block C127

    Block C2

    Block C3

    Block M0

    Block M1

    Block M128

    Block M255

    Block M3968

    Block M4095

    Block C125

    Block C124

    tag

    tag

    Set 0

    Set

    0

    Set 31

    tag = 0

    tag = 1

    tag = 127

    tag = 0

    tag = 127

    tag = 127

    tag = 0

  • Microprocessors & Microcontrollers 6 RN Biswas

    Cache Access and Update Sequence

    CPU floats memory address.

    Cache Controller compares the tag field of the address with the tags in the selected set:

    Cache miss main memory is accessed and the fetched contents stored in the cache.

    Cache hit cache is accessed.

    Cache write requires memory update:

    Write-back - memory updated only when the location is replaced by a new one from memory.

    Write-through - memory updated for every write.

  • Microprocessors & Microcontrollers 7 RN Biswas

    Speed Improvement by Pipelining

    Processor speed can be enhanced by having separate hardware units for the different functional blocks, with buffers between the successive units.

    The number of unit operations into which the instruction cycle of a processor can be divided for this purpose defines the number of stages in the pipeline.

    A processor having an n-stage pipeline would have up to n instructions simultaneously being processed by the different functional units of the processor.

    Effective processor speed increases ideally by a factor equal to the number of pipelining stages.

  • Microprocessors & Microcontrollers 8 RN Biswas

    Typical Pipeline Organisation

    A common choice is to have four such units :

    Fetch: Fetch the instruction code from the memory;

    Decode: Decode the Op Code and fetch operand(s);

    Operate: Perform operation required by the op code;

    Write: Store the result in the destination location.

    A four-stage pipeline would require three buffers, each separating two functional units of the processor.

    Write cycle of I1, Operate cycle of I2, Decode cycle of I3 and Fetch cycle of I4 take place in the same time slot, and have to be completed within the same time as prescribed by the pipeline design

  • Microprocessors & Microcontrollers 9 RN Biswas

    A Four-stage Pipeline

  • Microprocessors & Microcontrollers 10 RN Biswas

    Data Dependency in Pipelining

    If the input data for an instruction depends on the

    outcome of the previous instruction, the Write cycle of

    the previous instruction has to be over before the

    Operate cycle of the next instruction can start. The

    pipeline effectively idles through one instruction,

    creating a bubble in the pipeline which persists for

    several instructions.

    F4 D4

    O3

    F2 D2 idle W2 O2

    W4

    F3 idle D3 W3

    O4

    Bubble ends here

    F1 D1 O1 W1

  • Microprocessors & Microcontrollers 11 RN Biswas

    Branch Dependency in Pipelining

    A Branch instruction can cause a pipeline stall if the

    branch is taken, as the next instruction has to be

    aborted in that case. If I1 is an unconditional branch

    instruction, the next Fetch cycle (F2) can start after D1.

    But if I1 is a conditional branch instruction, F2 has to

    wait until O1 for the decision as to whether the branch

    will be taken or not.

    F1 D1 O1 W1

    F2 D2 O2 W2 executed if branch is not taken

    F2 D2 O2 W2

    F2 D2 O2 W2

    executed for unconditional branch

    for conditional branch, if taken

    branch instruction

  • Microprocessors & Microcontrollers 12 RN Biswas

    Avoidance of Pipeline Bubbles

    Data Dependency - An instruction unaffected by

    the write operation has to be placed in the Load

    Delay Slot.

    Branch Dependency - The branch instruction

    has to perform a delayed branch, with

    instructions preceding the branch placed in the

    Branch Delay Slots.

    Requires optimising compilers to be written

    along with the design of the microprocessors.