final review

53
Final Review Bernard Chen

Upload: otto

Post on 15-Jan-2016

48 views

Category:

Documents


5 download

DESCRIPTION

Final Review. Bernard Chen. Example 1. Binary selector input 1) MUX A selector ( SELA ) : to place the content of R2 into BUS A 2) MUX B selector ( SELB ) : to place the content of R3 into BUS B 3) ALU operation selector ( OPR ) : to provide the arithmetic addition R2 + R3 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Final Review

Final Review

Bernard Chen

Page 2: Final Review

Binary selector input

1) MUX A selector (SELA) : to place the content of R2 into BUS A

2) MUX B selector (SELB) : to place the content of R3 into BUS B

3) ALU operation selector (OPR) : to provide the arithmetic addition R2 + R3

4) Decoder selector (SELD) : to transfer the content of the output bus into R1

321 RRR Example 1

Page 3: Final Review

Encoding of Register Selection Fields:

»SELA or SELB = 000 (External Input) : MUX selects the external data

»SELD = 000 (None) : no destination register is selected but the contents of the output bus are available in the external output

Page 4: Final Review

(Example 2)1. Micro-operationR1 R2 - R3

2. Control wordField: SELA SELB SELD OPRSymbol: R2 R3 R1

SUBControl word: 010 011 001 00101

Example

Page 5: Final Review

STACK OPERATIONSREVERSE POLISH NOTATION (postfix)

n • Evaluation procedure:

n 1. Scan the expression from left to right.2. When an operator is reached, perform the operation with the two operands found on the left side of the operator.3. Replace the two operands and the operator by the result obtained from the operation.

n (Example) infix 3 * 4 + 5 * 6 = 42 postfix 3 4 * 5 6 * +

n 12 5 6 * +12 30 +42

Page 6: Final Review

STACK OPERATIONSREVERSE POLISH NOTATION (postfix)

• Reverse Polish notation evaluation with a stack. Stack is the most efficient way for evaluating arithmetic expressions.

stack evaluation:Get valueIf value is data: push dataElse if value is operation: pop, pop evaluate and push.

Page 7: Final Review

STACK OPERATIONSREVERSE POLISH NOTATION (postfix)

(Example) using stacks to do this. 3 * 4 + 5 * 6 = 42

=> 3 4 * 5 6 * +

Page 8: Final Review

8.4 Instruction Formats• Zero address instruction: Stack is used. Arithmetic

operation pops two operands from the stack and pushes the result.

• One address instructions: AC and memory. Since the accumulator always provides one operand, only one memory address needs to be specified.

•Two address instructions: Two address registers or two memory locations are specified, one for the final result.

•Three address instructions: Three address registers or memory locations are specified, one for the final result.

It is also called general address organization.

Page 9: Final Review

EXAMPLE: Show how can the following operation be performed using:a- three address instructionb- two address instructionc- one address instructiond- zero address instructionX = (A + B) * (C + D)

Page 10: Final Review

a-Three-address instructions (general register organization)

ADD R1, A, B R1 M[A] + M[B] ADD R2, C, D R2 M[C] + M[D] MUL X, R1, R2 M[X] R1 * R2

Page 11: Final Review

b-Two-address instructions (general register organization)

MOV R1, A R1 M[A] ADD R1, B R1 R1 + M[B] MOV R2, C R2 M[C] ADD R2, D R2 R2 + M[D] MOV X, R2 M[X] R2 MUL X, R1 M[X] R1 * M[X]

Page 12: Final Review

c- One-address instructions LOAD A AC M[A] ADD B AC AC + M[B] STORE T M[T ] AC LOAD C AC M[C] ADD D AC AC + M[D] MUL T AC AC * M[T ] STORE X M[X] AC Store

Page 13: Final Review

d- Zero-address instructions (stack organization)

Push value Else If operator is encountered: Pop, pop,

operation, push Pop operand pop another operand then perform

an operation and push the result back into the stack.

PUSH A TOS A Push PUSH B TOS B ADD TOS (A+B) PUSH C TOS C PUSH D TOS D ADD TOS (C+D) MUL TOS (C+D)*(A+B) POP X M[X] TOS (*TOS stands for top of stack).

Pop, pop, operation, push

Page 14: Final Review

Pipelining: Laundry Example

Small laundry has one washer, one dryer and one operator, it takes 90 minutes to finish one load:

Washer takes 30 minutes Dryer takes 40 minutes “operator folding” takes

20 minutes

A B C D

Page 15: Final Review

Sequential Laundry

This operator scheduled his loads to be delivered to the laundry every 90 minutes which is the time required to finish one load. In other words he will not start a new task unless he is already done with the previous task

The process is sequential. Sequential laundry takes 6 hours for 4 loads

A

B

C

D

30 40 20 30 40 20 30 40 20 30 40 20

6 PM 7 8 9 10 11 Midnight

Task

Order

Time

90 min

Page 16: Final Review

Efficiently scheduled laundry: Pipelined LaundryOperator start work ASAP

Another operator asks for the delivery of loads to the laundry every 40 minutes!?. Pipelined laundry takes 3.5 hours for 4 loads

A

B

C

D

6 PM 7 8 9 10 11 Midnight

Task

Order

Time

30 40 40 40 40 2040 40 40

Page 17: Final Review

Pipelining Facts Multiple tasks

operating simultaneously

Pipelining doesn’t help latency of single task, it helps throughput of entire workload

Pipeline rate limited by slowest pipeline stage

Potential speedup = Number of pipe stages

Unbalanced lengths of pipe stages reduces speedup

Time to “fill” pipeline and time to “drain” it reduces speedup

A

B

C

D

6 PM 7 8 9

Task

Order

Time

30 40 40 40 40 20

The washer waits for the dryer for 10

minutes

Page 18: Final Review

9.2 Pipelining

Suppose we want to perform the combined multiply and add operations with a stream of numbers:

Ai * Bi + Ci for i =1,2,3,…,7

Page 19: Final Review
Page 20: Final Review
Page 21: Final Review

Pipeline Performance

n:instructions k: stages in

pipeline : clockcycle Tk: total time

))1(( nkTk

)1(1

nk

nk

T

TSpeedup

k

n is equivalent to number of loads in the laundry examplek is the stages (washing, drying and folding.Clock cycle is the slowest task time

n

k

Page 22: Final Review

Example: 6 tasks, divided into 4 segments 1 2 3 4 5 6 7 8 9

T1 T2 T3 T4 T5 T6

T1 T2 T3 T4 T5 T6

T1 T2 T3 T4 T5 T6

T1 T2 T3 T4 T5 T6

Page 23: Final Review

Some definitions

Pipeline: is an implementation technique where multiple instructions are overlapped in execution.

Pipeline stage: The computer pipeline is to divided instruction processing into stages. Each stage completes a part of an instruction and loads a new part in parallel. The stages are connected one to the next to form a pipe - instructions enter at one end, progress through the stages, and exit at the other end.

Page 24: Final Review

Throughput of the instruction pipeline is determined by how often an instruction exits the pipeline. Pipelining does not decrease the time for individual instruction execution. Instead, it increases instruction throughput.

Machine cycle . The time required to move an instruction one step further in the pipeline. The length of the machine cycle is determined by the time required for the slowest pipe stage.

Some definitions

Page 25: Final Review

Instruction pipeline (Contd.)

sequential processing is

faster for few instructions

Page 26: Final Review

Difficulties...

If a complicated memory access occurs in stage 1, stage 2 will be delayed and the rest of the pipe is stalled.

If there is a branch, if.. and jump, then some of the instructions that have already entered the pipeline should not be processed.

We need to deal with these difficulties to keep the pipeline moving

Page 27: Final Review

5-Stage Pipelining

Fetch Instruction

(FI)

FetchOperand

(FO)

Decode Instruction

(DI)

WriteOperand

(WO)

Execution Instruction

(EI)

S3 S4S1 S2 S5

1 2 3 4 98765S1

S2

S5

S3

S4

1 2 3 4 8765

1 2 3 4 765

1 2 3 4 65

1 2 3 4 5

Time

Page 28: Final Review

Five Stage Instruction Pipeline

Fetch instruction Decode

instruction Fetch operands Execute

instructions Write result

Page 29: Final Review

Two major difficulties

Data Dependency Branch Difficulties

Solutions: Prefetch target instruction Delayed Branch Branch target buffer (BTB) Branch Prediction

Page 30: Final Review

Data Dependency

Use Delay Load to solve:

Example:load R1 R1M[Addr1]

load R2 R2M[Addr2] ADD R3R1+R2

Store M[addr3]R3

Page 31: Final Review

Delay Load

Page 32: Final Review

Delay Load

Page 33: Final Review

Example

Five instructions need to be carried out:

Load from memory to R1Increment R2Add R3 to R4Subtract R5 from R6Branch to address X

Page 34: Final Review

Delay Branch

Page 35: Final Review

Rearrange the Instruction

Page 36: Final Review

Floating Point Arithmetic Pipeline Example for floating-point addition

and subtraction Inputs are two normalized floating-

point binary numbers X = A x 2^a Y = B x 2^b

A and B are two fractions that represent the mantissas

a and b are the exponents

Try to design segments are used to perform the “add” operation

Page 37: Final Review

Floating Point Arithmetic Pipeline      Compare the exponents      Align the mantissas      Add or subtract the

mantissas      Normalize the result

Page 38: Final Review

Floating Point Arithmetic Pipeline X = 0.9504 x 103 and Y = 0.8200 x 102 The two exponents are subtracted in the first

segment to obtain 3-2=1 The larger exponent 3 is chosen as the exponent

of the result Segment 2 shifts the mantissa of Y to the right to

obtain Y = 0.0820 x 103 The mantissas are now aligned Segment 3 produces the sum Z = 1.0324 x 103 Segment 4 normalizes the result by shifting the

mantissa once to the right and incrementing the exponent by one to obtain Z = 0.10324 x 104

Page 39: Final Review

Memory Hierarchy The main memory occupies a central position by being

able to communicate directly with the CPU and with auxiliary memory devices through an I/O processor

A special very-high-speed memory called cache is used to increase the speed of processing by making current programs and data available to the CPU at a rapid rate

Page 40: Final Review

RAM

Page 41: Final Review

ROM

Page 42: Final Review

Memory Address Map Memory Address Map is a pictorial

representation of assigned address space for each chip in the system

To demonstrate an example, assume that a computer system needs 512 bytes of RAM and 512 bytes of ROM

The RAM have 128 byte and need seven address lines, where the ROM have 512 bytes and need 9 address lines

Page 43: Final Review

Memory Address Map

Page 44: Final Review

Memory Address Map The hexadecimal address assigns a range of

hexadecimal equivalent address for each chip

Line 8 and 9 represent four distinct binary combination to specify which RAM we chose

When line 10 is 0, CPU selects a RAM. And when it’s 1, it selects the ROM

Page 45: Final Review
Page 46: Final Review

Cache memory The performance of cache memory

is frequently measured in terms of a quantity called hit ratio

When the CPU refers to memory and finds the word in cache, it is said to produce a hit

Otherwise, it is a miss Hit ratio = hit / (hit+miss)

Page 47: Final Review

Cache memory The basic characteristic of cache memory is its

fast access time, Therefore, very little or no time must be

wasted when searching the words in the cache The transformation of data from main memory

to cache memory is referred to as a mapping process, there are three types of mapping:

Associative mapping Direct mapping Set-associative mapping

Page 48: Final Review

Cache memory

To help understand the mapping procedure, we have the following example:

Page 49: Final Review

Associative mapping

Page 50: Final Review

Direct Mapping

Page 51: Final Review

Direct Mapping

Page 52: Final Review

Set-Associative Mapping

Page 53: Final Review

Page Replacement Algorithms Goal: Want lowest page-fault rate

Evaluate algorithm by running it on a particular string of memory references (reference string) and computing the number of page faults on that string

In all our examples, the reference string is

1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5