lecture 5: instruction set architecture

Post on 05-Jan-2016

29 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Lecture 5: Instruction Set Architecture. Computer Engineering 585 Fall 2001. Summary, #1. Designing to Last through Trends CapacitySpeed Logic2x in 3 years2x in 3/2 years DRAM4x in 3 years2x in 10 years Disk4x in 3 years2x in 10 years - PowerPoint PPT Presentation

TRANSCRIPT

Lecture 5: Instruction Set Architecture

Computer Engineering 585Fall 2001

Summary, #1• Designing to Last through Trends

Capacity Speed

Logic 2x in 3 years 2x in 3/2 years

DRAM 4x in 3 years 2x in 10 years

Disk 4x in 3 years 2x in 10 years

• 6yrs to graduate => 16X CPU speed, DRAM/Disk size

• Time to run the task– Execution time, response time, latency

• Tasks per day, hour, week, sec, ns, …– Throughput, bandwidth

• “X is n times faster than Y” means ExTime(Y) Performance(X)

--------- = --------------

ExTime(X) Performance(Y)

Summary, #2 Amdahl’s Law:

CPI Law:

Execution time is the REAL measure of computer performance!

Good products created when have: Good benchmarks, good ways to summarize

performance Die Cost goes roughly with die area4

Can PC industry support engineering/research investment?

Speedupoverall =ExTimeold

ExTimenew

=

1

(1 - Fractionenhanced) + Fractionenhanced

Speedupenhanced

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

Computer Architecture Is …the attributes of a [computing] system as

seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation.

Amdahl, Blaaw, and Brooks, 1964

SOFTWARESOFTWARE

Computer Architecture’s Changing Definition

1950s to 1960s: Computer Architecture Course: Computer Arithmetic

1970s to mid 1980s: Computer Architecture Course: Instruction Set Design, especially ISA appropriate for compilers

1990s-2000s: Computer Architecture Course:Design of CPU, memory system, I/O system, Multiprocessors

Instruction Set Architecture (ISA)

instruction set

software

hardware

Interface DesignA good interface:

• Lasts through many implementations (portability, compatibility)

• Is used in many different ways (generality)

• Provides convenient functionality to higher levels

• Permits an efficient implementation at lower levels

Interfaceimp 1

imp 2

imp 3

use

use

use

time

Evolution of Instruction Sets Single Accumulator (EDSAC 1950)

Accumulator + Index Registers(Manchester Mark I, IBM 700 series 1953)

Separation of Programming Model from Implementation

High-level Language Based Concept of a Family(B5000 1963) (IBM 360 1964)

General Purpose Register Machines

Complex Instruction Sets Load/Store Architecture

RISC

(Vax, Intel 432 1977-80) (CDC 6600, Cray 1 1963-76)

(Mips,Sparc,HP-PA,IBM RS6000, . . .1987)

A "Typical" RISC

32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take

pair) 3-address, reg-reg arithmetic instruction Single address mode for load/store:

base + displacement no indirection

Simple branch conditions Delayed branch

see: SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM PowerPC, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3

Evolution of Instruction Sets Major advances in computer architecture are

typically associated with landmark instruction set designs Ex: Stack vs GPR (System 360)

Design decisions must take into account: technology machine organization programming languages compiler technology operating systems

And they in turn influence these

Example: MIPS

Op

31 26 01516202125

Rs1 Rd immediate

Op

31 26 025

Op

31 26 01516202125

Rs1 Rs2

target

Rd Opx

Register-Register

561011

Register-Immediate

Op

31 26 01516202125

Rs1 Rs2/Opx immediate

Branch

Jump / Call

Architecture, Implementation Architecture deals with functions

provided to the programmer: addressing, addition, interrupt, and I/O

Implementation deals with method used to achieve this function, such as a parallel datapath and a microprogrammed control

Realization is means used to materialize this method: electrical, magnetic or mechanical devices; power and packaging.

Clock Architecture

1 23

12

4567

8910

11

Architecture

Variant Realizations

Architecture: Two arms – small one for hour, longer one for minutes, may be alarm.

Realization: Shape of clock arms and dial, numbers. Mechanical or digital mechanism. Energy source a wound spring or a battery.

Instruction Set Design: (1) Ease of Use

consistency: with a partial knowledge of the system, one can predict the remainder. e.g. including square-root as an instruction should almost fully define everything else. FP op halve was added to IBM 360 as an afterthought and lacked post-normalization.

orthogonality: Two independent concerns should be handled as such. e.g. clock architecture -- (1) luminous dial (2) alarm.

IBM 650, low order addr bits determine amount of shift. Yet, if address exceeds address space, a violation occurs.

transparency: an architectural function is transparent if its implementation does not produce any architecturally visible side-effects. e.g. pipelining should not affect the compiler-visible machine.

generality: Designer should not limit a function by his/her own notions about its use. Intel 8080 has a restart op intended to restart after an interrupt. Its larger use is a return from a subroutine, since it was designed in all its generality.

open-endedness: provision for future expansion.

completeness: all functions of a given class are provided. special case: symmetry: inverse is also provided.

Instruction Set Design: (2) Program size: memory size; CPU-MM

bandwidth; frequently used (written-down) instructions should be short.

(3) Execution speed: time required to execute an instruction Can they be pipelined? Are they uniform in

execution length? Control and cache are often in the critical path of a

processor design. Uniform length requirements at loggerheads with

(2) above. (4) Complexity of control unit: Some

instructions should not even be in the instruction set. (RISC)

Instruction Set Classification

internal CPU operand storage mechanism: registers, stack, accumulator

# explicit operands / instruction: 0, 1, 2, 3

presumed operand locations: memory, stack

Operations type and size of operands

Instruction: Opcode ---- Operands: ADD R1, 20

Instruction Formats

#Ops Instruction

Semantics Machine

4 NI op A B C C = A op B IBM 650 µ-code

3 op A B C C=A op B RISC, Cray

2 op A B A=A op B IBM370, VAX

1 op A Acc = Acc op A

PDP8, M6809

0 op X=X op Y stack machines transputer, B5500

Stack/Reg/Acc Architectures

Stack Accumulator

Reg-Mem Reg-Reg

PUSH A LOAD A LOAD R1, A LOAD R1, A

PUSH B ADD B ADD R1, B LOAD R2, B

ADD STORE C STORE C, R1

ADD R3, R1, R2

POP C STORE R3, C

C = A+B

Stack: short inst, post-fix model of expression evaluation; sequential operand access --- hard for

compilers, Implementation issues --- how deep,

exception handling e.g. when empty? Accumulator: short inst and relatively

small machine state, (easier context-switch); high memory traffic.

Reg-Reg: Easiest for compiler optimization -- most general model.

long instructions and large state.

Endian-ness of Memory AddressingCohen's article: On Holy Wars and a Plea for Peace,IEEE Computer, Oct 81.

CPUwords, pages

Memorybits, bytes

What order are they composed in order to form the nextobject in the hierarchy?

LSB (less-significant unit) travels first little endians (Lilliputians)MSB (more-significant unit) travels first big endians (Blefuscians)

Endian-ness Big-endian: IBM 360, MIPS, Motorola

680xx, SPARC, DLX Little-endian: DEC VAX,Compaq/HP

Alpha, Intel 80x86 Selectable: PowerPC, MIPS: mode bit: 0-

Big, 1-LittleA Content

s

4 0x10

5 0x20

6 0x30

7 0x40

Word at Addr 4: 0X10203040 (Big) 0x40302010 (Little)

Memory addressing contd: data alignment

Most machines are byte addressable.

Object Aligned at byte addr Misaligned at byte addr

Byte 0,1,2,3,4,5,6,7 Never

Half word 0,2,4,6 1,3,5,7

Word 0,4 1,2,3,5,6,7

Double word 0 1,2,3,4,5,6,7

1 B

Decoder

1KX1B memory

.. …1K

1K to 132 to 1

decoder 32

32X32B memory

32 B

32 to 1 multiplexor

5 MSBAddr bits

5 LSBAddr bits

10 addr. bits

Physical Rationale for Alignment

Costs of misalignment

0 1 2 3

a2=1

4 5 6 7

Memory Multiplexor

3 addr bits: a3, a2, a1

a3=0 a3=1

a2=0

top related