lecture 4 instruction set examples - university of · pdf file– derived from many...

36
Lecture 4 Instruction Set Examples Computer Architectures 521480S

Upload: lamkhanh

Post on 28-Mar-2018

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

Lecture 4Instruction Set Examples

Computer Architectures 521480S

Page 2: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

DLX Architecture• Introduced by Hennessey and Patterson in

1990– Derived from many different instruction set archite ctures

from MIPS, Sun, IBM, Intel, HP, AMD, etc.

• DLX is a typical RISC architecture.– 32-bit fixed length instructions– 3 instruction formats: register-register (R-type), register-

immediate (I-type), and jump (J-type)– Load/store architecture– Simple branch conditions (no condition codes, but a

condition register)

• DLX registers– 32 32-bit general-purpose registers (R0 = 0 (always !) )– 32 32-bit (or 16 64-bit) floating point registers– Special purpose registers (e.g., FP Status and PC)

Page 3: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

DLX Design Decisions• DLX is based on the following design decisions

– Use general purpose registers with a load-store arc hitecture– Support commonly used addressing modes

» displacement, immediate, and register indirect– Support simple instructions that occur frequently

» load, store, add, subtract, move, and, shift, compa re equal, branch, jump, call, and return

– Support commonly required data sizes» 8 (byte), 16 (half word), and 32-bit (word) integer s» 32 (float) and 64-bit (double) floating point

– Use fixed length instructions that are easy to deco de – Provide plenty of general purpose registers and sep arate

floating point registers: helps compiler

Page 4: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

DLX Instruction FormatsRegister-Register (R-type) ADD R1, R2, R3

Register-Immediate (I-type) SUB R1, R2, #3

(ALU imm. operations, loads and stores, conditional branch, jump (and link)

(jump, jump and link, trap and return from exception)

(AlI reg. operations, read/write special registers and moves)

Op

0 5 31161511106

rs1 rs2 rd

2120

func

Op

0 5 31161511106

rs1 rd immediate

Op

0 5 316

offset added to PC

Page 5: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

Examples of DLX Instructions

• Data Transfer– LW R1, 30(R2) Regs[R1] <= Mem[30 + Regs[R2]]– SD F0, 40(R3) Mem[40 + Regs[3]] <= Regs[F0]

Mem[44 + Regs[3]] <= Regs[F1]– Loads and stores also for bytes, half words, and fl oats– How would you perform a register move? nop ?

• Arithmetic and Logic– SUB R1, R2, R3 Regs[R1] <= Regs[R2] - Regs[R3]– SLLI R1, R2, #5 Regs[R1] <= Regs[R2] << #5– LHI R1, #42 Regs[R1] <= 42##0 16

– SLT R1, R2, R3 if (Regs[R2] < Regs[R3]) Regs[1] <= 1else Regs[1] <= 0

- How would you load a 32 bit immediate into a regis ter?

Page 6: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

Examples of DLX Instructions• Control

– JALR R2 Regs[31] <= PC+4, PC <= Regs[R2]– JR R3 PC <= Regs[R3]– BENZ R4, name if (Regs[R4] != 0) PC <- name

else PC <- PC + 4– How would you implement a subroutine call and retur n?

• Floating Point– MULF F1, F2, F3 Regs[F1] <= Regs[F2] * Regs[F3]– ADDD F0, F2, F4 Regs[F0&F1] < Regs[F2&F3] + Regs[F4& F5]

Page 7: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

DLX Instruction SetAppendix C.3

• Data transfer– Load/store word– Load/store halfword or byte (singed/unsigned loads)– Load/store floating point single/double– Register moves

• Arithmetic and Logic– Add/subtract (signed or unsigned, reg. or imm.)– Multiply/divide (signed or unsigned, operands in FP reg.)– And, or, xor (reg. or imm.)– Load high word (loads upper half of a reg. with imm .)– Shifts (LL, RL, RA) (reg. or imm.)– Set conditionals (LT, GT, LE, GE, EQ, NE) (reg. or imm.)

Page 8: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

Istruction Examples

Page 9: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,
Page 10: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,
Page 11: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

DLX Instruction Set• Control

– Conditional branch on register (compare with zero)– Conditional on FP status bit (bit true or false)– Jump, jump register (26 bit imm. or reg.)– Jump and link, jump and link register (26 bit imm. o r reg.)– Trap, return from exception (trap to and return fro m O.S.)

• Floating Point– Add, subtract, multiply, divide (single or double)– FP converts (convert between single, double, and in teger)– FP compares (single or double, sets bit in FP statu s)

Page 12: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

DLX Instruction Usage

Instr. gcc espresso spice nasa7int load/store 43% 29% 23% 1%int. arith 26% 30% 33% 22%control 17% 13% 11% 4%logical 10% 23% 5% 1%fp load/store 0% 0% 8% 33%fp arith 0% 0% 5% 39%misc 4% 0% 5% 0%

Page 13: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

DLX Summary• Simple load/store architecture

– Only accesses memory on loads/stores– All other operations use registers and immediate

• Designed for pipeline efficiency– Fixed length instruction encoding– Simple instructions

• Easy to compile to– Simple, frequently used instructions– Orthogonal instruction set– Few addressing modes

• Reduces execution time by– reducing CPI– increasing clock rate

Page 14: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

History of the Intel 80x86

• 1971: Intel invents microprocessor - 4004• 1975: 8080 introduced

– 8-bit microprocessor, used in the Altair personal c omputer– Accumulator machine

• 1978: 8086 introduced– 16 bit microprocessor– Accumulator plus dedicated registers

• 1980: IBM selects 8088 as basis for IBM PC– 8088 is 8-bit external bus version of 8086

• 1980: 8087 floating point coprocessor – adds 60 floating point instructions– 80 bit floating point registers – uses hybrid stack/register scheme

Page 15: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

History of the Intel 80x86• 1982: 80286 introduced

– 24-bit address– memory mapping & protection

• 1985: 80386 introduced– 32-bit address, 32-bit GP registers– Support for multitasking

• 1989: 80486 introduced– Built in math coprocessor– More powerful cache and instruction pipelining

• 1992: Pentium introduced– Superscalar processor (multiple instructions per cy cle)

• 1995: Pentium Pro introduced– More aggressive superscalar with register renaming, branch

prediction, and speculative execution

Page 16: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

History of the Intel 80x86• Pentium II introduced

– Incoporated MMX Technology– 57 new instructions for processing video, audio, an d graphics– Support single instructions that operate on multipl e data (SIMD)

– 8 data of 8 bits, 4 data of 16 bits, or 2 data by 3 2 bits

• Pentium III introduced– Features Internet Streaming SIMD extensions– Improve performance of 3D graphics and internet app lications– Allow one instruction to executed on 4 pairs of 32- bit floating

point data

• Itanium introduced– New 64-bit RISC-like architecture (IA-64)– 128-bit instructions bundles (3 instructions per bu ndle)

• Intel architecture was due to the desire for backward compatability

– Highly irregular architecture– Over 50 million sold per year

Page 17: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

Intel 80x86 Integer Registers

Page 18: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

X86 Operand Types• x86 instructions typically have two operands,

where one operand is both a source and a destination operand.

• Possible combinations includeSource/destination type Second source type

Register RegisterRegister ImmediateRegister MemoryMemory RegisterMemory Immediate

• No memory-memory or immediate-immediate• Immediates can be 8, 16, or 32 bits

Page 19: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

Intel 80x86 Floating Point Registers

• Operations on the top of stack and one register within the stack

Page 20: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

80x86 Instruction Format• Instructions sizes vary from 1 to 17 bytes

Page 21: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

80x86 Instructions

• Data movement (move, push, pop )• Arithmetic and logic (logic ops, tests

condition codes, shifts, integer and decimal arithmetic )

• Control flow (branches, jumps, calls, returns)• String instructions (move and compare)• FP data movement (load, load const., store)• Arithmetic instructions (add, subtract,

multiply, divide, square root , absolute value) • Comparisons ( can send result to ALU )• Transcendental functions (sin, cos, log, etc.)

Page 22: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

80x86 Addressing Mode Usage for 32-bit Mode

Addressing Mode GccEspr. NASA7 Spice Avg.Register indirect 10% 10% 6% 2% 7%Base + 8-bit disp 46% 43% 32% 4% 31%Base + 32-bit disp 2% 0% 24% 10% 9%Indexed 1% 0% 1% 0% 1%Based indexed + 8b disp 0% 0% 4% 0% 1%Based indexed + 32b disp 0% 0% 0% 0% 0%Base + Scaled Indexed 12% 31% 9% 0% 13%Base + Scaled Index + 8b disp 2% 1% 2% 0% 1%Base + Scaled Index + 32b disp 6% 2% 2% 33% 11%32-bit Direct 19% 12% 20% 51% 26%

Page 23: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

80x86 Length DistributionLe

ngth

in b

ytes

% instructions at each length

0% 10% 20% 30%

1

2

3

4

5

6

7

8

9

10

11

24%

23%

21%

3%

12%

13%

3%

0%

0%

1%

19%

17%

16%

1%

15%

27%

4%

0%

0%

1%

24%

24%

27%

4%

13%

6%

2%

0%

0%

0%

25%

24%

29%

3%

12%

4%

2%

0%

0%

0%

Espresso

Gcc

Spice

NASA7

Page 24: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

Instruction Counts: 80x86 vs. DLX

SPEC pgm x86 DLX DLX/x86gcc 3,771,327,742 3,892,063,460 1.03espresso 2,216,423,413 2,801,294,286 1.26spice 15,257,026,309 16,965,928,788 1.11nasa7 15,603,040,963 6,118,740,321 0.39

• DLX tends to perform more instructions for integer programs, while the 80x86 performs more instructions for floating point programs

• 80x86 performs many more data transfers– Two to four times more for floating point programs

(reason: floating point register stack)– About 1.25 times more for integer programs

Page 25: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

Comparison

• How would you expect the x86 and MIPS architectures to compare on the following.

– CPI on SPEC benchmarks» MIPS < x86

– Ease of design and implementation» MIPS easier

– Ease of writing assembly language & compilers» compiler: MIPS is easier (regular ISA, fewer addres sing

modes, more general purpose registers,..)» writing assembly language?

– Code density» x86 > MIPS

– Overall performance?

• What other advantages/disadvantages are there to th e two architectures.

Page 26: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

Graphics and Multimedia Instruction Set Extensions

• Several companies have extended their computer’s instruction sets to better support graphics and multimedia applications.

– Intel’s MMX Technology– Intel’s Internet Streaming SIMD Extensions– AMD’s 3DNow! Technology – Sun’s Visual Instruction Set– Motorola’s and IBM’s AltiVec Technology

• These extensions improve the performance of – Computer-aided design– Internet applications– Computer visualization– Video games– Speech recognition

Page 27: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

MMX Data Types• MMX Technology supports operations on the

following 64-bit integer data types.

Packed byte (eight 8-bit elements)

Packed word (four 16-bit elements)

Packed double word (two 32-bit elements)

Packed quad word (one 64-bit elements)

Page 28: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

SIMD Operations

• MMX Technology allows a Single Instruction to work on Multiple pieces of Data (SIMD).

PADD[W]: Packed add word

• In the above example, 4 parallel adds are performed on 16-bit elements.

• Most MMX instructions only require a single cycle.

A3 A2 A1 A0

B3 B2 B1 B0

A3+B3 A2+B2 A1+B1 A0+B0

Page 29: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

Saturating Arithmetic

• Both wrap-around and saturating adds are supported.

• With saturating arithmetic, results that overflow are set to the largest value.

PADD[W]: Packed wrap-around add

(with 16-bit words)

PADDUS[W]: Packed saturating add

Page 30: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

Pack and Unpack Instructions

• Pack and unpack instructions provide conversion between standard data types and packed data types

PACKSS[DW]: Packed signed with saturating double to packed word

Page 31: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

Multiply-Add Operations• Many graphics applications require multiply-

accumulate operations– Vector Dot Products– Matrix Multiplies– Fast Fourier Transforms (FFTs)– Filter implementations

PMADDWD: Packed multiply-add word to double

Page 32: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

Vector Dot Product

• A dot product on an 8-element vector can be performed using 8 MMX instructions

• Without MMX 40 instructions are required

Page 33: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

Packed Compare Instructions

• Packed compare instructions allow a bit mask to be set or cleared

• This is useful when images with certain qualities need to be extracted.

Page 34: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

MMX Instructions• MMX Technology adds 57 new instructions to

the x86 architecture. • Some of these instructions include

– PADD(b, w, d) Packed addition– PSUB(b, w, d) Packed subtraction– PCMPEQ(b, w, d) Packed compare equal– PMULLw Packed word multiply low– PMULHw Packed word multiply high– PMADDwd Packed word multiply-add– PSRL(w, d, q) Pack shift right logical– PACKSS(wb, dw) Pack data– PUNPCK(bw, wd, dq) Unpack data– PAND, POR, PXOR Packed logical operations

Page 35: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

Performance Comparison• The following shows the performance of Pentium

processors with and without MMX Technology

1.64255.43156.00Overall

2.13318.90149.80Audio

1.03166.44161.523D geometry

4.67743.90159.03Image Processing

1.72268.70155.52Video

SpeedupWith MMX

Without MMX

Application

Page 36: Lecture 4 Instruction Set Examples - University of · PDF file– Derived from many different instruction set architectures ... – 8088 is 8-bit external bus version of 8086 ... multiply,

MMX Technology Summary• MMX technology extends the Intel x86 architecture t o

improve the performance of multimedia and graphics applications.

• It provides a speedup of 1.5 to 2.0 for certain applications.

• MMX instructions are hand-coded in assembly or implemented as libraries to achieve high performanc e.

• Only increase the chip area by about 5% (note: as we know from the home assingment 2, it is important to keep chip area as small as possible for reducing di e costs).