se 292 (3:0) high performance computing l2: basic computer organization
DESCRIPTION
SE 292 (3:0) High Performance Computing L2: Basic Computer Organization. R. Govindarajan govind@serc. Basic Computer Organization. Main parts of a computer system: Processor : Executes programs Main memory : Holds program and data I/O devices : For communication with outside - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/1.jpg)
SE 292 (3:0) High Performance ComputingL2: Basic Computer Organization
R. Govindarajan
govind@serc
![Page 2: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/2.jpg)
2
Basic Computer Organization Main parts of a computer system:
Processor: Executes programs Main memory: Holds program and data I/O devices: For communication with outside
Machine instruction: Description of primitive operation that machine hardware is able to execute
Instruction Set: Complete specification of all the kinds of instructions that the processor hardware was built to execute
e.g. ADD these two integers
![Page 3: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/3.jpg)
3
Basic Computer Organization
Memory
I/O
Bus
I/OI/O
ALU Registers
CPU
Control
![Page 4: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/4.jpg)
4
Inside the Processor… Hardware to manage instruction execution Arithmetic, logic hardware Registers: small units of memory to hold
data/instructions temporarily during execution
Two kinds of registers1. Special purpose registers
2. General purpose registers
![Page 5: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/5.jpg)
5
Special Purpose Registers Program Counter (PC): specifies location in
memory of instruction being executed Instruction Register (IR): holds that
instruction Processor Status Register: holds status
information about current state of processor, such as whether an arithmetic overflow has occurred, etc
![Page 6: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/6.jpg)
6
General Purpose Registers Available for use by programmer, possibly for
keeping frequently used data Why? Since there is a large speed disparity
between processor and main memory 1 GHz Processor: 1 nanosecond time scale Memory: ~ 50 - 100 nsec time scale
What do these numbers mean? Instruction operands can come from registers
or from main memory
![Page 7: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/7.jpg)
7
Basic Computer Organization
CacheMemory
I/O
Bus
I/OI/O
MMU
ALU Registers
CPU
Control
General Purpose Registers Integer
Registers FP Registers
Special Purpose Registers Program
Counter Instruction
Register
![Page 8: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/8.jpg)
8
Main Memory Holds instructions and data View as sequence of locations, each referred
to by a unique memory address If size of each memory location is 1 Byte, we
call the memory byte addressable This is quite typical, as smallest data
(character) is represented in 1 Byte Larger data items are stored in contiguous
memory locations, e.g., a 4Byte integer would occupy 4 consecutive memory locations
![Page 9: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/9.jpg)
9
Terms: Byte ordering
What is the integer (4 byte data) at Address 400? Big Endian byte ordering:1AC8B246
Little Endian byte ordering: 46B2C81A
Some machines use big endian byte ordering and others use little endian byte ordering
1A C8 46B2 F0 8C DF1EData
400 406404402Address
0001 1010 1100 1000 1011 0010 0100 0110
In Hexadecimal (0,1,2,…,A,B,C,D,E,F)
0100 0110 1011 0010 1100 1000 0001 1010
Decimal: 449,360,454
Decimal: 1,186,121,754
![Page 10: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/10.jpg)
10
Terms: Word Size, Word AlignmentWord Size
Normal size of an integer or pointer
32b (4B) on many machines
Word Alignment
`Integer variable X is not word aligned’
The data item is not located at a word boundary
Word boundaries: addresses 0, 4, 8, 12, …
HW:
Write a C program to Identify whether a machine supports Little Endian or BigEndian
Write a C program to transfer a sequence of 4-byte values from a Little Endian to BigEndian.
![Page 11: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/11.jpg)
11
Instruction Set Architecture (ISA)View of the computer visible to the programmer (or
compiler)
Two kinds of ISAs
1. Complex Instruction Set Computer (CISC)
A single instruction can perform a complex operation involving several actions
2. Reduced Instruction Set Computer (RISC)
Each instruction performs a only simple operation
![Page 12: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/12.jpg)
12
Instruction Set Architecture Description of machine from view of the
programmer/compiler Example: Intel x86 ISA
Includes specification of1. The different kinds of instructions available
(instruction set)
2. How operands are specified (addressing modes)
3. What each instruction looks like (instruction format)
![Page 13: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/13.jpg)
13
Kinds of Instructions1. Arithmetic/logical instructions
Add, subtract, multiply, divide, compare (int/fp) Or, and, not, xor Shift (left/right, arithmetic/logical), rotate
2. Data transfer instructions Load (to register from memory) Store (to memory location from register) Move
3. Control transfer instructions Jump, conditional branch, function call, return
4. Other instructions Example: halt
![Page 14: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/14.jpg)
14
Operand Addressing Modes• Operands to an instruction
• Source: input value to instruction• Destination: where result is to go
• Addressing Mode• How the location of operand is specified
• An operand can be either• in a memory location• in a register
![Page 15: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/15.jpg)
15
Addressing Modes: Operand in Register1. Register Direct Addressing Mode
Operand is in the specified general purpose register
ExampleSuppose that the General Purpose Registers are
numbered as 0, 1, 2, etc
ADD R1, R2, R3 / R1 R2 + R3
2. Immediate Addressing ModeOperand is included in the instruction
ADD R1, R2, 1 / R1 R2 + 1
R1
R2
R3
17
24
35
59
source operandsdestination operand
![Page 16: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/16.jpg)
16
Addressing Modes: Operand in Memory3. Register Indirect Addressing Mode
Memory address of operand is in the specified general purpose register
ADD R1, R1, (R2)
4. Base-Displacement Addressing ModeMemory address of operand is calculated as
the sum of value in specified register and specified displacementADD R1, R1, 4(R2)
R1
R1
R2
R2
32
100
32
100
Address 96 100 104 108
Value 0 10 35 -17
MAIN MEMORY
MAIN MEMORY
42
67
Address 96 100 104 108
Value 0 10 35 -17
![Page 17: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/17.jpg)
17
Addressing Modes: Operand in Memory5. Absolute Addressing Mode
Memory address of operand is specified directly in the instruction
ADD R1, R2, #100
6. Indexed Addressing ModeMemory address of operand is calculated as sum of
contents of 2 registersADD R1, R2, (R3+R4)
Others Auto-increment/decrement (pre/post) PC relative
![Page 18: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/18.jpg)
18
Case Study: MIPS I Integer Instruction Set Registers
32 32b general purpose registers, R0..R31 R0 hardwired to value 0 R31 implicitly used by instructions JAL, JALR
HI, LO: 2 other 32b registers Used implicitly by multiply and divide instructions
Addressing Modes Immediate, Register direct (arithmetic) Absolute (jumps) Base-displacement (loads, stores) PC relative (branches)
![Page 19: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/19.jpg)
19
MIPS I ISA: General Comments All instructions, registers are 32b in size Load-store architecture: the only instructions
that have memory operands are loads&stores Terminology
Word: 32b Halfword: 16b Byte: 8b
Displacements and immediates are signed 16 bit quantities
![Page 20: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/20.jpg)
20
A RISC Instruction SetInstruction Mnemonic Example Meaning
Data Transfer Instructions Load LB, LBU, LH, LHU,
LUI, LW lw R2, 4(R3) R2Mem[R3+4]
Store SB, SH, SW sb R2, -8(R4) Mem[R4 - 8] R2 Move MFHI,MFLO,MTHI,
MTLO mfhi R1 R1 HI
Integer ALU Instructions Add ADD,ADDU,ADDI,
ADDIU add R1, R2, R3 R1 R2 + R3
Subtract SUB, SUBU sub R1, R2, R3 R1 R2 – R3 Multiply MULT, MULTU mult R1, R2 LO LSW ( R1*R2)
HI MSW (R1*R2) Divide DIV, DIVU div R1, R2 LO R1 div R2
HI R1 mod R2 Logical AND,ANDI,OR,ORI
NOR, XOR, XORI ori R1, R2, 0xF0 R1 R1 | SE (0xF0)
Shift SLL, SLV, SRA, SR sr R1, R2, 4 R1 0000 || (R2)31-4 Comparison SLT, SLTI, SLTU,
SLTIU slti R1, R2, 16 R1 1 if R2 < SE(16)
0 otherwise
![Page 21: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/21.jpg)
21
RISC Instruction Set (contd)
Instruction Mnemonic Example Meaning Control Transfer Instructions
Conditional Branch
BEQ, BGEZ, BLTZ, BLEZ, BGTZ, BNE
bltz R2, -16 PCPC+4 –16 if R2 < 0
Jump J, JR j <target> PC(PC)31-28||target||00 Jump & Link JAL, JALR jalr R2 R31 PC + 8
PC R2 System Call SYSCALL syscall
HW:Write a simple C program and generate the corpg. assembly language program for a RISC/CISC machine. Understand the instructions, function call mechanism, formats of branch and jump instructions, etc.
![Page 22: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/22.jpg)
27
MIPS Instruction Encoding
Example: add R 1, R 2, R 3
Opcode6 bits
Src1 (rs)5 bits
Func. code6-bits
Dst (rd)5 bits
Src2 (rt)5 bits
R-Formatsh amt5 bits
![Page 23: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/23.jpg)
28
MIPS Instruction Encoding
Opcode6 bits
Src1 (rs)5 bits
constant16-bits
Dst (rt)5 bits
I-Format
Example: addi R 1, R 2, 8
lw R 1, 24 (R 2)
bltz R 1, loop
![Page 24: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/24.jpg)
29
MIPS Instruction Encoding
Opcode6 bits
Jump address26-bits
J-Format
Example: jal fact
![Page 25: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/25.jpg)
30
CISC vs RISC -- ISA Comparison
RISC Code:
lw R1, 0(R3)
lw R2, 0(R4)
add R5, R1, R2
subi R2, R2, 1
sw 0(R3), R5
sw 0(R4), R2
CISC Code:add (R3)+, (R3), (R4)sub (R4), -(R4), 1
a[i++] = a[i] + b[i];
b[i] = b[--i] - 1;
# of Data Memory Accesses:
RISC - 4CISC - 5
![Page 26: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/26.jpg)
32
On Instruction Processing Fetch
Get instruction whose address is in PC from memory into IR
Increment PC Decode
Understand instruction, addressing modes, etc Calculate effective addresses and fetch operands
Execute Do required operation
Write back the result of the instruction
![Page 27: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/27.jpg)
34
Instruction Execution
Mem
IR
+
PC
NPC4
Instruction Fetch (IF) from program memory
to instruction register
IR Mem [PC]
Increment PC
Instr Fetch
![Page 28: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/28.jpg)
35
Instruction Execution…
Instr Fetch
RegFile
signextend
A
Imm
B
Instr Decode
InstMem
IR
+
PC
NPC4
A RegisterFile[rs] B RegisterFile[rt]Imm sign extend(IR15-0)
Instruction Decode & Operand Fetch (ID)
![Page 29: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/29.jpg)
36
Instruction Execution…Execution (EX)
Arithmetic Inst: ALU-Out A op B ALU-Out A op Imm
Load/Store Inst: ALU-Out A + Imm
Branch Inst: ALU-Out NPC + Imm
Jump Inst: PC NPC 31-28 || IR 25-0 ||00
Imm
NPCALU-
outALU
Zero?
B
A
Cond.
Execution
![Page 30: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/30.jpg)
37
Instruction Execution…Memory (MEM)
Execution Memory
Imm
NPCALUoutALU
Zero?
Mem LMDB
A
Cond
Store Instr Mem[ALUOut]
B
Load Instr LMD Mem[ALUout]
![Page 31: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/31.jpg)
38
Instruction Execution…Write Back (WB)
ALU Inst RegisterFile[rd] ALUout
Load Inst RegisterFile[rt] LMD
Conditional Branch Inst PC ALU-out if Cond PC NPC otherwise
![Page 32: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/32.jpg)
39
Processor Datapath
MemIR
+
PC
NPC
RegFile
signextend
A
Imm
B
Inst Fetch
IF
Inst Decode
ID
4
ALUoutALU
Zero?
MemLMD
Execution
EX
Memory
MEM
Cond
WB
![Page 33: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/33.jpg)
40
Our Assumptions1. Disparity in Processor vs Memory speed
Time for performing addition, register access, etc. vs memory fetch?
Which stages require memory access?
2. Main memory delays not typically seen by instruction processor Otherwise timeline is dominated by them There is some hardware mechanism through which
most memory access requests can be satisfied at processor speeds (cache memory)
3. Preferable that the time required for each stage of instruction processing to be the same – cycle time
![Page 34: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/34.jpg)
41
Processor cycle time: time required to do
Cache memory access Register access + some logic (like decode) ALU operation
Inst Fetch IF
Inst Decode ID
ExecutionEX
Memory MEM
MemIR
+PC
NPC
RegFile
signextend
A
Imm
B
4
ALUoutALU
Zero?
Mem LMD
Cond
WriteBack WB
![Page 35: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/35.jpg)
43
Performance of Processor Which is more important?
execution time of a single instruction throughput of instruction execution i.e.,
number of instructions executed per unit time Cycles Per Instruction (CPI)
Current ideas: CPI between 3 and 5
![Page 36: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization](https://reader036.vdocument.in/reader036/viewer/2022062801/5681431a550346895daf7442/html5/thumbnails/36.jpg)
CPI Calculation Cycles for
ALU Ins. – 4; Load – 5 ; Store – 4; Conditional – 4; Jump – 3;
% of Instructions in a Program ALU Ins. – 45 %; Load – 15% ; Store – 10% ;
Conditional – 20% ; Jump – 10%; CPI = ?
CPI = 0.45*4 + 0.25*5 + 0.1*4 + 0.2*4 + 0.1*3 = 4.55
How to improve CPI? Pipelining : Fetch the next instruction while the
previous is being decoded.
44