seq part 3 / hclrs - computer sciencecr4bd/3330/f2017/notes/20170921... · exercise: ... hlt:stop...
TRANSCRIPT
SEQ part 3 / HCLRS
1
Changelog
Changes made in this version not seen in first lecture:21 September 2017: data memory value MUX input for call is PC + 10,not PC21 September 2017: slide 23: add input to pre R[dstE] mux for irmovq21 September 2017: slide 26: need MUX for 0 ALU input21 September 2017: correct a couple instances of ‘HCL2D’ to ‘HCLRS’
1
last time
single cycle processor design strategy
conceptual stagesfor now: ease processor designconsider what every instruction does for a particular stage
actual timing — clock signalone cycle per instruction in this designcalculations between rising edges of clockrising edge of clock triggers state change (register/memory valueschange)
2
SEQ: instruction fetch
read instruction memory at PC
split into seperate wires:icode:ifun — opcoderA, rB — register numbersvalC — call target or mov displacement
compute next instruction address:valP — PC + (instr length)
3
instruction fetch
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Stat
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+valP
4
SEQ: instruction “decode”
read registersvalA, valB — register values
5
instruction decode (1)
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Stat
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+valP
exercise: which of these instructions can this not work for?nop, addq, mrmovq, popq, call,
6
instruction decode (1)
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Stat
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+valP
exercise: which of these instructions can this not work for?nop, addq, mrmovq, popq, call,
6
SEQ: srcA, srcB
always read rA, rB?
Problems:push rApopcallret
extra signals: srcA, srcB — computed input register
MUX controlled by icode
7
SEQ: possible registers to read
instruction srcA srcBhalt, nop, jCC, irmovq none nonecmovCC, rrmovq rA nonemrmovq none rBrmmovq, OPq rA rBcall, ret none? %rsppushq, popq rA %rsp
MUX srcB
rB
%rsp
(none) F
logic functionicode
8
SEQ: possible registers to read
instruction srcA srcBhalt, nop, jCC, irmovq none nonecmovCC, rrmovq rA nonemrmovq none rBrmmovq, OPq rA rBcall, ret none? %rsppushq, popq rA %rsp
MUX srcB
rB
%rsp
(none) F
logic functionicode
8
SEQ: possible registers to read
instruction srcA srcBhalt, nop, jCC, irmovq none nonecmovCC, rrmovq rA nonemrmovq none rBrmmovq, OPq rA rBcall, ret none? %rsppushq, popq rA %rsp
MUX srcB
rB
%rsp
(none) F
logic functionicode
8
instruction decode (2)
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Stat
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+valP
9
SEQ: execute
perform ALU operation (add, sub, xor, and)valE — ALU output
read prior condition codesCnd — condition codes based on ifun (instruction type for jCC/cmovCC)
write new condition codes
10
using condition codes: cmov*
(always) 1(le) SF | ZF
(l) SF
cc(from instr)
rB0xF dstENOT
11
execute (1)
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Stat
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+valP
exercise: which of these instructions can this not work for?nop, addq, mrmovq, popq, call,
12
execute (1)
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Stat
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+valP
exercise: which of these instructions can this not work for?nop, addq, mrmovq, popq, call,
12
SEQ: ALU operations?
ALU inputs always valA, valB (register values)?
no, inputs from instruction: (Displacement + rB)MUX aluB
valBvalC
mrmovqrmmovq
no, constants: (rsp +/- 8)pushqpopqcallret
extra signals: aluA, aluBcomputed ALU input values
13
execute (2)
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Stat
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+valP
14
SEQ: Memory
read or write data memoryvalM — value read from memory (if any)
15
memory (1)
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Stat
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+valP
exercise: which of these instructions can this not work for?nop, rmmovq, mrmovq, popq, call,
16
memory (1)
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Stat
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+valP
exercise: which of these instructions can this not work for?nop, rmmovq, mrmovq, popq, call,
16
SEQ: control signals for memory
read/write — read enable? write enable?
Addr — addressmostly ALU outputtricky cases: popq, ret
Data — value to writemostly valBtricky cases: call, push
17
memory (2)
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Stat
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+valP
18
SEQ: write back
write registers
19
write back (1)
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Stat
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+valP
exercise: which of these instructions can this not work for?nop, pushq, mrmovq, popq, call,
20
write back (1)
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Stat
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+valP
exercise: which of these instructions can this not work for?nop, pushq, mrmovq, popq, call,
20
SEQ: control signals for WB
two write inputs — two needed by popqvalM (memory output), valE (ALU output)
two register numbersdstM, dstE
write disable — use dummy register number 0xF
MUX dstErBF
%rsp
21
write back (2)
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Stat
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+valP
22
write back (3a)
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Stat
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+valP
23
write back (3b)
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Stat
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+valP
24
SEQ: Update PC
choose value for PC next cycle (input to PC register)usually valP (following instruction)exceptions: call, jCC, ret
25
PC update
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Stat
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+valP
26
describing hardware
how do we describe hardware?
pictures?
add 1count
27
circuits with pictures?
yes, something you can do
such commercial tools exist, but…
not commonly used for processors
28
hardware description language
programming language for hardware
(typically) text-based representation of circuit
often abstracts away details like:how to build arithmetic operations from gateshow to build registers from transistorshow to build memories from transistorshow to build MUXes from gates…
those details also not a topic in this course
29
our tool: HCLRS
built for this course
assumes you’re making a processor
somewhat different from textbook’s HCL
30
nop CPU
thePcInstr.Mem.
add 1
“pc” “i10bytes”
StatSTAT_AOK
register pF {thePc : 64 = 0;
}p_thePc = F_thePc + 1;pc = F_thePc;
built-in componentuse is mandatory
Stat = STAT_AOK;
built-in component:AOK: continueHLT: stop
31
nop CPU
thePcInstr.Mem.
add 1
“pc” “i10bytes”
StatSTAT_AOK
register pF {thePc : 64 = 0;
}
p_thePc = F_thePc + 1;pc = F_thePc;
built-in componentuse is mandatory
Stat = STAT_AOK;
built-in component:AOK: continueHLT: stop
31
nop CPU
thePcInstr.Mem.
add 1
“pc” “i10bytes”
StatSTAT_AOK
register pF {thePc : 64 = 0;
}
p_thePc = F_thePc + 1;pc = F_thePc;
built-in componentuse is mandatory
Stat = STAT_AOK;
built-in component:AOK: continueHLT: stop
31
nop CPU
thePcInstr.Mem.
add 1
“pc” “i10bytes”
StatSTAT_AOK
register pF {thePc : 64 = 0;
}p_thePc = F_thePc + 1;
pc = F_thePc;
built-in componentuse is mandatory
Stat = STAT_AOK;
built-in component:AOK: continueHLT: stop
31
nop CPU
thePcInstr.Mem.
add 1
“pc” “i10bytes”
StatSTAT_AOK
register pF {thePc : 64 = 0;
}p_thePc = F_thePc + 1;pc = F_thePc;
built-in componentuse is mandatory
Stat = STAT_AOK;
built-in component:AOK: continueHLT: stop
31
nop CPU
thePcInstr.Mem.
add 1
“pc” “i10bytes”
StatSTAT_AOK
register pF {thePc : 64 = 0;
}p_thePc = F_thePc + 1;pc = F_thePc;
built-in componentuse is mandatory
Stat = STAT_AOK;
built-in component:AOK: continueHLT: stop
31
nop CPU
thePcInstr.Mem.
add 1
“pc” “i10bytes”
StatSTAT_AOK
register pF {thePc : 64 = 0;
}p_thePc = F_thePc + 1;pc = F_thePc;
built-in componentuse is mandatory
Stat = STAT_AOK;
built-in component:AOK: continueHLT: stop
31
nop CPU: running
need a program in memory.yo file
tools/yas — convert .ys to .yo
tools/yis — reference interpreter for .yo filesif your processor doesn’t do the same thing…
can build tools by running make
32
nop CPU: creating a program
create assemby file: nops.ys:nopnopnopnopnop
assemble using tools/yas nops.ys or make nops.yo
33
nop.yo
more readable/simpler than normal executables:0x000: 10 | nop0x001: 10 | nop0x002: 10 | nop0x003: 10 | nop0x004: 10 | nop
|
loaded into data and program memory
parts left of | just comments
34
running a simulator (1)
Usage: ./hclrs [options] HCL-FILE [YO-FILE [TIMEOUT]]Runs HCL_FILE on YO-FILE. If --check is specified, no YO-FILE may be supplied.Default timeout is 9999 cycles.
Options:-c, --check check syntax only-d, --debug output traces of all assignments for debugging-q, --quiet only output state at the end-t, --testing do not output custom register banks (for autograding)-h, --help print this help menu
--version print version number
35
running a simulator (2)
$ ./hclrs nop_cpu.hcl nops.yo+------------------- between cycles 0 and 1 ----------------------+| RAX: 0 RCX: 0 RDX: 0 || RBX: 0 RSP: 0 RBP: 0 || RSI: 0 RDI: 0 R8: 0 || R9: 0 R10: 0 R11: 0 || R12: 0 R13: 0 R14: 0 || register pF(N) thePc=0000000000000000 || used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f || 0x0000000_: 10 10 10 10 10 |+-----------------------------------------------------------------------+pc = 0x0; loaded [10 : nop]+------------------- between cycles 1 and 2 ----------------------+....
36
running a simulator (2)
$ ./hclrs nop_cpu.hcl nops.yo+------------------- between cycles 0 and 1 ----------------------+| RAX: 0 RCX: 0 RDX: 0 || RBX: 0 RSP: 0 RBP: 0 || RSI: 0 RDI: 0 R8: 0 || R9: 0 R10: 0 R11: 0 || R12: 0 R13: 0 R14: 0 || register pF(N) thePc=0000000000000000 || used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f || 0x0000000_: 10 10 10 10 10 |+-----------------------------------------------------------------------+pc = 0x0; loaded [10 : nop]+------------------- between cycles 1 and 2 ----------------------+....
36
running a simulator (2)
$ ./hclrs nop_cpu.hcl nops.yo+------------------- between cycles 0 and 1 ----------------------+| RAX: 0 RCX: 0 RDX: 0 || RBX: 0 RSP: 0 RBP: 0 || RSI: 0 RDI: 0 R8: 0 || R9: 0 R10: 0 R11: 0 || R12: 0 R13: 0 R14: 0 || register pF(N) thePc=0000000000000000 || used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f || 0x0000000_: 10 10 10 10 10 |+-----------------------------------------------------------------------+pc = 0x0; loaded [10 : nop]+------------------- between cycles 1 and 2 ----------------------+....
36
running a simulator (2)
$ ./hclrs nop_cpu.hcl nops.yo+------------------- between cycles 0 and 1 ----------------------+| RAX: 0 RCX: 0 RDX: 0 || RBX: 0 RSP: 0 RBP: 0 || RSI: 0 RDI: 0 R8: 0 || R9: 0 R10: 0 R11: 0 || R12: 0 R13: 0 R14: 0 || register pF(N) thePc=0000000000000000 || used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f || 0x0000000_: 10 10 10 10 10 |+-----------------------------------------------------------------------+pc = 0x0; loaded [10 : nop]+------------------- between cycles 1 and 2 ----------------------+....
36
nop/halt CPU
thePcInstr.Mem.
add 1
valP
Stat
MUX
STAT_AOKSTAT_HLTSTAT_INS
extract opcode
register pP {thePc : 64 = 0;
}p_thePc = P_thePc + 1;pc = P_thePc;Stat = [
i10bytes[4..8] == NOP : STAT_AOK;i10bytes[4..8] == HALT : STAT_HLT;1 : STAT_INS; (default case)
];
37
nop/halt CPU
thePcInstr.Mem.
add 1
valP
Stat
MUX
STAT_AOKSTAT_HLTSTAT_INS
extract opcode
register pP {thePc : 64 = 0;
}p_thePc = P_thePc + 1;pc = P_thePc;Stat = [
i10bytes[4..8] == NOP : STAT_AOK;i10bytes[4..8] == HALT : STAT_HLT;1 : STAT_INS; (default case)
];
37
MUXes in HCLRS
book calls “case expression”conditions evaluated (as if) in orderfirst match is output: result = [
x == 5: 1;x in {0, 6}: 2;x > 2: 3;1: 4;
];x = 5: result is 1x = 6: result is 2x = 3: result is 3x = 4: result is 3x = 1: result is 4
38
nop/halt CPU
thePcInstr.Mem.
add 1
valP
Stat
MUX
STAT_AOKSTAT_HLTSTAT_INS
extract opcode
register pP {thePc : 64 = 0;
}p_thePc = P_thePc + 1;pc = P_thePc;Stat = [
i10bytes[4..8] == NOP : STAT_AOK;i10bytes[4..8] == HALT : STAT_HLT;1 : STAT_INS; (default case)
];
39
subsetting bits in HCLRS
extracting bits 2 (inclusive)–9 (exclusive): value[2..9]
least significant bit is bit 0
40
bit numbers and instructions
value from instruction memory in i10bytes
HCLRS numbers bits from LSB to MSB
80-bit integer, little-endian order:
first byte is least significant byte
HCLRS bit ‘0’ is least significant bit
41
example
pushq %rbx at memory address x: A F 2 F
memory at x + 0: pushq F ; at x + 1: rbx F
x + 0: A F ; at x + 1: 2 F
as a little-endian 2-byte number in typical English order:2 F A F
0010 1111 1010 1111
most sig. bit(bit 15)
least sig. bit(bit 0)
42
Y86 encoding table
byte: 0 1 2 3 4 5 6 7 8 9
halt 0 0
nop 1 0
rrmovq/cmovCC rA, rB 2 cc rA rB
irmovq V, rB 3 0 F rB
rmmovq rA, D(rB) 4 0 rA rB
mrmovq D(rB), rA 5 0 rA rB
OPq rA, rB 6 fn rA rB
jCC Dest 7 cc
call Dest 8 0
ret 9 0
pushq rA A 0 rA F
popq rA B 0 rA F
V
D
D
Dest
Dest
byte 0: bits 0–7least sig. 4 bits of byte 0: bits 0–4most sig. 4 bits of byte 0: bits 4–8most sig. 4 bits of byte 1: bits 12–16least sig. 4 bits of byte 1: bits 8–12
43
Y86 encoding table
byte: 0 1 2 3 4 5 6 7 8 9
halt 0 0
nop 1 0
rrmovq/cmovCC rA, rB 2 cc rA rB
irmovq V, rB 3 0 F rB
rmmovq rA, D(rB) 4 0 rA rB
mrmovq D(rB), rA 5 0 rA rB
OPq rA, rB 6 fn rA rB
jCC Dest 7 cc
call Dest 8 0
ret 9 0
pushq rA A 0 rA F
popq rA B 0 rA F
V
D
D
Dest
Dest
byte 0: bits 0–7
least sig. 4 bits of byte 0: bits 0–4most sig. 4 bits of byte 0: bits 4–8most sig. 4 bits of byte 1: bits 12–16least sig. 4 bits of byte 1: bits 8–12
43
Y86 encoding table
byte: 0 1 2 3 4 5 6 7 8 9
halt 0 0
nop 1 0
rrmovq/cmovCC rA, rB 2 cc rA rB
irmovq V, rB 3 0 F rB
rmmovq rA, D(rB) 4 0 rA rB
mrmovq D(rB), rA 5 0 rA rB
OPq rA, rB 6 fn rA rB
jCC Dest 7 cc
call Dest 8 0
ret 9 0
pushq rA A 0 rA F
popq rA B 0 rA F
V
D
D
Dest
Dest
byte 0: bits 0–7
least sig. 4 bits of byte 0: bits 0–4
most sig. 4 bits of byte 0: bits 4–8most sig. 4 bits of byte 1: bits 12–16least sig. 4 bits of byte 1: bits 8–12
43
Y86 encoding table
byte: 0 1 2 3 4 5 6 7 8 9
halt 0 0
nop 1 0
rrmovq/cmovCC rA, rB 2 cc rA rB
irmovq V, rB 3 0 F rB
rmmovq rA, D(rB) 4 0 rA rB
mrmovq D(rB), rA 5 0 rA rB
OPq rA, rB 6 fn rA rB
jCC Dest 7 cc
call Dest 8 0
ret 9 0
pushq rA A 0 rA F
popq rA B 0 rA F
V
D
D
Dest
Dest
byte 0: bits 0–7least sig. 4 bits of byte 0: bits 0–4
most sig. 4 bits of byte 0: bits 4–8
most sig. 4 bits of byte 1: bits 12–16least sig. 4 bits of byte 1: bits 8–12
43
Y86 encoding table
byte: 0 1 2 3 4 5 6 7 8 9
halt 0 0
nop 1 0
rrmovq/cmovCC rA, rB 2 cc rA rB
irmovq V, rB 3 0 F rB
rmmovq rA, D(rB) 4 0 rA rB
mrmovq D(rB), rA 5 0 rA rB
OPq rA, rB 6 fn rA rB
jCC Dest 7 cc
call Dest 8 0
ret 9 0
pushq rA A 0 rA F
popq rA B 0 rA F
V
D
D
Dest
Dest
byte 0: bits 0–7least sig. 4 bits of byte 0: bits 0–4most sig. 4 bits of byte 0: bits 4–8
most sig. 4 bits of byte 1: bits 12–16
least sig. 4 bits of byte 1: bits 8–12
43
Y86 encoding table
byte: 0 1 2 3 4 5 6 7 8 9
halt 0 0
nop 1 0
rrmovq/cmovCC rA, rB 2 cc rA rB
irmovq V, rB 3 0 F rB
rmmovq rA, D(rB) 4 0 rA rB
mrmovq D(rB), rA 5 0 rA rB
OPq rA, rB 6 fn rA rB
jCC Dest 7 cc
call Dest 8 0
ret 9 0
pushq rA A 0 rA F
popq rA B 0 rA F
V
D
D
Dest
Dest
byte 0: bits 0–7least sig. 4 bits of byte 0: bits 0–4most sig. 4 bits of byte 0: bits 4–8most sig. 4 bits of byte 1: bits 12–16
least sig. 4 bits of byte 1: bits 8–12 43
nop/halt CPU
thePcInstr.Mem.
add 1
valP
Stat
MUX
STAT_AOKSTAT_HLTSTAT_INS
extract opcode
register pP {thePc : 64 = 0;
}p_thePc = P_thePc + 1;pc = P_thePc;Stat = [
i10bytes[4..8] == NOP : STAT_AOK;i10bytes[4..8] == HALT : STAT_HLT;1 : STAT_INS; (default case)
];
44
nop/halt CPU
thePcInstr.Mem.
add 1
valP
Stat
MUX
STAT_AOKSTAT_HLTSTAT_INS
extract opcode
register pP {thePc : 64 = 0;
}p_thePc = P_thePc + 1;pc = P_thePc;Stat = [
i10bytes[4..8] == NOP : STAT_AOK;i10bytes[4..8] == HALT : STAT_HLT;1 : STAT_INS; (default case)
];
44
nop/halt CPU
thePcInstr.Mem.
add 1
valP
Stat
MUX
STAT_AOKSTAT_HLTSTAT_INS
extract opcode
register pP {thePc : 64 = 0;
}p_thePc = P_thePc + 1;pc = P_thePc;Stat = [
i10bytes[4..8] == NOP : STAT_AOK;i10bytes[4..8] == HALT : STAT_HLT;1 : STAT_INS; (default case)
];
44
nop/jmp CPU
PC
Instr.Mem. split MUX
1 if jmp0 if nop
icode
dest
+ 1 (nop size)
wire valP : 64;wire icode : 4, dest: 64;register pP {
thePc : 64 = 0;}icode = i10bytes[4..8];dest = i10bytes[8..72];valP = [
icode == NOP : P_thePc + 1;icode == JXX : dest;
];p_thePc = valP;pc = P_thePc;
Stat = [(icode == NOP ||icode == JXX) : STAT_AOK;icode == HALT : STAT_HLT;1 : STAT_INS;
];45
nop/jmp CPU
PC
Instr.Mem. split MUX
1 if jmp0 if nop
icode
dest
+ 1 (nop size)
wire valP : 64;wire icode : 4, dest: 64;register pP {
thePc : 64 = 0;}icode = i10bytes[4..8];dest = i10bytes[8..72];valP = [
icode == NOP : P_thePc + 1;icode == JXX : dest;
];p_thePc = valP;pc = P_thePc;
Stat = [(icode == NOP ||icode == JXX) : STAT_AOK;icode == HALT : STAT_HLT;1 : STAT_INS;
];45
running nop/jmp/halt
nopjmp.ys:nopjmp C
B: jmp DC: jmp BD: nop
nophalt
…assemble with yas
46
nopjmp.yo
nopjmp.yo:0x000: 10 | nop0x001: 701300000000000000 | jmp C0x00a: 701c00000000000000 | B: jmp D0x013: 700a00000000000000 | C: jmp B0x01c: 10 | D: nop0x01d: 10 | nop0x01e: 00 | halt
47
nopjmp.yo
nopjmp.yo:0x000: 10 | nop0x001: 701300000000000000 | jmp C0x00a: 701c00000000000000 | B: jmp D0x013: 700a00000000000000 | C: jmp B0x01c: 10 | D: nop0x01d: 10 | nop0x01e: 00 | halt
47
running nopjmp.yo
$ ./hclrs nopjmp_cpu.hcl nopjmp.yo......+--------------------- (end of halted state) ---------------------------+Cycles run: 7
48
differences from book
wire not bool or int
book uses names like valC — not required!author’s environment limited adding new wires
implement your own ALU
49
differences from book
wire not bool or int
book uses names like valC — not required!author’s environment limited adding new wires
implement your own ALU
49
things in HCLRS
register banks
wires
things for our processor:Stat registerinstruction memorythe register filedata memory
50
things in HCLRS
register banks
wires
things for our processor:Stat registerinstruction memorythe register filedata memory
51
register banks
register xY {foo : width1 = defaultValue1;bar : width2 = defaultValue2;
}two letters: input (X) / Output (Y)
input signals: x_foo, x_baroutput signals: Y_foo, Y_bar
each value has width in bits
each value has initial value — mandatory
some other signals — stall, bubblelater in semester
52
register banks
register xY {foo : width1 = defaultValue1;bar : width2 = defaultValue2;
}two letters: input (X) / Output (Y)
input signals: x_foo, x_baroutput signals: Y_foo, Y_bar
each value has width in bits
each value has initial value — mandatory
some other signals — stall, bubblelater in semester
52
register banks
register xY {foo : width1 = defaultValue1;bar : width2 = defaultValue2;
}two letters: input (X) / Output (Y)
input signals: x_foo, x_baroutput signals: Y_foo, Y_bar
each value has width in bits
each value has initial value — mandatory
some other signals — stall, bubblelater in semester
52
things in HCLRS
register banks
wires
things for our processor:Stat registerinstruction memorythe register filedata memory
53
wires
wire wireName : wireWidth;wireName = ...;... = wireName;... = wireName;things that can accept/produce a signal
some created implicitly – e.g. by creating registersome builtin — supplied components (like instruction memory)
assignment — connecting wires
54
wires and order
wire icode : 4;wire valP : 64;register pP {
thePc : 64 = 0;}valP = P_thePC + 1;p_thePc = valP;pc = P_thePc;icode = i10bytes[4..8];Stat = [
icode == NOP : STAT_AOK;icode == HALT : STAT_HLT;1 : STAT_INS;
];
wire icode : 4;wire valP : 64;register pP {
thePc : 64 = 0;}p_thePc = valP;pc = P_thePc;Stat = [
icode == NOP : STAT_AOK;icode == HALT : STAT_HLT;1 : STAT_INS;
];valP = P_thePC + 1;icode = i10bytes[4..8];
order doesn’t matterwire is connected or not connected
55
wires and order
wire icode : 4;wire valP : 64;register pP {
thePc : 64 = 0;}valP = P_thePC + 1;p_thePc = valP;pc = P_thePc;icode = i10bytes[4..8];Stat = [
icode == NOP : STAT_AOK;icode == HALT : STAT_HLT;1 : STAT_INS;
];
wire icode : 4;wire valP : 64;register pP {
thePc : 64 = 0;}p_thePc = valP;pc = P_thePc;Stat = [
icode == NOP : STAT_AOK;icode == HALT : STAT_HLT;1 : STAT_INS;
];valP = P_thePC + 1;icode = i10bytes[4..8];
order doesn’t matterwire is connected or not connected
55
wires and order
wire icode : 4;wire valP : 64;register pP {
thePc : 64 = 0;}valP = P_thePC + 1;p_thePc = valP;pc = P_thePc;icode = i10bytes[4..8];Stat = [
icode == NOP : STAT_AOK;icode == HALT : STAT_HLT;1 : STAT_INS;
];
wire icode : 4;wire valP : 64;register pP {
thePc : 64 = 0;}p_thePc = valP;pc = P_thePc;Stat = [
icode == NOP : STAT_AOK;icode == HALT : STAT_HLT;1 : STAT_INS;
];valP = P_thePC + 1;icode = i10bytes[4..8];
order doesn’t matterwire is connected or not connected
55
wires and widthwire bigValueOne: 64;wire bigValueTwo: 64;wire smallValue: 32;bigValueOne = smallValue; /* ERROR */smallValue = bigValueTwo; /* ERROR */…
wire bigValueOne: 64;wire bigValueTwo: 64;wire smallValue: 32;
smallValue = bigValueTwo[0..32]; /* OKAY */
56
things in HCLRS
register banks
wires
things for our processor:Stat registerinstruction memorythe register filedata memory
57
Stat register
how do we stop the machine?
hard-wired mechanism — Stat register
possible values:STAT_AOK — keep goingSTAT_HLT — stop, normal shtdownSTAT_INS — invalid instruction…(and more errors)
must be set
determines if simulator keeps going
58
things in HCLRS
register banks
wires
things for our processor:Stat registerinstruction memorythe register filedata memory
59
program memory
input wire: pc
output wire: i10bytes80-bits wide (10 bytes)bit 0 — least significant bit of first byte(width of largest instruction)
what about less than 10 byte instructions?just don’t use the extra bits
60
program memory
input wire: pc
output wire: i10bytes80-bits wide (10 bytes)bit 0 — least significant bit of first byte(width of largest instruction)
what about less than 10 byte instructions?just don’t use the extra bits
60
things in HCLRS
register banks
wires
things for our processor:Stat registerinstruction memorythe register filedata memory
61
register file
four register number inputs (4-bit):sources: reg_srcA, reg_srcBdestinations: reg_dstM reg_dstE
no write or no read? register number 0xF (REG_NONE)
two register value inputs (64-bit):reg_inputE, reg_inputM
two register output values (64-bit):reg_outputA, reg_outputB
62
example using register file: add CPU
wire rA : 4, rB : 4, icode : 4, ifunc: 4;register pP {
thePC : 64 = 0;}/* PC update: */pc = P_thePC; p_thePC = P_thePC + 2;/* Decode: */icode = i10bytes[4..8]; ifunc = i10bytes[0..4];rA = i10bytes[12..16]; rB = i10bytes[8..12];reg_srcA = rA;reg_srcB = rB;/* Execute + Writeback: */reg_inputE = reg_outputA + reg_outputB;reg_dstE = rB;/* Status maintainence: */Stat = ...
63
example using register file: add CPU
wire rA : 4, rB : 4, icode : 4, ifunc: 4;register pP {
thePC : 64 = 0;}/* PC update: */pc = P_thePC; p_thePC = P_thePC + 2;/* Decode: */icode = i10bytes[4..8]; ifunc = i10bytes[0..4];rA = i10bytes[12..16]; rB = i10bytes[8..12];reg_srcA = rA;reg_srcB = rB;/* Execute + Writeback: */reg_inputE = reg_outputA + reg_outputB;reg_dstE = rB;/* Status maintainence: */Stat = ...
63
example using register file: add CPU
wire rA : 4, rB : 4, icode : 4, ifunc: 4;register pP {
thePC : 64 = 0;}/* PC update: */pc = P_thePC; p_thePC = P_thePC + 2;/* Decode: */icode = i10bytes[4..8]; ifunc = i10bytes[0..4];rA = i10bytes[12..16]; rB = i10bytes[8..12];reg_srcA = rA;reg_srcB = rB;/* Execute + Writeback: */reg_inputE = reg_outputA + reg_outputB;reg_dstE = rB;/* Status maintainence: */Stat = ...
63
register file pictureregister file
reg_srcA
reg_srcB
reg_dstM
reg_dstE
next R[dstM] = reg_inputM
next R[dstE] = reg_inputE
reg_outputA = R[srcA]reg_outputB = R[srcB]
from rAfrom rB
from rB
from sum
unset (default 0xF = none)
unused
64
register file pictureregister file
reg_srcA
reg_srcB
reg_dstM
reg_dstE
next R[dstM] = reg_inputM
next R[dstE] = reg_inputE
reg_outputA = R[srcA]reg_outputB = R[srcB]
from rAfrom rB
from rB
from sum
unset (default 0xF = none)
unused
64
register file pictureregister file
reg_srcA
reg_srcB
reg_dstM
reg_dstE
next R[dstM] = reg_inputM
next R[dstE] = reg_inputE
reg_outputA = R[srcA]reg_outputB = R[srcB]
from rAfrom rB
from rB
from sum
unset (default 0xF = none)
unused
64
register file pictureregister file
reg_srcA
reg_srcB
reg_dstM
reg_dstE
next R[dstM] = reg_inputM
next R[dstE] = reg_inputE
reg_outputA = R[srcA]reg_outputB = R[srcB]
from rAfrom rB
from rB
from sum
unset (default 0xF = none)
unused
64
things in HCLRS
register banks
wires
things for our processor:Stat registerinstruction memorythe register filedata memory
65
data memory
input address: mem_addr
input value: mem_input
output value: mem_output
read/write enable: mem_readbit, mem_writebit
66
reading from data memory
mem_addr = 0x12345678;mem_readbit = 1;mem_writebit = 0;... = mem_output;
mem_output has value in same cycle
67
reading from data memory
mem_addr = 0x12345678;mem_readbit = 1;mem_writebit = 0;... = mem_output;
mem_output has value in same cycle
67
reading from data memory
mem_addr = 0x12345678;mem_readbit = 1;mem_writebit = 0;... = mem_output;
mem_output has value in same cycle
67
writing to data memory
mem_addr = 0x12345678;mem_input = ...;mem_readbit = 0;mem_writebit = 1;
memory updated for next cycle
68
writing to data memory
mem_addr = 0x12345678;mem_input = ...;mem_readbit = 0;mem_writebit = 1;
memory updated for next cycle
68
writing to data memory
mem_addr = 0x12345678;mem_input = ...;mem_readbit = 0;mem_writebit = 1;
memory updated for next cycle
68
debugging mode
+------------------- between cycles 0 and 1 ----------------------+| RAX: 0 RCX: 0 RDX: 0 || RBX: 0 RSP: 0 RBP: 0 || RSI: 0 RDI: 0 R8: 0 || R9: 0 R10: 0 R11: 0 || R12: 0 R13: 0 R14: 0 || register pP(N) thePc=0000000000000000 || used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f || 0x0000000_: 10 70 13 00 00 00 00 00 00 00 70 1c 00 00 00 00 || 0x0000001_: 00 00 00 70 0a 00 00 00 00 00 00 00 10 10 00 |+-----------------------------------------------------------------------+pc set to 0x0i10bytes set to 0x137010 (reading 10 bytes from memory at pc=0x0)pc = 0x0; loaded [10 : nop]icode set to 0x1dest set to 0x1370Stat set to 0x1valP set to 0x1p_thePc set to 0x1.------------------- between cycles 1 and 2 ----------------------+...
69
debugging mode
+------------------- between cycles 0 and 1 ----------------------+| RAX: 0 RCX: 0 RDX: 0 || RBX: 0 RSP: 0 RBP: 0 || RSI: 0 RDI: 0 R8: 0 || R9: 0 R10: 0 R11: 0 || R12: 0 R13: 0 R14: 0 || register pP(N) thePc=0000000000000000 || used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f || 0x0000000_: 10 70 13 00 00 00 00 00 00 00 70 1c 00 00 00 00 || 0x0000001_: 00 00 00 70 0a 00 00 00 00 00 00 00 10 10 00 |+-----------------------------------------------------------------------+pc set to 0x0i10bytes set to 0x137010 (reading 10 bytes from memory at pc=0x0)pc = 0x0; loaded [10 : nop]icode set to 0x1dest set to 0x1370Stat set to 0x1valP set to 0x1p_thePc set to 0x1.------------------- between cycles 1 and 2 ----------------------+...
69
interactive + debugging mode
$ ./nopjmp_cpu.exe -i -d nopjmp.yo+------------------- between cycles 0 and 1 ----------------------+| RAX: 0 RCX: 0 RDX: 0 || RBX: 0 RSP: 0 RBP: 0 || RSI: 0 RDI: 0 R8: 0 || R9: 0 R10: 0 R11: 0 || R12: 0 R13: 0 R14: 0 || register pP(N) thePc=0000000000000000 || used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f || 0x0000000_: 10 70 13 00 00 00 00 00 00 00 70 1c 00 00 00 00 || 0x0000001_: 00 00 00 70 0a 00 00 00 00 00 00 00 10 10 00 |+-----------------------------------------------------------------------+(press enter to continue)set pc to 0x0pc = 0x0; loaded [10 : nop]set icode to 0x1set valP to 0x1set p_thePc to 0x1set Stat to 0x1+------------------- between cycles 1 and 2 ----------------------+...
70
interactive + debugging mode
$ ./nopjmp_cpu.exe -i -d nopjmp.yo+------------------- between cycles 0 and 1 ----------------------+| RAX: 0 RCX: 0 RDX: 0 || RBX: 0 RSP: 0 RBP: 0 || RSI: 0 RDI: 0 R8: 0 || R9: 0 R10: 0 R11: 0 || R12: 0 R13: 0 R14: 0 || register pP(N) thePc=0000000000000000 || used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f || 0x0000000_: 10 70 13 00 00 00 00 00 00 00 70 1c 00 00 00 00 || 0x0000001_: 00 00 00 70 0a 00 00 00 00 00 00 00 10 10 00 |+-----------------------------------------------------------------------+(press enter to continue)set pc to 0x0pc = 0x0; loaded [10 : nop]set icode to 0x1set valP to 0x1set p_thePc to 0x1set Stat to 0x1+------------------- between cycles 1 and 2 ----------------------+...
70
quiet mode
$ ./hclrs nopjmp_cpu.hcl -q nopjmp.yo+----------------------- halted in state: ------------------------------+| RAX: 0 RCX: 0 RDX: 0 || RBX: 0 RSP: 0 RBP: 0 || RSI: 0 RDI: 0 R8: 0 || R9: 0 R10: 0 R11: 0 || R12: 0 R13: 0 R14: 0 || register pP(N) { thePc=0000000000000000 } || used memory: _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _a _b _c _d _e _f || 0x0000000_: 10 70 13 00 00 00 00 00 00 00 70 1c 00 00 00 00 || 0x0000001_: 00 00 00 70 0a 00 00 00 00 00 00 00 10 10 00 |+--------------------- (end of halted state) ---------------------------+Cycles run: 7
71
HCLRS summary
declare/assign values to wires
MUXes with[ test1: value1; test2: value2 ]
register banks with register iO:next value on i_name; current value on O_name
fixed functionalityregister file (15 registers; 2 read + 2 write)memories (data + instruction)Stat register (start/stop/error)
72
exercise: implementing ALU?
wire aluOp : 2,aluValueA : 64,aluValueB : 64,aluResult : 64;
const ALU_ADD = 0b00,ALU_SUB = 0b01,ALU_AND = 0b10,ALU_XOR = 0b11;
aluResult = [aluOp == ALU_ADD : aluValueA + aluValueB;aluOp == ALU_SUB : aluValueA - aluValueB;aluOp == ALU_AND : aluValueA & aluValueB;aluOp == ALU_XOR : aluValueA ^ aluValueB
];
73
on design choices
textbook choices:memory always goes to ‘M’ port of register fileRSP +/- 8 uses normal ALU, not seperate adders…
do you have to do this? no
you: single cycle/instruction; use supplied register/memory
other logic: make it function correctly
74
comparing to yis
$ ./hclrs nopjmp_cpu.hcl nopjmp.yo......+--------------------- (end of halted state) ---------------------------+Cycles run: 7
$ ./tools/yis nopjmp.yoStopped in 7 steps at PC = 0x1e. Status 'HLT', CC Z=1 S=0 O=0Changes to registers:
Changes to memory:
75
circuit: setting MUXes
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+
8
9
PC+2
M[PC+1]
rA=8
rB=9
R[8]
R[9]
aluA + aluB
M[PC+2]
add
MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select when running addq %r8, %r9?
MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select for rmmovq?MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select for call?
76
circuit: setting MUXes
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+
8
9
PC+2
M[PC+1]
rA=8
rB=9
R[8]
R[9]
aluA + aluB
M[PC+2]
add
MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select when running addq %r8, %r9?
MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select for rmmovq?MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select for call?
76
circuit: setting MUXes
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+
8
9
PC+2
M[PC+1]
rA=8
rB=9
R[8]
R[9]
aluA + aluB
M[PC+2]
add
MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select when running addq %r8, %r9?
MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select for rmmovq?MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select for call?
76
circuit: setting MUXes
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+
8
9
PC+2
M[PC+1]
rA=8
rB=9
R[8]
R[9]
aluA + aluB
M[PC+2]
add
MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select when running addq %r8, %r9?MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select for rmmovq?MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select for call?
76
circuit: setting MUXes
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+
8
9
PC+2
M[PC+1]
rA=8
rB=9
R[8]
R[9]
aluA + aluB
M[PC+2]
add
MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select when running addq %r8, %r9?
MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select for rmmovq?
MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select for call?
76
circuit: setting MUXes
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+
8
9
PC+2
M[PC+1]
rA=8
rB=9
R[8]
R[9]
aluA + aluB
M[PC+2]
add
MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select when running addq %r8, %r9?
MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select for rmmovq?
MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select for call?
76
circuit: setting MUXes
PC
Instr.Mem.
register filesrcA
srcB
R[srcA]R[srcB]
dstE
next R[dstE]
dstM
next R[dstM]
DataMem.
ZF/SF
Data in
Addr inData out
valC
0xF
0xF%rsp
%rsp
rA
rB
ALUaluA
aluBvalE8
0
add/subxor/and(functionof instr.)
write?
functionof opcode
PC + 10
instr.length+
8
9
PC+2
M[PC+1]
rA=8
rB=9
R[8]
R[9]
aluA + aluB
M[PC+2]
add
MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select when running addq %r8, %r9?MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select for rmmovq?
MUXes — PC, dstM, dstE, aluA, aluB, dmemInExercise: what do they select for call?
76