ee 357 unit 21 - usc viterbiee.usc.edu/~redekopp/ee357/slides/ee357unit21_finalreview_notes.pdfee...
TRANSCRIPT
© Mark Redekopp, All rights reserved
Where EE 357 Fits
• CS 101,102,105,201– Programming with high-
level languages (HLL’s) like C / C++/ Java
• EE 101,201– Digital hardware
(registers, adders, muxes)
C / C++ / Java
Logic Gates
Transistors
HWHW
SWSW
Voltage / Currents
Applications
Functional Units
(Registers, Adders, Muxes)
© Mark Redekopp, All rights reserved
Where EE 357 Fits
• CS 101,102,105,201– Programming with high-
level languages (HLL’s) like C / C++/ Java
• EE 101,201– Digital hardware (registers,
adders, muxes)
• EE 357– Computer organization and
architecture• HW/SW System Perspective
– Topics• HW/SW interface
• System Software
• Assembly Language
• Computer Architecture
C / C++ / Java
Logic Gates
Transistors
HWHW
SWSW
Voltage / Currents
Assembly /
Machine Code
Applications
LibrariesOS
Processor / Memory / I/O
Functional Units
(Registers, Adders, Muxes)
© Mark Redekopp, All rights reserved
Where Computer Architecture Fits
Computer Architecture
Software Development
(Parallel Programming,
Memory Hierarchy effects)
Operating Systems + Compilers
Embedded Devices +
Applications
IC/VLSI/Digital Design
New Technology
© Mark Redekopp, All rights reserved
EE 357 in Context
EE 357 EE 457 EE 557EE 653
Software Development
( CS 303, EE 451?)
Operating Systems + Compilers
(CS 402 / CS 410)
IC/Digital Design
(EE 477L, EE 438L)
New Technology
(EE 337L)
Embedded Devices + Applications
(EE 459L, EE 454L, EE 579, EE 483,
EE 434L, BME 302, BME 405L)
© Mark Redekopp, All rights reserved
Architecture
• Multicore
– Power, multithreaded/multicore, parallel programming
• Reliability
– Smaller transistors leads to reliability issues (bits can be flipped
accidentally, transistors can “break”, etc.)
– How do we add reliability into the architecture
• Mobile and network-centric
– Low power, software/hardware decomposition
• Architectural Concepts
– Make the common case fast (Amdahl’s law)
– Concept of caching (save your work and reuse it next time you
need it) applies to almost anything (SW programs can “cache”
their results, too), not just “cache memory”
– Often easier to tackle throughput rather than latency
© Mark Redekopp, All rights reserved
Digital Design & VLSI
• Build the structures and implement the algorithms that
architects, signal processing, and communications
engineers develop given power, area, and performance
constraints
• Focus on SoC (System-on-chip)
– Combine processor core + IP cores => Embedded System– http://en.wikipedia.org/wiki/List_of_semiconductor_IP_core_vendors
– http://www.design-reuse.com
• Learn Verilog and/or VHDL
• Understand advantages of FPGA vs. custom chips
• Learn to program (likely in C/C++ or maybe Python)
• Take EE 477L (VLSI design) as a technical elective
© Mark Redekopp, All rights reserved
Programming
• Understand the hardware architecture you are working on
and how to take advantage of it
– Effects of Cache
• Sequential access is best
• Static allocation (fewer pointer based data structures) is often better
• Cluster accesses to same data…don’t use it, do a lot of other stuff, then reuse
it if you don’t have to
– Thread Level Parallelism (Create parallel tasks)
• OpenMP, MPI, Native Threads
– Data Level Parallelism (SIMD)
• Compiler options, intrinsics
• Understand the cost of parallelization
– Amdahl’s law
– Cost of communications/synchronization
© Mark Redekopp, All rights reserved
Embedded Systems
• Complex systems which integrate and
interface to many I/O devices can be
cheaply and efficiently made with
programmable microcontrollers
• Identify your I/O and computational needs
and select an appropriate
microcontroller/microprocessor
© Mark Redekopp, All rights reserved
Microcontrollers
• Freescale (www.freescale.com)
• Atmel – AVR (www.atmel.com)
• PIC (www.microchip.com)
• ARM (www.arm.com)
© Mark Redekopp, All rights reserved
Embedded Systems Devices
• Parallel I/O
– Character LCD’s
– LED’s, Switches, Pushbuttons
• Serial I/O
– LCD’s (IIC)
– GPS (RS-232)
– USB
– Bluetooth (direct or RS-232)
– Wireless (direct or RS-232)
– Real Time Clocks (IIC)
– Servo Motors
• Analog Inputs
– Pressure, temperature,
biometric sources
– Touch screens/sensors
• Web sites
– Digikey.com
– Jameco.com
– Sparkfun.com
© Mark Redekopp, All rights reserved
Final Jeopardy
Binary
Brainteasers
Performance
Puzzles
Memory
Madness
Processor
Predicaments
Programming
Pickles
100 100 100 100 100
200 200 200 200 200
300 300 300 300 300
400 400 400 400 400
500 500 500 500 500
© Mark Redekopp, All rights reserved
Binary Brainteaser 100
• Given the binary string “10001101”, what
would its decimal equivalent be assuming
a 2’s complement representation?
© Mark Redekopp, All rights reserved
Binary Brainteaser 200
• Assuming the 12-bit IEEE shortened FP
format, what is the decimal equivalent of
the following number?
1 10010 100010
© Mark Redekopp, All rights reserved
Binary Brainteaser 300
• Under what conditions does overflow
occur in signedsigned arithmetic
(addition/subtraction)?
© Mark Redekopp, All rights reserved
Binary Brainteaser 400
• Under what conditions does overflow
occur in unsignedunsigned arithmetic
(addition/subtraction)?
© Mark Redekopp, All rights reserved
Binary Brainteaser 500
• Given the following normalized FP
number, what would the result be after
using the round-to-nearest method?
+1.011011 100 * 25
© Mark Redekopp, All rights reserved
Performance Puzzle 100
• What is the best metric for performance
measurement (i.e. not subject to
manipulation or misleading results)?
© Mark Redekopp, All rights reserved
Performance Puzzle 200
• What are the three basic components of
the performance equation learned in
class?
© Mark Redekopp, All rights reserved
Performance Puzzle 300
• Which of the three components of the
performance equation would be affected
by the choice of compiler?
© Mark Redekopp, All rights reserved
Performance Puzzle 500
• Using Amdahl’s law, if 50% of the
instructions of a program can be sped up
by a factor of 2, will the speedup of the
program be 1 / 0.75 = 4/3??
© Mark Redekopp, All rights reserved
Memory Madness 100
• SDRAM will allow consecutive
(columns/rows/banks) to be read/written in
bursts?
© Mark Redekopp, All rights reserved
Memory Madness 200
• In a 4-way set associative cache with 512
total blocks, how many bits will be used to
index the set (i.e. the set field of the
address breakdown)?
© Mark Redekopp, All rights reserved
Memory Madness 300
• DRAM (may / will not) lose its content
even though power is continuously
provided and in general is (faster / slower)
to access than SRAM
© Mark Redekopp, All rights reserved
Memory Madness 400
• In general, caches closer to the processor
core are (smaller / larger) so that they can
be faster. In addition, they usually have a
(lower / higher) degree of associativity?
© Mark Redekopp, All rights reserved
Memory Madness 500
• In a 4-way set-associative cache with 128
sets, the worst cache performance will
occur when all accesses map to different
blocks in (the same / different) set(s) and
the earliest an eviction can occur is on the
(1st/ 4th/ 5th/ 128th/ 129th) block access.
© Mark Redekopp, All rights reserved
Processor Predicaments 100
• What is the ideal throughput (CPI or IPC)
of a pipelined CPU?
© Mark Redekopp, All rights reserved
Processor Predicaments 200
• Name the three kinds of hazards that
prevent the pipeline from being kept full?
© Mark Redekopp, All rights reserved
Processor Predicaments 300
• What method(s) can be used to solve the
following Read-After-Write data
hazards/dependencies or at least reduce
the associated stall penalty?
© Mark Redekopp, All rights reserved
Processor Predicaments 400
• Temporary registers are needed in the
(single- / multi-) cycle CPU. An example
of a temporary register is the (PC / IR).
© Mark Redekopp, All rights reserved
Processor Predicaments 500
• The (single- / multi-) cycle CPU
architecture implies variable CPI’s for
different instruction classes and the clock
cycle time is set by the longest (state /
instruction) delay.
© Mark Redekopp, All rights reserved
Programming Pickles 100
• When checking the status of an I/O device
one can rely on interrupts or __________?
© Mark Redekopp, All rights reserved
Programming Pickles 200
• Calling a subroutine requires using the
(bsr / bra) instruction and will result in the
return address being stored (on the stack /
in A7)?
© Mark Redekopp, All rights reserved
Programming Pickles 300
• The stack frame of a subroutine includes
space for three sections of data, what are
they?
© Mark Redekopp, All rights reserved
Programming Pickles 400
• System calls/TRAPS, interrupts, and error
conditions cause breaks in normal
program execution. What is the name we
give to these events?
© Mark Redekopp, All rights reserved
Programming Pickles 500
• What is the name we use for software
routines associated with an interrupt or
other error event?
© Mark Redekopp, All rights reserved
Cache Operation Example
• Address Trace
– R: 0x3c0
– W: 0x048
– R: 0x3d4
– W: 0xb50
• Operations
– Hit
– Fetch block XX
– Evict block XX
(w/ or w/o WB)
– Final WB of block XX)
• Perform address breakdown and apply
address trace
• 2-Way Set-Assoc, N=8, B=8 words
Processor
Access
Cache Operation
R: 0x3c0
W: 0x048
R: 0x3d4
W: 0xb50
Done!
Address Tag Set Word Unused
0x3c0
0x048
0x3d4
0xb50
© Mark Redekopp, All rights reserved
DBNZ on Multicycle CPU
• Many looping operations require decrementing a
counter and branching it the new value is zero
• Many instruction sets include an instruction that
we will term DBNZ (Decrement and Branch if
Not Zero)
• Format: DBNZ $rs, disp
• Operation:
– $rs = $rs – 1
– if $rs /= 0, branch to PC+4+dispOpcode Rs Rt Displacement
6-bits 5-bits 5-bits
(copy of Rs)
16-bits
© Mark Redekopp, All rights reserved
Modified Datapath for DBNZP
C
Memory
Addr.
Read
Data
Write
Data
Me
mR
ea
d
0
1
Me
mW
rite
Instruc.
Reg.
Instruc[31:26]
Instruc[25:0]
IRW
rite
Register File
Read
Reg. 1 #
Read
Reg. 2 #
Write
Reg. #
Write
Data
Read
data 1
Read
data 2
0
1
0
1
AL
U Res.
Zero
0
1
01
23
Sign
Extend
Sh.
Left 2ALU
control
0
1
2
Target
Reg.
Sh.
Left 2
[15:11]4
[20:16]
[25:21]
[15:0]
[5:0]
Reg
Write
PC[31:28]
16 32
26 30
32
PC
Wri
te
AL
US
elA
ALUSelB
PCSource
TargetWrite
IorD
RegDst
MemtoReg
© Mark Redekopp, All rights reserved
Multi-cycle CPU FSM w/ DBNZMemRead
ALUSelA=0
IorD=0
IRWrite
ALUSelB=01
ALUOp=00
PCSource=00
PCWrite
ALUSelA=0
ALUSelB=11
ALUOp=00
TargetWrite
ALUSelA=1
ALUSelB=00
ALUOp=01
PCWriteCond
PCSource=01
PCWrite
PCSource=10
(Op=‘BEQ’)
(Op=‘JMP’)
Branch
Completion
Jump
Completion
Instruc. Fetch Instruc. Decode +
Reg. Fetch
01
8 9
Reset
© Mark Redekopp, All rights reserved
Addressing Modes Review
20003004
20000500
20000506
2000050a
2000050c
2000050e
20000510
20000512
20000514
.data
PTR: .long 0x2000300c
.long 0x20003010
DAT: .long -1,1
RES: .space 4
.text
MAIN: MOVEA.L #DAT,A1
MOVEA.L -4(A1),A0
MOVE.L (A0),D5
MOVE.L -(A0),D6
ADD.L D5,D6
OR.L D5,D6
LSL.L #1,D6
MOVE.L D6,RES
RES = ___________
A1= ___________
A0= ___________
D5= ___________
D6= ___________
A0= ___________
D6= ___________
N,Z,V,C= _________
D6= ___________
D6= ___________