ee 357 unit 21 - usc viterbiee.usc.edu/~redekopp/ee357/slides/ee357unit21_finalreview_notes.pdfee...

44
© Mark Redekopp, All rights reserved EE 357 Unit 21 Final Review

Upload: others

Post on 25-Apr-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

© Mark Redekopp, All rights reserved

EE 357 Unit 21

Final Review

© Mark Redekopp, All rights reserved

A LOOK BACK

EE 357 in review…

© Mark Redekopp, All rights reserved

Where EE 357 Fits

• CS 101,102,105,201– Programming with high-

level languages (HLL’s) like C / C++/ Java

• EE 101,201– Digital hardware

(registers, adders, muxes)

C / C++ / Java

Logic Gates

Transistors

HWHW

SWSW

Voltage / Currents

Applications

Functional Units

(Registers, Adders, Muxes)

© Mark Redekopp, All rights reserved

Where EE 357 Fits

• CS 101,102,105,201– Programming with high-

level languages (HLL’s) like C / C++/ Java

• EE 101,201– Digital hardware (registers,

adders, muxes)

• EE 357– Computer organization and

architecture• HW/SW System Perspective

– Topics• HW/SW interface

• System Software

• Assembly Language

• Computer Architecture

C / C++ / Java

Logic Gates

Transistors

HWHW

SWSW

Voltage / Currents

Assembly /

Machine Code

Applications

LibrariesOS

Processor / Memory / I/O

Functional Units

(Registers, Adders, Muxes)

© Mark Redekopp, All rights reserved

Where Computer Architecture Fits

Computer Architecture

Software Development

(Parallel Programming,

Memory Hierarchy effects)

Operating Systems + Compilers

Embedded Devices +

Applications

IC/VLSI/Digital Design

New Technology

© Mark Redekopp, All rights reserved

EE 357 in Context

EE 357 EE 457 EE 557EE 653

Software Development

( CS 303, EE 451?)

Operating Systems + Compilers

(CS 402 / CS 410)

IC/Digital Design

(EE 477L, EE 438L)

New Technology

(EE 337L)

Embedded Devices + Applications

(EE 459L, EE 454L, EE 579, EE 483,

EE 434L, BME 302, BME 405L)

© Mark Redekopp, All rights reserved

Architecture

• Multicore

– Power, multithreaded/multicore, parallel programming

• Reliability

– Smaller transistors leads to reliability issues (bits can be flipped

accidentally, transistors can “break”, etc.)

– How do we add reliability into the architecture

• Mobile and network-centric

– Low power, software/hardware decomposition

• Architectural Concepts

– Make the common case fast (Amdahl’s law)

– Concept of caching (save your work and reuse it next time you

need it) applies to almost anything (SW programs can “cache”

their results, too), not just “cache memory”

– Often easier to tackle throughput rather than latency

© Mark Redekopp, All rights reserved

Digital Design & VLSI

• Build the structures and implement the algorithms that

architects, signal processing, and communications

engineers develop given power, area, and performance

constraints

• Focus on SoC (System-on-chip)

– Combine processor core + IP cores => Embedded System– http://en.wikipedia.org/wiki/List_of_semiconductor_IP_core_vendors

– http://www.design-reuse.com

• Learn Verilog and/or VHDL

• Understand advantages of FPGA vs. custom chips

• Learn to program (likely in C/C++ or maybe Python)

• Take EE 477L (VLSI design) as a technical elective

© Mark Redekopp, All rights reserved

Programming

• Understand the hardware architecture you are working on

and how to take advantage of it

– Effects of Cache

• Sequential access is best

• Static allocation (fewer pointer based data structures) is often better

• Cluster accesses to same data…don’t use it, do a lot of other stuff, then reuse

it if you don’t have to

– Thread Level Parallelism (Create parallel tasks)

• OpenMP, MPI, Native Threads

– Data Level Parallelism (SIMD)

• Compiler options, intrinsics

• Understand the cost of parallelization

– Amdahl’s law

– Cost of communications/synchronization

© Mark Redekopp, All rights reserved

Embedded Systems

• Complex systems which integrate and

interface to many I/O devices can be

cheaply and efficiently made with

programmable microcontrollers

• Identify your I/O and computational needs

and select an appropriate

microcontroller/microprocessor

© Mark Redekopp, All rights reserved

Microcontrollers

• Freescale (www.freescale.com)

• Atmel – AVR (www.atmel.com)

• PIC (www.microchip.com)

• ARM (www.arm.com)

© Mark Redekopp, All rights reserved

Embedded Systems Devices

• Parallel I/O

– Character LCD’s

– LED’s, Switches, Pushbuttons

• Serial I/O

– LCD’s (IIC)

– GPS (RS-232)

– USB

– Bluetooth (direct or RS-232)

– Wireless (direct or RS-232)

– Real Time Clocks (IIC)

– Servo Motors

• Analog Inputs

– Pressure, temperature,

biometric sources

– Touch screens/sensors

• Web sites

– Digikey.com

– Jameco.com

– Sparkfun.com

© Mark Redekopp, All rights reserved

REVIEW FOR FINAL

© Mark Redekopp, All rights reserved

Final Jeopardy

Binary

Brainteasers

Performance

Puzzles

Memory

Madness

Processor

Predicaments

Programming

Pickles

100 100 100 100 100

200 200 200 200 200

300 300 300 300 300

400 400 400 400 400

500 500 500 500 500

© Mark Redekopp, All rights reserved

Binary Brainteaser 100

• Given the binary string “10001101”, what

would its decimal equivalent be assuming

a 2’s complement representation?

© Mark Redekopp, All rights reserved

Binary Brainteaser 200

• Assuming the 12-bit IEEE shortened FP

format, what is the decimal equivalent of

the following number?

1 10010 100010

© Mark Redekopp, All rights reserved

Binary Brainteaser 300

• Under what conditions does overflow

occur in signedsigned arithmetic

(addition/subtraction)?

© Mark Redekopp, All rights reserved

Binary Brainteaser 400

• Under what conditions does overflow

occur in unsignedunsigned arithmetic

(addition/subtraction)?

© Mark Redekopp, All rights reserved

Binary Brainteaser 500

• Given the following normalized FP

number, what would the result be after

using the round-to-nearest method?

+1.011011 100 * 25

© Mark Redekopp, All rights reserved

Performance Puzzle 100

• What is the best metric for performance

measurement (i.e. not subject to

manipulation or misleading results)?

© Mark Redekopp, All rights reserved

Performance Puzzle 200

• What are the three basic components of

the performance equation learned in

class?

© Mark Redekopp, All rights reserved

Performance Puzzle 300

• Which of the three components of the

performance equation would be affected

by the choice of compiler?

© Mark Redekopp, All rights reserved

Performance Puzzle 400

• State Amdahl’s law for speedup?

© Mark Redekopp, All rights reserved

Performance Puzzle 500

• Using Amdahl’s law, if 50% of the

instructions of a program can be sped up

by a factor of 2, will the speedup of the

program be 1 / 0.75 = 4/3??

© Mark Redekopp, All rights reserved

Memory Madness 100

• SDRAM will allow consecutive

(columns/rows/banks) to be read/written in

bursts?

© Mark Redekopp, All rights reserved

Memory Madness 200

• In a 4-way set associative cache with 512

total blocks, how many bits will be used to

index the set (i.e. the set field of the

address breakdown)?

© Mark Redekopp, All rights reserved

Memory Madness 300

• DRAM (may / will not) lose its content

even though power is continuously

provided and in general is (faster / slower)

to access than SRAM

© Mark Redekopp, All rights reserved

Memory Madness 400

• In general, caches closer to the processor

core are (smaller / larger) so that they can

be faster. In addition, they usually have a

(lower / higher) degree of associativity?

© Mark Redekopp, All rights reserved

Memory Madness 500

• In a 4-way set-associative cache with 128

sets, the worst cache performance will

occur when all accesses map to different

blocks in (the same / different) set(s) and

the earliest an eviction can occur is on the

(1st/ 4th/ 5th/ 128th/ 129th) block access.

© Mark Redekopp, All rights reserved

Processor Predicaments 100

• What is the ideal throughput (CPI or IPC)

of a pipelined CPU?

© Mark Redekopp, All rights reserved

Processor Predicaments 200

• Name the three kinds of hazards that

prevent the pipeline from being kept full?

© Mark Redekopp, All rights reserved

Processor Predicaments 300

• What method(s) can be used to solve the

following Read-After-Write data

hazards/dependencies or at least reduce

the associated stall penalty?

© Mark Redekopp, All rights reserved

Processor Predicaments 400

• Temporary registers are needed in the

(single- / multi-) cycle CPU. An example

of a temporary register is the (PC / IR).

© Mark Redekopp, All rights reserved

Processor Predicaments 500

• The (single- / multi-) cycle CPU

architecture implies variable CPI’s for

different instruction classes and the clock

cycle time is set by the longest (state /

instruction) delay.

© Mark Redekopp, All rights reserved

Programming Pickles 100

• When checking the status of an I/O device

one can rely on interrupts or __________?

© Mark Redekopp, All rights reserved

Programming Pickles 200

• Calling a subroutine requires using the

(bsr / bra) instruction and will result in the

return address being stored (on the stack /

in A7)?

© Mark Redekopp, All rights reserved

Programming Pickles 300

• The stack frame of a subroutine includes

space for three sections of data, what are

they?

© Mark Redekopp, All rights reserved

Programming Pickles 400

• System calls/TRAPS, interrupts, and error

conditions cause breaks in normal

program execution. What is the name we

give to these events?

© Mark Redekopp, All rights reserved

Programming Pickles 500

• What is the name we use for software

routines associated with an interrupt or

other error event?

© Mark Redekopp, All rights reserved

Cache Operation Example

• Address Trace

– R: 0x3c0

– W: 0x048

– R: 0x3d4

– W: 0xb50

• Operations

– Hit

– Fetch block XX

– Evict block XX

(w/ or w/o WB)

– Final WB of block XX)

• Perform address breakdown and apply

address trace

• 2-Way Set-Assoc, N=8, B=8 words

Processor

Access

Cache Operation

R: 0x3c0

W: 0x048

R: 0x3d4

W: 0xb50

Done!

Address Tag Set Word Unused

0x3c0

0x048

0x3d4

0xb50

© Mark Redekopp, All rights reserved

DBNZ on Multicycle CPU

• Many looping operations require decrementing a

counter and branching it the new value is zero

• Many instruction sets include an instruction that

we will term DBNZ (Decrement and Branch if

Not Zero)

• Format: DBNZ $rs, disp

• Operation:

– $rs = $rs – 1

– if $rs /= 0, branch to PC+4+dispOpcode Rs Rt Displacement

6-bits 5-bits 5-bits

(copy of Rs)

16-bits

© Mark Redekopp, All rights reserved

Modified Datapath for DBNZP

C

Memory

Addr.

Read

Data

Write

Data

Me

mR

ea

d

0

1

Me

mW

rite

Instruc.

Reg.

Instruc[31:26]

Instruc[25:0]

IRW

rite

Register File

Read

Reg. 1 #

Read

Reg. 2 #

Write

Reg. #

Write

Data

Read

data 1

Read

data 2

0

1

0

1

AL

U Res.

Zero

0

1

01

23

Sign

Extend

Sh.

Left 2ALU

control

0

1

2

Target

Reg.

Sh.

Left 2

[15:11]4

[20:16]

[25:21]

[15:0]

[5:0]

Reg

Write

PC[31:28]

16 32

26 30

32

PC

Wri

te

AL

US

elA

ALUSelB

PCSource

TargetWrite

IorD

RegDst

MemtoReg

© Mark Redekopp, All rights reserved

Multi-cycle CPU FSM w/ DBNZMemRead

ALUSelA=0

IorD=0

IRWrite

ALUSelB=01

ALUOp=00

PCSource=00

PCWrite

ALUSelA=0

ALUSelB=11

ALUOp=00

TargetWrite

ALUSelA=1

ALUSelB=00

ALUOp=01

PCWriteCond

PCSource=01

PCWrite

PCSource=10

(Op=‘BEQ’)

(Op=‘JMP’)

Branch

Completion

Jump

Completion

Instruc. Fetch Instruc. Decode +

Reg. Fetch

01

8 9

Reset

© Mark Redekopp, All rights reserved

Addressing Modes Review

20003004

20000500

20000506

2000050a

2000050c

2000050e

20000510

20000512

20000514

.data

PTR: .long 0x2000300c

.long 0x20003010

DAT: .long -1,1

RES: .space 4

.text

MAIN: MOVEA.L #DAT,A1

MOVEA.L -4(A1),A0

MOVE.L (A0),D5

MOVE.L -(A0),D6

ADD.L D5,D6

OR.L D5,D6

LSL.L #1,D6

MOVE.L D6,RES

RES = ___________

A1= ___________

A0= ___________

D5= ___________

D6= ___________

A0= ___________

D6= ___________

N,Z,V,C= _________

D6= ___________

D6= ___________