datapath components (3) — those who “remember” thingshtseng/classes/ee120a...• the output is...

43
Datapath Components (3) — Those Who “Remember” Things Prof. Usagi

Upload: others

Post on 09-Mar-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

Datapath Components (3) — Those Who “Remember” Things

Prof. Usagi

Page 2: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

• Combinational logic • The output is a pure function of its current inputs • The output doesn’t change regardless how many times the logic is

triggered — Idempotent • Sequential logic

• The output depends on current inputs, previous inputs, their history

2

Recap: Combinational v.s. sequential logic

Sequential circuit has memory!

Page 3: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

Master-Slave D Flip-flop

Recap: D flip-flop

3

D-latchD Q

ClkD-latch

D Q

Clk

Input

Clk

Output

Clk

Input

Output

Page 4: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

Recap: Positive-edge-triggered D flip-flop

4

Q

Clock

Data

Q

Page 5: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

• Volatile Memory • Registers • SRAM • DRAM

• Programming and memory • Non-volatile Memory

5

Outline

Page 6: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

Registers

6

Page 7: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

Register

Clk

D Flip-flop

DD Q

Input 1

Output 1

D Flip-flop

DD Q

Input 2

Output 2

D Flip-flop

DD Q

Input 3

Output 3

D Flip-flop

DD Q

Input 4

Output 4

D Flip-flop

DD Q

Input 5

Output 5

DD

Input 5

• Register: a sequential component that can store multiple bits • A basic register can be built simply by using multiple D-FFs

7

Registers

Page 8: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

8

What will we output 4 cycles later?

Clk

D Flip-flop

DD Q

Input 1

Output 1

D Flip-flop

DD Q

Input 2

Output 2

D Flip-flop

DD Q

Input 3

Output 3

D Flip-flop

DD Q

Input 4

Output 4

Clk

D Flip-flop

DD QInput 1

D Flip-flop

DD Q

D Flip-flop

DD Q

D Flip-flop

DD Q

Output 4Output 3Output 2Output 1

• For the above D-FF organization, what are we expecting to see in (O1,O2,O3,O4) in the beginning of the 5th cycle after receiving (1,0,1,1)?

A. (1,1,1,1) B. (1,0,1,1) C. (1,1,0,1) D. (0,0,1,0) E. (0,1,0,0)

Poll close in

Page 9: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

9

What will we output 4 cycles later?

Clk

D Flip-flop

DD QInput 1

D Flip-flop

DD Q

D Flip-flop

DD Q

D Flip-flop

DD Q

Output 4Output 3Output 2Output 1

• For the above D-FF organization, what are we expecting to see in (O1,O2,O3,O4) in the beginning of the 5th cycle after receiving (1,0,1,1)?

A. (1,1,1,1) B. (1,0,1,1) C. (1,1,0,1) D. (0,0,1,0) E. (0,1,0,0)

Page 10: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

• Holds & shifts samples of input

10

Shift register

Clk

D Flip-flop

DD Q

Input 1

Output 1

D Flip-flop

DD Q

Input 2

Output 2

D Flip-flop

DD Q

Input 3

Output 3

D Flip-flop

DD Q

Input 4

Output 4

Clk

D Flip-flop

DD QInput 1

D Flip-flop

DD Q

D Flip-flop

DD Q

D Flip-flop

DD Q

Output 4Output 3Output 2Output 1

Page 11: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

11

Let’s play with the shift register more…

Clk

D Flip-flop

DD QInput 1

D Flip-flop

DD Q

D Flip-flop

DD Q

D Flip-flop

DD Q

Output 4Output 3Output 2Output 1

Clk

D Flip-flop

DD QInput 1

D Flip-flop

DD Q

D Flip-flop

DD Q

D Flip-flop

DD Q

Output 4Output 3Output 2Output 1

Page 12: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

• For the extended shift register, what sequence of input will the let the circuit output “1”?

A. (1, 1, 1, 1) B. (0, 1, 0, 1) C. (1, 0, 1, 0) D. (0, 1, 1, 0) E. (1, 0, 0, 1)

12

Let’s play with the shift register more…Poll close in

Page 13: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

• For the extended shift register, what sequence of input will the let the circuit output “1”?

A. (1, 1, 1, 1) B. (0, 1, 0, 1) C. (1, 0, 1, 0) D. (0, 1, 1, 0) E. (1, 0, 0, 1)

13

Let’s play with the shift register more…

Page 14: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

• Combinational function of input samples

14

Pattern Recognizer

Clk

D Flip-flop

DD QInput 1

D Flip-flop

DD Q

D Flip-flop

DD Q

D Flip-flop

DD Q

Output 4Output 3Output 2Output 1

Clk

D Flip-flop

DD QInput 1

D Flip-flop

DD Q

D Flip-flop

DD Q

D Flip-flop

DD Q

Output 4Output 3Output 2Output 1

We can recognize 1001!

Page 15: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

• Sequences through a fixed set of patterns • Note: definition is general • For example, the one in the figure is a type of counter called Linear Feedback

Shift Register (LFSR)

15

Counters

Clk

D Flip-flop

DD Q

D Flip-flop

DD Q

D Flip-flop

DD Q

D Flip-flop

DD Q

Output 4Output 3Output 2Output 1

Page 16: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

Static Random Access Memory (SRAM)

16

Page 17: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

A Classical 6-T SRAM Cell

17

bitlinebitline’wordline

Q’ Q

Sense Amplifier

Page 18: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

A Classical 6-T SRAM Cell

18

Sense Amplifier

Page 19: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

19

Write “1” to an SRAM Cellbitlinebitline’

wordline

Q’ Q0 1

100 1

Sense Amplifier

• Bitlines overpower cell with new value • Q = 0, Q’ = 1, BL = 1, BL’ = 0 — Force

Q’ low, then Q rises high

Page 20: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

20

Write “0” to an SRAM Cellbitlinebitline’

wordline

Q’ Q1 0

011 0

Sense Amplifier

Page 21: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

21

Reading from an SRAM Cellbitlinebitline’

wordline

Q’ Q0 1

Sense Amplifier

0 1

Page 22: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

MUX

SRAM array

22

Deco

der

012

n-1SenseAmp

SenseAmp

SenseAmp

SenseAmp

wd0 wd1 wd2 wd(m-1)

We can only work on cells sharing the same word line simultaneously

upper bits of address

lower bits of address

Page 23: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

Dynamic Random Access Memory (DRAM)

23

Page 24: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

• 1 transistor (rather than 6) • Relies on large capacitor to store

bit • Write: transistor conducts, data

voltage level gets stored on top plate of capacitor

• Read: look at the value of d • Problem: Capacitor discharges

over time • Must “refresh” regularly, by reading

d and then writing it right back24

An DRAM cell

wordline

data

Page 25: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

DRAM array

25

Row

Dec

oder

012

n-1

upper bits of address

Row Buffer

lower bits of address

Usually 4K — the page size of your OS!

MUX

Page 26: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

• Consider the following memory elements ① 64*64-bit Registers ② 512B SRAM ③ 512B DRAM A. Area: (1) > (2) > (3) Delay: (1) < (2) < (3) B. Area: (1) > (3) > (2) Delay: (1) < (3) < (2) C. Area: (3) > (1) > (2) Delay: (1) < (3) < (2) D. Area: (3) > (2) > (1) Delay: (3) < (2) < (1) E. Area: (2) > (3) > (1) Delay: (2) < (3) < (1)

26

Register v.s. DRAM v.s. SRAMPoll close in

Page 27: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

• Consider the following memory elements ① 64*64-bit Registers ② 512B SRAM ③ 512B DRAM A. Area: (1) > (2) > (3) Delay: (1) < (2) < (3) B. Area: (1) > (3) > (2) Delay: (1) < (3) < (2) C. Area: (3) > (1) > (2) Delay: (1) < (3) < (2) D. Area: (3) > (2) > (1) Delay: (3) < (2) < (1) E. Area: (2) > (3) > (1) Delay: (2) < (3) < (1)

27

Register v.s. DRAM v.s. SRAM

Page 28: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

RC charging

28

Page 29: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

Latency of volatile memory

29

Size (Transistors per bit) Latency (ns)

Register 18T ~ 0.1 ns

SRAM 6T ~ 0.5 ns

DRAM 1T 50-100 ns

Page 30: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

Programming and memory

30

Page 31: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

Memory “hierarchy” in modern processor architectures

31

Processor

DRAM

Storage

SRAM $

Processor Core

Registers

larger

fastest

< 1ns

tens of ns

tens of ns

a few ns

GBs

TBs

32 or 64 words

KBs ~ MBs

L1 $

L2 $

L3 $

fastest

larger

Page 32: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

• Which side is faster in executing the for-loop? A. Left B. Right C. About the same32

Thinking about programmingstruct student_record { int id; double homework; double midterm; double final; };

int main(int argc, char **argv) { int i,j; double midterm_average=0.0; int number_of_records = 10000000; struct timeval time_start, time_end; struct student_record *records; records = (struct student_record*)malloc(sizeof(struct student_record)*number_of_records); init(number_of_records,records);

for (j = 0; j < 100; j++) for (i = 0; i < number_of_records; i++) midterm_average+=records[i].midterm;

printf("average: %lf\n",midterm_average/number_of_records); free(records); return 0; }

int main(int argc, char **argv) { int i,j; double midterm_average=0.0; int number_of_records = 10000000; struct timeval time_start, time_end; id = (int*)malloc(sizeof(int)*number_of_records); midterm = (double*)malloc(sizeof(double)*number_of_records); final = (double*)malloc(sizeof(double)*number_of_records); homework = (double*)malloc(sizeof(double)*number_of_records); init(number_of_records); for (j = 0; j < 100; j++) for (i = 0; i < number_of_records; i++) midterm_average+=midterm[i]; free(id); free(midterm); free(final); free(homework); return 0; }

Poll close in

Page 33: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

• Which side is faster in executing the for-loop? A. Left B. Right C. About the same33

Thinking about programmingstruct student_record { int id; double homework; double midterm; double final; };

int main(int argc, char **argv) { int i,j; double midterm_average=0.0; int number_of_records = 10000000; struct timeval time_start, time_end; struct student_record *records; records = (struct student_record*)malloc(sizeof(struct student_record)*number_of_records); init(number_of_records,records);

for (j = 0; j < 100; j++) for (i = 0; i < number_of_records; i++) midterm_average+=records[i].midterm;

printf("average: %lf\n",midterm_average/number_of_records); free(records); return 0; }

int main(int argc, char **argv) { int i,j; double midterm_average=0.0; int number_of_records = 10000000; struct timeval time_start, time_end; id = (int*)malloc(sizeof(int)*number_of_records); midterm = (double*)malloc(sizeof(double)*number_of_records); final = (double*)malloc(sizeof(double)*number_of_records); homework = (double*)malloc(sizeof(double)*number_of_records); init(number_of_records); for (j = 0; j < 100; j++) for (i = 0; i < number_of_records; i++) midterm_average+=midterm[i]; free(id); free(midterm); free(final); free(homework); return 0; }

More row buffer hits in the DRAM, more SRAM hits

Page 34: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

• Which side is consuming less memory? A. Left B. Right C. About the same34

Thinking about programming (2)struct student_record { int id; double homework; double midterm; double final; };

int main(int argc, char **argv) { int i,j; double midterm_average=0.0; int number_of_records = 10000000; struct timeval time_start, time_end; struct student_record *records; records = (struct student_record*)malloc(sizeof(struct student_record)*number_of_records); init(number_of_records,records);

for (j = 0; j < 100; j++) for (i = 0; i < number_of_records; i++) midterm_average+=records[i].midterm;

printf("average: %lf\n",midterm_average/number_of_records); free(records); return 0; }

int main(int argc, char **argv) { int i,j; double midterm_average=0.0; int number_of_records = 10000000; struct timeval time_start, time_end; id = (int*)malloc(sizeof(int)*number_of_records); midterm = (double*)malloc(sizeof(double)*number_of_records); final = (double*)malloc(sizeof(double)*number_of_records); homework = (double*)malloc(sizeof(double)*number_of_records); init(number_of_records); for (j = 0; j < 100; j++) for (i = 0; i < number_of_records; i++) midterm_average+=midterm[i]; free(id); free(midterm); free(final); free(homework); return 0; }

Poll close in

Page 35: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

• Which side is consuming less memory? A. Left B. Right C. About the same35

Thinking about programming (2)struct student_record { int id; double homework; double midterm; double final; };

int main(int argc, char **argv) { int i,j; double midterm_average=0.0; int number_of_records = 10000000; struct timeval time_start, time_end; struct student_record *records; records = (struct student_record*)malloc(sizeof(struct student_record)*number_of_records); init(number_of_records,records);

for (j = 0; j < 100; j++) for (i = 0; i < number_of_records; i++) midterm_average+=records[i].midterm;

printf("average: %lf\n",midterm_average/number_of_records); free(records); return 0; }

int main(int argc, char **argv) { int i,j; double midterm_average=0.0; int number_of_records = 10000000; struct timeval time_start, time_end; id = (int*)malloc(sizeof(int)*number_of_records); midterm = (double*)malloc(sizeof(double)*number_of_records); final = (double*)malloc(sizeof(double)*number_of_records); homework = (double*)malloc(sizeof(double)*number_of_records); init(number_of_records); for (j = 0; j < 100; j++) for (i = 0; i < number_of_records; i++) midterm_average+=midterm[i]; free(id); free(midterm); free(final); free(homework); return 0; }

64-bit

final

homework

midterm

final

homework

midterm

id

id

Page 36: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

Non-volatile memory

36

Page 37: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

• Volatile memory • The stored bits will vanish if the cell is not supplied with eletricity • Register, SRAM, DRAM

• Non-volatile memory • The stored bits will not vanish “immediately” when it’s out of

electricity — usually can last years • Flash memory, PCM, MRAM, STTRAM

37

Volatile v.s. Non-volatile

Page 38: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

• Floating gate made by polycrystalline silicon trap electrons

• The voltage level within the floating gate determines the value of the cell

• The floating gates will wear out eventually

38

Flash memory

Page 39: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

Basic flash operations

39

Block #0 …………………

Page #: 0 1 2 3 4 5 6 7 n-8 n-7 n-6 n-5 n-4 n-3n-2 n-1

Block #1 …………………

Block #2 …………………

…………

…………

…………

Block #n-2 …………………

Block #n-1 …………………

Free PageProgram Read Programmed page

Page 40: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

Types of Flash Chips

40

Single-Level Cell(SLC)

Multi-Level Cell(MLC)

Triple-Level Cell(TLC)

2 voltage levels, 1-bit

4 voltage levels, 2-bit

8 voltage levels, 3-bit

Quad-Level Cell(QLC)

16 voltage levels, 4-bit

Page 41: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

Programming in MLC

41

Multi-Level Cell(MLC)

4 voltage levels, 2-bit

11

10

01

00

 3.1400000000000001243449787580= 0x40091EB851EB851F

= 01000000 00001001 00011110 10111000 01010001 11101011 10000101 00011111

11 10 01 00

3 Cycles/Phases to finish programming

phase #1

phase #2

phase #3

Page 42: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

• Assignment #4 due next Tuesday — Chapter 4.8-4.9 & 5.2-5.4

• Lab 5 is up — due next Thursday • Start early & plan your time carefully • Watch the video and read the instruction BEFORE your session • There are links on both course webpage and iLearn lab section • Submit through iLearn > Labs

• Check your grades in iLearn

42

Announcement

Page 43: Datapath Components (3) — Those Who “Remember” Thingshtseng/classes/ee120a...• The output is a pure function of its current inputs ... D Q Clk D-latch D Q Clk Input Clk Output

つづく

ElectricalComputerEngineering

Science 120A