operating systems - architecture

37
UNIVERSITY OF MASSACHUSETTS AMHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Architecture

Upload: emery-berger

Post on 01-Nov-2014

1.712 views

Category:

Technology


0 download

DESCRIPTION

From the Operating Systems course (CMPSCI 377) at UMass Amherst, Fall 2007.

TRANSCRIPT

Page 1: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Emery Berger

University of Massachusetts Amherst

Operating SystemsCMPSCI 377Architecture

Page 2: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 2

Architecture

Hardware Support for Applications & OS

Architecture basics & details

Focus on characteristics exposed to application programmer / OS

Page 3: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

The Memory Hierarchy

3

Registers

Caches

Associativity

Misses

Locality

Page 4: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Registers

Register = dedicated name for word of memory managed by CPU

General-purpose: “AX”, “BX”, “CX” on x86

Special-purpose:

“SP” = stack pointer

“FP” = frame pointer

“PC” = program counter

4

SP

FP

arg0arg1arg0arg1arg2

Page 5: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Registers

Register = dedicated name for one word of memory managed by CPU

General-purpose: “AX”, “BX”, “CX” on x86

Special-purpose:

“SP” = stack pointer

“FP” = frame pointer

“PC” = program counter

Change processes:save current registers &load saved registers =context switch

5

SP

FP

arg0arg1

Page 6: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Caches

Access to main memory: “expensive”

~ 100 cycles (slow, relatively cheap)

Caches: small, fast, expensive memory

Hold recently-accessed data (D$) or instructions (I$)

Different sizes & locations

Level 1 (L1) – on-chip, smallish

Level 2 (L2) – on or next to chip, larger

Level 3 (L3) – pretty large, on bus

Manages lines of memory (32-128 bytes)

6

Page 7: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Memory Hierarchy

Higher = small, fast, more $, lower latency

Lower = large, slow, less $, higher latency

7

D$, I$ separate

registers

L1

L2

RAM

Disk

1-cycle latency

2-cycle latency

7-cycle latency

100 cycle latency

40,000,000 cycle latency

Network 200,000,000+ cycle latency

D$, I$ unified

load

evict

Page 8: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Cache Jargon

Cache initially cold

Accessing data initially misses

Fetch from lower level in hierarchy

Bring line into cache (populate cache)

Next access: hit

Once cache holds most-frequently used data: “warmed up”

Context switch implications?

8

Page 9: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Cache Details

Ideal cache would be fully associative

That is, LRU (least-recently used) queue

Generally too expensive

Instead, partition memory addresses and put into separate bins divided into ways

1-way or direct-mapped

2-way = 2 entries per bin

4-way = 4 entries per bin, etc.

9

Page 10: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Associativity Example

Hash memory based on addresses to different indices in cache

10

Page 11: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Miss Classification

First access = compulsory miss

Unavoidable without prefetching

Too many items in way = conflict miss

Avoidable if we had higher associativity

No space in cache = capacity miss

Avoidable if cache were larger

Invalidated = coherence miss

Avoidable if cache were unshared

11

Page 12: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Exercise

Cache with 4 entries, 2-way associativity

Assume hash(x) = x % 4 (modulus)

How many misses?

# compulsory misses?

# conflict misses?

# capacity misses?

12

Page 13: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Solution

Cache with 4 entries, 2-way associativity

Assume hash(x) = x % 4 (modulus)

How many misses?

# compulsory misses? 10

# conflict misses?

# capacity misses?

13

3 7 11 2 3 7 7 9 9 6 13 7 2 5 8 10

Page 14: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Solution

Cache with 4 entries, 2-way associativity

Assume hash(x) = x % 4 (modulus)

How many misses?

# compulsory misses? 10

# conflict misses? 2

# capacity misses?

14

3 7 11 2 3 7 7 9 9 6 13 7 2 5 8 10

Page 15: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Solution

Cache with 4 entries, 2-way associativity

Assume hash(x) = x % 4 (modulus)

How many misses?

# compulsory misses? 10

# conflict misses? 2

# capacity misses? 0

15

3 7 11 2 3 7 7 9 9 6 13 7 2 5 8 10

Page 16: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Locality

Locality = re-use of recently-used items

Temporal locality: re-use in time

Spatial locality: use of nearby items

In same cache line, same page (4K chunk)

Intuitively – greater locality = fewer misses

# misses depends on cache layout, # of levels, associativity…

Machine-specific

16

Page 17: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Quantifying Locality

Instead of counting misses,compute hit curve from LRU histogram

Assume perfect LRU cache

Ignore compulsory misses

17

3 7 7 2 3 7

1 2 3 4 5 6

3

7

Page 18: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Quantifying Locality

Instead of counting misses,compute hit curve from LRU histogram

Assume perfect LRU cache

Ignore compulsory misses

18

3 7 7 2 3 7

1 2 3 4 5 6

3

7

Page 19: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Quantifying Locality

Instead of counting misses,compute hit curve from LRU histogram

Assume perfect LRU cache

Ignore compulsory misses

19

3 7 7 2 3 7

1 2 3 4 5 6

3

7

2

Page 20: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Quantifying Locality

Instead of counting misses,compute hit curve from LRU histogram

Assume perfect LRU cache

Ignore compulsory misses

20

3 7 7 2 3 7

1 2 3 4 5 6

3

7

2

Page 21: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Quantifying Locality

Instead of counting misses,compute hit curve from LRU histogram

Assume perfect LRU cache

Ignore compulsory misses

21

3 7 7 2 3 7

1 2 3 4 5 6

3

7

2

Page 22: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Quantifying Locality

Instead of counting misses,compute hit curve from LRU histogram

Assume perfect LRU cache

Ignore compulsory misses

22

3 7 7 2 3 7

1 2 3 4 5 6

3

7

2

Page 23: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Quantifying Locality

Instead of counting misses,compute hit curve from LRU histogram

Start with total misses on right hand side

Subtract histogram values

23

1 2 3 4 5 6

1 1 3 3 3 3

Page 24: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

0%

33%

67%

100%

1 2 3 4 5

Quantifying Locality

Instead of counting misses,compute hit curve from LRU histogram

Start with total misses on right hand side

Subtract histogram values

Normalize

24

.3 .3 1 1 1 1

Page 25: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Hit Curve Exercise

Derive hit curve for following trace:

25

3 5 4 2 8 3 6 9 9 6 13 7 2 5 8 10

Page 26: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Hit Curve Exercise

Derive hit curve for following trace:

26

3 5 4 2 8 3 6 9 9 6 13 7 2 5 8 10

1 2 3 4 5 6 7 8 9

Page 27: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Hit Curve Exercise

Derive hit curve for following trace:

27

3 5 4 2 8 3 6 9 9 6 13 7 2 5 8 10

1 2 3 4 5 6 7 8 9

1 2 2 2 3 3 4 5 6

Page 28: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Hit Curve Exercise

Derive hit curve for following trace:

28

1 2 2 2 3 3 4 5 6

0%

33%

67%

100%

1 2 3 4 5 6 7 8 9

Page 29: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Important CPU Internals

29

Issues that affect performance

Pipelining

Branches & prediction

System calls (kernel crossings)

Page 30: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Scalar architecture + memory…

Straight-up sequential execution

Fetch instruction

Decode it

Execute it

Problem: instruction or data miss in cache

Result – stall: everything stops

How long to wait for miss all the way to RAM?

30

Page 31: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Superscalar architectures

Out-of-order processors

Pipeline of instructions in flight

Instead of stalling on load, guess!

Branch prediction

Value prediction

Predictors based on history, location in program

Speculatively execute instructions

Actual results checked asynchronously

If mispredicted, squash instructions

Accurate prediction = massive speedup

Hides latency of memory hierarchy31

Page 32: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Pipelining and Branches

Instruction fetch

Instruction decode

Execute

Memory access

Write back

Pipelining overlaps instructions to exploit parallelism, allowing the clock rate to be increased. Branches cause bubbles in the pipeline, where some stages are left idle.

Unresolved branch

Page 33: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Branch Prediction

Instruction fetch

Instruction decode

Execute

Memory access

Write back

A branch predictor allows the processor to speculatively fetch and execute instructions down the predicted path.

Speculative execution

Page 34: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Kernel Mode

Protects OS from users

kernel = English for nucleus

Think atom

Only privileged code executes in kernel

System call –

Enters kernel mode

Flushes pipeline, saves context

Executes code in kernel land

Returns to user mode, restoring context

Where we are in user land

34

Page 35: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Timers & Interrupts

Need to respond to events periodically

Change executing processes

Quantum – time limit for process execution

Fairness – when timer goes off, interrupt

Current process stops

OS takes control through interrupt handler

Scheduler chooses next process

Interrupts also signal I/O events

Network packet arrival, disk read complete…

35

Page 36: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

To do

Read C/C++ notes for next week

First homework assigned next week

Language: C/C++

Will be due in 2 weeks

36

Page 37: Operating Systems - Architecture

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

The End

37