1 virtual memory. 2 outline pentium/linux memory system core i7 suggested reading: 9.6, 9.7

33
1 Virtual Memory

Upload: judith-strickland

Post on 01-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

1

Virtual Memory

2

Outline

• Pentium/Linux Memory System

• Core i7

• Suggested reading: 9.6, 9.7

3

bus interface unit

DRAMexternal

system bus (e.g. PCI)

instruction

fetch unit

L1

i-cache

L2

cache

cache bus

L1

d-cache

inst

TLB

data

TLB

processor package

• 32 bit address space• 4 KB page size• L1, L2, and TLBs

• 4-way set associative• inst TLB

• 32 entries• 8 sets

• data TLB• 64 entries• 16 sets

• L1 i-cache and d-cache• 16 KB• 32 B line size• 128 sets

• L2 cache• unified• 128 KB -- 2 MB• 32 B line size

P6 Memory System

4

Review of Address Translation

• Basic Parameters– N = 2n = Virtual address limit– M = 2m = Physical address limit– P = 2p = page size (bytes).

• Components of the virtual address (VA)– VPO: Virtual page offset – VPN: Virtual page number

• Components of the physical address (PA)– PPO: Physical page offset (same as VPO)– PPN: Physical page number

5

Review of Address Translation

virtual page number page offset virtual address

physical page number page offset physical address

0p–1

address translation

pm–1

n–1 0p–1p

TLB tag TLB index

Cache tag Cache index Cache offset

6

Review of Address Translation

• Components of the virtual address (VA)– VPO: Virtual page offset – VPN: Virtual page number – TLBI: TLB index– TLBT: TLB tag

• Components of the physical address (PA)– PPO: Physical page offset (same as VPO)– PPN: Physical page number– CO: Byte offset within cache line– CI: Cache index– CT: Cache tag

7

Multi-Level Page Tables

• multi-level page tables

• e.g., 2-level page table

– Level 1 tableLevel 1 table: 1024

entries, each of which

points to a Level 2 page

table.

– Level 2 tableLevel 2 table: 1024

entries, each of which

points to a page

Level 1

Table

...

Level 2

Tables

8

CPU

VPN VPO20 12

TLBT TLBI416

virtual address (VA)

...TLB (16 sets, 4 entries/set)VPN1 VPN2

1010

PDE PTE

PDBR

PPN PPO20 12

Page tables

TLB

miss

TLB

hit

physical

address (PA)

result32

...

CT CO20 5

CI7

L2 and DRAM

L1 (128 sets, 4 lines/set)

L1

hit

L1

miss

P6 Address Translation

9

• Page directory – 1024 4-byte page directory

entries (PDEs) that point to page tables

– one page directory per process.– page directory must be in

memory when its process is running

– always pointed to by PDBR

• Page tables:– 1024 4-byte page table entries

(PTEs) that point to pages.– page tables can be paged in and

out.

page directory

...

Up to 1024 page tables

1024PTEs

1024PTEs

1024PTEs

...

1024PDEs

P6 Page Table

10

Page table physical base addr Avail G PS A CD WT U/S R/W P=1

Page table physical base address: 20 most significant bits of physical page table address (forces page tables to be 4KB aligned)

Avail: available for system programmers

G: global page (don’t evict from TLB on task switch)

PS: page size 4K (0) or 4M (1)

A: accessed (set by MMU on reads and writes, cleared by software)

CD: cache disabled (1) or enabled (0)

WT: write-through or write-back cache policy for this page table

U/S: user or supervisor mode access

R/W: read-only or read-write access

P: page table is present in memory (1) or not (0)

31 12 11 9 8 7 6 5 4 3 2 1 0

Available for OS (page table location in secondary storage) P=0

31 01

P6 page directory entry (PDE)

11

Page physical base address Avail G 0 D A CD WT U/S R/W P=1

Page base address: 20 most significant bits of physical page address (forces pages to be 4 KB aligned)

Avail: available for system programmers

G: global page (don’t evict from TLB on task switch)

D: dirty (set by MMU on writes)

A: accessed (set by MMU on reads and writes)

CD: cache disabled or enabled

WT: write-through or write-back cache policy for this page

U/S: user/supervisor

R/W: read/write

P: page is present in physical memory (1) or not (0)

31 12 11 9 8 7 6 5 4 3 2 1 0

Available for OS (page location in secondary storage) P=0

31 01

P6 page table entry (PTE)

12

PDE

PDBRphysical address

of page table base

(if P=1)

physical

address

of page base

(if P=1)physical address

of page directory

word offset into

page directory

word offset into

page table

page directory page table

VPN1

10VPO

10 12VPN2 Virtual address

PTE

PPN PPO

20 12Physical address

word offset into

physical and virtual

page

Page tables Translation

13

CPU

VPN VPO20 12

TLBT TLBI416

virtual address (VA)

...TLB (16 sets, 4 entries/set)VPN1 VPN2

1010

PDE PTE

PDBR

PPN PPO20 12

Page tables

TLB

miss

TLB

hit

physical

address (PA)

result32

...

CT CO20 5

CI7

L2 and DRAM

L1 (128 sets, 4 lines/set)

L1

hit

L1

miss

P6 TLB Translation

14

• TLB entry (not all documented, so this is speculative):

– V: indicates a valid (1) or invalid (0) TLB entry– PD: is this entry a PDE (1) or a PTE (0)?– tag: disambiguates entries cached in the same set– PDE/PTE: page directory or page table entry

• Structure of the data TLB:– 16 sets, 4 entries/set

PDE/PTE Tag PD V

1 11632

entry entry entry entryentry entry entry entryentry entry entry entry

entry entry entry entry

...

set 0set 1set 2

set 15

P6 TLB

15

1. Partition VPN into TLBT

and TLBI.

2. Is the PTE for VPN

cached in set TLBI?

3. Yes: then build

physical address.

4. No: then read PTE

(and PDE if not

cached) from memory

and build physical

address.

CPU

VPN VPO20 12

TLBT TLBI416

virtual address

PDE PTE

...TLB

miss

TLB

hit

page table translation

PPN PPO20 12

physical address

1 2

3

4

Translating with the P6 TLB

16

CPU

VPN VPO20 12

TLBT TLBI416

virtual address (VA)

...TLB (16 sets, 4 entries/set)VPN1 VPN2

1010

PDE PTE

PDBR

PPN PPO20 12

Page tables

TLB

miss

TLB

hit

physical

address (PA)

result32

...

CT CO20 5

CI7

L2 and DRAM

L1 (128 sets, 4 lines/set)

L1

hit

L1

miss

P6 Page Table Translation

17

• Case 1/1: page table and page present.

• MMU Action: – MMU build

physical address and fetch data word.

• OS action

– none

VPN

VPN1 VPN2

PDE

PDBR

PPN PPO20 12

20VPO

12

p=1 PTE p=1

Data page

data

Page directory

Page table

Mem

Disk

Translating with the P6 page tables (case 1/1)

18

• Case 1/0: page table present but page missing.

• MMU Action: – page fault

exception– handler receives

the following args:

• VA that caused fault

• fault caused by non-present page or page-level protection violation

• read/write• user/supervisor

VPN

VPN1 VPN2

PDE

PDBR

20VPO

12

p=1 PTE

Page directory

Page table

Mem

DiskData page

data

p=0

Translating with the P6 page tables (case 1/0)

19

• OS Action: – Check for a legal

virtual address.– Read PTE through

PDE. – Find free physical

page (swapping out current page if necessary)

– Read virtual page from disk and copy to virtual page

– Restart faulting instruction by returning from exception handler.

VPN

VPN1 VPN2

PDE

PDBR

20VPO

12

p=1 PTE p=1

Page directory

Page table

Data page

data

PPN PPO20 12

Mem

Disk

Translating with the P6 page tables (case 1/0, cont)

20

• Case 0/1: page table missing but page present.

• Introduces consistency issue.

– potentially every page out requires update of disk page table.

• Linux disallows this

– if a page table is swapped out, then swap out its data pages too.

VPN

VPN1 VPN2

PDE

PDBR

20VPO

12

p=0

PTE p=1

Page directory

Page table

Mem

Disk

Data page

data

Translating with the P6 page tables (case 0/1)

21

• Case 0/0: page

table and page

missing.

• MMU Action:

– page fault

exception

VPN

VPN1 VPN2

PDE

PDBR

20VPO

12

p=0

PTE

Page directory

Page table

Mem

DiskData page

datap=0

Translating with the P6 page tables (case 0/0)

22

• OS action:

– swap in page

table.

– restart faulting

instruction by

returning from

handler.

• Like case 0/1 from

here on.

VPN

VPN1 VPN2

PDE

PDBR

20VPO

12

p=1 PTE

Page directory

Page table

Mem

DiskData page

data

p=0

Translating with the P6 page tables (case 0/0, cont)

23

CPU

VPN VPO20 12

TLBT TLBI416

virtual address (VA)

...TLB (16 sets, 4 entries/set)VPN1 VPN2

1010

PDE PTE

PDBR

PPN PPO20 12

Page tables

TLB

miss

TLB

hit

physical

address (PA)

result32

...

CT CO20 5

CI7

L2 and DRAM

L1 (128 sets, 4 lines/set)

L1

hit

L1

miss

P6 L1 Cache Access

24

• Partition physical address into CO, CI, and CT.

• Use CT to determine if line containing word at address PA is cached in set CI.

• If no: check L2.• If yes: extract word

at byte offset CO and return to processor.

physical

address (PA)

data32

...

CT CO20 5

CI7

L2 and DRAM

L1 (128 sets, 4 lines/set)

L1

hit

L1

miss

L1 cache access

25

• Core i7– The 64-bit Nehalem microarchitecture

– Current support 48-bit (256 TB) virtual address space and 52-bit (4 PB) physical address space

– Compatibility mode support 32-bit (4 GB) virtual and physical address spaces

Core i7 Summary

26

Pro

cessor

packag

eCore i7 Memory System

Core x 4

RegistersRegisters Instruction fetch

Instruction fetch

L1 d-cache 32 KB,8-wayL1 d-cache

32 KB,8-wayL1 i-cache

32 KB,8-wayL1 i-cache

32 KB,8-way

L2 unified cache256 KB, 8-way

L2 unified cache256 KB, 8-way

L3 unified cache8 MB, 16-way

(shared by all cores)

L3 unified cache8 MB, 16-way

(shared by all cores)

MMU(addr translation)

MMU(addr translation)

L1 d-TLB 64 entries, 4-way

L1 d-TLB 64 entries, 4-way

L1 i-TLB 64 entries, 4-way

L1 i-TLB 64 entries, 4-way

L2 unified TLB 512 entries, 4-way

L2 unified TLB 512 entries, 4-way

QuickPath interconnect4 links @ 25.6 GB/s

102.4 GB/s total

QuickPath interconnect4 links @ 25.6 GB/s

102.4 GB/s total

DDR3 memory controller3 x 64 bit @ 10.66 GB/s

32 GB/s total (shared by all cores)

DDR3 memory controller3 x 64 bit @ 10.66 GB/s

32 GB/s total (shared by all cores)

Main memoryMain memory

To othercores

To I/Obridge

...

27

...

CPUCPU

VPNVPN VPOVPO36 12

432

Virtual address (VA)

L1 TLB (16 sets, 4 entries/set)

PPNPPN PPOPPO40 12

TLBmiss

TLB hit

physical

address (PA)

resultresult

32/64

CTCT40 66

L2, L3, and main memoryL2, L3, and

main memory

L1 d-cache(64 sets, 8 lines/set)

L1 hitL1 miss

TLBTTLBT TLBITLBI

Core i7 Address Translation

CICI COCOVPN1VPN1 VPN2VPN2 VPN3VPN3 VPN4VPN4

PTEPTE PTE PTE

CR3

Page Tables

9 9 9 9

28

Page table physical base addr Unused G PS A CD WT U/S R/WP=1

XD Disable or enable instruction fetches

Base addr 40 most significant bits of base address of child page table (forces page tables to be 4KB aligned)

G global page (don’t evict from TLB on task switch)

PS Page size either 4K(0) or 4M(1) (defined for Level 1 PTEs only)

A Reference bit (set by MMU on reads and writes, cleared by software)

CD Cache disabled(1) or enabled(0) for child page table

WT Write-through or write-back cache policy

U/S User or supervisor(kernel) mode access permission

R/W Read-only or read-write access permissiom

P Child page table present in memory(1) or not(0)

51 1211 9 8 7 6 5 4 3 2 1 0

Available for OS (page table location in secondary storage) P=0

Level 1, Level 2 and Level 3 Page Table Entry

Unused

52

XD

6263

29

Page table physical base addr Unused G D A CD WT U/S R/WP=1

XD Disable or enable instruction fetches

Base addr 40 most significant bits of base address of child page table (forces page tables to be 4KB aligned)

G global page (don’t evict from TLB on task switch)

D Dirty bit (Set by MMU on writes, cleared by software)

A Reference bit (set by MMU on reads and writes, cleared by software)

CD Cache disabled(1) or enabled(0) for child page table

WT Write-through or write-back cache policy

U/S User or supervisor(kernel) mode access permission

R/W Read-only or read-write access permissiom

P Child page table present in memory(1) or not(0)

51 1211 9 8 7 6 5 4 3 2 1 0

Available for OS (page table location in secondary storage) P=0

Level 4 Page Table Entry

Unused

52

XD

6263

30

Physical addressof L1 PT

physical addressof page512 GB

regionper entry

L1 PTPage global

directory

12

Virtual address

40 12 Physical address

Offset into physical and virtual page

Page tables Translation

VPN 1 VPN 2 VPN 3 VPN 4 VPO

9999

PPO

L1 PTE

CR3

1 GBregion

per entry

L2 PTPage upper

directory

L2 PTE

2 MBregion

per entry

L3 PTPage middle

directory

L3 PTE

4 KBregion

per entry

L4 PTPage

directory

L4 PTE

PPO

12

40

9999

40404040

Cute Trick for Speeding Up L1 Access

• Observation– Bits that determine CI identical in virtual and physical address– Can index into cache while address translation taking place– Generally we hit in TLB, so PPN bits (CT bits) available next– “Virtually indexed, physically tagged”– Cache carefully sized to make this possible

Physical address

(PA)

CT CO36 6

CI6

Virtual address

(VA) VPN VPO

36 12

PPOPPN

AddressTranslation

NoChange

CI

L1 Cache

CT Tag Check

32

Process-specific data structures(e.g. task and mm structs,page tables, kernel stack)

Process-specific data structures(e.g. task and mm structs,page tables, kernel stack)

Physical memoryPhysical memory

Kernel code and dataKernel code and data

User stackUser stack

Memory mapped regionfor shared libraries

Memory mapped regionfor shared libraries

Run-time heap (via malloc)Run-time heap (via malloc)

Uninitialized data (.bss)Uninitialized data (.bss)

Initialized data (.data)Initialized data (.data)

Program text (.text)Program text (.text)

Kernel virtual memory

Processvirtualmemory

Different foreach process

Identical foreach process

%esp

brk

x08048000 (32)x40000000 (64)

Linux Virtual Memory System

Next

• Memory Mapping

• Suggested reading: 10.7, 10.8

33