last time: p6 (ia32) address translation - systems group @ … ·  · 2014-06-24versus > ] u w...

8
Lecture 20: Devices and I/O Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 © Systems Group | Department of Computer Science | ETH Zürich Last time: P6 (ia32) Address Translation CPU VPN VPO 20 12 TLBT TLBI 4 16 virtual address (VA) ... TLB (16 sets, 4 entries/set) VPN1 1 VPN2 10 10 PDE PTE PDBR PPN PPO 20 12 N 0 N Page tables TLB miss TLB hit P 1 physical address (PA) result 32 ... CT CO 20 5 C CI 7 L2 and DRAM L1 (128 sets, 4 lines/set) L1 hit L1 miss PTE VPN Last time: VAX translation Addr VPN Addr VPO 21 9 0 0 0 x Addr VPN 0 0 0 0 PxTB 1 1 1 0 0 0 0 PTE VPO PTE VPN 1 1 1 0 0 0 0 0 SPTB 1 1 1 0 0 0 0 PTE VPO PTE PFN 1 1 1 0 Load TE VP Addr PFO Addr PFN 1 1 1 0 Load dr P Load і sŝƌƚƵĂů ĂĚĚƌĞƐƐ ƌĞƋƵĞƐƚĞĚ (in seg. Px: user space) і sŝƌƚƵĂů ĂĚĚƌĞƐƐ ŽĨ Wd (in seg. S0: system space) і WŚLJƐŝĐĂů ĂĚĚƌĞƐƐ ŽĨ Wd mapping the PTE we want і WŚLJƐŝĐĂů ĂĚĚƌĞƐƐ ŽĨ Wd we want і dŚĞ Wd ƉŚLJƐŝĐĂů ĂĚĚƌĞƐƐ ŽĨ ǀĂůƵĞ ǁĞ ǁĂŶƚ і DĞŵŽƌLJ ǀĂůƵĞ ;Ăƚ ůĂƐƚͿ Data Las time: Large Pages ϰD ŽŶ ϯϮ-ďŝƚ ϮD ŽŶ ϲϰ-bit ^ŝŵƉůŝĨLJ ĂĚĚƌĞƐƐ ƚƌĂŶƐůĂƚŝŽŶ hƐĞĨƵů ĨŽƌ ƉƌŽŐƌĂŵƐ ǁŝƚŚ ǀĞƌLJ ůĂƌŐĞ ĐŽŶƚŝŐƵŽƵƐ ǁŽƌŬŝŶŐ ƐĞƚƐ Reduces compulsory TLB misses How to use (Linux) hugetlbfs support (since at least 2.6.16) Use libhugetlbs {m,c,re}alloc replacements VPN VPO 20 12 PPN PPO 20 12 VPN VPO 10 22 PPN PPO 10 22 versus >ĂƐƚ ƚŝŵĞ sD ƵĨĨĞƌŝŶŐ KŶĞ ŝƚĞƌĂƚŝŽŶ ŽĨ Ă ŵĂƚƌŝdž-matrix multiply: ŽŶƐĞƋƵĞŶĐĞ ĂĐŚ ƌŽǁ ŝƐ ŽŶ ĚŝĨĨĞƌĞŶƚ ƉĂŐĞ DŽƌĞ ƌŽǁƐ ƚŚĂŶ d> ĞŶƚƌŝĞƐ d> ƚŚƌĂƐŚŝŶŐ Solution: ďƵĨĨĞƌŝŶŐ с ĐŽƉLJ ďůŽĐŬ ƚŽ ĐŽŶƚŝŐƵŽƵƐ ŵĞŵŽƌLJ O(B 2 Ϳ ĐŽƐƚ ĨŽƌ K; 3 ) operations a b * c с c + ĂƐƐƵŵĞ х ϰ < с ϱϭϮ ĚŽƵďůĞƐ ďůŽĐŬƐŝnjĞ с ϭϱϬ each row used O(B) times but every time O(B 2 ) ops between Today: I/O Devices What is a device? Registers Example: NS16550 UART Interrupts ŝƌĞĐƚ DĞŵŽƌLJ ĐĐĞƐƐ ;DͿ PCI (Peripheral Component Interconnect) Summary

Upload: vandien

Post on 02-Apr-2018

223 views

Category:

Documents


4 download

TRANSCRIPT

Lecture 20: Devices and I/O Computer Architecture and

Systems Programming (252-0061-00)

Timothy Roscoe

Herbstsemester 2012

© Systems Group | Department of Computer Science | ETH Zürich

Last time: P6 (ia32) Address Translation

CPU

VPN VPO 20 12

TLBT TLBI 4 16

virtual address (VA)

...

TLB (16 sets, 4 entries/set) VPN1 1 VPN2

10 10

PDE PTE

PDBR

PPN PPO 20 12

N0

N

Page tables

TLB miss

TLB hit

P1

physical address (PA)

result 32

...

CT CO 20 5

CCI 7

L2 and DRAM

L1 (128 sets, 4 lines/set)

L1 hit

L1 miss

PTE VPN

Last time: VAX translation

Addr VPN Addr VPO

21 9

0 00x

Addr VPN 0 000 PxTB 1 110 000

PTE VPO PTE VPN 1 110

0 000 SPTB 1 110 000

PTE VPO PTE PFN 1 110

Load

TE VP

Addr PFO Addr PFN 1 110

Load

dr P

Load

(in seg. Px: user space)

(in seg. S0: system space)

mapping the PTE we want

we want

Data

Las time: Large Pages

• - -bit

• – Reduces compulsory TLB misses

• How to use (Linux) – hugetlbfs support (since at least 2.6.16) – Use libhugetlbs

• {m,c,re}alloc replacements

VPN VPO

20 12

PPN PPO

20 12

VPN VPO

10 22

PPN PPO

10 22

versus

• -matrix multiply:

• – – – Solution:

• O(B2 3) operations

a b*

c

c+

each row used O(B) times but every time O(B2) ops between

Today: I/O Devices

• What is a device?

• Registers

– Example: NS16550 UART

• Interrupts

• PCI (Peripheral Component Interconnect)

• Summary

What is a device?

• Occupies some location on a bus

• registers

• interrupts

Today: I/O Devices

• What is a device?

• Registers

– Example: NS16550 UART

• Interrupts

• PCI (Peripheral Component Interconnect)

• Summary

Registers

– Read input data

• CPU can store to device registers:

– Write output data

– Reset states

Example: National Semiconductor ns16550 UART

• Universal Asynchronous Receiver/Transmitter – RS-232 serial line

• – Chip now integrated into

Southbridge on all PCs – – Rarely seen on laptops any

more • First device driver

OS…

Registers

•given in chip “datasheets” or “data

•trusted by OS programmers

From the data

ns16550 UART

Addressing registers

1. – Registers appear as memory locations – Access using loads/stores (movb/movw/movl/movq)

2. “I/O instructions”: – – – Special instructions: inb, outb, etc.

3. Indirection: –

with the actual register value – Used to save scarce address space (usually I/O space)

Addr. Name Description Notes

0 RBR (read only)

0 THR Transmit Holding Register (write only)

1 IER Interrupt Enable Register

2 IIR

2 FCR FIFO Control Register (write only)

3 LCR Line Control Register

4 Control Register

5 LSR Line Status Register

6

7 SCR Scratch Register

0 DLL Divisor Latch (LSB)

1

DLAB = bit 7 of the LCR register

ns16550 LSR: Line Status Register

DR OE PE FE BI THRE TEMT ERRF

0 7

Error in Receiver FIFO

Transmitter Empty

Transmit Holding Register Empty

Framing Error

Parity Error

Overrun Error

Data Ready

Very simple UART driver #define UART_BASE 0x3f8 #define UART_THR (UART_BASE + 0) #define UART_RBR (UART_BASE + 0) #define UART_LSR (UART_BASE + 5) void serial_putc(char c) { // Wait until FIFO can hold more chars while( (inb(UART_LSR) & 0x20)== 0); // Write character to FIFO outb(UART_THR, c); } char serial_getc() { // Wait until there is a char to read while( (inb(UART_LSR) & 0x01) == 0); // Read from the receive FIFO return inb(UART_RBR); }

Register addresses

Send a character (wait

Read a character (spin waiting until one is

there to read)

Very simple UART driver

• –

• • Uses Programmed I/O (PIO)

– – All data must pass through CPU registers

• Uses polling –

• – Can’t do anything else in the meantime

• Although could be extended with threads and care… – Without CPU polling, no I/O can occur

Registers are not memory

Device registers don’t behave like RAM!

– Status words

– Incoming data

• Writes to registers are used to trigger actions

– Sending data

– Resetting state machines

Dealing with caches

• – Register value changes cache becomes inconsistent

• Write- –

• Reads and writes cannot be combined into cache lines – – Line- – Even spurious reads trigger device state changes

Device register access must bypass the cache – PTEs – I/O space access isn’t cached anyway

Other challenges

1. How to avoid polling all the time?

2. How to avoid the CPU copying all the data?

–caches?

– Can the CPU get on with something else?

3.

– How are the physical addresses allocated?

Today: I/O Devices

• What is a device?

• Registers

– Example: NS16550 UART

• Interrupts

• PCI (Peripheral Component Interconnect)

• Summary

Avoiding polling

• Solution: device can interrupt the processor

–address

– vector

Interrupts

• CPU Interrupt-request line triggered by I/O device – - or level-triggered

• Interrupt handler receives interrupts

• Maskable to ignore or delay some interrupts

• Interrupt vector to dispatch interrupt to correct handler – Based on priority – Some nonmaskable

0 Divide error

1 Debug exception

2 Non-maskable interrupt (NMI)

3

4

5 Bound

6 Invalid opcode

7 Coprocessor not available

9 Coprocessor

A

B Segment not present

C

D

E Page Fault

F Reserved

10

11

12

13

14-1F Reserved to Intel

20-FF Available for external interrupts

Programmable Interrupt Controllers

INTA

INTR

NMI

D0 … D7

Device 1

Device 2

Device n

R

PIC

IRQ0

IRQ1

IRQ<n-1>

Today: I/O Devices

• What is a device?

• Registers

– Example: NS16550 UART

• Interrupts

• PCI (Peripheral Component Interconnect)

• Summary

• Avoid programmed I/O

• DMA controller

– Generally built-in these days

•device and memory

– Can save memory bandwidth

Old-

CPU

Cache

controller

controller

PCI bus

Frontside (memory) bus

1.

controller

2.

3.

4.byte to address A, increments A, decrements S.

5.

interrupts CPU to

complete

• Decoupling

– Doesn’t pollute CPU cache

– Can be processed when the CPU (or OS) decides

– Higher- •

• Possible disadvantages:

– Generally not a problem (even the UARTs

• inconsistent with CPU caches

• Options:

1. -cacheable large hit – probably wants to process data anyway

2.

3.

• physical – Appear on external bus

• User and OS code deal with virtual addresses (mostly)

• OS (and device drivers) must manually translate virtual

– – not be contiguous in

physical address space – Scatter-gather

• – – – –

Today: I/O Devices

• What is a device?

• Registers

– Example: NS16550 UART

• Interrupts

• PCI (Peripheral Component Interconnect)

• Summary

Where are all the registers?

• Where are the device registers in the physical address space?

– And which interrupt vectors correspond to them?

• Solution: built into the I/O bus design

– Example: PCI

PCI is…

Peripheral Component Interconnect

• – -X, PCI-Express, etc.

••

devices

• -

PCIe has succeeded PCI, but extends the same software-visible interface

PCI tries to solve many problems:

• Device discovery

– Finding out which devices are in the system

• Address allocation

– Which addresses should each device’s registers appear at?

• Interrupt routing

–which exception vectors?

Physical connections: PCI is a tree

CPU

PCI root bridge

PCI ro

USB

PCI-PCI bridge

SCSI Graphics CI-P

ridg

ridg

Sound

NIC

Wireless

• • Also:

• PCI-ISA • PCI-USB • PCI-SCSI • Etc.

PCI address space is flat

– Physical address space (32-bit or 64-bit)

– I/O address space (usually 16-bit)

• Bridges up the tree remap addresses to the device

• Result:

– • In memory space

-describing

• – Accessed through parent bridge, initially

• Plus: – – Interrupts – – Etc.

Bits Description

16 ID ( Intel, 3Com, NVidia, etc.

16

24

Finding all the devices

• Find the PCI “root complex” bridge – – Large PCs can have more than one

• – – – recurse

• Result is: –

Allocating addresses

– –

bridge’s range – Each bridge has a range which includes all it’s “children”. – Each range is aligned to some power- -2 boundary

• Then program: – – Each device with “base-address/range” (BAR) registers

PCI Interrupts

• Four interrupt lines – INTA, INTB, INTC, INTD… – – Translated by root bridge into system interrupt

• PCI Express introduces – -signalled interrupts – – Translated by root bridge into system interrupt – Interrupts can be individually steered to particular

cores/APICs

• PCI allows Bus Mastering

– Device can issue read/write transactions to anywhere in memory

– Even (in some cases) other PCI devices

– Principle applies: device

Today: I/O Devices

• What is a device?

• Registers

– Example: NS16550 UART

• Interrupts

• PCI (Peripheral Component Interconnect)

• Summary

Intelligent devices

• Devices can now autonomously access any:

– Other devices

• CPU Device interaction

– Extensive in-

Example: Intel e1000 PCI-Express Ethernet card

• • • • • in hardware • etc.

Summary

• Devices and the CPU communicate via:

– Interrupts and interrupt vectors

• I/O Interconnects (such as PCI):

– Allow devices to share/allocate physical addresses

– Allocate interrupts

– Permit bus mastering direct memory access