adapted from computer organization and design, patterson & hennessy ece232: hardware...

22
Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6 http://www.ecs.umass.edu/ece/ece232/

Upload: doreen-fitzgerald

Post on 18-Jan-2016

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

Adapted from Computer Organization and Design, Patterson & Hennessy

ECE232: Hardware Organization and Design

Part 17: Input/OutputChapter 6

http://www.ecs.umass.edu/ece/ece232/

Page 2: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 2 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

Anatomy: 5 components of any Computer

Memory Devices

Input

Output

Keyboard, Mouse

Display, Printer

Disk

Processor Control

DatapathProcessor

Cache

Memory - I/O Bus

MainMemory

I/OController

Disk Disk

I/OController

I/OController

Graphics Network

interruptsinterrupts

Page 3: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 3 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

Handling IO

Users like to connect devices to their computers• Keyboard, mouse, printer…

External devices may require attention from processor at unpredictable times• CPU doesn’t know when you’re about to hit a key

IO devices can be very fast or very slow Need to have a flexible way to control all devices

Page 4: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 4 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

I/O Device Examples and Speeds I/O Speed: bytes transferred per second

(from mouse to display: million-to-1)

Device Behavior Partner Data Rate (Mbit/sec)

Keyboard Input Human 0.0001Mouse Input Human 0.0038Laser Printer Output Human 3.2000Magnetic Disk Storage Machine 240-

2560Modem I or O Machine

0.016-0.064Network-LAN I or O Machine 100-

1000Graphics Display Output Human 800-8000

See Fig. 6.2 Text

Page 5: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 5 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

Parallel ATA(100 MB/sec)

Parallel ATA(100 MB/sec)

(20 MB/sec)

PCI bus(132 MB/sec)

CSA(0.266 GB/sec)

AGP 8X(2.1 GB/sec)

Serial ATA(150 MB/sec)

Disk

Pentium 4processor

1 Gbit Ethernet

Memorycontroller

hub(north bridge)

82875P

MainmemoryDIMMs

DDR 400(3.2 GB/sec)

DDR 400(3.2 GB/sec)

Serial ATA(150 MB/sec)

Disk

AC/97(1 MB/sec)

Stereo(surround-

sound) USB 2.0(60 MB/sec)

. . .

I/Ocontroller

hub(south bridge)

82801EB

Graphicsoutput

(266 MB/sec)

System bus (800 MHz, 604 GB/sec)

CD/DVD

Tape

10/100 Mbit Ethernet

Hardware Solution (875 Chipset)

Page 6: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 6 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

Disk Device Terminology Several platters, with information recorded magnetically on

both surfaces (usually)

Bits recorded in tracks, which in turn are divided into sectors (e.g., 512 Bytes)

Actuator moves head (end of arm, 1/surface) over track (“seek”), select surface, wait for sector rotate under head, then read or write• “Cylinder”: all tracks under heads

Platter

OuterTrack

InnerTrackSectorHeadArm

Actuator

Page 7: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 7 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

Disk Device Performance Disk Latency = Seek Time + Rotation Time + Transfer Time

+ Controller Overhead Seek Time - depends on no. tracks arm moves, seek speed Average no. tracks arm moves?

• Sum all possible seek distances from all possible tracks / total #• Assumes average seek distance is random• Disk industry standard benchmark

Rotation Time - depends on rotation speed, how far sector is from head

1/2 time of a rotation• Example: 7200 Revolutions Per Minute 120 Rev/sec• 1 revolution = 1/120 sec 8.33 milliseconds• 1/2 rotation (revolution) 4.16 ms

Transfer Time - depends on data rate (bandwidth) of disk (bit density), size of request

Page 8: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 8 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

Disk Performance Model /Trends

Capacity• + 100%/year (2X/1 yr)

Transfer rate (BW)• + 40%/year (2X/2 yrs)

Rotation + Seek time• – 8%/year (1/2 in 10 yrs)

MB/$• > 100%/yr (2X/<1.5 yr)

Page 9: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 9 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

Disk Performance Calculate time to read 1 sector (512B) for UltraStar 72 using

advertised performance; sector is on outer track Disk latency = average seek time + average rotational

delay + transfer time + controller overhead= 5.3 ms + 0.5 * 1/(10000 RPM) + 0.5 KB / (50 MB/s) + 0.15

ms = 5.3 + 3.0 + 0.10 + 0.15 ms = 8.55 ms

Page 10: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 10 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

Instruction Set Architecture for I/O Some machines have special input and output instructions Alternative model (used by MIPS):

• Input: ~ reads a sequence of bytes • Output: ~ writes a sequence of bytes

Memory also a sequence of bytes, so use loads for input, stores for output• Called “Memory Mapped Input/Output”

A portion of the address space dedicated to communication paths to Input or Output devices (no memory there)

These addresses are not regular memory, instead, they correspond to registers in I/O devices

0

0xFFFFFFFF

0xFFFF0000 cmd reg.data reg.

address

Page 11: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 11 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

Memory Mapped IO

Make control registers and I/O device data registers appear to be part of the system’s main memory• Reads and writes to the mapped region of the memory

are translated by memory controller hardware into accesses of hardware device

• Makes it easy to support variable numbers/types of devices – just map them onto different regions of memory

Accessing I/O device registers and memory can be done by accessing data structures via the device pointers

• Most device drivers are now written in C/C++. Memory mapped I/O makes this feasible without any changes to the way a CPU is programmed

Page 12: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 12 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

Processor-I/O Speed Mismatch 1 GHz microprocessor can execute 1000 million load or

store instructions per second, or 4 million KB/s data rate• I/O devices from 0.01 KB/s to 30,000 KB/s

Input: device may not be ready to send data as fast as the processor loads it• Also, might be waiting for human to act

Output: device may not be ready to accept data as fast as processor stores it

What to do?

Page 13: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 13 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

Processor Checks Status before Acting: Polling Path to device generally has 2 registers:

• 1 register says it’s OK to read/write (I/O ready), often called Control Register

• 1 register that contains data, often called Data Register

Processor reads from Control Register in loop, waiting for device to set Ready bit in Control reg to say its OK (0 1)

Processor then loads from (input) or writes to (output) data register• Load from device/Store into Data Register resets Ready

bit (1 0) of Control Register

Page 14: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 14 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

Cost of Polling? Assume: a 1 GHz processor takes 400 clock cycles for a

polling operation (call polling routine, accessing the device, and returning). Determine % of processor time for polling• Mouse: polled 30 times/sec - not to miss user movement• Hard disk: transfers data in 16-byte chunks and can

transfer at 8 MB/second. No transfer can be missed

Mouse Polling Clocks/sec = 30 * 400 = 12000 clocks/sec % Processor for polling = 12*103/1*109 = 0.0012% Polling

mouse has little impact on processor Times Polling Disk/sec = 8 MB/s /16B = 500K polls/sec Disk Polling Clocks/sec

= 500K * 400 = 200,000,000 clocks/sec % Processor for polling:

• 2*108/1*109 = 20% Unacceptable

Page 15: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 15 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

What is the alternative to polling? Interrupt Wasteful to have processor spend most of its time “spin-

waiting” for I/O to be ready Wish we could have an unplanned procedure call that would

be invoked only when I/O device is ready Solution: use exception mechanism to help I/O. Interrupt

program when I/O ready, return when done with data transfer

Polling is like picking up the phone every few seconds to see if you have a call. Interrupt is like letting the phone ring

Page 16: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 16 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

I/O Interrupt

Controller sends interrupt to the processor along with additional information• which device• nature of interrupt: error, no paper, no ink,…

Processor halts execution of current program Saves State Processor looks up which handler to start from the interrupt

information When interrupt is handled, returns to program state and

resumes

Page 17: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 17 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

Interrupt Driven Data Transfer

(1) I/Ointerrupt

(2) save PC

(3) interruptservice addr

Memory

add

sub

and

or

userprogram

read

store

...

jr

interruptserviceroutine

(4)

(5)

Page 18: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 18 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

Benefit of Interrupt-Driven I/O 500 clock cycle overhead for each transfer, including

interrupt. Find the % of processor consumed if the hard disk is only active 5% of the time

If interrupt rate = polling rate• Disk Interrupts/sec = 8 MB/s /16B

= 500K interrupts/sec• Disk Polling Clocks/sec = 500K * 500

= 250,000,000 clocks/sec• % Processor used during transfers: 250*106/1*107= 25%

If disk active 5% 5% * 25% 1.25% busy

Page 19: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 19 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

Interrupts – Multiple devices

Aggregates interrupts Prioritization

(network, keyboard,..)

Processor

AdvancedPriorityInterruptController(APIC)

Device 1

Device 2

Device n

Device i

Page 20: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 20 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

Interrupt vs. Polling

Which is better: Interrupts or Polling?• Interrupts are better if the processor has something else

to do and the time-to-response is not critical Polling is better if the processor has to respond to an event

ASAP• Polling is also used when data is expected at regular

intervals such as in a modem• Modem typically connects to a “com” port• The “com” port can be polled at expected intervals

Page 21: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 21 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

Direct Memory Access (DMA) How to transfer large amounts of data between a Device and

Memory? Waste of CPU cycles if done through CPU Let the device controller transfer data directly to and from

memory => DMA The CPU sets up the DMA transfer by supplying the type of

operation, memory address and number of bytes to be transferred

The DMA controller contacts the bus directly, provides memory address and transfers the data

Once the DMA transfer is complete, the controller interrupts the CPU to inform completion

Cycle Stealing – Bus gives priority to DMA controller thus stealing cycles from the CPU

Page 22: Adapted from Computer Organization and Design, Patterson & Hennessy ECE232: Hardware Organization and Design Part 17: Input/Output Chapter 6

ECE232: I/O-1 22 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011

OS control of I/O operations Low-level control of I/O device is complex because it requires

managing a set of concurrent events and because requirements for correct device control are often very detailed

I/O systems often use interrupts to communicate information about I/O operations and these can occur at a random time

The I/O system is shared by multiple programs using the processor

Would like I/O services for all user programs under safe control