adapted from computer organization and design, patterson & hennessy ece232: hardware...
TRANSCRIPT
Adapted from Computer Organization and Design, Patterson & Hennessy
ECE232: Hardware Organization and Design
Part 17: Input/OutputChapter 6
http://www.ecs.umass.edu/ece/ece232/
ECE232: I/O-1 2 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
Anatomy: 5 components of any Computer
Memory Devices
Input
Output
Keyboard, Mouse
Display, Printer
Disk
Processor Control
DatapathProcessor
Cache
Memory - I/O Bus
MainMemory
I/OController
Disk Disk
I/OController
I/OController
Graphics Network
interruptsinterrupts
ECE232: I/O-1 3 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
Handling IO
Users like to connect devices to their computers• Keyboard, mouse, printer…
External devices may require attention from processor at unpredictable times• CPU doesn’t know when you’re about to hit a key
IO devices can be very fast or very slow Need to have a flexible way to control all devices
ECE232: I/O-1 4 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
I/O Device Examples and Speeds I/O Speed: bytes transferred per second
(from mouse to display: million-to-1)
Device Behavior Partner Data Rate (Mbit/sec)
Keyboard Input Human 0.0001Mouse Input Human 0.0038Laser Printer Output Human 3.2000Magnetic Disk Storage Machine 240-
2560Modem I or O Machine
0.016-0.064Network-LAN I or O Machine 100-
1000Graphics Display Output Human 800-8000
See Fig. 6.2 Text
ECE232: I/O-1 5 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
Parallel ATA(100 MB/sec)
Parallel ATA(100 MB/sec)
(20 MB/sec)
PCI bus(132 MB/sec)
CSA(0.266 GB/sec)
AGP 8X(2.1 GB/sec)
Serial ATA(150 MB/sec)
Disk
Pentium 4processor
1 Gbit Ethernet
Memorycontroller
hub(north bridge)
82875P
MainmemoryDIMMs
DDR 400(3.2 GB/sec)
DDR 400(3.2 GB/sec)
Serial ATA(150 MB/sec)
Disk
AC/97(1 MB/sec)
Stereo(surround-
sound) USB 2.0(60 MB/sec)
. . .
I/Ocontroller
hub(south bridge)
82801EB
Graphicsoutput
(266 MB/sec)
System bus (800 MHz, 604 GB/sec)
CD/DVD
Tape
10/100 Mbit Ethernet
Hardware Solution (875 Chipset)
ECE232: I/O-1 6 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
Disk Device Terminology Several platters, with information recorded magnetically on
both surfaces (usually)
Bits recorded in tracks, which in turn are divided into sectors (e.g., 512 Bytes)
Actuator moves head (end of arm, 1/surface) over track (“seek”), select surface, wait for sector rotate under head, then read or write• “Cylinder”: all tracks under heads
Platter
OuterTrack
InnerTrackSectorHeadArm
Actuator
ECE232: I/O-1 7 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
Disk Device Performance Disk Latency = Seek Time + Rotation Time + Transfer Time
+ Controller Overhead Seek Time - depends on no. tracks arm moves, seek speed Average no. tracks arm moves?
• Sum all possible seek distances from all possible tracks / total #• Assumes average seek distance is random• Disk industry standard benchmark
Rotation Time - depends on rotation speed, how far sector is from head
1/2 time of a rotation• Example: 7200 Revolutions Per Minute 120 Rev/sec• 1 revolution = 1/120 sec 8.33 milliseconds• 1/2 rotation (revolution) 4.16 ms
Transfer Time - depends on data rate (bandwidth) of disk (bit density), size of request
ECE232: I/O-1 8 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
Disk Performance Model /Trends
Capacity• + 100%/year (2X/1 yr)
Transfer rate (BW)• + 40%/year (2X/2 yrs)
Rotation + Seek time• – 8%/year (1/2 in 10 yrs)
MB/$• > 100%/yr (2X/<1.5 yr)
ECE232: I/O-1 9 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
Disk Performance Calculate time to read 1 sector (512B) for UltraStar 72 using
advertised performance; sector is on outer track Disk latency = average seek time + average rotational
delay + transfer time + controller overhead= 5.3 ms + 0.5 * 1/(10000 RPM) + 0.5 KB / (50 MB/s) + 0.15
ms = 5.3 + 3.0 + 0.10 + 0.15 ms = 8.55 ms
ECE232: I/O-1 10 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
Instruction Set Architecture for I/O Some machines have special input and output instructions Alternative model (used by MIPS):
• Input: ~ reads a sequence of bytes • Output: ~ writes a sequence of bytes
Memory also a sequence of bytes, so use loads for input, stores for output• Called “Memory Mapped Input/Output”
A portion of the address space dedicated to communication paths to Input or Output devices (no memory there)
These addresses are not regular memory, instead, they correspond to registers in I/O devices
0
0xFFFFFFFF
0xFFFF0000 cmd reg.data reg.
address
ECE232: I/O-1 11 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
Memory Mapped IO
Make control registers and I/O device data registers appear to be part of the system’s main memory• Reads and writes to the mapped region of the memory
are translated by memory controller hardware into accesses of hardware device
• Makes it easy to support variable numbers/types of devices – just map them onto different regions of memory
Accessing I/O device registers and memory can be done by accessing data structures via the device pointers
• Most device drivers are now written in C/C++. Memory mapped I/O makes this feasible without any changes to the way a CPU is programmed
ECE232: I/O-1 12 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
Processor-I/O Speed Mismatch 1 GHz microprocessor can execute 1000 million load or
store instructions per second, or 4 million KB/s data rate• I/O devices from 0.01 KB/s to 30,000 KB/s
Input: device may not be ready to send data as fast as the processor loads it• Also, might be waiting for human to act
Output: device may not be ready to accept data as fast as processor stores it
What to do?
ECE232: I/O-1 13 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
Processor Checks Status before Acting: Polling Path to device generally has 2 registers:
• 1 register says it’s OK to read/write (I/O ready), often called Control Register
• 1 register that contains data, often called Data Register
Processor reads from Control Register in loop, waiting for device to set Ready bit in Control reg to say its OK (0 1)
Processor then loads from (input) or writes to (output) data register• Load from device/Store into Data Register resets Ready
bit (1 0) of Control Register
ECE232: I/O-1 14 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
Cost of Polling? Assume: a 1 GHz processor takes 400 clock cycles for a
polling operation (call polling routine, accessing the device, and returning). Determine % of processor time for polling• Mouse: polled 30 times/sec - not to miss user movement• Hard disk: transfers data in 16-byte chunks and can
transfer at 8 MB/second. No transfer can be missed
Mouse Polling Clocks/sec = 30 * 400 = 12000 clocks/sec % Processor for polling = 12*103/1*109 = 0.0012% Polling
mouse has little impact on processor Times Polling Disk/sec = 8 MB/s /16B = 500K polls/sec Disk Polling Clocks/sec
= 500K * 400 = 200,000,000 clocks/sec % Processor for polling:
• 2*108/1*109 = 20% Unacceptable
ECE232: I/O-1 15 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
What is the alternative to polling? Interrupt Wasteful to have processor spend most of its time “spin-
waiting” for I/O to be ready Wish we could have an unplanned procedure call that would
be invoked only when I/O device is ready Solution: use exception mechanism to help I/O. Interrupt
program when I/O ready, return when done with data transfer
Polling is like picking up the phone every few seconds to see if you have a call. Interrupt is like letting the phone ring
ECE232: I/O-1 16 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
I/O Interrupt
Controller sends interrupt to the processor along with additional information• which device• nature of interrupt: error, no paper, no ink,…
Processor halts execution of current program Saves State Processor looks up which handler to start from the interrupt
information When interrupt is handled, returns to program state and
resumes
ECE232: I/O-1 17 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
Interrupt Driven Data Transfer
(1) I/Ointerrupt
(2) save PC
(3) interruptservice addr
Memory
add
sub
and
or
userprogram
read
store
...
jr
interruptserviceroutine
(4)
(5)
ECE232: I/O-1 18 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
Benefit of Interrupt-Driven I/O 500 clock cycle overhead for each transfer, including
interrupt. Find the % of processor consumed if the hard disk is only active 5% of the time
If interrupt rate = polling rate• Disk Interrupts/sec = 8 MB/s /16B
= 500K interrupts/sec• Disk Polling Clocks/sec = 500K * 500
= 250,000,000 clocks/sec• % Processor used during transfers: 250*106/1*107= 25%
If disk active 5% 5% * 25% 1.25% busy
ECE232: I/O-1 19 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
Interrupts – Multiple devices
Aggregates interrupts Prioritization
(network, keyboard,..)
Processor
AdvancedPriorityInterruptController(APIC)
Device 1
Device 2
Device n
Device i
ECE232: I/O-1 20 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
Interrupt vs. Polling
Which is better: Interrupts or Polling?• Interrupts are better if the processor has something else
to do and the time-to-response is not critical Polling is better if the processor has to respond to an event
ASAP• Polling is also used when data is expected at regular
intervals such as in a modem• Modem typically connects to a “com” port• The “com” port can be polled at expected intervals
ECE232: I/O-1 21 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
Direct Memory Access (DMA) How to transfer large amounts of data between a Device and
Memory? Waste of CPU cycles if done through CPU Let the device controller transfer data directly to and from
memory => DMA The CPU sets up the DMA transfer by supplying the type of
operation, memory address and number of bytes to be transferred
The DMA controller contacts the bus directly, provides memory address and transfers the data
Once the DMA transfer is complete, the controller interrupts the CPU to inform completion
Cycle Stealing – Bus gives priority to DMA controller thus stealing cycles from the CPU
ECE232: I/O-1 22 Adapted from Computer Organization and Design, Patterson&Hennessy, Kundu,UMass Koren, 2011
OS control of I/O operations Low-level control of I/O device is complex because it requires
managing a set of concurrent events and because requirements for correct device control are often very detailed
I/O systems often use interrupts to communicate information about I/O operations and these can occur at a random time
The I/O system is shared by multiple programs using the processor
Would like I/O services for all user programs under safe control