i/o subsystem · ae0b36apo computer architectures 5 what is the main task of i/o subsystem?...
TRANSCRIPT
1AE0B36APO Computer Architectures
Computer Architectures
I/O subsystem
Miroslav Šnorek, Pavel Píša, Michal Štepanovský
Ver.1.10
Czech Technical University in Prague, Faculty of Electrical Engineering
English version partially supported by:European Social Fund Prague & EU: We invests in your future.
2AE0B36APO Computer Architectures
This lecture topics?
● Computer subsystem interconnection● Buses, examples of different platforms (VME)
● PC standard example● PCI● Why? Introduction for laboratory experiments
● PCI innovations● PCIe● Why? Introduction for laboratory experiments● Further innovations● Hypertransport, Infiniband, QPI● Why? For you to know
3AE0B36APO Computer Architectures
Motivation
Goal of today lectureunderstandbus interconnectiontechnologies
4AE0B36APO Computer Architectures
Motivation – Intel – Only as an example
5AE0B36APO Computer Architectures
What is the main task of I/O subsystem?
● Interconnection of subsystems inside computer, connection of external peripherals and computers together
● Demands on I/O subsystem:
Creating optimal data paths, especially critical for the most demanding peripherals (graphic cards, external memories)
● Possible solutions:
There have to be compromises due to price/performance ratio when it is
● possible to share data paths, or● advantageous to share them.
6AE0B36APO Computer Architectures
Single cycle processor form lecture 2
MemWriteMemToReg
BranchALUControl 2:0ALUScrRegDestRegWrite
31:26
5:0
Control UnitOpcod
e
Funct
4
PC’ PC Instr25:21
20:16
20:16
15:11
15:0
SrcA
SrcB
Zero
AluOutM
WriteDataWriteReg
SignImmPCPlus4D
PCBranchPCPlus4E
AluOutW
ReadData
Result
PCPlus4F
RtRd
Instr. Memory
A RD
Data Memory
A RD
WD
WE
Reg. File
A1 RD1
A2 RD2A3WD
3
WE3
+
+
01
01
01
01
Sign Ext <<2
ALU
Main memory is not part of the CPU…
7AE0B36APO Computer Architectures
Single cycle processor form lecture 2
Data Memory
MemWriteMemToReg
BranchALUControl 2:0ALUScrRegDestRegWrite
31:26
5:0
Control UnitOpcod
e
Funct
4
PC’ PC Instr25:21
20:16
20:16
15:11
15:0
SrcA
SrcB
Zero
AluOutM
WriteDataWriteReg
SignImmPCPlus4D
PCBranchPCPlus4E
Result
PCPlus4F
RtRd
A RDA RD
WD
WE
Reg. File
A1 RD1
A2 RD2A3WD
3
WE3
+
+
01
01
01
01
Sign Ext <<2
ALU
ReadData
AluOutW
64 signals are required to connect momory (control ones not counted)
Instr. Memory
A RD
A RD
WD
WE
The address bus width is 32 bits, data path is 32 bits wide too as well as instruction!
8AE0B36APO Computer Architectures
Some other examples and solutions● Do you know Parallel ATA (PATA)?● Integrated Drive Electronics (IDE) nebo EIDE
(Enhanced IDE) from Western Digital may be more known term
● ATA = Advanced Technology Attachment
• It has been the most used interface to connect hard-drives and optical units
• 40-pin header -> 40 leads (16 of these for data)
• later 80 leads used for better signal integrity (shielding), but 40-pin header preserved
9AE0B36APO Computer Architectures
Serial ATA – Solution used today
http://en.wikipedia.org/wiki/Serial_ATA
● Serial ATA (SATA) more used today● SATA 1.0: 150 MB/s (PATA:130MB/s)● SATA 2.0: 300 MB/s● SATA 3.0: 600 MB/s● SATA 3.2: about 2 GB/s
• Interface used for drives and optical units connection today
• 7 leads only!!! Pin Mating Function
1 1st Ground
2 2nd A+ (Transmit)
3 2nd A− (Transmit)
4 1st Ground
5 2nd B− (Receive)
6 2nd B+ (Receive)
7 1st Ground
10AE0B36APO Computer Architectures
Interfacing terminology – Important terms:
● Interface● Common communication part shared by two systems,
equipment or programs.● Includes also boundary and supporting control elements
necessary for their interconnection.● Bus × point-to-point connection.● Address, data, control bus.● Multiplexed/separate bus.● Processor, system, local, I/O bus.● Bus cycle, bus transaction.● Open collector, 3-state output, switched multiplexers
11AE0B36APO Computer Architectures
Reminder: bus x point-to-point connection
Bus – shared data path
Point-to-point connection
Many different combinations in between in real systems
Remark: logical topology as seen from computer system inside (OS, programs) can differ from the physical topology
12AE0B36APO Computer Architectures
PC architecture (cca 2000+) … based on PCI
Buses
Source: http://computer.howstuffworks.com/pci.htm
Point-to-point
13AE0B36APO Computer Architectures
PCIe architecture - bus is replaced by shared switch
MicroprocessorRoot
complex
Endpoint
Endpoint
Endpoint RAM
RAM
RAM
Endpoint
Endpoint End
point
Endpoint
Endpoint
Endpoint
Endpoint
Switch
Microprocessor
Busbridge
L2cache
RAM
Front Side bus (FSB)
AGPchipset
Memorycontroller
RAM
RAMGraphic
controller
Busbridge
PCI Bus ISA Bus
Sys
tem
mem
ory
Back Side bus (BSB)
PC
I de
vice
s
ISA
dev
ices
PCIe = PCI Express
More details later …
14AE0B36APO Computer Architectures
Another bus example
Microcomputer control system VME
15AE0B36APO Computer Architectures
VME Bus (and system)
Students of KyR surely will meet soon (avionic, railway, industry)
16AE0B36APO Computer Architectures
VME milestones
Version Max. throughput
VME32(IEEE-1014)
40 MB/s
VME64 80 MB/s
VME2eSST 320+ MB/s
Memory Processor MemoryDisc
ControllerI/O
Controller
ProcessorI/O
ControllerMemory
VMX bus
VMX bus
VME bus
VMS
17AE0B36APO Computer Architectures
PC platform: some details concerning PCI
18AE0B36APO Computer Architectures
PCI signals – 32-bit version
PCIdevice interrupts
INTA#
INTB#
INTC#
INTD#
AD[31::00]
C/BE[3::0]#
FRAME#
TRDY#
IRDY#
STOP#
DEVSEL#
IDSEL#
PERR#
SERR#
REQ#
GNT#
CLK#
RST#
addressand data
PAR
interfacecontrol
errorreporting
accessarbitration
(master only)
system
19AE0B36APO Computer Architectures
PCI - architecture
C/BE[3::0]#
Control
AD[31::00]
Slot 1ID
SE
L
RE
Q#
GN
T#
Slot 2
IDS
EL
RE
Q#
GN
T#
Slot 4
IDS
EL
RE
Q#
GN
T#
Slot 3
IDS
EL
RE
Q#
GN
T#
PCI
Bus access arbiter
Interrupt subsystemIRQ
IDSEL
20AE0B36APO Computer Architectures
PCI bus cycle example
AD bus is - bidirectional and - multiplexed (explanation later)
21AE0B36APO Computer Architectures
Bus-line electronics - buffer
uni-directional
bi-directional
22AE0B36APO Computer Architectures
IO output: TP and OC? Practical examples:
TP – Totem Pole
OC – Open Collector
Vdd Vdd
A B
A
B
Vss
Out
Internal structure of two inputsNAND chip from TTL 7400 family
Internal structure NAND circuit equippedby open collector output.Pull up resistor is connected externally.
23AE0B36APO Computer Architectures
Study of conflict: bi-directional transfer on a single bus line
+Vcc +Vcc
1 0
No problem with parallel inputs.
There can be short circuit on outputs!
Solution?
Inputs
Outputs
& &
& &
switched on
switched on
24AE0B36APO Computer Architectures
One solution: output circuit as OC x tri-state
OC – Open collector 3S – Tri-state output
OE = Output Enable
Bus line
Vin
+Vcc
Bus line
+Vcc
OE
25AE0B36APO Computer Architectures
Bus switches and multiplexers
OE
A B
Two-Port
OE
1A
S
11B
MUX/DeMUX
1B2
Control
Logic
1A1
Bus Exchange
OE
S
1B2
1B1
1A2
Control
Logic
Source:http://www.ti.com/signalswitches
Texas Instruments Digital Bus Switch Selection Guide
26AE0B36APO Computer Architectures
Data transfer synchronization
27AE0B36APO Computer Architectures
How to synchronize/recognize valid data on the bus?
● Possible transfer types:● asynchronous,● synchronous,
– pseudo-synchronous,● isochronous (not focus
of this lecture).
● Remark:● usually we call these timing
diagrams as bus protocol● usually only one bus transfer cycle
is analyzed
28AE0B36APO Computer Architectures
Synchronization options: synchronous transfer
29AE0B36APO Computer Architectures
Synchronization options: asynchronous transfer
30AE0B36APO Computer Architectures
How to slow down synchronous variant
Status signals used by transfer participants (initiator or target of the transfer request) requests to insert WAIT-cycles
Pseudo-synchronous transfer variant
What to do when data source or target cannot process given data at clock rate
31AE0B36APO Computer Architectures
Bus/signal lines reuse/multiplexing
C/BE[3::0]#
Control
AD[31::00]
Slot 1 Slot 2 Slot 4Slot 3PCI
Bus access arbiter
Interrupt subsystemIRQ
IDSEL
AD[31::00] signals are used for address transfer during initial transfer phase, then they are used for data
32AE0B36APO Computer Architectures
PCI terms and definitions I.
● Two devices participate in each bus transaction:● Initiator (starts the transaction) × Target (obeys request)● Initiator = Bus Master,● Target = Slave.
● Initiator and target role does not directly impose data source and receiver role!
● What does it mean that bus is multimaster?● More participants can act as Bus Master – initiate transfers!
● When bus signals are shared, then transactions need to be serialized and only one device can act as master at any given time instant
● Bus master is responsible to grant bus to requesting participant● Who takes care of bus arbitration?
● One dedicated device or bus backplane● Or decentralized arbitration is possible for some buses (not PCI)
33AE0B36APO Computer Architectures
PCI terms and definitions II.
● The rising clock edge is reference/trigger point for all phase of the bus cycle/transaction timing
● Bus cycle is formed in most cases by● address phase● data phase
● Premature termination of data transfer is possible during bus cycle
● Transfer synchronization itself is pseudo-synchronous
34AE0B36APO Computer Architectures
Bus cycle kind/direction – command – specified by C/BE
C/BE[0::3]# Bus command (BUS CMD)
0000 Interrupt Acknowledge
0001 Special Cycle
0010 I/O Read
0011 I/O Write
0100 Reserved
0101 Reserved
0110 Memory Read
0111 Memory Write
1000 Reserved
1001 Reserved
1010 Configuration Read (only 11 low addr bits for fnc and reg + IDSEL)
1011 Configuration Write (only 11 low addr bits for fnc and reg + IDSEL)
1100 Memory Read Multiple
1101 Dual Address Cycle (more than 32 bits for address – i.e. 64-bit)
1110 Memory Read Line
1111 Memory Write and Invalidate
35AE0B36APO Computer Architectures
PCI bus memory write timing
36AE0B36APO Computer Architectures
Some remarks and observations
● The length of the transferred data block is controlled by the FRAME signal. It is negated (de-asserted) before last word transfer by initiator.
● Single world or burst transfer can be delayed by inserting wait cycles (pseudo-synchronous synchronization)!
37AE0B36APO Computer Architectures
PCI bus memory read timing
This transaction is going to be captured and shown by logic analyzer during labs
38AE0B36APO Computer Architectures
Interrupt acknowledge cycle
● Processor needs time to save interrupted execution state to allow state restoration after return from service routine
● Interrupt controller provides vector number (assigned to the asynchronous event) and CPU is required to translate it to the interrupt service routine start address
● All these activities require some time
39AE0B36APO Computer Architectures
Interrupt acknowledge cycle timing
40AE0B36APO Computer Architectures
Some more details about standard PC PIC IRQ routing
INTA#INTB#INTC#INTD#
PCI#4
INTA#INTB#INTC#INTD#
PCI#3
INTA#INTB#INTC#INTD#
PCI#4
INTA#INTB#INTC#INTD#
PCI#4
PCI slots
IRQXIRQYIRQZ
IRQW
IRQ2,9IRQ3IRQ4IRQ5IRQ6IRQ7
IRQ10IRQ11IRQ12IRQ14IRQ15In
terr
upts
fro
m m
ainb
oard
in
teg
rate
d p
erip
hera
ls
Inte
rrup
t re
que
sts
mu
ltipl
exor
(P
CI
chip
set)
PIC
825
9#1
TimerKeyboard
CMOS RTC
0
1234567
INTR
to CPU
PIC
825
9#2
0
1234567
NXP
41AE0B36APO Computer Architectures
Standard PC PIC interrupt vectors assignment
Common interrupt numbers for PC category computers:
IRQ
0 Timer (for scheduler, timers)
1 Keyboard
2 i8259 cascade interrupt
8 Real-time clocks (CMOS wall time)
9 Available or SCSI controller
10,11 Available
12 Available or PS/2 mouse
13 Available or arithmetics co-processor
14 1-st IDE controller
15 2-nd IDE controller
3 COM2
4 COM1
5 LPT2 or available
6 Floppy disc controller
7 LPT1
42AE0B36APO Computer Architectures
Recapitulation of the bus description steps
● Notice: we started PCI description by bus topology analysis, PCI signals and then we moved to timing diagrams
● simpler cases first (read and write)● then how bus access is arbitrated ● the special functions (interrupt acknowledge) last
● Doc. Šnorek recommends: always follow these steps when trying to learn new bus technology.
43AE0B36APO Computer Architectures
PCI bus timing laboratory exercise
44AE0B36APO Computer Architectures
Some more notes regarding the bus hardware realization
45AE0B36APO Computer Architectures
How signals are transferred
Single ended (asymmetric) versus. differential (symmetric)
English term Signalling
46AE0B36APO Computer Architectures
Typical signaling levels and some speed considerations
Differential signaling Single ended signaling
47AE0B36APO Computer Architectures
PCIe
48AE0B36APO Computer Architectures
MicroprocessorRoot
complex
Endpoint
Endpoint
Endpoint RAM
RAM
RAM
Endpoint
Endpoint End
point
Endpoint
Endpoint
Endpoint
Endpoint
Switch
PCIe architecture
Source: http://computer.howstuffworks.com/pci-express.htm
49AE0B36APO Computer Architectures
PCIe topology and components
PCI/PCI-X
Switch
Memory
PCI ExpressEndpoint CPU Root
Complex
CPU
LegacyEndpoint
PCI ExpressEndpoint
PCI ExpressEndpoint
LegacyEndpoint
PCI Express-PCIBridge
PCI Express
PCI Express
PCI Express
PCI Express
PCI Express
PCI Express
50AE0B36APO Computer Architectures
Why switch to serial data transfer
51AE0B36APO Computer Architectures
PCIe transfers signaling, PCIe lanes
● Link interconnect switch with exactly one device
● Differential AC signaling is used – two wires for single direction
● Each link consists of one or more lanes
● Lane consists of two pair of wires
● One pair for Tx and other one for Rx
● Data are serialized by 8/10 code● The separate pairs allow full-
duplex operation/transfers
52AE0B36APO Computer Architectures
PCIe physical link layer
Differential full-duplex physical layer
8b/10b encoding provides enough edges for clock signal reconstruction/synchronization and balanced number of ones and zeros. This ensures zero common signal (DC) and AC (only) coupling is possible
53AE0B36APO Computer Architectures
PCIe physical layer model
Source: Budruk, R., at all: PCI Express System Architecture
54AE0B36APO Computer Architectures
PCIe characteristics
● 2.5 GHz clock frequency (1 GT/s), raw single link single direction bandwidth 250 MB/s (can be multiplied by parallel lanes – 2×, 4×, 8×)
● Single link efficient data rate is 200 MB/s, this is 2× … 4× more than for classic PCI
● The bandwidth is not shared, point to point interconnection● Two pairs of wires, differential signaling● Data are encoded (modulated) using 8b/10b code● Expected up to 10 GHz clocks due technology advances● PCI Express 2.x (2007) allows 5 GT/s (5 GHz clock)● PCI Express 3.x (started at 2010) increases it to 8 GT/s● Encoding changed from 8b/10b (20% bandwidth used by
encoding) to "scrambling" and 128b/130b encoding (takes only 1.5% of the bandwidth)
55AE0B36APO Computer Architectures
Picture source: Wikipedia
PCIe x4
PCIe x16
PCIe x1
PCIe x16
32-bit classicPCI slot
LanParty nF4 Ultra-D mainboard from DFI
56AE0B36APO Computer Architectures
The PCIe board DB4CGX15 used in your semester work
● PCIe x1● Altera
EP4CGX15 BF14C6N FPGA
● EPCS16 Configuration device
● 32Mbyte DDR2 SDRAM
● 20 I/O Pins● 4 Input Pins● 2 User LEDs
57AE0B36APO Computer Architectures
PCIe communication protocol – hardware and software layers
PCIe physical topology is serial, point-to-point, packet oriented, but its logical view and behavior is the same as PCI – that is multi-master bus, same enumeration, read/write cycles
58AE0B36APO Computer Architectures
Data packet structure and transport layers
Source: http://zone.ni.com/devzone/cda/tut/p/id/3767#toc0
59AE0B36APO Computer Architectures
PCIe packet format
Specification highlights:
• Packet length from 4B to 4096B• PCIe packed has o be exchanged as the whole (no option to
preempt when higher priority transfer is requested) • Long packed can cause increase of latencies in the system• On the other hand, short packets overhead (frame/data) is
considerable
60AE0B36APO Computer Architectures
Other, more advanced, architectures and PC future
61AE0B36APO Computer Architectures
What are future options?
● In PC development, 15 years is quite a long time and there have been significant successes, but what about the future?
● There are already other technologies in use (some in PC systems already), many originating from supercomputers:
● PCI-X, ● HyperTransport● InfiniBand● QPI
● More:● Advanced Computer Architectures (AE4M36PAP) course
62AE0B36APO Computer Architectures
InfiniBand vs. HyperTransport
63AE0B36APO Computer Architectures
Miscellaneous information regarding buses
64AE0B36APO Computer Architectures
Nowadays, using high speed, serial bus is quite common.
FireWire
DVI-D, HDMIswitch from analog VGA/ RGB
Serial attached SCSIswitch from parallel (wide, differential) SCSI Switch from parallel ATA
Replacement of slow RS-232 serial, PS/2.General purpose now:disc, video, etc.
65AE0B36APO Computer Architectures
Key characteristics of the dominant bus technologies
Parameter Firewire USB 2.0 PCI Express
SATA Serial Attached SCSI
Expected use external internal internal internal both
Max. nodes number 63 127 1 1 4
No. of wires (fundamental only)
4 2 4 4 4
Raw bandwidth 50 MB/s100 MB/s
0,2 MB/s1,5 MB/s60 MB/s
250 MB/s 1x
300 MB/s
300 MB/s
Hot-plug attach? yes yes (yes) yes yes
Max. bus distance 4,5 m 5 m 0,5 m 1 m 8 m
Official standard designation
IEEE 1394
USB PCI-SIGPCI
SATA-IO T10 committee
66AE0B36APO Computer Architectures
Key characteristics – another view
67AE0B36APO Computer Architectures
Another comparison of bus standards