1 microprocessor-based systems course 4 - microprocessors
Post on 19-Dec-2015
228 views
TRANSCRIPT
1
Microprocessor-based Systems
Course 4 - Microprocessors
2
Microprocessors
Definition 1: It is a VLSI circuit that integrates a central
processing unit (CPU) Definition 2:
An integrated circuit that integrates: one or more central processing units (CPUs)
Symmetric multiprocessor architecture Asymmetric multiprocessor architecture
Cache memory Other components:
Interrupt controller, Bus management unit, Memory Management unit (MMU)
3
Microprocessors -
First microprocessor: Intel Company, I4004 – 4 bits organization
First successful microprocessor: Intel I8080 – 8 bits processor
First 16 bits processor Intel I8086 –
First 32 bit processor Intel I80386
Superscalar microprocessor architecture Pentium Pro
64 bits processors, multi-core architectures Pentium IV, dual core, Core Duo
4
Components of a microprocessor
Traditional components: Control Unit (CU) Arithmetical and Logical Unit (ALU) General and special Registers (GR, SR)
Supplementary components: Cache memories (Cache)
high speed low capacity memories hierarchical organization on 2-3 levels
Mathematical co-processor (CoP) for floating point arithmetic
Memory Management Unit (MMU) controls the traffic (instructions and data) between
the main memory and the cache memory Interrupt controller
handles internal and external events synchronize the processor with I/O interfaces
5
Signals of a microprocessor – the System Bus
Address bus Micro- Data bus processor Comand & control bus Memory I/O modules Interfaces Peripheral devices
Generic scheme of a microprocessor-based system
6
Typical signals for a microprocessor
Address Bus arbitration signals signals Data Micro- Status signals signals procesor Clock signals Command signals Other signals
Interrupt signals Supply signals
Signals of a microprocessor
7
Typical signals for a microprocessor
Address signals: A0-An Used for specifying memory locations or I/O ports (registers) Generated by the microprocessor to other components in order
to address them (read or write operations) The number of address lines determine the maximum addressing
space of a microprocessor Ex: 20 lines=> 1MB 32 lines =>4GB
Data signals: D0-Dm Bidirectional lines used to transfer instruction codes and data
between the microprocessor and the other components of the system
The number of data lines is usually in accordance with the internal organization of the processor (there are also exceptions, see 8088, Pentium Pro)
The number of data lines determine the maximum width of a data transferred on a bus
Ex: 8, 16, 32, 64 lines
8
Typical signals for a microprocessor Command and control signals
Command signals: MRDC\, MWTC\, IORC\, IOW\, INTA\ determine memory and interface read and write cycles very important signals, similar signals for any microprocessor
Control signals: ALE (Address Latch Enable), DEN (Data enable)
help controlling the address and data amplifiers specific for every microprocessor
Interrupt signals: INTR, NMI Clock signals: CLK, PCLK
Power supply signals: GND +5V, 3,3V
9
Instructions execution Steps:
Instruction fetch Operands read Operation execution Write the result
Seen from outside: Instruction fetch cycle – read from the memory - mandatory Operand(s) read - optional Write the result - optional
Transfer cycle (on the bus) o a transfer on the bus that involve:
Processor and memory or Processor and an I/O interface
A cycle has a fixed number of clock periods (determined by the microprocessors architecture)
it may be extended on request with an integer number of clock periods, if a slow module is addressed (e.g. EPROM memory)
A cycle is a sequence of signal activations on the bus (address, data and command)
a cycle is described by a time diagram
10
Processors of the Intel x86 family I8086 and I8088
EU BIU AH AL AX BH BL BX CH CL CX CS DH DL DX DS SI ES DI SS BP IP SP IR Ext. Bus Temp.Reg Ctrl. Control ALU Unit 1,2,3,4, .. Instruction queue State reg.
Internal structure of the I8086 and I8088
11
I8086, I8088 I8086
16 bits processor with 16 data lines, 20 address lines (1MB addressing space)
40 pins integrated circuit Supporting circuits:
8087 – mathematic co-processor (floating point) 8288 – bus controller 88289 – bus arbiter
Structure: EU –Execution Unit – dedicated for instruction execution
CU, ALU, general registers, state register BIU – Basic Interface Unit – a unit responsible for the
operations (transfer cycles) with the external bus transfers instructions (in advance) and data contains:
Special registers (segment registers, IP) Instruction queue, bus amplifiers
8088 identical with 8086 but with 8 data signals on the external bus
12
I80286 16 bits processor 16 data lines, 24 address lines (16MB addressing
space) Working modes: real and protected (privileged)
Addressing unit Interfacing unit
Data ampl. External Address ampl. Bus Bus control
Execution unit Instruction unit Instr. Instr. queue decode
Internal structure of the I80286 processor
13
I80386
32 bits processor, 32 data lines, 32 address lines (4GB addressing space)
General registers extended to 32 bits 2 extra segment registers (FS and GS) Protected mode improved
Segmenting Paging unit unit Execution Interface unit unit Decoding Instr. prefetch unit unit
Internal structure of the I80386 processor
14
I80486
Integrates: processor + co-processor + MMU Enables the use of cache memory Protected mode improved
Segmenting Paging unit unit Integer exec. unit Cache Bus Unit interf. Float unit exec. unit Instr. Instr. Decoder prefetch u.
Internal structure of the I80486
15
Pentium
Two pipelines: U (integers) and V (floats) 64 bits external bus (for a 32 bits processor) Versions:
Pentium –2 pipeline architecture Pentium Pro Pentium II - superscalara P6 architecture Pentium III Pentium IV – NetBurst architecture I7 - multicore
16
Pentium Processors
Pentium Pro Superscalar P6 architecture (CPI<1) Dynamic instruction execution:
Data flow analysis Branch prediction Speculative execution of instructions
Pentium II MMX technology:
a SIMD execution unit dedicated for multimedia data Parallel (SIMD) execution of arithmetic operations 57 new MMX instructions
Pentium III SSE2 technology
Parallel execution (SIMD) on floating point variables good for 2D/3D graphics
17
P6 superscalar architecture
3 autonomous units Speculative execution
R e tire m e n t u n it
Instruction fetch and
decode unit
Instruction dispatch and execute unit
Instruction pool
Functional blocks of the P6 architecture
18
Detailed view of the P6 architecture System bus L2 Cache Bus interface unit (BIU) L1 ICache L1 DCache
Instruction dispatch and execute unit
Retirement unit
Instruction fetch and
decode unit
In s tru c t io n P o o l
19
Instruction fetch and decoding unit
Fetch and decode instructions in advance
In-order unit 3 instructions
decoded /clock Branch prediction Components:
Decoder (3 units) Address generator unit
(next_IP) Branch target buffer Micro-operation
sequencer Alias registers allocator
From BIU (Basic Interface Unit) L1 ICache Next_IP Branch Instruction target Decoder buffer (x3) Micro-operations sequencer To the instruction Alias reg. pool allocator
Instruction fetch and decoding unit
20
Instruction dispatch and execute unit Responsible for instruction
execution Out-of-order unit 7 execution units + reservation
station IEU – Integer Execution Unit FEU – Floating-point Execution
Unit MMX – Multimedia execution
unit AGU – Address generation unit JGU – Jump generation unit
Reservation station MMX FEU Port 0 IEU Instruction MMX pool JEU Port 1 IEU Port 2 AGU read Port 3,4 AGU write
Instruction dispatch and execute
21
Retirement Unit
Reestablish the normal order of the instructions (of results)
In-order unit Components:
MIU – memory interface unit
RRF – Retirement register file
DCache Reservation UIM station RRF Instruction pool
Retirement unit
22
The P6 Bus
The main elements of the P6 bus: the bus works in a synchronous mode; every signal
is considered on clock signal edges transfers are made through transactions that may
be executed in parallel it is a multi-processor bus; more processors on the
same bus block transfers are preferred there are error detection and correction
mechanisms there are mechanisms that assure cache memory
consistency a new digital technology (different amplifiers) that
assure high frequency transmissions on bus
23
Transfer on the P6 bus
Parallel transactions (pipeline) Phases:
Arbitration Transfer request Snooping Error Response Transfer
Technology: GTL (instead of TTL)
24
Time diagram for the P6 bus 1 2 3 4 5 6 7 8 9 1
0 11
12
13
14
15
16
BCLK
Arbitrare
Cerere Eroare
Spionare
Răspuns
Transfer
Figura 6-14 Tranzacţii în regim concurent pe magistrala P6
25
Pentium IV –NetBurst Architecture
a 20 stage pipeline architecture double compared with P6
bus frequency is increased 4 times 400MHz, with "quad pump“ technology, 3.2Gbytes/s transfer speed
doubles the speed of the ALU, 2 arithmetical operations are executed in every clock period; the ALU works with a double frequency clock
the use of very high speed cache memory Advanced Transfer Cache, that assures at 2GHz 64Gbytes/s data
transfer extension of the MMX technology
the SSE – Streaming SIMD Extension 144 new SIMD instructions that extend the data width to 128 bits (16
bytes processed in parallel) improvement of branch prediction with aprox. 30%
through the extension of the BTB unit and increasing the instruction queue to 126 instructions
26
Pentium IV
BTB
Decoder
Alias reg alocator
Trace cache
Instr. queues for microoperations
Schedulers
L2 Cache and control
Reg. for „floats” Registers for „integers”
ALU ALU ALU ALU AGU AGUALU-F ALU-F
L1 D-Cache
ROM
The NetBurst Pentium IV architecture
Interface with the external bus
Instruction fetch and decode
Instruction scheduling and
execution
27
Pentium IV
New tendencies: Hyper-threading technology
two threads executed in parallel on the same core
Multi-core technology more processors on the same chip
64 bits architecture