chapter 1 microcomputers and microprocessors microprocessor evolution and performance
TRANSCRIPT
Contents
Introduction to microcomputer systemMicroprocessor evolution
the INTEL processor family
Microprocessor performance
Introduction to Microcomputer
An microcomputer can be interpreted as a machine with: I/O devices for Input/Output, microprocessor for processing, memory units for storage Buses for connecting the above components
In 1970, a microcomputer was normally interpreted as a computer considerably smaller than a mini-computer, possibly using ROM for program storage
Basic hardware units
Input e.g. keyboard, mouse
Microprocessor e.g. 8085, 8086, mc68000 microprocessors
Memory e.g. RAM, hard disk
Output e.g. monitor, printer
Buses
Buses: External connections to input/output unit
Major Buses: Address bus: address of memory locations
containing instructions or data Data bus: contents of memory locations Control Bus: synchronization and handshaking
between components
General Architecture
Inputunit
Microprocessingunit
Outputunit
Secondarymemory
Primarymemory
MemoryUnit
First Generation Computers
Vacuum tube technology Large room, air-conditioned Tube life-time: 3,000 hours
Useless Machine? 1951: 1st Univac I (UNIVersal Automatic
Computer) delivered 1952: Prediction of presidential election by CBS 1952: IBM Model 710 Data Processing System
Second Generation Computers
The Transistor Is Born (Solid-State Era) 1948: invention of bipolar transistors
1956: Nobel physics award: Drs. William Shockley, John Bardeen and Walter H. Brattain (Bell Labs)
1954: Bell Labs: all-transistorized computer (TRADIC)800 transistorsMuch less heatMore reliable and less costly
Second Generation Computers
Mainframe Computers 1958: IBM’s 1st transistorized computer
7070/7090 1959: 1401 (business-oriented model) Built on circuit boards mounted into rack panels,
or frames Main frame (mainframe): the CPU portion of the
computer Popular with business and industry
Third Generation Computers
Invention of IC: 1959 Dr. Robert Noyce (Fairchild) and Jack Kilby (TI) Kilby: fabricating resistors, capacitors and transistors on a
germanium wafer, and connecting these parts with fine gold wires
Noyce: isolating individual components with reverse-biased diodes, and deposing an adherent metal film over the circuit, thus connecting the components
1st IC: 2-transistor multivibrator By mid 1960s: memory chips with 1,000 components are
common
Third Generation Computers
1964: IBM 360 Series (32-bit) The first to use IC technology
A family of 6 compatible computers
40 different I/O and auxiliary storage devices Memory capacity: 16K words to over 1MB. 32-bit registers x 16 24-bit address bus 128-bit data bus
Third Generation Computers
1964: IBM 360 Series (32-bit) 375,000 computations per second
(<< 150 mips Pentium 100)
$5 billion development cost
IBM became the leading mainframe company
Minicomputer
1960s: Space Race between US & USSR IC industry boom A tremendous demand by scientists and engineers for an
inexpensive computer that they could operate by themselves
1965: DEC PDP-8 (by Edson de Castro’s group)Low-cost ($25,000) minicomputer12-bit16-bit PDP-11
Supermini …
Microprocessors: CPU on a Chip
1968: INTEL (Integrated Electronics) Founded by Robert Noyce and Gordon Moore
(Fairchild) Original goals: semiconductor memory market 1969: customized IC’s for Busicom for calculator Ted Hoff and Stan Mazor: proposed 4-bit CPU on
a single chip, plus ROM, RAM chips
Microprocessors: CPU on a Chip
1971: 4000 Family By Fredrico Faggin 4001: 2K ROM with 4-bit I/O port 4002: 320-bit RAM, 4-bit output port 4003: 10-bit serial-in parallel-out shift register 4004: 4-bit processor
Processor-on-a-chip: Micro-processor era
Microprocessors: CPU on a Chip
8-bit CPUs16-bit address (64K)
MC6800: Motorola 6502: MOS Technology (spin-off from Motorola)
Apple-II, Apple DOS
Z-80: Zilog (spin-off from Intel)Z-80 cards on Apple-II, CP/M
Microprocessors: CPU on a Chip
16-bit CPUs (Late 1970s) 8086, 80186, 80286: Intel
PC, PC-DOS, MS-DOS, SCO-Unix
MC68000: Motorola16-bit instructionsHardware multiply and divide20-bit address buses (1MB)Workstations: Sun3
Microprocessors: CPU on a Chip
32-bit CPUs 80386, 80486: Intel MC68020, 68030: Motorola
64-bit CPUs Pentium, Pentium Pro (64-bit external data bus,
32-bit internal registers, not recognized as 64-bit CPUs in terms of internal register word length)
Microcomputers: Computers Based on Microprocessors
1975: MITS Altair 8800 (Kit) $399, i8080, programmed by depositing 1s/0s via front
panel switches
Other Computers boom 8080: MITS, … 6800: SWTPC 6800, … Z-80: TRS-80, … 6502: Apple I, 8K, programmed with BASIC
Steve Jobs & Steve Wozniak, millionaires from PC COM’s …
Personal Computers: the Open Architecture Era
1982: IBM PC A system board (mother board) Intel 8088 processor 16K memory 5 expansion slots
Third-party vendors to supply various IO adapter cardsOpen architectureComputer with interchangeable components
Micro-controllers: Microcomputers on a Chip
Microcontroller: a computer on a chip Microprocessor, plus On-chip memory, plus Input/output ports
1995: microcontrollers out sold microprocessors 10:1 embedded on various equipments:
Thermostat, machine tools, communication, automotive, …
Evolution: getting greater IO capabilities Intel: MCS-51, MCS-96, …
High-Performance Processors
Supercomputers Aircraft design, global climate modeling, oil-
bearing formation, molecular design of new drugs, financial behavior
CDC6600, 7600: Seymour Cray Cray-1: 1976, the first true supercomputer
ECL, 128 KW power consumption130 MFLOPS (Pentium 100: 150 MFLOPS)$5.1 million
High-Performance Processors
Parallel Processors Tens of gigaflops Multi-processors wired by a common bus Each is given a portion of the problem to solve Hypercube: early 1980s
Cosmic Cube, iPSC (with i860/RISC chips)
2D rectangular Mesh architecture: multiple processor at each node
Intel: teraflops computer with 4500 nodes, each powered by 2 Pentium Pro 200.
RISC vs. CISC
RISC: Reduced Instruction Set Computer (1980s) A small number of fixed-length instructions Simple addressing modes A large number of registers Instructions executed in one clock cycle
Intel i860 (“Cray on a Chip”) 82 instructions, 32-bit long each Four addressing modes 32 general-purpose registers
RISC vs. CISC
CISC: Complex Instruction Set Computer A large number of variable length instructions Multiple addressing modes A small number of registers Multiple number of clock cycles to execute
Intel 8086 Over 3000 instruction forms, 1-6 bytes 9 addressing modes 8 general-purpose registers Execution from 2 to 80+ cycles
RISC vs. CISC
RISC Control unit is much simpler (simpler instructions,
execution in 1 CLK) Faster execution with less total on-chip logic Chip area: 10% (vs 50% for CISC) More area for register file, data and instruction
caches, FPU, and co-processor PowerPC: 32-bit, by IBM, Apple, Motorola Sparc: for SunMicro workstations
Application-Specific Processors
DSP Chips Mostly for analog signal processing ADC-DSP-DAC architecture Avoid processing analog signals using discrete
circuits, involving capacitors and inductance DSP: conduct complex mathematic functions
Digital filter, spectrum analysis
Application-Specific Processors
DSP Chip Architecture Different data/program areas: Harvard Architecture Hardware multipliers and adders, optimized to execute on
a single cycle Arithmetic pipelining: several instructions operated at once Hardware loop control Multiple IO ports for communication with other processors
Summary of Processor History
1940s: Vacuum tube, large and consuming large power
1950s: Transistor (1948-)
1959: First IC (second industrial revolution)
1960s: IC was popular to build CPU’s.
1971: Intel 4004 microprocessor (2300 transistors)
Starts of the microprocessor age
Late 1970’s: 8080/85
Summary of Processor History
1980: RISC (reduced instruction set computer)
CISC (complicated instruction set computer) vs. RISC
CISC family: Intel 80x86, Pentium; Motorola 68000 series
All others are RISC series.
INTEL
Integrated Electronics 1968: founded by Robert Noyce and Gordon
Moore IA: Intel Architecture (e.g, IA-16, IA-32, IA-64)
since 8008 (’72) had became the de facto standardEvolution:
Internal register sizesExternal bus widthsReal, Protected, and Virtual 8086 modes
4-bit Processors
4004 first microprocessor became available in 1971 4-bit microprocessor:
4-bit registers & 4-bit data bus#transistors: 2250Min. feature size: 10 micronsAddress bus: 10 bits/1K0.06 MIPS (@ 0.108 MHz)No internal cache
8086: IA standard
Became available in 1978 16-bit data bus 20-bit address bus (was 16-bit for 8080) memory organization: 16 segments of 64KB (1 MB limit)
Re-organize CPU into BIU (bus interface unit) and EU (execution unit) Allow fetch and execution simultaneously
Internal register expanded to 16-bit Allow access of low/high byte separately
8086
Hardware multiply and divide instructionsExternal math co-processorInstruction set compatible with 8080/80858086: defined the 80x86 architecture
8086
Not quite successful 16-bit data bus: Requires two separate 8-bit memory
banks Memory chips were expensive
8088: PC standard
Became available in 1979, almost identical to 80868-bit data bus: for hardware compatibility with 808016-bit internal registers and data bus (same as 8086)20-bit address bus (was 16-bit for 8080)
BIU re-designedmemory organization: 16 segments of 64KB (1 MB limit)
Two memory accesses for 16-bit data (less efficient) But less cost
8088: used by IBM PC (1982), 16K-64K, 4.77MHz
80186, 80188: High Integration CPU
PC system: 8088 CPU + various supporting chips
Clock generator8251: serial IO (RS232)8253: timer/counter8255: PPI (programmable periphial interface)8257: DMA controller8259: interrupt controller
80186/80188: 8086/8088 + supporting functions Compatible instruction set (+ 9 new instructions)
80286
Became available in 1982used in IBM AT computer (1984)16-bit data busclock speed 25% faster than 8088, throughput
5 times greater than 808824-bit address bus (16 MB) (vs. 20-bit/1M
8086)
80286: Real vs. Protected Modes
Larger address space: 24-bit address bus Real Mode vs. Protected Mode
Real Mode: Power on default mode Function like a 8086: use 20-bit least significant address
lines (1M) Software compatible with 286 16 new instructions (for Protected Mode management) Faster 286: redesigned processor, plus higher clock rate (6-
8MHz)
80286: Real vs. Protected Modes
Protected Mode: Multi-program environment Each program has a predetermined amount of
memory Addressed via segment selector (physical
addresses invisible): 16M addressable Multiple programs loaded at once (within their
respective segments), protected from read/write by each other
80286: Real vs. Protected Modes
Protected Mode: Cannot be switch back to real mode to avoid
illegal access by switching back and forth between modes
A faster 8086 only? MS-DOS requires that all programs be run in Real
Mode
Clock Speed
Electrical signals cannot change instantaneously (transition period required)
System clock provides timing signal for synchronization
Cannot be used to compare the performance of microprocessors with different instruction sets e.g., a 66 MHz Pentium is twice as fast as a 66 MHz
80486
80386DX (aka. 80386)
available in 1985, a major redesign of 86/286 Compatibility commitment through 2000
32-bit data and address buses (4 GB memory) Real Address Mode: 1M visible, 286 real mode Protected Virtual Address Mode:
On board MMUSegmented tasks of 1byte to 4G bytes
• Segment base, limit, attributes defined by a descriptor register
Page swapping: 4K pages, up to 64TB virtual memory spaceWindows, OS/2, Unix/Linux
80386DX (aka. 80386)
Virtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machines-multitasking (similar to real mode) Windows (multiple MSDOS’s)
Clock rate: max. 40MHz, 2 pulses per R/W bus cycle External memory cache to avoid wait
Fast SRAM93% hit rate with 64K cache
Compatible instructions (14 new)
80486DX
1989: a polished 386, 6 new OS level instructionsvirtually identical to 386 in terms of compatibilityRISC design concepts
fewer clock cycles per operation, a single clock cycle for most frequently used instructions
Max 50MHz 5 stage execution pipeline
Portions of 5 instructions execute at once
80486DX
Highly Integrated: On board 8K memory cache FPP (equivalent to external 80387 co-processor)
Twice as fast as 386 at any given clock rate 20Mhz 486 ~= 40Mhz 386
80486SX
80486SX NOT a 16-bit version for transition purpose no coprocessor No internal cache For low-end applications Max. 33Mhz only
80486DX2/DX4: Overdrive Chips
Processor speed increased too fast Redesign of microcomputer for compatibility
becomes harder Solution: Separating internal speed with external
speed, improve performance independently
80486DX2/DX4 – internal clock twice/three times (NOT four times) the external clock: runs faster internally
80486DX2/DX4: Overdrive Chips
System board design is independent of processor upgrade (less expensive components are allowed)
Processor operate at maximum speed data rate internally Only slow access to external data operates at system board rate Internal cache offset the speed gap
486DX2 66: 66 internal, 33 external486DX4 100: 100 internal, 33 external (3x)Overdrive sockets: for upgrading 486dx/sx to
486dx2/dx4 (with overdrive socket pin-outs)
Pentium: Superscaler Processor
available in 199232-bit architectureSuperscaler architecture
Scaling: scaling down etchable feature size to increase complexity of IC (e.g., DRAM)
10 microns/4004 to 0.13 microns (2001) Superscaler: go beyond simply scaling down Two instruction pipelines: each with own ALU, address
generation circuitry, data cache interface Execute two different instructions simultaneously
Pentium: Superscaler Processor
Onboard cache Separate 8K data and code caches to avoid access
conflictsFPPInstruction pipeline: 8 stageOptimized floating point functions
5x-10x FLOP’s of 486 2x performance of 486 at any clock rate
Pentium: Superscaler Processor
Compatibility with 386/486: Internal 32-bit registers and address bus Data bus expanded to 64-bits for higher data
transfer rateCompare 8088 to 386sx transition
Pentium: Superscaler Processor
non-clone competition from AMD, Cyrixdevelopment of brand identity by Intel
Pentium Pro: Two Chips in One
Became available in 1995Superscaler of degree 3
Can execute 3 instructions simultaneously
Optimized for 32-bit operating systems (e.g., Windows NT, OS2/Warp)
Two separate silicon die on the same package Processor: 0.35 u, 5.5 million transistors 256KB(/512K) Level 2 cache included on chip, 15.5
million transistors in smaller area
Pentium Pro: Two Chips in One
On Board Level 2 cache Simplifies system board design Requires less space Gains faster communication with processor
Internal (level 1) cache: 8KPentium Pro 133 ~= 2x Pentium 66 ~= 4x
486DX2 66
Pentium Pro:Dynamic Execution
Dynamic execution: reduce idle processor time by predicting instruction behaviors Multiple Branch Prediction: look as far as 30 instructions
ahead to anticipate program branches Data Flow Analysis: looks at upcoming instructions and
determine if they are available for processing, depending on other instructions. Determine optimal execution sequences.
Speculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.
Moore's Law
In 1965, Gordon Moore predicted that:
“The number of transistors per integrated circuit would double every 18 months”
He forecast that this trend would continue through 1975
Other Microprocessors
Motorola family from 6809 (Apple II) through 68040
PowerPC joint venture between Apple, IBM, and Motorola
RISC Processors DEC Alpha, MIPS, Sun SPARC, etc.
CISC vs. RISC
CISC (Complex Instruction Set Computer) CISC processors have a large versatile instruction
set that supports many complex addressing modes move complexity from software to hardware
RISC (Reduced Instruction Set Computer) RISC processors have a small instruction set move complexity from hardware to software
Microprocessor Performance
Two main factors:
Respond time the time between the start and completion of a
task, also referred to as execution time
Throughput the total amount of work done in a given time
MIPS
Million Instructions Per Second MIPS = (Instruction count) / (Execution time in micro
second X 106)
It specifies performance inversely to execution time
Faster machines have a higher MIPS rating
Some Problems of MIPS
Cannot compare computers with different instruction sets, since the instruction count will certainly differ
MIPS varies between programs on the same computer
iCOMP
An index provided by Intel for comparison of performance of their 32-bit microprocessors
Based on a variety of performance components that represent integer mathematics, graphics, etc.
Combine results of a set of software application benchmarks
Chapter 2Computer Codes, Programming, and Operating Systems
Number SystemsComputer CodesProgrammingOperating Systems
Base Conversion: 210
Binary to Decimal D = i=0,n-1 bi x 2i
Decimal to Binary Repeated subtraction
D’ = i=0,m-1 bi x 2i = D - 2m (bm=1)
D <= D’ & m <= m’ (m’: max exp. s.t. (bm’=1)
Long divisionD’ = D/2 … bi & D <= D’
MCS-51 Program DevelopmentMCS-51 Program Development
EditorEditor AssemblerAssembler LinkerLinker
SymbolConverter
SymbolConverter ICEICE
TargetTarget
Program
.ASM .OBJ.HEX
.SYM
.SDT
(X8051) (Link)
(CVTSYM)
8086: IA standard
Became available in 1978 16-bit data bus 20-bit address bus (was 16-bit for 8080) memory organization: 16 segments of 64KB (1 MB limit)
Re-organize CPU into BIU (bus interface unit) and EU (execution unit) Allow fetch and execution simultaneously
Internal register expanded to 16-bit Allow access of low/high byte separately
8088: PC standard
Became available in 1979, almost identical to 80868-bit data bus: for hardware compatibility with 808016-bit internal registers and data bus (same as 8086)20-bit address bus (was 16-bit for 8080)
BIU re-designedmemory organization: 16 segments of 64KB (1 MB limit)
Two memory accesses for 16-bit data (less efficient) But less cost
8088: used by IBM PC (1982), 16K-64K, 4.77MHz
80186, 80188: High Integration CPU
PC system: 8088 CPU + various supporting chips
Clock generator8251: serial IO (RS232)8253: timer/counter8255: PPI (programmable periphial interface)8257: DMA controller8259: interrupt controller
80186/80188: 8086/8088 + supporting functions Compatible instruction set (+ 9 new instructions)
8086 Processor Model: BIU+EU
BIU Memory & IO address generation
EU Receive codes and data from BIU
Not connected to system buses
Execute instructions Save results in registers, or pass to BIU to memory
and IO
83
8086 Processor Model
BH BLAH AL
DH DLCH CL
BPDISISP
ALU
Flags
CSESSSDSIP
Address Generationand Bus Control
Instruction Queue
EU BIU
Fetch and Execution Cycle
BIU+EU allows the fetch and execution cycle to overlap 0. System boot, Instruction Queue is empty 1. IP =>BIU=> address bus && IP++ 2. Mem[(IP-1)] => Instruction Queue[tail++] 3a. InstrQ[head] => EU => execution 3b. Mem[IP++] => InstrQ[tail++]
Maybe multiple instructions
Repeat 3a+3b (overlapped)
Waiting Conditions: Memory Access
BIU+EU: execute (almost) continuously without waiting
Waiting Conditions: Accessing memory locations not in queue BIU suspend instruction fetch Issues external memory address Resumes instruction fetch and execution
Waiting Conditions: Jump
Next Jump Instruction Instructions in queue are discarded EU wait for the next instruction after the jump
location to be fetched by BIU Resume execution
Waiting Conditions: Long Instructions
Long Instruction is being executed Instruction Full BIU waits Resume instruction fetch after EU pull one or tow
bytes from queue
BIU: 8088 vs. 8086
BIU is the major difference8088:
data bus: 8-bit (vs. 16-bit/8086) Instruction queue: 4 bytes (vs. 6-byte/8086)
Only 30% slower than 8086 If queue is kept full
8086 Programming Model
Data Group: AX (AH+AL): Accumulator BX (BH+BL): Base CX (CH+CL): Counter DX (DH+DL): Data
8086 Programming Model
Segment Group: CS: Code Segment DS: Data Segment ES: Extra Segment SS: Stack Segment
Segment Registers: Base address to particular segments
8086 Programming Model
Pointer/Index Group: IP: Instruction Pointer CS SI: Source IndexDS DI: Destination IndexES SP: Stack PointerSS
Index Registers: Index (offset) or Pointer to a Base address
8086 8086 Flag WordFlag Word
Flag L :
SF ZF X AF X PF X CF
CF: Carry FlagCF= 0 : No Carry (Add) or Borrow (SUB)
CF= 1 : high-order bit Carry/Borrow
AF: Aux. Carry: Carry/Borrow on bit 3 (Low nibble of AL)
SF: Sign Flag: (0: positive, 1: negative)
ZF: Zero Flag: (1: result is zero)
PF: (Even) Parity Flag (even number of 1’s in low-order 8 bits of result)
8086 8086 Flag WordFlag Word
Flag H :
X X X X OF DF IF TF
TF: Trap flag (single-step after next instruction; clear by single-step interrupt)
IF: Interrupt-Enable: enable maskable interrupts
DF: Direction flag: auto-decrement (1) or increment(0) index on string operations
OF: Overflow: signed result cannot be expressed within #bits in destination operand
Segmented Memory
Linear vs. Segmented Linear Addressing:
The entire memory is regarded as a wholethe entire memory space is available all the time
Segmented:memory is divided into segmentsProcess is limited to access designated segments at a
given time
8086 Memory Organization
Even and Odd Memory Banks 16-bit data bustwo-byte / two one-byte access Allows processor to work on bytes or on words
(16-bit)IO operations are normally conducted in bytes
Can handle odd-length instructionsSingle byte instructionsMultiple byte (and very long) instructions
8086 Memory Organization
Memory Space: 20-bit address bus Linearly, 1M bytes directly addressable
Memory Banks Can read 16-bit data (512K words) from even and
odd-addressed simultaneouslyneed Two memory banks in parallelBHE control line: allows addressing even/odd banks
or both
Memory Organization: Alignment
Endianess: One way to model multi-byte CPU register
AX AH+AL Two ways to store operands in memory
Big-endian CPU: (IBM370, M68*, Sparc) High-order-byte-first (HOBF) Maps highest-order byte of internal registerlowest (1st)
memory byte address Operand addressaddress of MSB
MOV R1, N N: 1st byte in memory & MSB of register
Memory Organization: Alignment
Little-endian CPU: (DEC, Intel) Low-order-byte-first (LOBF) Maps lowest-order byte of register 1st memory byte Operand address address of LSB (1st memory byte)
MOV AX, N N: 1st byte in memory & LSB of registerALN, AHN+1
Configurable: Can switch between Big/Little-endian, or Provide instructions which convert 16-/32-bit data between
two byte ordering (80486)
8086 Memory Organization
Aligned operand Operand aligned at even-byte (word/dword) boundaries Allows single access to read/write one operand
Through internal shift/swap mechanism, if necessary
Mis-aligned words: Word operand not start at even address Need 2 read cycles to read/write the word (8086)
Issues two addresses to access the two even-aligned words containing the operand in order to access the operand
slower but transparent to programmer
8086 Memory Organization
8088 always 2 cycles for word operations
Aligned or not
Because of 8-bit external data busSingle memory bank is sufficient
8086 Memory Map
Memory Map: How memory space is allocated ROM Area: boot, BIOS RAM: OS/User Apps & data Unused Reserved: for future hardware/software uses Dedicated: for specific system interrupt and rest
functions, etc.
Logical and Physical Addresses
Physical: 20-bitLogical: 16-bit
16-byte segment boundaries
Address Translation E.g., CS:IP
80286
Became available in 1982used in IBM AT computer (1984)16-bit data busclock speed 25% faster than 8088, throughput
5 times greater than 808824-bit address bus (16 MB) (vs. 20-bit/1M
8086)
80286: Real vs. Protected Modes
Larger address space: 24-bit address bus Real Mode vs. Protected Mode
Real Mode: Power on default mode Function like a 8086: use 20-bit least significant address
lines (1M) Software compatible with 286 16 new instructions (for Protected Mode management) Faster 286: redesigned processor, plus higher clock rate (6-
8MHz)
80286: Real vs. Protected Modes
Protected Mode: Multi-program environment Each program has a predetermined amount of
memory Addressed via segment selector (physical
addresses invisible): 16M addressable Multiple programs loaded at once (within their
respective segments), protected from read/write by each other
80286: Real vs. Protected Modes
Protected Mode: Cannot be switch back to real mode to avoid
illegal access by switching back and forth between modes
A faster 8086 only? MS-DOS requires that all programs be run in Real
Mode
80386DX (aka. 80386)
available in 1985, a major redesign of 86/286 Compatibility commitment through 2000
32-bit data and address buses (4 GB memory) Real Address Mode: 1M visible, 286 real mode Protected Virtual Address Mode:
On board MMUSegmented tasks of 1byte to 4G bytes
• Segment base, limit, attributes defined by a descriptor register
Page swapping: 4K pages, up to 64TB virtual memory spaceWindows, OS/2, Unix/Linux
80386DX (aka. 80386)
Virtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machines-multitasking (similar to real mode) Windows (multiple MSDOS’s)
Clock rate: max. 40MHz, 2 pulses per R/W bus cycle External memory cache to avoid wait
Fast SRAM93% hit rate with 64K cache
Compatible instructions (14 new)
80386: Real vs. Protected Modes
Larger address space: 32-bit address bus (4G) Real Mode vs. Protected Mode (refined from 286)
Real Mode: Power on default mode Function like a 8086: (1) use only 20-bit least significant
address lines (1M) (2) segmented memory retained (64K) Software compatible with 286
New Real Mode Features: access to 32-bit register set two new segments: F, G
80386: Real vs. Protected Modes
Protected Mode: new addressing mechanism vs. real mode supports protection levels segment size: 1 to 4G (not 64K, fixed) segment register: pointer to a descriptor table
not base address
80386: Real vs. Protected Modes
Protected Mode: descriptor table: (8 byte per entry)
32-bit base address of segmentsegment sizeaccess rights
memory address = base address (in table) + offset (in instruction)
80386: Real vs. Protected Modes
Protected Mode: Paging mechanism:
map 32-bit linear address (base+offset) =>physical address & page frame address
(4K page frames in system memory)64TB of virtual memory
80386: Real vs. Protected Modes
Protected Mode: Protection mechanism:
tasks/data/instructions are assigned a privilege level (PL)
tasks running at lower PL cannot access tasks or data segments at a higher PL
running programs that are protected from the others
80386: Real vs. Protected Modes
Two Ways to Run 8086 Programs: Real Mode Virtual 8086 Mode
Virtual 8086 Mode: runs multiple 8086+other 386 (protected mode) programs
independently each sees 1 MB (mapped via paging to anywhere in 4GB
space) running V8086+ Protected mode simultaneously
80386 Processor Model: BIU+CPU+MMU
BIU control 32-bit address and data buses keep instruction queue full (16 bytes)
Address pipelining address of next memory location is output halfway through
current bus cycle more address decode time slower memory chip is OK easier to keep up with faster (2 CLK) bus cycle of 386
80386 Processor Model: BIU
dynamic data bus sizing switch between 16-/32-bit data bus on the fly accommodate to external 16-bit memory cards or
IO devices adjust bus timing to use only the least significant
16 bits
80386 Processor Model: BIU
External memory 4 memory banks (4x8=32bits) BE0-BE3 for bank selection access byte or word or double word
aligned operands: 1 bus cyclemis-aligned (not %4): 2 bus cycles
80386 Processor Model: CPU
CPU=IU (instruction) +EU (execution) fetching & execution overlap
IU: retrieval instructions from queue decode store in decoded queue
EU:ALU+registers (32-bit) execute decode instructions
80386 Processor Model: MMU
Segmentation unit Real mode: generate the 20-bit physical address Protected mode: store base/size/rights in descriptor
registerscache descriptor tables in RAMfaster operations
Paging Unit determines physical addresses associated with active
segments (divided into 4K pages) virtual memory support to allow larger programs
80386 Programming Model
General Purpose Registers Data & Addresses Groups Status & Control Flags
VM, RF, NT, IOPL
Segment Group
80386 Programming Model
Memory Management segment descriptors
keep base, size, access rights3 types of tables: global (GDT), local (LDT), interrupt
(IDT)addressing:
• index (to a table) + RPL• base + offset (from instruction)
PagingTLB
80386 Programming Model
Protection (PL) task: CPL instruction: RPL data segment: DPL
Gates special descriptors that allows access to higher PL
tasks from lower PL tasks
80486DX
1989: a polished 386, 6 new OS level instructionsvirtually identical to 386 in terms of compatibilityRISC design concepts
fewer clock cycles per operation, a single clock cycle for most frequently used instructions
Max 50MHz 5 stage execution pipeline
Portions of 5 instructions execute at once
80486DX
Highly Integrated: On board 8K memory cache FPP (equivalent to external 80387 co-processor)
Twice as fast as 386 at any given clock rate 20Mhz 486 ~= 40Mhz 386
80486SX
80486SX NOT a 16-bit version for transition purpose no coprocessor No internal cache For low-end applications Max. 33Mhz only
80486DX2/DX4: Overdrive Chips
Processor speed increased too fast Redesign of microcomputer for compatibility
becomes harder Solution: Separating internal speed with external
speed, improve performance independently
80486DX2/DX4 – internal clock twice/three times (NOT four times) the external clock: runs faster internally
80486DX2/DX4: Overdrive Chips
System board design is independent of processor upgrade (less expensive components are allowed)
Processor operate at maximum speed data rate internally Only slow access to external data operates at system board rate Internal cache offset the speed gap
486DX2 66: 66 internal, 33 external486DX4 100: 100 internal, 33 external (3x)Overdrive sockets: for upgrading 486dx/sx to
486dx2/dx4 (with overdrive socket pin-outs)
486 Processor Features
386 features: Real/Protected Modes Memory Management PL’s registers & bus sizes
New features 6 OS instructions 8K/16K onboard cache (was external before 386)
486 Processor Features
A better 386 5 stage instruction pipeline
IF/ID/EX => PF/D1/D2/EX/WBPF: instructions => Q (2*16-bytes)D1: determine opcodeD2: determine memory address of operandsEX: execute indicated OPWB: update register
486 Processor Features
Reduced Instruction Cycle Times 5 stage instruction pipeline (e.g., Fig. 3.18) instruction cycle times:
8086: 4 CLK80386: 2 CLK80486: 1 CLK (close to RISC)about 2X faster than 386
486 Processor Model: 386+FPU+Cache
386 units retained: BIU, CPU, MMUnew: FPU (80387) + Cache (8K/16K)FPU:
387 onboard0.8 u => #transistors increased (275K => 1+ millions)simplified system board designspeedup FP operations
486 Processor Model: Cache
Cache (8K/16K (dx4)) Function: bridge processor memory bandwidth
8088: 4.77MHz80486: 50MHzPentium: 100MHzPentium Pro: 133 MHzMain Memory (DRAM): relatively slow
Fast Static RAMs (SRAM) as cache
486 Processor Model: Cache
Organization: 8K 4-way set associative
4 direct mapped caches wired in paralleleach block maps to a set of 4 lines
unified: data & code in the same cache write-through: update cache and memory page on
write operations
486 Processor Model: Cache
locality (why caches help?) spatial locality: e.g., array of data temporal: e.g., loops in codes
operations on hit/miss128-bit cache lines
32-bit x N to catch locality (N=4) 128-bit = 16-byte
486 Processor Model: Cache
Mapping: memory => many-to-many => cache Data RAM: save memory data Tag RAM: save memory address information
3 methods of mapping fully associative: memory block to any cache line direct map: memory block to specific line
trashing
set associative: memory block to a set of cache lines
486 Processor Model: Cache
Replacement policy (LRU) valid bits: all 4 lines in use ?
NO => use any unused lineYES => find one to replace
LRU bits: which is least recently used
Pentium: Superscaler Processor
available in 199232-bit architectureSuperscaler architecture
Scaling: scaling down etchable feature size to increase complexity of IC (e.g., DRAM)
10 microns/4004 to 0.13 microns (2001) Superscaler: go beyond simply scaling down Two instruction pipelines: each with own ALU, address
generation circuitry, data cache interface Execute two different instructions simultaneously
Pentium: Superscaler Processor
Onboard cache Separate 8K data and code caches to avoid access
conflictsFPPInstruction pipeline: 8 stageOptimized floating point functions
5x-10x FLOP’s of 486 2x performance of 486 at any clock rate
Pentium: Superscaler Processor
Compatibility with 386/486: Internal 32-bit registers and address bus Data bus expanded to 64-bits for higher data
transfer rateCompare 8088 to 386sx transition
Pentium: Superscaler Processor
non-clone competition from AMD, Cyrixdevelopment of brand identity by Intel
Pentium Pro: Two Chips in One
Became available in 1995Superscaler of degree 3
Can execute 3 instructions simultaneously
Optimized for 32-bit operating systems (e.g., Windows NT, OS2/Warp)
Two separate silicon die on the same package Processor: 0.35 u, 5.5 million transistors 256KB(/512K) Level 2 cache included on chip, 15.5
million transistors in smaller area
Pentium Pro: Two Chips in One
On Board Level 2 cache Simplifies system board design Requires less space Gains faster communication with processor
Internal (level 1) cache: 8KPentium Pro 133 ~= 2x Pentium 66 ~= 4x
486DX2 66
Pentium Pro:Dynamic Execution
Dynamic execution: reduce idle processor time by predicting instruction behaviors Multiple Branch Prediction: look as far as 30 instructions
ahead to anticipate program branches Data Flow Analysis: looks at upcoming instructions and
determine if they are available for processing, depending on other instructions. Determine optimal execution sequences.
Speculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.