hardware fundamentals(1)

Upload: nandhalaalaa

Post on 03-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Hardware Fundamentals(1)

    1/44

    Some Hardware Fundamentals and an Introduction to

    Software

    In order to comprehend fully the function of system software, it is vital to understand theoperation of computer hardware and peripherals. The reason for this is that software and

    hardware are inextricably connected in a symbiotic relationship. First, however, we need

    to identify types of software and their relationship to each other and, ultimately to the

    hardware.

    Figure 1 Software Hierarchy

    Figure 1 represents the relationship between the various types of software and hardware.

    The figure appears as an inverted pyramid to reflect the relative size and number of the

    various types of software, on one hand, and their proximity to computer hardware, on the

    other.

    First, application software is remote from, and rarely interacts with, the computers

    hardware. This is particularly true of applications that run on modern operating systems

    such as indows !T ".#, $###, %& and '!I% variants. (y $###, with the advent of

    the .!)T and *ava paradigms, applications became even further removed from hardware,

    Tom (utler1

    Application Software

    System Software

    Firmware

    Hardware

    Computer Logic Circuits

  • 8/12/2019 Hardware Fundamentals(1)

    2/44

    as .!)Ts +ommon anguage -untime +-/ and the *ava 0irtual achine *0/

    provide operating system and hardware access.

    Indeed, from the operating systems perspective, the *0 and the +- are merely

    applications. 2lder operating systems such as 34523, permitted some direct interactionbetween applications, chiefly computer games6 however, this meant that vendors of such

    applications had to write code that would interact with the computers (I23 (asic

    Input72utput 3ystem or firmware based on the computers read only memory or -2

    integrated circuits/. Indeed, when computers first appeared on the mar8et this practice

    was the norm, rather than the exception. 9pplication programmers soon tired of

    reinventing the wheel every time they wrote an application, as they would have to include

    software routines that helped the application software communicate and control hardware

    devices, including the +&' central processing unit/. In order to overcome this, computer

    scientists focused on developing a new type of software:operating or system software:

    whose sole purpose was to provide an environment or interface for applications such that

    the burden of managing and communicating the computer hardware was removed from

    application programs. This proved important as technological advances resulted in

    computer hardware becoming more sophisticated and difficult to manage. Thus,

    operating systems were developed to manage a computers hardware resources and

    provide an application programming interface, as well as a user or administrator

    interface, to permit access to the hardware for use and configuration by application

    software programmers and systems administrators.

    In early computer systems, a boot strap code, that was either loaded into the system

    manually via switches, and7or pre4coded punched cards or teletype tape, was re;uired to

    load the operating system program and boot the system so that an application program

    could be loaded and run. The advent of read only memory -2/ in the 1owever, firmware came into its own in microprocessors

    Tom (utler$

  • 8/12/2019 Hardware Fundamentals(1)

    3/44

    systems and, later, personal computers. (y the turn of the new millennium, entire

    operating systems, such as indows !T and inux, appeared in the firmware of

    embedded systems. The most recent advances in this area have been in the &59 or

    &oc8et &+ mar8et, where &alm 23 and icrosoft +) are competing for dominance. That

    said, while almost every type of electronic device possesses firmware of one form or

    other, the most prevalent appears in personal computers &+s/. i8ewise, &+s dominate

    the computer mar8et due to their presence in all areas of human activity. >ence,

    understanding &+ hardware has become a sine ;ua non for all who call themselves IT

    professionals. The remainder of this chapter therefore focuses on delineating the basic

    architecture of todays &+.

    A Brief Look Under the Hood of Todays PCThis section provides a brief examination of the ma?or components of the &+.

    The Power Supply

    The most oft ignored of the &+s component is the system power supply. ost household

    electrical appliances operate on alternating current 9+/ 11# 0olt @# >z 9+, e.g. '39/

    or $$# 0olt A# >z 9+, )urope/. >owever, electronic subassemblies or entire devices

    with embedded logic circuitry, whether microprocessor4based or not, operate exclusively

    on direct current 5+/. The ?ob of a &+s power supply is to transform and rectify the

    external 9+ commercial supplies to a range of 5+ voltages re;uired by the computer

    logic, associated electronic components, the 5+ motors in the hard dis8 drives, floppy,

    +5B-2, and 505 drives and the system fans. Typical 5+ power supplies in a &+ are

    rated at 1.A, C.C, A, 4A, 1$, 41$ volts. 9lso note that as !oteboo8 and aptop computers

    have a rechargeable 5+ battery, it re;uires special 5+45+ converters to generate the

    re;uired range of 5+ voltages. 3everal colour4designated cables emanate from a

    computers power supply unit, the largest of which is connected to the computers maincircuit board, called the motherboard. The various 5+ voltages are distributed via the

    power supply rails printed onto the circuit board.

    Tom (utlerC

  • 8/12/2019 Hardware Fundamentals(1)

    4/44

    The Basic Input/Output Operating System

    The (asic Input72utput 3ystem (I23/ is system software and is a collection of

    hardware4related software routines embedded as firmware on a read4only memory

    -2/ integrated circuit I+/ Dchip which is typically housed on a computers

    motherboard. 'sually referred to as -2 (I23, this software component provides the

    most fundamental means for the operating system to communicate with the hardware.

    >owever, most (I23s are 1@ bit programs and must operate in real mode 1on machines

    with Intel processors. hile this does not cause performance problems during the boot4

    up phase, it means a degradation in &+ performance as the +&' switches from protected

    to real mode when (I23 routines are referenced by an operating system. C$ bit (I23s

    are presently in use, but are not widespread. odern C$ bit operating systems such as

    I!'% do not use the (I23 after bootup, as the designers of I!'% integrated C$ bit

    versions of the (I23 routines into the I!'% 8ernel. >ence, the limitations of real mode

    switches in the +&' are avoided. !evertheless, the (I23 plays a critical role during the

    boot4up phase, as it performs the power4on self test &23T/ for the computer and then

    loads the boot code from the hard dis8s master boot record (-/, which in turn copies

    the system software into -9 and loads it into the +&'.

    hen a computer is first turned on, 5+ voltages are applied to the +&' and associated

    electrical and logic circuits. This would lead to electronic mayhem if the +&' did not

    assert control. >owever, a +&' is merely a collection of hundreds of thousands and now

    millions/ of logic circuits. +&' designers therefore built in a predetermined se;uence of

    programmed electronic events, which are triggered when a signal appears on the +&'s

    reset pin. This has the +&'s control unit use the memory address in the instruction

    counter I+/ register to fetch the first instruction to be executed. The C$ bit value placed

    in the I+ is the address of the first byte of the final @" E( segment in the first 1 ( of

    the computers address space this is a hangover from the early days of the &+ when the

    last C" E( of the first 1 ( of -9 was reserved for the system and peripheral (I23

    routines, each of which were @" E( in length/. This is the address of first of the many 1@

    11@ bit applications operate in real mode on all Intel +&'s. This effectively limits the address space to 1

    (, by using 1@ x @" E( program segments. )ach 1@ bit application can only address @" E( $16G @A,AC@

    locations/, however, the +&' manages and uses an extra " bit7address lines to provide the (I23, 23 and

    applications with 1@ $4G 1@/ segment addresses.

    Tom (utler"

  • 8/12/2019 Hardware Fundamentals(1)

    5/44

  • 8/12/2019 Hardware Fundamentals(1)

    6/44

  • 8/12/2019 Hardware Fundamentals(1)

    7/44

    Figure The Intel !"# $hipset

    The %other&oard and the $hipset

    The motherboard or system board houses all system components, from the +&', -9,

    expansion slots ).J. I39 and &+I/, to the I72 controllers. >owever, the 8ey component

    on a motherboard is the chipset. hile motherboards are identified physically by their

    form factor, the chipset designation indicates the capability of the motherboard to house

    system components. The most popular form factor is I(s 9T%. This motherboard was

    designed by I( to increase air movement for cooling on4board components, and allow

    easier access to the +&' and -9. hile the motherboard contains many chips or I+s,

    such as the +&', -9, (I23, and a variety of smaller chips, two chips now handle most

    of the I72 functionality of a &+. The first is the !orthbridge chip, which handles all

    communication address, data and control/ to the +&', -9, 9ccelerated Jraphics &ort

    Tom (utler=

  • 8/12/2019 Hardware Fundamentals(1)

    8/44

    and &+I devices. The frontside system bus F3(/ terminates on the !orthbridge chip and

    permits the +&' to access the -9, 9J& and &+I devices and those serviced by the

    3outhbridge chip and vice versa/. The 3outhbridge chip permits communication with

    slow peripherals such as the floppy dis8 drive, the hard dis8 drive7+54-23, I39

    devices, and the parallel, serial, mouse, 8eyboard ports Flash -2 (I23.

    Figure ' The Intel !#( $hipset

    Intel and 0I9 are the leaders in chipset manufacture as of $##$, although there areseveral other manufacturers:9li and 3i3. hile Intel services its own +&'s, 0I9

    manufactures for both Intel and its ma?or competitor 95. In $##$, the basic Intel iA#

    chipset consisted of the $A# !orthbridge +> emory +ontroller >ub/ and a I+>$

    I72 +ontroller >ub/ 3outhbridge. The chipset also contains a Firmware >ub F>/ that

    provides access to the Flash -2 (I23. This permits up to "J( of -9 with )++

    Tom (utler

  • 8/12/2019 Hardware Fundamentals(1)

    9/44

    error correction/, "%9J& ode, " 'ltra 9T9 1## I5) dis8 drives, and four '3( ports.

    I39 is not supported. 5ifferent chipset designs support different -9 types and speeds

    e.g. 55- 35-9 or -9(us 5-9/, +&' types and pac8aging, system bus speeds,

    and so on.

    In $###, Intel announced that the future of -9 in the &+ industry was -9(us 5ram

    -5-9/. This heralded the release of the Intel $# D+amino chipset, which supported

    three -9(us memory slots. >owever, errors in the design meant that only two memory

    slots could be used. 9 loss of confidence in the mar8etplace meant that withdrawal of the

    ill4fated +amino and its replacement with the Intel "# D+armel chipset. This includes a

    @" bit &+I controller, a redesigned and improved -5-9 memory repeater, and an

    35-9 memory repeater that converts the -5-9 protocol to 35-9. This was a

    smart move by Intel, which bac8fired terribly as the 35-9 hub had design errors that

    limited the limited the number of 35-9s that could be used. In addition, the -5-9

    to 35-9 conversion protocol impaired overall memory throughput when using

    35-9. +onse;uently, faster memory performance on Intels &entium III +oppermine

    +&'s with an 1CC hz Frontside (us could only be achieved using 0I9s 9pollo &ro

    1CC 9. To ma8e matters worse, the Intel 1A 3olano chipset, which was introduced to

    support &+ 1CC 5Is 35-9 memory modules/ and to help regain mar8et share

    from 0I9, would not allow 35-9 modules wor8 at 1CC hz, if +&'s such as certain

    variants of Intels &entium III/ rated for a 1## hz external cloc8 rate were fitted on the

    motherboard. This particularly applies to the +eleron family which ran at a @@ hz

    external cloc8 rate. It is significant that many of Intels competitors promoted &+1CC and

    &+ $@@ 5I standards over the more expensive -9(us 5-9. This further

    impeded the acceptance of -5-96 however, by late $##$, -5-9 had its own

    mar8et niche as the price of 35-9 increased once more.

    Intel learned from its experience with +amino and +armel chipsets. (owing to mar8et

    pressure it designed two new chipset families for use with its new &entium I0 +&'. The

    first of these, the i"A see Figure $/ was targeted at systems based on the &entium I0

    and synchronous 5-9 memory such as the &+1CC, $CC, and CCC, with up to C J( of

    memory. The iA# see Figure C/ was targeted on -5-94based systems of up to " J(,

    which supported the &+ ##, 1#CC and 1#@@ -9(us memory. In late $##$, The Intel

    Tom (utler

  • 8/12/2019 Hardware Fundamentals(1)

    10/44

    "AJ) chipset was released to support &+CCC 55- 3-9 and &entium " processor.

    The chipset also included Intels )xtreme Jraphics technology which ran at $@@ >z

    core speed. The basic member of the Intel A# chipset family had support for &+##

    -5-9 memory and provided a balanced performance platform for the &entium "

    processor with "##>z system bus and !et(urstK 9rchitecture. It also supports dual

    channel access to -5-9 -Is, which increases overall throughput to C.$ Jbps.

    3ubse;uent developments in this chipset family provided support for -5-9 running at

    1#CC hz, 1#@@ hz and a ACC >z F3(. Further advances in 55- 35-9 technologies

    saw 55- 35-94based Intel and 0I9 chipsets which accommodated &+$"## 9!5

    &+$=## 55- 3--9 running at 1A# hz and 1@@>z.respectively and which is

    double cloc8ed to C## and CCC hz so called 55- C## and CCC/. >owever, the

    evolution of 55-C@@ and chipset design led to the &+C### 55- 35-9 being released

    with even higher bandwidth speeds.

    Basic CPU Architectures

    CISC vs. RISC

    There are two types of fundamental +&' architectureH complex instruction set computers

    +I3+/ and reduced instruction set computers -I3+/. +I3+ is the most prevalent and

    established microprocessor architecture, while -I3+ is a relative newcomer. Intels

    #x@ and &entium microprocessor families are +I3+4based, although -I3+4type

    functionality has been incorporated into &entium +&'s. otorolas @### family of

    microprocessors is another example of this type of architecture. 3un icrosystems

    3&9-+ microprocessors and I&3 -$###, -C### and -"### families dominate the

    -I3+ end of the mar8et6 however, otorolas &ower&+, J", Intels i@#, and 9nalog

    5evices Inc.s digital signal processors 53&/ are in wide use. In the &+7or8station

    mar8et, 9pple +omputers and 3un employ -I3+ microprocessors as their choice of +&'.

    Tom (utler1#

  • 8/12/2019 Hardware Fundamentals(1)

    11/44

    Ta&le 1 $IS$ and )IS$

    $IS$ )IS$

    arge instruction set +ompact instruction set

    +omplex, powerful instructions 3imple hard4wired machine code and control unit

    Instruction sub4commands microcoded in on board

    -2

    &ipelining of instructions

    +ompact and versatile register set !umerous registers

    !umerous memory addressing options for operands +ompiler and I+ developed simultanwously

    The difference between the two architectures is the relative complexity of the instruction

    sets and underlying electronic and logic circuits in +I3+ microprocessors. For example,

    the original -I3+ I prototype had ?ust C1 instructions, while the -I3+ II had C

  • 8/12/2019 Hardware Fundamentals(1)

    12/44

    Figure " Typical %icroprocessor Architectures

    Tom (utler1$

    (us Interface 'nit

    &rogram +ounter

    3tac8 &ointer

    9% (&

    (% 3I

    +% 5I

    5% Flag

    Instruction -e ister

    Jeneral purpose

    registersH 9% is the

    9ccumulator

    Internal (us

    5ecode 'nit

    9rithmetic and ogic 'nit

    +ontrol 'nit

    9ddress (us 5ata (us +ontrol (us

    Includes

    read7write,

    interrupt, cloc8 andreset

  • 8/12/2019 Hardware Fundamentals(1)

    13/44

    the 1

  • 8/12/2019 Hardware Fundamentals(1)

    14/44

    several integrated transistors which are configured as a flip4flop circuits each of which

    can be switched into a 1 or # state. They remain in that state until changed under control

    of the +&' or until the power is removed from the processor. )ach register has a specific

    name and is addressable, some, however, are dedicated to specific tas8s while the

    ma?ority are Dgeneral purpose. The width of a register depends on the type of +&', e.g.,

    an 1@, C$ or @" bit microprocessor. In order to provide bac8ward compatibility, registers

    may be sub4divided. For example, the &entium processor is a C$ bit +&', and its registers

    are C$ bits wide. 3ome of these are sub4divided and named as and 1@ bit registers in

    order to run and 1@ bit applications designed for earlier x@ microprocessors.

    Instruction )egister

    hen the (us Interface 'nit receives an instruction it transfers it to the Instruction

    -egister for temporary storage. In &entium processors the (us Interface 'nit transfers

    instructions to the 1 I4+ache, there is no instruction register as such.

    Stac* Pointer

    9 Dstac8 is a small area of reserved memory used to store the data in the +&'s registers

    whenH 1/ system calls are made by a process to operating system routines6 $/ when

    hardware interrupts generated by input7output I72/ transactions on peripheral devices6

    C/ when a process initiates an I72 transfer6 C/ when a process rescheduling event occurson foot of a hardware timer interrupt. This transfer of register contents is called a Dcontext

    switch. The stac8 pointer is the register which holds the address of the most recent

    Dstac8 entry. >ence, when a system call is made by a process to say print a document/

    and its context is stored on the stac8, the called system routine uses the stac8 pointer to

    reload the register contents when it is finished printing. Thus the process can continue

    where it left off.

    Instruction +ecoder

    The Instruction 5ecoder is an arrangement of logic elements which act on the bits that

    constitute the instruction. 3imple instructions with corresponding logic hard4wired into

    the execution unit are simply passed to the )xecution 'nit and7or the % in the

    &entium II, III and I0/, complex instructions are decoded so that related microcode

    Tom (utler1"

  • 8/12/2019 Hardware Fundamentals(1)

    15/44

    modules can be transferred from the +&'s microcode -2 to the execution unit. The

    Instruction 5ecoder will also store referenced operands in appropriate registers so data at

    the memory locations referenced can be fetched.

    Program or Instruction $ounter

    The &rogram +ounter &+/ is the register that stores the address in primary memory

    -9 or -2/ of the next instruction to be executed. In C$ bit systems, this is a C$ bit

    linear or virtual memory address that references a byte the first of " re;uired to store the

    C$ bit instruction/ in the processs virtual memory address space. This value is translated

    to determine the real memory address in which the instruction is stored. hen the

    referenced instruction is fetched, the address in the &+ is incremented to the address of

    the next instruction to be executed. If the current address is ##(# hex, then the next

    address will be ##(" hex. -emember each byte in -9 is individually addressable,

    however each complete instruction is C$ bits or " bytes, and the address of the next

    instruction in the process will be " bytes on.

    Accumulator

    The accumulator may contain data to be used in a mathematical or logical operation, or it

    may contain the result of an operation. Jeneral purpose registers are used to support the

    accumulator by holding data to be loaded to7from the accumulator.

    $omputer Status ,ord or Flag )egister

    The result of a 9' operation may have conse;uences of subse;uent operations6 for

    example, changing the path of execution. Individual bits in this register are set or reset in

    accordance with the result of mathematical or logical operations. 9lso called a flag, each

    bit in the register has a preassigned meaning and the contents are monitored by the

    control unit to help control +&' related actions.

    Arithmetic and -ogic .nit

    The 9rithmetic and ogic 'nit 9'/ performs all arithmetic and logic operations in a

    microprocessor viz. addition, subtraction, logical 9!5, 2-, )%42-, etc.. 9 typical 9'

    is connected to accumulator and general purpose registers and other +&' components

    Tom (utler1A

  • 8/12/2019 Hardware Fundamentals(1)

    16/44

    that help transfer the result of its operations to -9 via the (us Interface 'nit and the

    system bus. The results may also be written into internal or external caches.

    $ontrol .nit

    The control unit coordinates and manages +&' activities, in particular the execution of

    instructions by the arithmetic and logic unit 9'/. In &entium processors its role is

    complex, as microcode from decoded instructions are pipelined for execution by two

    9's.

    The System $loc*

    The Intel # had a cloc8 speed of ".== hz6 that is, its internal logic gates were opened

    and closed under the control of a s;uare wave pulsed signal that had a fre;uency of ".==

    million cycles per second. 9lternatively put, the logic gates opened and closed ".==

    million times per second. Thus, instructions and data were pumped through the integrated

    transistor logic circuits at a rate of ".== million bits per second. ater designs ran at

    higher speeds viz. the i$@ 4$# hz, the iC@ 1@4CC hz, i"@ $A4A# hz. here does

    this cloc8 signal come fromM )ach motherboard is fitted with a ;uartz oscillator in a

    metal pac8age that generates a s;uare wave cloc8 pulse of a certain fre;uency. In i#

    systems the crystal oscillator ran at 1".C1 hz and this was fed to the i$" to generate

    the system cloc8 fre;uency of ".== hz in earlier system, to 1#hz is later designs.ater, the i$@ &+s had a 1$ hz crystal which provided i$$" I+ multiplier7divider

    with the primary cloc8 signal. This then divided7multiplied the basic 1$ hz to generate

    the system cloc8 signal of 4$# hz. ith the advent of the i"@5%, the system cloc8

    signal, which ran at $A or CC hz, was effectively multiplied by factors of $ and C to

    deliver an internal +&' cloc8 speed of A#, @@, =A, 1## hz. This approach is used in

    &entium I0 architectures, where the primary crystal source delivers a relatively slow A#

    hz cloc8 signal that is then multiplied to the system cloc8 speed of 1##41CC hz. The

    internal multiplier in the &entium then multiplies this by a fact or $#N to obtain speeds of

    $Jhz and above.

    Tom (utler1@

  • 8/12/2019 Hardware Fundamentals(1)

    17/44

    Instruction $ycle

    9n instruction cycle consists of the activities re;uired to fetch and execute an instruction.

    The length of time ta8e to fetch and execute is measured in cloc8 cycles. In +I3+

    processors this will ta8e many cloc8 cycles, depending on the complexity of theinstruction and number of memory references made to load operands. In -I3+ computers

    the number of cloc8 cycles are reduced significantly. hen the +&' finishes the

    execution of an instruction it transfers the content of the program or instruction register

    into the (us Interface 'nit 1 cloc8 cycle/ . This is then gated onto the system address

    bus and the read signal is asserted on the control bus 1 cloc8 cycle/. This is a signal to

    the -9 controller that the value of this address is to be read from memory and loaded

    onto the data bus "N cloc8 cycles/. The instruction is read in from the data bus and

    decoded $ N cloc8 cycles. The fetch and decode activities constitute the first machine

    cycle of the instruction cycle. The second machine cycle begins when the instructions

    operand is read from -9 and ends when the instruction is executed and the result

    written bac8 to memory. This will ta8e at least another N cloc8 cycles, depending on the

    complexity of the instruction. Thus an instruction cycle will ta8e at least 1@ cloc8 cycles,

    a considerable length of time. Together, -I3+ processors and fast -9 can 8eep this to

    a minimum. >owever, Intel made advances by super pipelining instructions, that is by

    interleaving fetch, decode, operand read, execute, and retire i.e. write the result of the

    instruction to -9/ activities into two separate pipelines serving two 9's. >ence,

    instructions are not executed se;uentially, but concurrently and in parallel:more about

    pipelining later.

    #thand th 0eneration Intel $P. Architecture

    The &entium microprocessor was the last of Intels A th generation microprocessors and

    had several basic unitsH the (us Interface 'nit (I'/6 the I4+ache E( of write4through

    3tatic -9:3-9/6 the Instruction Translation oo8aside (uffer T(/6 The 54

    +ache E( of write4bac8 3-9/6 the 5ata T(6 the +loc8 5river7ultiplier6

    Instruction Fetch 'nit6 the (ranch &rediction 'nit6 the Instruction 5ecode 'nit6 +omplex

    Instruction 3upport 'nit6 3uperscalar Integer )xecution 'nit6 &ipelined Floating &oint

    'nit. Figure A presents a bloc8 diagram of the original &entium.

    Tom (utler1=

  • 8/12/2019 Hardware Fundamentals(1)

    18/44

    The &entium was the first Intel chip to have a @" bit external data bus which was split

    internally into two separate pipelines, each C$ bits wide. This allowed the &entium to

    execute two instructions simultaneously6 however, more than one instruction could be in

    the pipeline, thus increasing instruction throughput.

    >eat dissipation is enemy of chip designers, as the greater the number of integrated

    transistors, the higher the speed of operation and the operating voltage, the more poser is

    consumed, and the more heat generated. The first two &entium versions ran at @# and @@

    hz respectively with an operating voltage of A 0 5+. >ence they ran ;uite hot.

    >owever, a change in pac8age design from 3oc8et A to =, &in Jrid 9rray:&J9/ and a

    reduction in operating voltage to C.C 0olts lowered power consumption and heat

    dissipation. Intel also introduced a cloc8 multiplier which multiplied the external cloc8

    signals and enabled the &entium to run at 1.A, $, $.A and finally C times this speed. Thus

    while the system bus ran at A#, @#, and @@ hz, the +&' ran at =A4$##hz.

    In 1owever, ma?or design changes came with the

    &entium I0. odifications and design changes centered on a/ the physical pac8age6 b/

    the process by which instructions were decoded and executed6 c/ support for memory

    beyond the " J( limit6 c/ the integration and enhancement of 1 and $ cache

    performance and size6 d/ the addition of a new cache6 e/ the speed of internal and

    external operation. )ach of these issues receives attention in the following subsections.

    Tom (utler1

  • 8/12/2019 Hardware Fundamentals(1)

    19/44

    Figure # Pentium $P. Bloc* +iagram

    Tom (utler

    us nter ace n t

    t A ress us

    - ac e

    rac arget u er

    Control Unit

    re etc u er

    Fetch and Decode Unit

    ontro ust ata us

    - ac e

    Microcode

    ROM

    oc

    Multiplier

    Dual Pipeline

    Execution Unit

    - pe ne - pe ne

    Registers

    Floating Point

    Unit

    Advanced

    Programmable

    Interrupt

    Controller

    1

  • 8/12/2019 Hardware Fundamentals(1)

    20/44

    Physical Packaging

    Two terms are employed to describe the pac8aging employed for the &entium family of

    processorsH the first refers to the motherboard connection, and the second to the actual

    pac8age itself. For example, the original &entium &A was fitted to the 3oc8et A type

    connection on the motherboard using a 3taggered &in Jrid 9rray 3&J9/ for the dies

    I72 die is the technical term for the physical structure that incorporates the chip/. ater

    variants used the 3oc8et = connector. The &in Jrid 9rray &J9/ family of pac8ages are

    associated with different 3oc8et types, which are numbered. 9 pin grid array is simply an

    array of metal pin connectors used to form an electrical connection between the internal

    electronics of the +&' pac8aged on the die/ and other system components li8e the

    system chipsets. The pins plug into corresponding receptacle pinholes in the +&'s

    soc8et on the motherboard. The different types of &J9 reflect the type of pac8aging, e.g.

    ceramic to plastic, the number of pins, and how they are arrayed. The &entium &ro used a

    3&J9 with a staggering C= pins for connection to the motherboard soc8et, called 3oc8et

    . The &entium &ro was the first Intel processor to have an $ cache connected to the

    +&' via bac8side bus, but on a separate die. This was a significant technical achievement

    pac8aging. hen Intel designed the &entium II they decided to change the pac8aging

    significantly and introduced a 3ingle )dge +ontact +onnector 3)++/ pac8age with

    three variants 3)++ for the &entium II, 3)++$ for the &entium II and 3)&& for the+eleron/, each of which plugged into the 3lot 1 connector on the motherboard. >owever,

    later variants of the +eleron and &entium III used &J9 pac8aging for certain

    applicationsH the +eleron uses the &lastic &J9, the +eleron III and &entium III the Flip4

    +hip &in Jrid 9rray F+4&J9/. (oth use the C=#4pin 3oc8et. The &entium I0 saw a full

    return to the &J9 for all chips. >ere a Flip4+hip &in Jrid 9rray F+4&J9/ was

    employed in a "= &+&J9 pac8age.

    Overall Architectural Comparison of the Pentium Family of Microprocessors

    The &entium &A"/ first shipped in 1

  • 8/12/2019 Hardware Fundamentals(1)

    21/44

  • 8/12/2019 Hardware Fundamentals(1)

    22/44

    variants, and the &entium III. 9s indicated, the physical pac8age was also significant

    advance, as was the incorporation of additional -I3+ features. >owever, aimed as it was

    at the server mar8et, the &entium &ro did not incorporate % technology. It was

    expensive to produce as it included the $ cache on its substrate but on a separate die/

    and had A.A million transistors at its core and over million in its $ cache. Its core logic

    operated at C.C0olts. The microprocessor was still, however, chiefly +I3+ in design, and

    optimized for C$ bit operation. The chief features of the &entium &ro wereH

    9 partly integrated $ cache of up to A1$ E( on a specially manufactured

    3-9 separate die/ that was connected via a dedicated Dbac8side bus that ran at

    full +&' speed.

    Three 1$ staged pipelines

    3peculative execution of instructions

    2ut4of4order completion of instructions

    "# renamed registers

    5ynamic branch prediction

    ultiprocessing with up to " &entium &ros

    9n increased bus size to C@ bits from C$/ to enable up to @" Jb of memory to be

    used. &lease note that the " extra bits can address up to 1@ memory locations6 this

    gives " Jb x 1@ G @" Jb of memory./

    The following description is ta8en from Intels introduction to its microprocessor

    architecture is relevant to all members of the &@ family, including the +eleron, &entium II

    and III.

    The Intel &entium &ro processor has three4way superscalar architecture. The term

    Othree4way superscalarP means that using parallel processing techni;ues, the processor is

    able on average to decode, dispatch, and complete execution of retire/ three instructions

    per cloc8 cycle. To handle this level of instruction throughput, the &entium &ro processor

    uses a decoupled, 1$4stage superpipeline that supports out4of4order instruction execution.

    It does this by incorporating even more parallelism than the &entium processor. The

    Tom (utler$$

  • 8/12/2019 Hardware Fundamentals(1)

    23/44

    &entium &ro processor provides 5ynamic )xecution micro4data flow analysis, out4of4

    order execution, superior branch prediction, and speculative execution/ in a superscalar

    implementation.

    The centerpiece of the &entium &ro processor architecture is an innovative out4of4order execution mechanism called Odynamic execution.P 5ynamic execution incorporates

    three data4processing conceptsH

    Q 5eep branch prediction.

    Q 5ynamic data flow analysis.

    Q 3peculative execution.

    (ranch prediction is a concept found in most mainframe and high4speed -I3+

    microprocessor architectures. It allows the processor to decode instructions beyond

    branches to 8eep the instruction pipeline full. In the &entium &ro processor, the

    instruction fetch7decode unit uses a highly optimized branch prediction algorithm to

    predict the direction of the instruction stream through multiple levels of branches,

    procedure calls, and returns.

    Figure Functional Bloc* +iagram of the PentiumPro Processor %icroarchitecture

    Tom (utler$C

  • 8/12/2019 Hardware Fundamentals(1)

    24/44

    5ynamic data flow analysis involves real4time analysis of the flow of data through the

    processor to determine data and register dependencies and to detect opportunities for out4

    of4order instruction execution. The &entium &ro processor dispatch7execute unit can

    simultaneously monitor many instructions and execute these instructions in the order that

    optimizes the use of the processors multiple execution units, while maintaining the

    integrity of the data being operated on. This out4of4order execution 8eeps the execution

    units busy even when cache misses and data dependencies among instructions occur.

    3peculative execution refers to the processors ability to execute instructions ahead of the

    program counter but ultimately to commit the results in the order of the original

    instruction stream. To ma8e speculative execution possible, the &entium &ro processor

    microarchitecture decouples the dispatching and executing of instructions from the

    commitment of results. The processors dispatch7execute unit uses data4flow analysis to

    execute all available instructions in the instruction pool and store the results in temporary

    registers. The retirement unit then linearly searches the instruction pool for completed

    instructions that no longer have data dependencies with other instructions or unresolved

    branch predictions. hen completed instructions are found, the retirement unit commits

    the results of these instructions to memory and7or the Intel 9rchitecture registers the

    processors eight general4purpose registers and eight floating4point unit data registers/ in

    the order they were originally issued and retires the instructions from the instruction pool.

    Through deep branch prediction, dynamic data4flow analysis, and speculative execution,

    dynamic execution removes the constraint of linear instruction se;uencing between the

    traditional fetch and execute phases of instruction execution. It allows instructions to be

    decoded deep into multi4level branches to 8eep the instruction pipeline full. It promotes

    out4of4order instruction execution to 8eep the processors six instruction execution units

    Tom (utler$"

  • 8/12/2019 Hardware Fundamentals(1)

    25/44

    running at full capacity. 9nd finally it commits the results of executed instructions in

    original program order to maintain data integrity and program coherency.

    Three instruction decode units wor8 in parallel to decode ob?ect code into smaller

    operations called Omicro4opsP microcode/. These go into an instruction pool, and wheninterdependencies dont prevent/ can be executed out of order by the five parallel

    execution units two integer, two F&' and one memory interface unit/. The -etirement

    'nit retires completed micro4ops in their original program order, ta8ing account of any

    branches.

    The power of the &entium &ro processor is further enhanced by its cachesH it has the same

    two on4chip 4E(yte 1 caches as does the &entium processor, and also has a $A@4A1$

    E(yte $ cache thats in the same pac8age as, and closely coupled to, the +&', using a

    dedicated @"4bit Obac8sideP/ full cloc8 speed bus. The 1 cache is dual ported, the $

    cache supports up to " concurrent accesses, and the @"4bit external data bus is transaction

    4oriented, meaning that each access is handled as a separate re;uest and response, with

    numerous re;uests allowed while awaiting a response. These parallel features for data

    access wor8 with the parallel execution capabilities to provide a Onon4bloc8ingP

    architecture in which the processor is more fully utilized and performance is enhanced.

    Pentium Pro Modes of Operation

    The Intel 9rchitecture supports three operating modesH protected mode, real4address

    mode, and system management mode. The operating mode determines which instructions

    and architectural features are accessibleH

    Protected mode2 The native state of the processor. In this mode all instructions

    and architectural features are available, providing the highest performance and

    capability. This is the recommended mode for all new applications and operating

    systems. 9mong the capabilities of protected mode is the ability to directly

    execute Oreal4addressmodeP #@ software in a protected, multi4tas8ing

    environment. This feature is called 3irtual!(! mode, although it is not actually

    a processor mode. 0irtual4#@ mode is actually a protected mode attribute that

    can be enabled for any tas8.

    Tom (utler$A

  • 8/12/2019 Hardware Fundamentals(1)

    26/44

    )ealaddress mode2 &rovides the programming environment of the Intel #@

    processor with a few extensions such as the ability to switch to protected or

    system management mode/. The processor is placed in real4address mode

    following power4up or a reset.

    System management mode2 9 standard architectural feature uni;ue to all Intel

    processors, beginning with the IntelC@ 3 processor. This mode provides an

    operating system or executive with a transparent mechanism for implementing

    platform4specific functions such as power management and system security. The

    processor enters 3 when the external 3 interrupt pin 3IR/ is activated

    or an 3I is received from the advanced programmable interrupt controller

    9&I+/. In 3, the processor switches to a separate address space while saving

    the entire context of the currently running program or tas8. 34specific code

    may then be executed transparently. 'pon returning from 3, the processor is

    placed bac8 into its state prior to the system management interrupt.

    The basic execution environment is the same for each of these operating modes,

    Basic Pentium 45ecution 4n3ironment

    9ny program or tas8 running on an Intel 9rchitecture processor is given a set of

    resources for executing instructions and for storing code, data, and state information.

    These resources shown in Figure / include an address space of up to $C$ bytes, a set of

    general data registers, a set of segment registers, and a set of status and control registers.

    hen a program calls a procedure, a procedure stac8 is added to the execution

    environment. &rocedure calls and the procedure stac8 implementation are described in

    +hapter ",Procedure Calls, Interrupts, and Exceptions./

    Figure 6 Basic 45ecution 4n3ironment

    Tom (utler$@

  • 8/12/2019 Hardware Fundamentals(1)

    27/44

    Pentium Pro %emory Organi7ation

    The memory that the processor addresses on its bus is called physical memory. &hysical

    memory is organized as a se;uence of 4bit bytes. )ach byte is assigned a uni;ue address,

    called a physical address. The physical address space ranges from zero to a maximum

    of $C$S 1 " gigabytes/. 0irtually any operating system or executive designed to wor8 with

    an Intel 9rchitecture processor will use the processors memory management facilities to

    access memory. These facilities provide features such as segmentation and paging, which

    allow memory to be managed efficiently and reliably. emory management is described

    in detail later. The following paragraphs describe the basic methods of addressing

    memory when memory management is used. hen employing the processors memory

    management facilities, programs do not directly address physical memory. Instead, they

    access memory using any of three memory modelsH flat, segmented, or real4address

    mode.

    ith the flat memory model see Figure C4$/, memory appears to a program as a single,

    continuous address space, called a linear address space. +ode a programs

    instructions/, data, and the procedure stac8 are all contained in this address space. The

    linear address space is byte addressable, with addresses running contiguously from # to

    $C$ 4 1. 9n address for any byte in the linear address space is called a linear address.

    ith the segmented memory model, memory appears to a program as a group of

    independent address spaces called segments. hen using this model, code, data, and

    stac8s are typically contained in separate segments. To address a byte in a segment, a

    program must issue a logical address, which consists of a segment selector and an offset.

    Tom (utler$=

  • 8/12/2019 Hardware Fundamentals(1)

    28/44

    9 logical address is often referred to as a far pointer./ The segment selector identifies

    the segment to be accessed and the offset identifies a byte in the address space of the

    segment. The programs running on an Intel 9rchitecture processor can address up to

    1@,CC segments of different sizes and types, and each segment can be as large as $ C$

    "J(/ bytes.

    Internally, all the segments that are defined for a system are mapped into the processors

    linear address space. 3o, the processor translates each logical address into a linear address

    to access a memory location. This translation is transparent to the application program.

    The primary reason for using segmented memory is to increase the reliability of programs

    and systems. For example, placing a programs stac8 in a separate segment prevents the

    stac8 from growing into the code or data space and overwriting instructions or data,

    respectively. 9nd placing the operating systems or executives code, data, and stac8 in

    separate segments protects Them from the application program and vice versa.

    ith either the flat or segmented model, the Intel 9rchitecture provides facilities for

    dividing the linear address space into pages and mapping the pages into virtual memory.

    If an operating system7executive uses the Intel 9rchitectures paging mechanism, the

    existence of the pages is transparent to an application program.

    The realaddress mode model uses the memory model for the Intel #@ processor, the

    first Intel 9rchitecture processor. It was provided in all the subse;uent Intel 9rchitecture

    processors for compatibility with existing programs written to run on the Intel #@

    processor. The real address mode uses a specific implementation of segmented memory

    in which the linear address space for the program and the operating system7executive

    consists of an array of segments of up to @"E bytes in size each. The maximum size of

    the linear address space in real4address mode is $$# bytes.

    Figure ! Three %emory %anagement %odels

    Tom (utler$

  • 8/12/2019 Hardware Fundamentals(1)

    29/44

    '&it 3s2 1&it Address and Operand Si7es

    The processor can be configured for C$4bit or 1@4bit address and operand sizes. ith C$4

    bit address and operand sizes, the maximum linear address or segment offset is

    FFFFFFFF> $C$/, and operand sizes are typically bits or C$ bits. ith 1@4bit address

    and operand sizes, the maximum linear address or segment offset is FFFF> $ 1@/, and

    operand sizes are typically bits or 1@ bits. hen using C$4bit addressing, a logical

    address or far pointer/ consists of a 1@4bit segment selector and a C$4bit offset6 when

    using 1@4bit addressing, it consists of a 1@4bit segment selector and a 1@4bit offset.

    Instruction prefixes allow temporary overrides of the default address and7or operand sizes

    from within a program. hen operating in protected mode, the segment descriptor for the

    currently executing code segment defines the default address and operand size. 9

    segment descriptor is a system data structure not normally visible to application code.

    9ssembler directives allow the default addressing and operand size to be chosen for a

    program. The assembler and other tools then set up the segment descriptor for the code

    segment appropriately. hen operating in real4address mode, the default addressing and

    operand size is 1@ bits. 9n address4size override can be used in real4address mode to

    Tom (utler$

  • 8/12/2019 Hardware Fundamentals(1)

    30/44

    enable C$ bit addressing6 however, the maximum allowable C$4bit address is still

    ####FFFF> $1@/.

    Figure 8 Application Programming )egisters

    )40IST4)S

    The processor provides 1@ registers for use in general system and application programming. 9s shown in

    Figure, these registers can be grouped as followsH

    0eneralpurpose data registers. These eight registers are available for storing

    operands and pointers.

    Segment registers. These registers hold up to six segment selectors.

    Status and control registers. These registers report and allow modification of thestate of the processor and of the program being executed.

    General-Purpose Data Reisters

    The C$4bit general4purpose data registers )9%, )(%, )+%, )5%, )3I, )5I, )(&, and

    )3& are provided for holding the following itemsH

    Tom (utlerC#

  • 8/12/2019 Hardware Fundamentals(1)

    31/44

    2perands for logical and arithmetic operations

    2perands for address calculations

    9lthough all of these registers are available for general storage of operands, results, and

    pointers, caution should be used when referencing the )3& register. The )3& register

    holds the stac8 pointer and as a general rule should not be used for any other purpose.

    any instructions assign specific registers to hold operands. For example, string

    instructions use the contents of the )+%, )3I, and )5I registers as operands. hen using

    a segmented memory model, some instructions assume that pointers in certain registers

    are relative to specific segments. For instance, some instructions assume that a pointer in

    the )(% register points to a memory location in the 53 segment.

    The following is a summary of these special usesH

    )9%:9ccumulator for operands and results data.

    )(%:&ointer to data in the 53 segment.

    )+%:+ounter for string and loop operations.

    )5%:I72 pointer.

    )3I:&ointer to data in the segment pointed to by the 53 register6 source pointer

    for string operations.

    )5I:&ointer to data or destination/ in the segment pointed to by the )3 register6

    destination pointer for string operations.

    )3&:3tac8 pointer in the 33 segment/.

    )(&:&ointer to data on the stac8 in the 33 segment/.

    9s shown in Figure, the lower 1@ bits of the general4purpose registers map directly to the

    register set found in the #@ and Intel $@ processors and can be referenced with the

    names 9%, (%, +%, 5%, (&, 3&, 3I, and 5I. )ach of the lower two bytes of the )9%,

    )(%, )+%, and )5% registers can be referenced by the names 9>, (>, +>, and 5>

    high bytes/ and 9, (, +, and 5 low bytes/.

    Tom (utlerC1

  • 8/12/2019 Hardware Fundamentals(1)

    32/44

    Segment )egisters

    The segment registers +3, 53, 33, )3, F3, and J3/ hold 1@4bit segment selectors. 9

    segment selector is a special pointer that identifies a segment in memory. To access a

    particular segment in memory, the segment selector for that segment must be present inthe appropriate segment register. hen writing application code, you generally create

    segment selectors with assembler directives and symbols. The assembler and other tools

    then create the actual segment selector values associated with these directives and

    symbols. If you are writing system code, you may need to create segment selectors

    directly.

    >ow segment registers are used depends on the type of memory management model that

    the operating system or executive is using. hen using the flat unsegmented/ memory

    model, the segment registers are loaded with segment selectors that point to overlapping

    segments, each of which begins at address # of the linear address space as shown in

    Figure/. These overlapping segments then comprise the linear4address space for the

    program. Typically, two overlapping segments are definedH one for code and another for

    data and stac8s. The +3 segment register points to the code segment and all the other

    segment registers point to the data and stac8 segment./

    hen using the segmented memory model, each segment register is ordinarily loaded

    with a different segment selector so that each segment register points to a different

    segment within the linear4address space as shown in Figure

  • 8/12/2019 Hardware Fundamentals(1)

    33/44

    Figure 11 .se of Segment )egisters in Segmented %emory %odel

    )ach of the segment registers is associated with one of three types of storageH code, data,

    or stac8/. For example, the +3 register contains the segment selector for the code

    segment, where the instructions being executed are stored. The processor fetches

    instructions from the code segment, using a logical address that consists of the segment

    selector in the +3 register and the contents of the )I& register. The )I& register contains

    the linear address within the code segment of the next instruction to be executed. The +3

    register cannot be loaded explicitly by an application program. Instead, it is loaded

    implicitly by instructions or internal processor operations that change program control

    such as, procedure calls, interrupt handling, or tas8 switching/.

    The 53, )3, F3, and J3 registers point to four data segments. The availability of four

    data segments permits efficient and secure access to different types of data structures. For

    example, four separate data segments might be createdH one for the data structures of the

    current module, another for the data exported from a higher4level module, a third for a

    dynamically created data structure, and a fourth for data shared with another program. To

    Tom (utlerCC

  • 8/12/2019 Hardware Fundamentals(1)

    34/44

    access additional data segments, the application program must load segment selectors for

    these segments into the 53, )3, F3, and J3 registers, as needed.

    The 33 register contains the segment selector for a stac* segment, where the procedure

    stac8 is stored for the program, tas8, or handler currently being executed. 9ll stac8operations use the 33 register to find the stac8 segment. 'nli8e the +3 register, the 33

    register can be loaded explicitly, which permits application programs to set up multiple

    stac8s and switch among them.

    The four segment registers +3, 53, 33, and )3 are the same as the segment registers

    found in the Intel #@ and Intel $@ processors and the F3 and J3 registers were

    introduced into the Intel 9rchitecture with the IntelC@ family of processors.

    4F-A0S )egister

    The C$4bit )F9J3 register contains a group of status flags, a control flag, and a group

    of system flags. Figure C4= defines the flags within this register. Following initialization

    of the processor either by asserting the -)3)T pin or the I!IT pin/, the state of the

    )F9J3 register is #######$>. (its 1, C, A, 1A, and $$ through C1 of this register are

    reserved. 3oftware should not use or depend on the states of any of these bits.

    3ome of the flags in the )F9J3 register can be modified directly, using special4purpose

    instructions described in the following sections/. There are no instructions that allow the

    whole register to be examined or modified directly. >owever, the following instructions

    can be used to move groups of flags to and from the procedure stac8 or the )9% registerH

    9>F, 39>F, &'3>F, &'3>F5, &2&F, and &2&F5. 9fter the contents of the

    )F9J3 register have been transferred to the procedure stac8 or )9% register, the flags

    can be examined and modified using the processors bit manipulation instructions (T,

    (T3, (T-, and (T+/.

    hen suspending a tas8 using the processors multitas8ing facilities/, the processor

    automatically saves the state of the )F9J3 register in the tas8 state segment T33/ for

    the tas8 being suspended. hen binding itself to a new tas8, the processor loads the

    )F9J3 register with data from the new tas8s T33.

    Tom (utlerC"

  • 8/12/2019 Hardware Fundamentals(1)

    35/44

    hen a call is made to an interrupt or exception handler procedure, the processor

    automatically saves the state of the )F9J3 registers on the procedure stac8. hen an

    interrupt or exception is handled with a tas8 switch, the state of the )F9J3 register is

    saved in the T33 for the tas8 being suspended.

    Instruction Pointer

    The instruction pointer )I&/ register contains the offset in the current code segment for

    the next instruction to be executed. It is advanced from one instruction boundary to the

    next in straightline code or it is moved ahead or bac8wards by a number of instructions

    when executing *&, *cc, +9, -)T, and I-)T instructions.

    The )I& register cannot be accessed directly by software6 it is controlled implicitly by

    controltransfer instructions such as *&, *cc, +9, and -)T/, interrupts, and

    exceptions. The only way to read the )I& register is to execute a +9 instruction and

    then read the value of the return instruction pointer from the procedure stac8. The )I&

    register can be loaded indirectly by modifying the value of a return instruction pointer on

    the procedure stac8 and executing a return instruction -)T or I-)T/.

    9ll Intel 9rchitecture processors prefetch instructions. (ecause of instruction

    prefetching, an instruction address read from the bus during an instruction load does not

    match the value in the )I& register. )ven though different processor generations usedifferent prefetching mechanisms, the function of )I& register to direct program flow

    remains fully compatible with all software written to run on Intel 9rchitecture processors.

    Operandsi7e and Addresssi7e Attri&utes

    hen processor is executing in protected mode, every code segment has a default

    operand4size attribute and address4size attribute. These attributes are selected with the 5

    default size/ flag in the segment descriptor for the code segment. hen the 5 flag is set

    the C$4bit operand4size and address4size attributes are selected6 when the flag is clear, the

    1@4bit size attributes are selected. hen the processor is executing in real4address mode,

    virtual4#@ mode, or 3, the default operand4size and address4size attributes are

    always 1@ bits.

    Tom (utlerCA

  • 8/12/2019 Hardware Fundamentals(1)

    36/44

    The operand4size attribute selects the sizes of operands that instructions operate on.

    hen the 1@4bit operand4size attribute is in force, operands can generally be either bits

    or 1@ bits, and when the C$4bit operand4size attribute is in force, operands can generally

    be bits or C$ bits. The address4size attribute selects the sizes of addresses used to

    address memoryH 1@ bits or C$ bits. hen the 1@4bit address4size attribute is in force,

    segment offsets and displacements are 1@4bits. This restriction limits the size of a

    segment that can be addressed to @" E(ytes. hen the C$4bit address4size attribute is in

    force, segment offsets and displacements are C$4bits, allowing segments of up to "

    J(ytes to be addressed. The default operand4size attribute and7or address4size attribute

    can be overridden for a particular instruction by adding an operand4size and7or address4

    size prefix to an instruction. The effect of this prefix applies only to the instruction it is

    attached to.

    Pentium II

    The &entium II incorporates many of the salient features of the &entium &ro and &entium

    %6 however, its physical pac8age was based on the 3)++73lot 1 interface and its A1$

    E( $ cache ran at only half the processor internal cloc8 rate. First generation &entium II

    Elamath +&'s operated at $CC, $@@, C## and CCChz with a F3( of @@hz and a core

    voltage of $. 0olts. In 1z, F3( and at $.# 0olts

    at the core. Its ma?or improvements wereH

    1@ Eb 1 instruction and data caches

    $ cache with non4proprietary commercially available 3-9

    Improved 1@ bit capability through segment register caches

    % unit.

    3tandard &entium II could only be used in dual multiprocessor configurations6

    however, &entium %)2! cpus had up to $ ( of $ cache and could be used in

    multiprocessor configurations of up to " processors.

    Tom (utlerC@

  • 8/12/2019 Hardware Fundamentals(1)

    37/44

    Celeron

    The +eleron began as a scaled down version of the &entium II and was designed to

    compete against similar offerings from Intels competitors. The Elamath4based

    +ovington core ran at $@@ and C## >z and were constructed without an $ cache.

    >owever, adverse mar8et reaction saw the 5eschutes4based endocino core introduced

    with an 1$ Eb $ cache and ran at C##, CCC, "##, "CC, "@@, A## and ACC >z. +elerons

    have the same 1 cache as their bigger brothers:&entium II and III. The important

    distinction is that the $ cache operates at full +&' cloc8 rates, unli8e the &entium II and

    the 3)++ pac8aged &entium III. ater variants of the &entium III had an on4die $

    cache which ran at full +&' cloc8 rate. The +eleron III +oppermine1$ core/has the

    same internal features as the &entium III, but has reduced functionalityH @@ hz cloc8

    rate, no error correction codes for the data bus, and parity creation for the address bus,

    and a maximum of " J( of address space. +eleron III +oppermine1$s with a 1.@ 0

    core and a 1## >z were produced in $##1 and operated at core speeds of up to 1.1

    hz. Tualatin4core +elerons were put on the mar8et in late $##1 and ran at 1.$ J>z.

    $##$ saw the final versions produced running aty 1.C and 1." >z.

    Pentium III

    The only significant difference between the &entium III and its predecessor was the

    inclusion of =$ % instructions, 8nown as the Internet 3treaming 3ingle Instruction

    ultiple 5ata )xtensions I33)/, they include integer and floating point operations.

    >owever, li8e the original % instructions, application programmers must include the

    corresponding extensions if any use is to be made of these instructions. The most

    controversial and short4lived addition was the +&' I5 number which could be used for

    software licensing and e4commerce. 9fter protest from various sources, Intel disabled it

    as default, but did not remove it. 5epending on the (I23 and motherboard manufacturer,

    it may remain as such but it can be enabled via the (I23. In reality, &entium III

    performance was based. The three variants of &entium III were the were the Eatami,

    +oppermine, and Tualatin. Eatami first introduced the I33) %7$/ as described with

    an F3( of 1## >. The +oppermine also introduced 9dvanced Transfer +ache 9T+/

    for the $ cache which reduced cache capacity to $A@ E( but saw the cache run at full

    processor speed. 9lso the @"4bit Eatami cache bus was ;uadrupled to $A@ bits.

    Tom (utlerC=

  • 8/12/2019 Hardware Fundamentals(1)

    38/44

    +oppermine also uses an 4way set associative cache, rather than the "4way set

    associative cache in the Eatami and older &entiums. (ringing the cache on4die also

    increased the transistor count to C# million, from the 1# million on the Eatami. 9nother

    advance in the +oppermine was 9dvanced 3ystem (uffering 93(/, which simply

    increased the number of buffers to account for the increased F3( speed of 1CC >z. The

    &entium III Tualatin had a reduced die size that allowed it to run at higher speeds.

    Tualatins use a 1CC>z F3( and have 9T+ and 93(.

    Pentium I9: The ;e5t 0eneration

    The release of the &entium I0 in $### heralded the seventh generation of Intel

    microprocessors. The release was premature, however, due to the out performance of the

    &entium III +oppermine, with its 1 Jhz performance threshold, by Intels ma?or

    competitor the microprocessor mar8et, the 95 9thlon. Intel was not ready to answer

    the competition through the early release of the next member of its &entium III family,

    the &entium III Tualatin, which were designed to brea8 the 1 Jhz barrier. &revious

    attempts to do so with the &entium III +oppermine 1.1C Jhz met with failure due to

    design flaws. &aradoxically, however, Intel was in a position to release the first of the

    &entium I0 family the illamette, which ran at 1.C, 1." and 1.A hz, using a F+4&J9

    pac8age on the short4lived 3oc8et "$C, which was a design dead end for motherboard

    manufacturers and consumers. orse still, the only Intel chipset available for the

    &entium I0 could only house the highly expensive -ambus 5-9. In addition, the early

    versions of &entium I0 +&' were outperformed by slower 95 9thlons. !evertheless,

    the core capability of Intels seventh generation processors is that they can run at ever4

    higher speeds. For example, Intels sixth generation &entiums began at 1$# hz with the

    &entium &ro and ended at over 1.$ Jhz, a tenfold increase. The bottom line here is that

    Intels seventh generation chips could end up running at speeds of 1# Jhz or more. >ow

    has Intel achieved thisM Through a radical redesign of the &entiums core architecture.The following sections illustrate the ma?or advances.

    The most visible feature seen of the new &entium I0 is the Front 3ide (us F3(/ which

    initially operated at e;uivalent speed of "## hz as compared to 1## >z on the

    &entium III. The &entium III has a @"4bit data bus that delivered a data throughput of

    Tom (utlerC

  • 8/12/2019 Hardware Fundamentals(1)

    39/44

    1.#@@ J( @"U 1CCG 1.#@@/. The &entium I0 F3( bus is also @"4bit wide, however, its

    1## hz bus speed is D;uad4pumped giving an effective bus speed of "##hz and a data

    transfer rate of C.$ J(. The newer as of late $##$/ &entium I07chipsets operate at 1CC

    hz and deliver a bus speed of ACC hz and a bus speed of ".$ Jhz. Thus, the &entium

    I0 exchange data with the i"A and iA# chipsets faster than any other processor, thus

    removing the &entium IIIs most significant bottlenec8. IntelVs A# chipset for the

    &entium I0 uses two -ambus channels to $4" -5-9 -Is. Together, these two

    -5-9 channels are able to deliver the same data bandwidth as the &entium I0 F3(.

    9s the later discussion on 5-9 indicates, similar transfer rates are delivered using the

    i"A chipset and 55- 5-9. stellation enables &entium "4systems to have the highest

    data transfer rates between processor, system and main memory, which is a clear benefit.

    Advanced Transfer Cache

    The first ma?or improvement is the integration of the $ cache and the evolution of the

    9dvanced Transfer +ache introduced in the &entium III +oppermine which had ?ust $A@

    E( of 1 +ache. The first &entium I0, the illamette, had a similar sized cache, but

    could transfer data at " J( per second at a +&' cloc8 speed of 1.A Jhz into the +&'s

    core logic, In comparison, the +oppermine could only transfer 1@ J(7s at 1 Jhz to its 1

    Instruction +ache. !ote also that the Front 3ide (us speed of the &entium III was 1CC

    hz, while the &entium I0 illamette had a F3( speed of "## hz. In addition, the

    &entium I0 $ cache has 1$4byte cache lines, which are divided in two @"4byte

    segments. For example, when the &entium I0 fetches data from the -9, it does so in

    @" byte burst transfers. >owever, if ?ust four bytes C$ bits/ are re;uired this bloc8

    transfer becomes inefficient. >owever, the cache has advanced 5ata &refetch ogic that

    predicts the data re;uired by the cache and loads it into the $ cache in advance. The

    &entium I0Vs hardware prefetch logic significantly accelerates the execution of processes

    that operate on large data arrays. The read latency the time it ta8es the cache to transfer

    data into the pipeline/ of &entium "Vs $4cache is = cloc8 pulses. >owever, its connection

    to the core logic the Translation oo8aside buffer in this case, there is no I4+ache in the

    &entium I0/ is $A@4bit wide and cloc8ed the full processor speed. The second member of

    the &entium I0 family was the !orthwood, which had a A1$ E( $ +ache running at the

    processors cloc8 speed.

    Tom (utlerC

  • 8/12/2019 Hardware Fundamentals(1)

    40/44

    L1 Data CacheThe second ma?or development in cache technology is that the &entium I0 has only one

    1 E( data cache. In place of the 1 instruction cache I4+ache/ in the @ thgeneration

    &entiums it has a much more efficient )xecution Trace +ache.

    Intel reduced the size of its 1 data cache to enable a very low latency of only $ cloc8

    cycles. This results in an overall read latency the time it ta8es to read data from cache

    memory/ of less than half of the &entium IIIVs 1 data cache.

    7thGeneration NetBurst Micro-Architecture

    Intels !et(urst icro49rchitecture provides a firm foundation for future advances in

    processor performance, particularly where speed of operation is concerned. The !et(urst

    micro4architecture has four ma?or componentsH >yper &ipelined Technology, -apid

    )xecution )ngine, )xecution Trace +ache and a "##>z system bus. 9lso incorporated

    are four significant improvements over sixth generation architectureH 9dvanced 5ynamic

    )xecution, 9dvanced Transfer +ache, )nhanced Floating &oint W ultimedia 'nit, and

    3treaming 3I5 )xtensions $.

    Hyper Pipelined Technology

    The traditional approach to increasing a +&'s cloc8 speed was ma8e smaller processors

    by shrin8ing the die. 9n alternative strategy evident in -I3+ processors is to ma8e the

    +&' more efficient do less per cloc8 cycle and have more of them. To do this in a +I3+4

    based processor, Intel simply increased the number of stages in the processors pipeline.

    The upshot of this is that less is accomplished per cloc8 cycle. This is a8in to a Dbuc8et4

    brigade passing smaller buc8ets rapidly down a chain, rather than larger buc8ets at a

    slower rate. For example, the ' and 0 integer pipelines in the original &entium each had

    ?ust five stagesH instruction fetch, decode 1, decode $, execute and write4bac8. The

    &entium &ro introduced a &@ architecture with a pipeline consisting of 1# stages. The &=

    !et(urst micro4architecture in the &entium I0 increased the number of stages to $#.

    This, Intel terms its >yper &ipelined Technology.

    Enhanced Branch Prediction

    The 8ey to pipeline efficiency and operation is effective branch prediction, hence the

    much improved branch prediction logic in the &entium I0s 9dvanced 5ynamic

    Tom (utler"#

  • 8/12/2019 Hardware Fundamentals(1)

    41/44

  • 8/12/2019 Hardware Fundamentals(1)

    42/44

    point operations, which are not prone to the same type of branch prediction inefficiencies

    as integer4based instructions.

    Streaming SIMD Extensions 2

    In the follow4up to Intels 3treaming 3I5 3ingle Instruction ultiple 5ata/ )xtensions33)/. 3I5 is a technology that allows a single instruction to be applied to multiple

    datasets at the same time. This is especially useful when processing C 5 graphics. 3I54

    F& Floating &oint/ extensions help speed up graphics processing by ta8ing the

    multiplication, addition and reciprocal functions and apply them to the multiple datasets

    simultaneously. -ecall, 3I5 first appeared with the &entium % which incorporated

    A= % instructions. These are essentially 3I54Int integer/ instructions. Intel first

    introduced 3I54F& extensions in the &entium III with =$ 3treaming 3I5 )xtensions

    33)/. Intel introduced 1"" new instructions in the &entium I0 that enable it to handle

    two @"4bit 3I54I!T operations and two double precision @"4bit 3I54F& operations.

    This is contrast to the two C$4bit operations the &entium % and III under 33)/

    handle. The ma?or benefit of 33)$ is enhanced greater performance, particularly with

    3I54F& instructions, as it increases the processors ability to handle greater precision

    floating point calculations. 9s with % and 33), these instructions re;uire software

    support.

    Celeron IV

    The +eleron I0 first appeared in $##$, these were based on the &entium I0 and could be

    accommodated on the 3oc8et "= motherboards. (ased on the illamette, the $ was

    halved to 1$ E( and ran at 1.= J>z. ater models ran at 1., 1.< and $ J>z. The next

    member was based on the !orthwood and had $A@ E( $ cache. (ased on the i"A

    chipset, the new +elerons are now good value entry level processors.

    Additional )esources

    The following 5iagrams of the &entium III, I0 and 95 9thlon +&'s are provided to

    highlight the architectural features of these microprocessors and enhance the foregoing

    text. The following figures have been obtained from Toms >ardware Juide !2T this

    Tom/H further insights into the Intel architectures may be found atH

    [email protected]$###11$#7index.html/.

    Tom (utler"$

  • 8/12/2019 Hardware Fundamentals(1)

    43/44

    Tom (utler"C

  • 8/12/2019 Hardware Fundamentals(1)

    44/44