computer architecture foundations for graduate level students

of 55 /55
Computer Architecture Foundations for Graduate Level Students

Author: anna-strickland

Post on 19-Jan-2016




0 download

Embed Size (px)


  • Computer Architecture Foundations for Graduate Level Students

  • Basic ParadigmHDMMCPUcache

  • Transfers of dataHDMMCPUcache

  • Transfers of dataHDMMCPUcache

  • CPU needs particular dataHDMMCPUcache

  • If required data is found in cacheHDMMCPUcache

  • When required data is not in cacheHDMMCPUcache

  • CPU ultimately gets data from cacheHDMMCPUcache

  • If data is not in cache and in MMHDMMCPUcache

  • From HD to MM to cacheHDMMCPUcache

  • If MM is full HDMMCPUcache

  • If cache is fullHDMMCPUcache

  • Swapping between MM and cacheHDMMCPUcache

  • Access TimeIf every memory reference to cache required transfer of one word between MM and cache, no increase in speed is achieved. In fact, speed will drop because apart from MM access, there is additional access to cacheSuppose reference is repeated n times, and after the first reference, location is always found in the cache

  • Cache Hit RatioThe probability that a word will be found in the cacheDepends upon the program and the size and organization of the cache

    h = Number of times required word found in cacheTotal number of references h: hit ratio

  • Access Timeta = Average access timetc = Cache access time(1-h) = miss ratiotm = Memory access time

  • Fetch MechanismsDemand FetchFetch a block from memory when it is needed and is not in the cachePrefetchFetch block/s from memory before they are requestedSelective FetchNot always fetching blocks, dependent on some defined criterion; blocks are stored in MM rather than the cache

  • Data in cache should be replaced with data from MMBlocks (a group of memory addresses) are transferred from MM to cacheCache has a limited capacity (page frame)MMcache

  • Replacement AlgorithmsWhen the word being requested by the CPU is not in the cache, it needs to be transferred from MM. (or it can also be from secondary memory to MM)A page fault occurs when a page or a block is not in the cache (or MM in the case of secondary memory)Replacement algorithms determine which page/block to remove or overwrite

  • CharacteristicsUsage based or Non-usage basedUsage based : the choice of page/block to replace is dependent on the how many times each page/block has been referencedNon-usage based : Use some other criteria for replacement

  • AssumptionsFor a given page size, we only need to consider the page/block number.If we have a reference (hit) to a page p, then any immediately succeeding references to p does not cause a page fault The size of memory/cache is represented as the number of pages it is capable of holding (page frame )

  • ExampleConsider the following address sequence calls:01100432 0101061201020103010401010611010201030302

    which, at 100 bytes per page, can be reduced to the following access string:14161613This sequence of page requests is called a reference string.

  • Replacement PoliciesRandom replacement algorithmFirst-in first-out replacementOptimal AlgorithmLeast recently used algorithmLeast Frequently UsedMost Frequently Used

  • Random ReplacementA page is chosen randomly at page fault timeThere is no relationship between the pages or their use.Choice is done by a random number generator.

  • FIFOMemory treated as a queueWhen a page comes in, it is inserted at the tailWhen a page is removed, the entry at the head of the queue gets deletedEasy to understand and programPerformance is not consistently good; dependent on reference string

  • FIFO ExampleConsider the following reference string:701203042

    With a page frame of 3



    An * indicates a miss (the page requested by the CPU is not in the cache or in MM)

  • FIFO Example #2Consider the following reference string:1 2 3 4 1 2 5 1 2 3 4 5

    With a page frame of 3* * * * * * * * * 1 2 3 4 1 2 5 5 5 3 4 4 1 2 3 4 1 2 2 2 5 3 3 1 2 3 4 1 1 1 2 5 5

    We have 9 page faultsTry performing this FIFO with a page frame of 4

  • Beladys AnomalyAn increase in page frame does not necessarily mean a decrease in page faultsMore formally, Beladys anomaly reflects the fact that, for some page-replacement algorithms, the page fault rate may increase as the number of allocated frames increases

  • Optimal AlgorithmThe page that will not be used for the longest period of time is replacedGuarantees the lowest page fault rate for a fixed number of framesDifficult to implement because it requires future knowledge of the reference string

  • Optimal Algorithm ExampleConsider the following reference string:701203042With a page frame of 3

    We look ahead and see that 7 is the page which will not be used again, so we replace 7; we also note that after our first hit we should not replace 0 immediately, but rather 1 because 1 will not be referenced any more (2 will be referenced last.)


  • Least Recently Used Approximates the optimal algorithmReplaces the page that has not been used for the longest period of timeWhen all page frames have been used up and every time there is a page hit, the referenced page is placed at the tail to indicate it has been recently accessed

  • LRU ExampleConsider the following reference string:7 0 1 2 0 3 0 4 0 3 0 2

    With a page frame of 3* * * * * * *7 0 1 2 0 3 0 4 0 3 0 2 7 0 1 2 0 3 0 4 0 3 0 7 0 1 2 2 3 3 4 4 3

    We have 7 page faultsTry performing this LRU with a page frame of 4

  • Least Frequently UsedCounts the number of references made to each page; when page is accessed, counter is incremented by onePage with smallest count is replacedFIFO is used to resolve a tieRationale: Page with the bigger counter is an actively used pageProblemPage initially actively may never be used againSolved by using a decaying counter

  • LFU ExampleConsider the following reference string:7 0 1 2 0 3 0 4 0 3 0 2

    With a page frame of 3* * * * * * *71 01 11 21 21 31 31 41 41 41 41 21 71 01 11 11 21 21 31 31 32 32 32 71 01 02 02 03 03 04 04 05 05

    We have 7 page faultsTry performing this LFU with a page frame of 4

  • Most Frequently UsedOpposite of LFUReplace page with the highest countTie is resolved using FIFOBased on the argument that the page with smallest count has just been probably brought in and is yet to be usedBoth LFU and MFU are not common and implementation is expensive.

  • The Central Processing UnitThe operating hub and heart of every computer systemComposed ofControl UnitDatapathEach component inside the CPU has a specific role in executing a commandCommunicates with other components of the system

  • Control Unit (CU)Regulates all activities inside the machineServes as nerve center that sends control signals to other units and senses their statusConnected to all components in the CPU as well as main memory

  • How The CU Is ConnectedControlUnitRegistersALUCPUMain Memory

  • Inside the CPU: The Datapath

  • RegistersComponents used for data storage (can be read from or written to)High speed memory locations used to store important information during CPU operationsTwo typesSpecialGeneral-purpose

  • Special RegistersRegisters used for specific purposesUsed heavily during execution of CPU instructions

  • General Purpose RegistersCPU registers used as scratch pad during execution of machine-level instructionsNumber varies between processors

  • Arithmetic Logic Unit (ALU)Performs all mathematical and logical operations within the CPUOperands not in the CPU would have to be retrieved from main memory

  • CPU-Memory CoordinationBus - a group of wires that connect separate componentsTypes of bus:Control bus (control signals)Address bus (address information)Data bus (instruction/data)

  • CPU-Memory CoordinationThe different busses facilitate communication between the CPU and main memoryActions of the two components are highly-synchronized to ensure efficient and timely execution of instructions

  • CPU OperationsInstructions do not reside in the CPU, they have to be fetched from memoryEach machine level instruction is broken down into a logical sequence of smaller steps

  • CPU OperationsInstructions are carried out by performing one or more of the following functions in some pre-specified sequenceRetrieving data from main memoryPutting data to main memoryRegister data transferALU operation

  • How An Instruction is ProcessedInstruction is retrieved from memoryAnalyze what the instruction is and how to execute itOperands/parameters (if any) are fetched from main memoryInstruction is executedResults are stored (CPU or MM)Prepare for next instruction

  • Instruction Processing ExampleFetch instruction from memoryDecode it (turns out to be an ADD)Get the two numbers to add from MMPerform the additionWhere will it be stored?Prepare for next instruction

  • Processing Data in ClustersInformation is organized into groups of fixed-size data that can be stored and retrieved in a single, basic operationEach group of n bits is referred to as a word of informationAccess to each word requires a distinct name (location/address)Can also refer to a characteristic of other components (i.e. size of bus)

  • Word LengthSize of a word specified in bits known as word lengthPossible benefits of a large word length:Faster processing (more data and/or instructions can be fetched at a time)Greater numeric precisionMore powerful instructions (e.g. instructions can have more operands)

  • Machine LanguageComposed ofInstructionData (instruction parameters)Instructions and data are represented by a stream of 1s and 0sCumbersome to deal with when preparing programs; programmers use hexadecimal numbersIn some computers, both instruction and data are stored in a single memory location

  • Assembly: An Improvement on Machine LanguageSymbols, called mnemonics, are used to represent instructionsSample instruction Advantage:Easier recall of instructionsDisadvantage:Need to convert mnemonics and hexadecimal numbers back to binaryadd(105),(8)instructionInstruction parameters

  • How Programs Are Loaded Into Main MemoryPrograms are loaded as binary numbersAssumptions:An instruction is represented by a 2-digit hexadecimal number (e.g. add by 1A, mov by A0)World length of instruction parameters: 3 hex digits (12 bits)InstructionInstruction parameters


  • Executing Multiple ProgramsPrograms share processor timeTime slicingSupported by modern CPUs

    There is a strong possibility that hexadecimal numbers in relation to binary numbers may have to be reviewed here.