the memory hierarchy jehan-françois pâris [email protected]
TRANSCRIPT
![Page 2: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/2.jpg)
Chapter Organization• Technology overview• Caches
– Cache associativity, write through andwrite back, …
• Virtual memory– Page table organization, the translation lookaside
buffer (TLB), page fault handling, memory protection
• Virtual machines• Cache consistency
![Page 3: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/3.jpg)
TECHNOLOGY OVERVIEW
![Page 4: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/4.jpg)
Dynamic RAM
• Standard solution for main memory since 70's– Replaced magnetic core memory
• Bits represented stored on capacitors– Charged state represents a one
• Capacitors discharge – Must be dynamically refreshed– Achieved by accessing each cell several
thousand times each second
![Page 5: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/5.jpg)
Dynamic RAM
Row select
ColumnSelect
nMOS transistor
Capacitor
Ground
![Page 6: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/6.jpg)
The role of the nMOS transistorRow select (gate)
ColumnSelect(source)
drain
• When the gate is positive with respect to the ground, electrons are attracted to the gate (the "field effect")and current can go through
• Normally, no current can go from the source to the drain
Not on the exam
![Page 7: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/7.jpg)
Magnetic disks
Platter
R/W headArm
Servo
![Page 8: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/8.jpg)
Magnetic disk (I)
• Data are stored into circular tracks• Tracks are partitioned into a variable number of
fixed-size sectors• If disk drive has more than one platter, all tracks
corresponding to the same position of the R/ W head form a cylinder
![Page 9: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/9.jpg)
Magnetic disk (II)
• Disk spins at a speed varying between– 5,400 rpm (laptops) and– 15,000 rpm (Seagate Cheetah X15, …)– Accessing data requires
• Positioning the head on the right track: –Seek time
• Waiting for the data to reach the R/W head–On the average half a rotation
![Page 10: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/10.jpg)
Disk access times
• Dominated by seek time and rotational delay
• We try to reduce seek times by placing all data that are likely to be accessed together on nearby tracks or same cylinder
• Cannot do as much for rotational delay– On the average half a rotation
![Page 11: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/11.jpg)
Average rotational delay
RPM Delay
(ms)
5400 5.67200 4.2
10,000 3.015,000 2.0
![Page 12: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/12.jpg)
Overall performance
• Disk access times are still dominated by rotational latency– Were 8-10 ms in the late 70's when rotational
speeds were 3,000 to 3,600 RPM• Disk capacities and maximum transfer rates have
done much better – Pack many more tracks per platter– Pack many more bits per track
![Page 13: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/13.jpg)
The internal disk controller
• Printed circuit board attached to disk drive– As powerful as the CPU of a personal
computer of the early 80's• Functions include
– Speed buffering– Disk scheduling– …
![Page 14: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/14.jpg)
Reliability issues
• Disk drives have more reliability issues than most other computer components– Moving parts eventually wear– Infant mortality– Would be too costly to produce
perfect magnetic surfaces• Disks have bad blocks
![Page 15: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/15.jpg)
Disk failure rates
• Failure rates follow a bathtub curve– High infantile mortality – Low failure rate during useful life– Higher failure rates as disks wear out
![Page 16: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/16.jpg)
Disk failure rates (II)
Failurerate
Time
Infantilemortality
Useful life
Wearout
![Page 17: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/17.jpg)
Disk failure rates (III)
• Infant mortality effect can last for months for disk drives
• Cheap ATA disk drives seem to age less gracefully than SCSI drives
![Page 18: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/18.jpg)
MTTF
• Disk manufacturers advertise very highMean Times To Fail (MTTF) for their products– 500,000 to 1,000,000 hours, that is,
57 to 114 years• Does not mean that disk will last that long!• Means that disks will fail at an average rate of one
failure per 500,000 to 100,000 hours duringtheir useful life
![Page 19: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/19.jpg)
More MTTF Issues (I)
• Manufacturers' claims are not supported by solid experimental evidence
• Obtained by submitting disks to a stress test at high temperature and extrapolating results to ideal conditions– Procedure raises many issues
![Page 20: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/20.jpg)
More MTTF Issues (II)
• Failure rates observed in the field aremuch higher– Can go up to 8 to 9 percent per year
• Corresponding MTTFs are 11 to 12.5 years
• If we have 100 disks and a MTTF of 12.5 years, we can expect an average of 8 disk failures per year
![Page 21: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/21.jpg)
Bad blocks (I)
• Also known as– Irrecoverable read errors– Latent sector errors
• Can be caused by– Defects in magnetic substrate– Problems during last write
![Page 22: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/22.jpg)
Bad blocks (II)
• Disk controller uses redundant encoding that can detect and correct many errors
• When internal disk controller detects a bad block– Marks it as unusable– Remaps logical block address of bad block to
spare sectors• Each disk is extensively tested during
burn in period before being released
![Page 23: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/23.jpg)
The memory hierarchy (I)
Level Device Access Time
1 Fastest registers(2 GHz CPU)
0.5 ns
2 Main memory 10-60 ns
3 Secondary storage (disk) 7 ms
4 Mass storage(CD-ROM library)
a few s
![Page 24: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/24.jpg)
The memory hierarchy (II)
• To make sense of these numbers, let us consider an analogy
![Page 25: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/25.jpg)
Writing a paper (I)
Level Resource Access Time
1 Open book on desk 1 s
2 Book on desk3 Book in library
4 Book far away
![Page 26: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/26.jpg)
Writing a paper (II)
Level Resource Access Time
1 Open book on desk 1 s
2 Book on desk 20-120 s
3 Book in library
4 Book far away
![Page 27: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/27.jpg)
Writing a paper (III)
Level Resource Access Time
1 Open book on desk 1 s
2 Book on desk 20-140 s
3 Book in library 162 days
4 Book far away
![Page 28: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/28.jpg)
Writing a paper (IV)
Level Resource Access Time
1 Open book on desk 1 s
2 Book on desk 20-140 s
3 Book in library 162 days
4 Book far away 63 years
![Page 29: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/29.jpg)
Major issues
• Huge gaps between– CPU speeds and SDRAM access times– SDRAM access times and disk access times
• Both problems have very different solutions– Gap between CPU speeds and SDRAM
access times handled by hardware– Gap between SDRAM access times and disk
access times handled by combination of software and hardware
![Page 30: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/30.jpg)
Why?
• Having hardware handle an issue– Complicates hardware design– Offers a very fast solution– Standard approach for very frequent actions
• Letting software handle an issue– Cheaper– Has a much higher overhead– Standard approach for less frequent actions
![Page 31: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/31.jpg)
Will the problem go away?
• It will become worse– RAM access times are not improving as fast
as CPU power– Disk access times are limited by rotational
speed of disk drive
![Page 32: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/32.jpg)
What are the solutions?
• To bridge the CPU/DRAM gap:– Interposing between the CPU and the DRAM
smaller, faster memories that cache the data that the CPU currently needs• Cache memories• Managed by the hardware and invisible to
the software (OS included)
![Page 33: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/33.jpg)
What are the solutions?
• To bridge the DRAM/disk drive gap:• Storing in main memory the data blocks that are
currently accessed (I/O buffer)• Managing memory space and disk space as a
single resource (Virtual memory)• I/O buffer and virtual memory are managed by
the OS and invisible to the user processes
![Page 34: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/34.jpg)
Why do these solutions work?
• Locality principle:– Spatial locality:
at any time a process only accesses asmall portion of its address space
– Temporal locality:this subset does not change too frequently
![Page 35: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/35.jpg)
Can we think of examples?
• The way we write programs• The way we act in everyday life
– …
![Page 36: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/36.jpg)
CACHING
![Page 37: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/37.jpg)
The technology
• Caches use faster static RAM (SRAM)– Similar organization as that of D flipflops
• Can have– Separate caches for instructions and data
• Great for pipelining– A unified cache
![Page 38: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/38.jpg)
A little story (I)
• Consider a closed-stack library– Customers bring book requests to circulation desk– Librarians go to stack to fetch requested book
• Solution is used in national libraries– Costlier than open-stack approach– Much better control of assets
![Page 39: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/39.jpg)
A little story (II)
• Librarians have noted that some books get asked again and again– Want to put them closer to the circulation desk
• Would result in much faster service• The problem is how to locate these books
– They will not be at the right location!
![Page 40: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/40.jpg)
A little story (III)
• Librarians come with a great solution– They put behind the circulation desk shelves with
100 book slots numbered from 00 to 99– Each slot is a home for the most recently
requested book that has a call number whose last two digits match the slot number • 3141593 can only go in slot 93• 1234567 can only go in slot 67
![Page 41: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/41.jpg)
A little story (IV)
The call number of the book I need is 3141593
Let me see if it's in bin 93
![Page 42: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/42.jpg)
A little story (V)
• To let the librarian do her job each slot much contain either– Nothing or– A book and its reference number
• There are many books whose reference number ends in 93 or any two given digits
![Page 43: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/43.jpg)
A little story (VI)
Could I get this time the book whose call number 4444493?
Sure
![Page 44: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/44.jpg)
A little story (VII)
• This time the librarian will– Go bin 93– Find it contains a book with a different call
number• She will
– Bring back that book to the stacks– Fetch the new book
![Page 45: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/45.jpg)
Basic principles
• Assume we want to store in a faster memory 2n words that are currently accessed by the CPU– Can be instructions or data or even both
• When the CPU will need to fetch an instruction or load a word into a register– It will look first into the cache– Can have a hit or a miss
![Page 46: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/46.jpg)
Cache hits
• Occur when the requested word is found in the cache– Cache avoided a memory access– CPU can proceed
![Page 47: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/47.jpg)
Cache misses
• Occur when the requested word is not found in the cache– Will need to access the main memory– Will bring the new word into the cache
• Must make space for it by expelling one of the cache entries
–Need to decide which one
![Page 48: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/48.jpg)
Handling writes (I)
• When CPU has to store the contents of a register into main memory– Write will update the cache
• If the modified word is already in the cache– Everything is fine
• Otherwise– Must make space for it by expelling one of the
cache entries
![Page 49: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/49.jpg)
Handling writes (II)
• Two ways to handle writes– Write through:
• Each write updates both the cache and the main memory
– Write back:• Writes are not propagated to the main
memory until the updated word is expelled from the cache
![Page 50: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/50.jpg)
Handling writes (II)
• Write through • Write back
CPU
Cache
RAM
CPU
Cache
RAM
later
![Page 51: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/51.jpg)
Pros and cons
• Write through:– Ensures that memory is always up to date
• Expelled cache entries can be overwritten
• Write back:– Faster writes– Complicates cache expulsion procedure
• Must write back cache entries that have been modified in the cache
![Page 52: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/52.jpg)
Picking the right solution
• Caches use write through:– Provides simpler cache expulsions– Can minimize write-through overhead with
additional circuitry• I/O Buffers and virtual memory use
write back:– Write-through overhead would be too high
![Page 53: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/53.jpg)
A better write through (I)
• Add a small buffer to speed up write performance of write-through caches– At least four words
• Holds modified data until they are written into main memory – Cache can proceed as soon as data are
written into the write buffer
![Page 54: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/54.jpg)
Write buffer
A better write through (II)
• Write through • Better write through
CPU
Cache
RAM
CPU
Cache
RAM
![Page 55: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/55.jpg)
A very basic cache
• Has 2n entries• Each entry contains
– A word (4 bytes)– Its RAM address
• Sole way to identify the word– A bit indicating whether the cache entry
contains something useful
![Page 56: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/56.jpg)
A very basic cache (I)
RAM Address WordRAM Address WordRAM Address WordRAM Address WordRAM Address WordRAM Address WordRAM Address WordRAM Address Word
Actualcachesaremuchbigger
RAM Address WordRAM Address WordRAM Address WordRAM Address Word
RAM Address WordRAM Address WordRAM Address WordRAM Address Word
Tag ContentsValidY/NY/NY/NY/NY/NY/NY/NY/N
![Page 57: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/57.jpg)
A very basic cache (II)
000001010011100101110111
RAM Address WordRAM Address WordRAM Address WordRAM Address Word
RAM Address WordRAM Address WordRAM Address WordRAM Address Word
Tag ContentsValidY/NY/NY/NY/NY/NY/NY/NY/N
![Page 58: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/58.jpg)
Comments (I)
• The cache organization we have presented is nothing but the hardware implementation of a hash table
• Each entry has– a key: the word address– a value: word contents plus valid bit
![Page 59: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/59.jpg)
Comments (II)
• The hash function is h(k) = (k/4) mod N
where k is the key and N is the cache size– Can be computed very fast
• Unlike conventional hash tables, this organization has no provision for handling collisions– Use expulsion to resolve collisions
![Page 60: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/60.jpg)
Managing the cache
• Each word fetched into cache can occupy a single cache location– Specified by n+1 to 2 bits of its address
• Two words with the same n+1 to 2 bitscannot be at the same time in the cache– Happens whenever the addresses of the two
words differ by K 2n+2
![Page 61: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/61.jpg)
Example
• Assume cache can contain 8 words• If word 48 is in the cache it will be stored at
cache index (48/4) mod 8 = 12 mod 8 = 4• In our case 2n+2 = 23+2 = 32• The only possible cache index for word 80 would
be (80/4) mod 8 = 20 mod 8 = 4• Same for words 112, 144, 176, …
![Page 62: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/62.jpg)
Managing the cache
• Each word fetched into cache can occupy a single cache location– Specified by n+1 to 2 bits of its address
• Two words with the same n+1 to 2 bitscannot be at the same time in the cache– Happens whenever the addresses of the two
words differ by K 2n+2
![Page 63: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/63.jpg)
Saving cache space
• We do not need to store whole address of each word in cache– Bits 1 and 0 will always be zero– Bits n + 1 to 2 can be inferred from the
cache index• If cache has 8 entries, bits 4 to 2
– Will only store in tag the remaining bits of address
![Page 64: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/64.jpg)
A very basic cache (III)
Cache uses bits 4 to 2ofwordaddress
000001010011100101110111
Bits 31:5 WordBits 31:5 WordBits 31:5 WordBits 31:5 Word
Bits 31:5 WordBits 31:5 WordBits 31:5 WordBits 31:5 Word
Tag ContentsValidY/NY/NY/NY/NY/NY/NY/NY/N
![Page 65: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/65.jpg)
Storing a new word in the cache
• Location of new word entry will be obtained from LSB of word address – Discard 2 LSB
• Always zero for a well-aligned word– Remove n next LSB for a cache of size 2n
• Given by cache index
MSB of word address 00n next LSB
![Page 66: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/66.jpg)
Accessing a word in the cache (I)
• Start with word address
• Remove two least significant bit• Always zero
Word address
Word address minus two LSB
![Page 67: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/67.jpg)
Accessing a word in the cache (II)
• Split remainder of address into
– n least significant bits• Word address in the cache
– Cache tag
n LSB
Word address minus two LSB
Cache Tag
![Page 68: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/68.jpg)
Towards a better cache
• Our cache takes into account temporal locality of accesses– Repeated accesses to the same location
• But not their spatial locality– Accesses to neighboring locations
• Cache space is poorly used– Need 26 + 1 bits of overhead to store
32 bits of data
![Page 69: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/69.jpg)
Multiword cache (I)
• Each cache entry will contain a block of 2, 4 , 8, … words with consecutive addresses– Will require words to be well aligned
• Pair of words should start at an address that is multiple of 2×4 = 8
• Group of four words should start at an address that is multiple of 4×4 = 16
![Page 70: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/70.jpg)
Multiword cache (II)
WordWord
WordWord
WordWord
WordWord
000001010011
000001
100101
010011
000001
110111
100101
010011
000001
ValidY/NY/NY/NY/NY/NY/NY/NY/N
110111
100101
010011
000001
ValidY/NY/NY/NY/NY/NY/NY/NY/N
110111
100101
010011
000001
TagValidY/NY/NY/NY/NY/NY/NY/NY/N
110111
100101
010011
000001
110111
100101
010011
000001
Contents
100101
010011
WordWord
WordWord
Bits 31:6Bits 31:6Bits 31:6Bits 31:6Bits 31:6Bits 31:6Bits 31:6Bits 31:6
WordWord
WordWord
WordWord
WordWord
![Page 71: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/71.jpg)
Multiword cache (III)
• Has 2n entries each containing 2m words• Each entry contains
– 2m words – A tag– A bit indicating whether the cache entry
contains useful data
![Page 72: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/72.jpg)
Storing a new word in the cache
• Location of new word entry will be obtained from LSB of word address – Discard 2 + m LSB
• Always zero for a well-aligned group of words
– Take n next LSB for a cache of size 2n
MSB of address 2 +m LSBn next LSB
![Page 73: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/73.jpg)
Example
• Assume– Cache can contain 8 entries– Each block contains 2 words
• Words 48 and 52 belong to the same block– If word 48 is in the cache it will be stored at cache
index (48 /8) mod 8 = 6 mod 8 = 6– If word 48 is in the cache it will be stored at cache
index (49 /8) mod 8 = 6 mod 8 = 6
![Page 74: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/74.jpg)
Selecting the right block size
• Larger block sizes improve the performance of the cache– Allows us to exploit spatial locality
• Three limitations– Spatial locality effect less pronounced if block
size exceeds 128 bytes– Too many collisions in very small caches– Large blocks take more time to be fetched
into the cache
![Page 75: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/75.jpg)
1 KB
8 KB
16 KB
64 KB
256 KB
256
40%
35%
30%
25%
20%
15%
10%
5%
0%
Mis
s ra
te
64164
Block size (bytes)
![Page 76: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/76.jpg)
Collision effect in small cache
• Consider a 4KB cache– If block size is 16 B, that is, 4 words,
cache will have 256 blocks– …– If block size is 128 B, that is 32 words,
cache will have 32 blocks• Too many collisions
![Page 77: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/77.jpg)
Problem
• Consider a very small cache with 8 entries and a block size of 8 bytes (2 words)– Which words will be fetched in the cache
when the CPU accesses words at address 32, 48, 60 and 80?
– How will these words will be stored in the cache?
![Page 78: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/78.jpg)
Solution (I)
• Since block size is 8 bytes– 3 LSB of address used to address one of the
8 bytes in a block• Since cache holds 8 blocks,
– Next 3 LSB of address used by the cache index
• As a result, tag has 32 – 3 – 3 =26 bits
![Page 79: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/79.jpg)
Solution (I)
• Consider words at address 32
• Cache index is (32/23) mod 23 = (32/8) mod 8 = 4• Block tag is 32/26 = 32/64 = 0
Row 4 Tag=0 32 33 34 35 36 37 38 39
![Page 80: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/80.jpg)
Solution (II)
• Consider words at address 48• Cache index is (48/8) mod 8 =6• Block tag is 48/64 = 0
Row 6 Tag=0 48 49 50 51 52 53 54 55
![Page 81: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/81.jpg)
Solution (III)
• Consider words at address 60• Cache index is (60/8) mod 8 =7• Block tag is 60/64 = 0
Row 6 Tag=0 56 57 58 59 60 61 62 63
![Page 82: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/82.jpg)
Solution (IV)
• Consider words at address 80• Cache index is (80/8) mod 8 = 10 mod 8 = 2• Block tag is 80/64 = 1
Row 2 Tag=1 80 81 82 83 84 85 86 67
![Page 83: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/83.jpg)
Set-associative caches (I)
• Can be seen as 2, 4, 8 caches attached together• Reduces collisions
![Page 84: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/84.jpg)
Set-associative caches (II)
000001010011100101110111
Bits 31:5 BlockBits 31:5 BlockBits 31:5 BlockBits 31:5 Block
Bits 31:5 BlockBits 31:5 BlockBits 31:5 BlockBits 31:5 Block
Tag ContentsValidY/NY/NY/NY/NY/NY/NY/NY/N
000001010011100101110111
Bits 31:5 BlockBits 31:5 BlockBits 31:5 BlockBits 31:5 Block
Bits 31:5 BlockBits 31:5 BlockBits 31:5 BlockBits 31:5 Block
Tag ContentsValidY/NY/NY/NY/NY/NY/NY/NY/N
![Page 85: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/85.jpg)
Set-associative caches (III)
• Advantage:– We take care of more collisions
• Like a hash table with a fixed bucket size– Results in lower miss rates than direct-
mapped caches• Disadvantage:
– Slower access– Best solution if miss penalty is very big
![Page 86: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/86.jpg)
Fully associative caches
• The dream!• A block can occupy any index position in the
cache• Requires an associative memory
– Content-addressable– Like our brain!
• Remain a dream
![Page 87: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/87.jpg)
Designing RAM to support caches
• RAM connected to CPU through a "bus"– Clock rate much slower than CPU clock rate
• Assume that a RAM access takes– 1 bus clock cycle to send the address– 15 bus clock cycle to initiate a read– 1 bus clock cycle to send a word of data
![Page 88: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/88.jpg)
Designing RAM to support caches
• Assume– Cache block size is 4 words– One-word bank of DRAM
• Fetching a cache block would take1 + 4×15 + 4×1 = 65 bus clock cycles
– Transfer rate is 0.25 byte/bus cycle• Awful!
![Page 89: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/89.jpg)
Designing RAM to support caches
• Could – Double bus width (from 32 to 64 bits)– Have a two-word bank of DRAM
• Fetching a cache block would take1 + 2×15 + 2×1 = 33 bus clock cycles
– Transfer rate is 0.48 byte/bus cycle• Much better
• Costly solution
![Page 90: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/90.jpg)
Designing RAM to support caches
• Could – Have an interleaved memory organization– Four one-word banks of DRAM– A 32-bit bus
32 bits
RAMbank 1
RAMbank 0
RAMbank 2
RAMbank 3
![Page 91: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/91.jpg)
Designing RAM to support caches
• Can do the 4 accesses in parallel• Must still transmit the block 32 bits by 32 bits• Fetching a cache block would take
1 + 15 + 4×1 = 20 bus clock cycles– Transfer rate is 0.80 word/bus cycle
• Even better• Much cheaper than having a 64-bit bus
![Page 92: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/92.jpg)
ANALYZING CACHE PERFORMANCE
![Page 93: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/93.jpg)
Memory stalls
• Can divide CPU time into– NEXEC clock cycles spent executing
instructions – NMEM_STALLS cycles spent waiting for memory
accesses• We have
CPU time = (NEXEC + NMEM_STALLS)×TCYCLE
![Page 94: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/94.jpg)
Memory stalls
• We assume that– cache access times can be neglected– most CPU cycles spent waiting for memory
accesses are caused by cache misses• Distinguishing between read stalls and write
stalls
NMEM_STALLS = NRD_STALLS + NWR_STALLS
![Page 95: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/95.jpg)
Read stalls
• Fairly simple
NRD_STALLS = NMEM_RD×Read miss rate×
Read miss penalty
![Page 96: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/96.jpg)
Write stalls (I)
• Two causes of delays– Must fetch missing blocks before updating them
• We update at most 8 bytes of the block!– Must take into account cost of write through
• Buffering delay depends of proximity of writes not number of cache misses
–Writes too close to each other
![Page 97: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/97.jpg)
Write stalls (II)
• We have
NWR_STALLS =NWRITES×Write miss rate×
Write miss penalty + NWR_BUFFER_STALLS
• In practice, very few buffer stalls if the buffer contains at least four words
![Page 98: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/98.jpg)
Global impact
• We have
NMEM_STALLS = NMEM_ACCESSES×Cache miss rate×
Cache miss penalty • and also
NMEM_STALLS = NINSTRUCTIONS×(NMISSES/Instruction)×
Cache miss penalty
![Page 99: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/99.jpg)
Example
• Miss rate of instruction cache is 2 percentMiss rate of data cache is 4 percentIn the absence of memory stalls, each instruction would take 2 cyclesMiss penalty is 100 cycles36 percent of instructions access the main memory
• How many cycles are lost due to cache misses?
![Page 100: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/100.jpg)
Solution (I)
• Impact of instruction cache misses0.02×100 =2 cycles/instruction
• Impact of data cache misses0.36×0.04×100 =1.44 cycles/instruction
• Total impact of cache misses2 + 1.44 = 3.44 cycles/instruction
![Page 101: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/101.jpg)
Solution (II)
• Average number of cycles per instruction2 + 3.44 = 5.44 cycles/instruction
• Fraction of time wasted 3.44 /5.44 = 63 percent
![Page 102: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/102.jpg)
Problem
• Redo the example with the following data– Miss rate of instruction cache is 3 percent
Miss rate of data cache is 5 percentIn the absence of memory stalls, each instruction would take 2 cyclesMiss penalty is 100 cycles40 percent of instructions access the main memory
![Page 103: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/103.jpg)
Solution
• The fraction of time wasted to memory stalls is 71 percent
![Page 104: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/104.jpg)
Average memory access time
• Some authors call it AMATTAVERAGE = TCACHE + f×TMISS
where f is the cache miss rate• Times can be expressed
– In nanoseconds– In number of cycles
![Page 105: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/105.jpg)
Example
• A cache has a hit rate of 96 percent• Accessing data
– In the cache requires one cycle– In the memory requires 100 cycles
• What is the average memory access time?
![Page 106: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/106.jpg)
Solution
• Miss rate = 1 – Hit rate = 0.04• Applying the formula
TAVERAGE = 1 + 0.04×100 = 5 cycles
Corrected
![Page 107: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/107.jpg)
Impact of a better hit rate
• What would be the impact of improving the hit rate of the cache from 96 to 98 percent?
![Page 108: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/108.jpg)
Solution
• New miss rate = 1 – New hit rate = 0.02• Applying the formula
TAVERAGE = 1 + 0.02×100 = 3 cycles
When the hit rate is above 80 percent small improvements in the hit rate willresult in much better miss rate
![Page 109: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/109.jpg)
Examples
• Old hit rate: 80 percentNew hit rate: 90 percent– Miss rates goes from 20 to 10 percent!
• Old hit rate: 94 percentNew hit rate: 98 percent– Miss rates goes from 6 to 2 percent!
![Page 110: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/110.jpg)
In other words
It's the miss rate, stupid!
![Page 111: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/111.jpg)
Improving cache hit rate
• Two complementary techniques– Using set-associative caches
• Must check tags of all blocks with the same index values
–Slower • Have fewer collisions
–Fewer misses– Use a cache hierarchy
![Page 112: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/112.jpg)
A cache hierarchy (I)
CPU
L1
L2
L3
RAM
L1 misses
L2 misses
L3 misses
![Page 113: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/113.jpg)
A cache hierarchy
• Topmost cache– Optimized for speed, not miss rate– Rather small– Uses a small block size
• As we go down the hierarchy– Cache sizes increase– Block sizes increase– Cache associativity level increases
![Page 114: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/114.jpg)
Example
• Cache miss rate per instruction is 2 percentIn the absence of memory stalls, each instruction would take one cycleCache miss penalty is 100 nsClock rate is 4GHz
• How many cycles are lost due to cache misses?
![Page 115: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/115.jpg)
Solution (I)
• Duration of clock cycle1/(4 Ghz) = 0.25×10-9 s = 0.25 ns
• Cache miss penalty100ns = 400 cycles
• Total impact of cache misses0.02×400 = 8 cycles/instruction
![Page 116: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/116.jpg)
Solution (II)
• Average number of cycles per instruction1 + 8 = 9 cycles/instruction
• Fraction of time wasted 8/9 = 89 percent
![Page 117: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/117.jpg)
Example (cont'd)
• How much faster would the processor if we added a L2 cache that – Has a 5 ns access time– Would reduce miss rate to main memory to
0.5 percent?• Will see later how to get that
![Page 118: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/118.jpg)
Solution (I)
• L2 cache access time5ns = 20 cycles
• Impact of cache misses per instructionL1 cache misses + L2 cache misses =
0.02×20+0.005×400 = 0.4 + 2.0 =2.4 cycles/instruction
• Average number of cycles per instruction1 + 2.4 = 3.4 cycles/instruction
![Page 119: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/119.jpg)
Solution (II)
• Fraction of time wasted 2.4/3.4 = 63 percent
• CPU speedup 9/3.4 = 2.6
![Page 120: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/120.jpg)
How to get the 0.005 miss rate
• Wanted miss rate corresponds to a combined cache hit rate of 99.5 percent
• Let H1 be hit rate of L1 cache and H2 be the hit rate of the second cache
• The combined hit rate of the cache hierarchy isH = H1 +(1-H1)H2
![Page 121: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/121.jpg)
How to get the 0.005 miss rate
• We have0.995 = 0.98 +0.02H2
• H2 = (0.995 – 0.98)/0.02 = 0.75– Quite feasible!
![Page 122: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/122.jpg)
Can we do better? (I)
• Keep 98 percent hit rate for L1 cache• Raise hit rate of L2 cache to 85 percent
– L2 cache is now slower: 6ns• Impact of cache misses per instruction
L1 cache misses + L2 cache misses =0.02×24+0.02×0.15×400 = 0.48 + 1.2 =1.68 cycles/instruction
![Page 123: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/123.jpg)
The verdict
• Fraction of time wasted per cycle1.68/2.68 = 63 percent
• CPU speedup 9/2.68 = 3.36
![Page 124: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/124.jpg)
Would a faster L2 cache help?
• Redo the example assuming – Miss rate of L1 cache is till 98 percent– New faster L2 cache
• Access time reduced to 3 ns• Hit rate only 50 percent
![Page 125: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/125.jpg)
The verdict
• Fraction of time wasted 87 percent
• CPU speedup 1.72
New L2 cache with a lower access timebut a higher miss rate performs much worsethan original L2 cache
![Page 126: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/126.jpg)
Cache replacement policy
• Not an issue in direct mapped caches– We have no choice!
• An issue in set-associative caches – Best policy is least recently used (LRU)
• Expels from the cache a block in thesame set as the incoming block
• Pick block that has not been accessed for the longest period of time
![Page 127: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/127.jpg)
Implementing LRU policy
• Easy when each set contains two blocks– We attach to each block a use bit that is
• Set to 1 when the block is accessed• Reset to 0 when the other block is accessed
– We expel block whose use bit is 0• Much more complicated for higher associativity
levels
![Page 128: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/128.jpg)
REALIZATIONS
![Page 129: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/129.jpg)
Caching in a multicore organization
• Multicore organizations often involve multiple chips– Say four chips with four cores per chip
• Have a cache hierarchy on each chip– L1, L2, L3 – Some caches are private, other are shared
• Accessing a cache on a chip is much faster than accessing a cache on another chip
![Page 130: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/130.jpg)
AMD 16-core system (I)
• AMD 16-core system– Sixteen cores on four chips
• Each core has a 64-KB L1 and a 512-KB L2 cache
• Each chip has a 2-MB shared L3 cache
![Page 131: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/131.jpg)
X/Y where X is latency in cyclesY is bandwidth in bytes/cycle
![Page 132: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/132.jpg)
AMD 16-core system (II)
• Observe that access times are non-uniform– Takes more time to access L1 or L2 cache of
another core than accessing shared L3 cache– Takes more time to access caches in another
chip than local caches– Access times and bandwidths depend on
chip interconnect topology
![Page 133: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/133.jpg)
VIRTUAL MEMORY
![Page 134: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/134.jpg)
Main objective (I)
• To allow programmers to write programs that reside– partially in main memory– partially on disk
![Page 135: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/135.jpg)
Main objective (II)
Main memory Address space (I)
Address space (II)
![Page 136: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/136.jpg)
Motivation
• Most programs do not access their whole address space at the same time
• Compilers go through several phases– Lexical analysis– Preprocessing (C, C++)– Syntactic analysis– Semantic analysis– …
![Page 137: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/137.jpg)
Advantages (I)
– VM allows programmers to write programs that would not otherwise fit in main memory• They will run although much more slowly• Very important in 70's and 80's
– VM allows OS to allocate the main memory much more efficiently• Do not waste precious memory space• Still important today
![Page 138: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/138.jpg)
Advantages
• VM let programmers use– Sparsely populated– Very large address spaces
VMDC S L
![Page 139: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/139.jpg)
Sparsely populated address spaces
• Let programmers put different items apart from each other– Code segment– Data segment– Stack– Shared library– Mapped files
Wait untilyou take 4330 to
study this
![Page 140: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/140.jpg)
Big difference with caching
• Miss penalty is much bigger– Around 5 ms– Assuming a memory access time of 50 ns,
5 ms equals 100,000 memory accesses– For caches, miss penalty was around
100 cycles
![Page 141: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/141.jpg)
Consequences
• Will use much larger block sizes– Blocks, here called pages, measure 4 KB, 8KB,
… with 4 KB an unofficial standard• Will use fully associative mapping to reduce
misses, here called page faults• Will use write back to reduce disk accesses
– Must keep track of modified (dirty) pages in memory
![Page 142: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/142.jpg)
Virtual memory
• Combines two big ideas– Non-contiguous memory allocation:
processes are allocated page frames scattered all over the main memory
– On-demand fetch:Process pages are brought in main memory when they are accessed for the first time
• MMU takes care of almost everything
![Page 143: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/143.jpg)
Main memory
• Divided into fixed-size page frames– Allocation units – Sizes are powers of 2 (512B, …, 4KB, … )– Properly aligned– Numbered 0 , 1, 2, . . .
0 1 2 3 4 5 6 7 8
![Page 144: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/144.jpg)
Program address space
• Divided into fixed-size pages– Same sizes as page frames– Properly aligned– Also numbered 0 , 1, 2, . . .
0 1 2 3 4 5 6 7
![Page 145: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/145.jpg)
The mapping
• Will allocate non contiguous page frames to the pages of a process
0 1 2
3 4 5 6 70 1 2
![Page 146: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/146.jpg)
The mapping
Page Number Frame number
0 01 42 2
![Page 147: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/147.jpg)
The mapping
• Assuming 1KB pages and page frames
Virtual Addresses Physical Addresses
0 to 1,023 0 to 1,023
1,024 to 2,047 4,096 to 5,119
2,048 to 3,071 2,048 to 3,071
![Page 148: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/148.jpg)
The mapping
• Observing that 210 = 1000000000 in binary• We will write 0-0 for ten zeroes and 1-1 for ten
ones
Virtual Addresses Physical Addresses
0000-0 to 0001-1 0000-0 to 0001-1
0010-0 to 0011-1 1000-0 to 1001-1
0100-0 to 0101-1 0100-0 to 0101-1
![Page 149: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/149.jpg)
The mapping
• The ten least significant bits of the address do not change
Virtual Addresses Physical Addresses
000 0-0 to 000 1-1 000 0-0 to 000 1-1
001 0-0 to 001 1-1 100 0-0 to 100 1-1
010 0-0 to 010 1-1 010 0-0 to 010 1-1
![Page 150: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/150.jpg)
The mapping
• Must only map page numbers into page frame numbers
Page number Page frame number
000 000
001 100
010 010
![Page 151: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/151.jpg)
The mapping
• Same in decimal
Page number Page frame number
0 0
1 4
2 2
![Page 152: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/152.jpg)
The mapping
• Since page numbers are always in sequence, they are redundant
Page number Page frame number
0 0
1 4
2 2 X
![Page 153: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/153.jpg)
The algorithm
• Assume page size = 2p
• Remove p least significant bits from virtual address to obtain the page number
• Use page number to find corresponding page frame number in page table
• Append p least significant bits from virtual address to page frame number to get physical address
![Page 154: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/154.jpg)
Realization
2 897
897
1
5
7
3
5
Virtual Address
Physical Address
PAGE TABLE
Page No Offset
PageFrameNo
(10 bits)
![Page 155: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/155.jpg)
The offset
• Offset contains all bits that remain unchanged through the address translation process
• Function of page size
Page size Offset
1 KB 10 bits2 KB 11 bits 4KB 12 bits
![Page 156: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/156.jpg)
The page number
• Contains other bits of virtual address• Assuming 32-bit addresses
Page size Offset Page number
1 KB 10 bits 22 bits2 KB 11 bits 21 bits
4KB 12 bits 20 bits
![Page 157: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/157.jpg)
Internal fragmentation
• Each process now occupies an integer number of pages
• Actual process space is not a round number– Last page of a process is rarely full
• On the average, half a page is wasted– Not a big issue– Internal fragmentation
![Page 158: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/158.jpg)
On-demand fetch (I)
• Most processes terminate without having accessed their whole address space– Code handling rare error conditions, . . .
• Other processes go to multiple phases during which they access different parts of their address space– Compilers
![Page 159: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/159.jpg)
On-demand fetch (II)
• VM systems do not fetch whole address space of a process when it is brought into memory
• They fetch individual pages on demand when they get accessed the first time– Page miss or page fault
• When memory is full, they expel from memory pages that are not currently in use
![Page 160: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/160.jpg)
On-demand fetch (III)
• The pages of a process that are not in main memory reside on disk– In the executable file for the program being
run for the pages in the code segment– In a special swap area for the data pages that
were expelled from main memory
![Page 161: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/161.jpg)
On-demand fetch (IV)
Main memory Code Data
Disk Executable
Swap area
![Page 162: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/162.jpg)
On-demand fetch (V)
• When a process tries to access data that are nor present in main memory– MMU hardware detects that the page is
missing and causes an interrupt– Interrupt wakes up page fault handler– Page fault handler puts process in waiting state
and brings missing page in main memory
![Page 163: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/163.jpg)
Advantages
• VM systems use main memory more efficiently than other memory management schemes– Give to each process more or less what it
needs • Process sizes are not limited by the size of main
memory– Greatly simplifies program organization
![Page 164: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/164.jpg)
Sole disadvantage
• Bringing pages from disk is a relatively slow operation– Takes milliseconds while memory access take
nanoseconds• Ten thousand times to hundred thousand
times slower
![Page 165: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/165.jpg)
The cost of a page fault
• Let– Tm be the main memory access time
– Td the disk access time– f the page fault rate– Ta the average access time of the VM
Ta = (1 – f ) Tm + f (Tm + Td ) = Tm + f Td
![Page 166: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/166.jpg)
Example
• Assume Tm = 50 ns and Td = 5 ms
f Mean memory access time
10-3 = 50 ns + 5 ms/103 = 5,050 ns
10-4 = 50 ns + 5 ms/104 = 550 ns
10-5 = 50 ns + 5 ms/105 = 100 ns
10-6 = 50 ns + 5 ms/ 106 = 55 ns
![Page 167: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/167.jpg)
Conclusion
• Virtual memory works best when page fault rate is less than a page fault per 100,000 instructions
![Page 168: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/168.jpg)
Locality principle (I)
• A process that would access its pages in a totally unpredictable fashion would perform very poorly in a VM system unless all its pages are in main memory
![Page 169: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/169.jpg)
Locality principle (II)
• Process P accesses randomly a very large array consisting of n pages
• If m of these n pages are in main memory, the page fault frequency of the process will be( n – m )/ n
• Must switch to another algorithm
![Page 170: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/170.jpg)
Tuning considerations
• In order to achieve an acceptable performance,a VM system must ensure that each process has in main memory all the pages it is currently referencing
• When this is not the case, the system performance will quickly collapse
![Page 171: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/171.jpg)
First problem
• A virtual memory system has– 32 bit addresses– 8 KB pages
• What are the sizes of the – Page number field?– Offset field?
![Page 172: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/172.jpg)
Solution (I)
• Step 1:Convert page size to power of 2
8 KB = 2----- B
• Step 2:Exponent is length of offset field
![Page 173: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/173.jpg)
Solution (II)
• Step 3:Size of page number field =Address size – Offset size
Here 32 – ____ = _____ bits
• Highlight the text in the box to see the answers
13 bits for the offset and 19 bits for the page number
![Page 174: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/174.jpg)
PAGE TABLE REPRESENTATION
![Page 175: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/175.jpg)
Page table entries
• A page table entry (PTE) contains– A page frame number– Several special bits
• Assuming 32-bit addresses, all fit into four bytes
Page frame number Bits
![Page 176: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/176.jpg)
The special bits (I)
• Valid bit:1 if page is in main memory, 0 otherwise
• Missing bit:1 if page is in not main memory, 0 otherwise
• Serve the same functionUse different conventions
![Page 177: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/177.jpg)
The special bits (II)
• Dirty bit:1 if page has been modified since it was brought into main memory,0 otherwise– A dirty page must be saved in the process
swap area on disk before being expelled from main memory
– A clean page can be immediately expelled
![Page 178: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/178.jpg)
The special bits (III)
• Page-referenced bit:1 if page has been recently modified,0 otherwise– Often simulated in software
![Page 179: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/179.jpg)
Where to store page tables
• Use a three-level approach• Store parts of page table
– In high speed registers located in the MMU:the translation lookaside buffer (TLB)(good solution)
– In main memory (bad solution)– On disk (ugly solution)
![Page 180: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/180.jpg)
The translation look aside buffer
• Small high-speed memory– Contains fixed number of PTEs– Content-addressable memory
• Entries include page frame number and page number
Page frame number BitsPage number
![Page 181: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/181.jpg)
Realizations (I)
• TLB of Intrisity FastMATH– 32-bit addresses– 4 KB pages– Fully associative TLB with 16 entries– Each entry occupies 64 bits
• 20 bits for page number• 20 bits for page frame number• Valid bit, dirty bit, …
![Page 182: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/182.jpg)
Realizations (II)
• TLB of ULTRA SPARC III– 64-bit addresses
• Maximum program size is 244 bytes, that is,16 TB
– Supported page sizes are 4 KB, 16KB, 64 KB, 4MB ("superpages")
![Page 183: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/183.jpg)
Realizations (III)
• TLB of ULTRA SPARC III– Dual direct-mapping (?) TLB
• 64 entries for code pages• 64 entries for data pages
– Each entry occupies 64 bits• Page number and page frame number• Context• Valid bit, dirty bit, …
![Page 184: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/184.jpg)
The context (I)
• Conventional TLBs contain the PTE's for a specific address space – Must be flushed each time the OS switches
from the current process to a new process• Frequent action in any modern OS
– Introduces a significant time penalty
![Page 185: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/185.jpg)
The context (II)
• UltraSPARC III architecture adds to TLB entries a context identifying a specific address space– Page mappings from different address
spaces can coexist in the TLB– A TLB hit now requires a match for both
page number and context– Eliminates the need to flush the TLB
![Page 186: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/186.jpg)
TLB misses
• When a PTE cannot be found in the TLB, a TLB miss is said to occur
• TLB misses can be handled– By the computer firmware:
• Cost of miss is one extra memory access– By the OS kernel:
• Cost of miss is two context switches
![Page 187: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/187.jpg)
Letting SW handle TLB misses
• As in other exceptions, must save current value of PC in EPC register
• Must also assert the exception by the end of the clock cycle during which the memory access occurs– In MIPS, must prevent WB cycle to occur after
MEM cycle that generated the exception
![Page 188: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/188.jpg)
Example
• Consider the instructionlw $1, 0($2)
– If word at address $2 is not in the TLB,we must prevent any update of $1
![Page 189: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/189.jpg)
Performance implications
• When TLB misses are handled by the firmware, they are very cheap– A TLB hit rate of 99% is very good:
Average access cost will be
Ta = 0.99×Tm + 0.01×2Tm= 1.01Tm
• Less true if TLB misses are handled by the kernel
![Page 190: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/190.jpg)
Storing the rest of the page table
• PTs are too large to be stored in main memory– Will store active part of the PT in main memory– Other entries on disk
• Three solutions– Linear page tables– Multilevel page tables– Hashed page tables
![Page 191: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/191.jpg)
Storing the rest of the page table
• We will review these solutions even though page table organizations are an operating system topic
![Page 192: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/192.jpg)
Linear page tables (I)
• Store PT in virtual memory (VMS solution)• Very large page tables need more than 2 levels
(3 levels on MIPS R3000)
![Page 193: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/193.jpg)
Linear page tables (II)
PhysicalMemory
Virtual MemoryVirtual Memory
PTOtherOther PTs PTs
![Page 194: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/194.jpg)
Linear page tables (III)
• Assuming a page size of 4KB,– Each page of virtual memory requires 4 bytes
of physical memory– Each PT maps 4GB of virtual addresses– A PT will occupy 4MB– Storing these 4MB in virtual memory will
require 4KB of physical memory
![Page 195: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/195.jpg)
Multi-level page tables (I)
• PT is divided into – A master index that always remains in main
memory– Sub indexes that can be expelled
![Page 196: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/196.jpg)
Multi-level page tables (II)
VIRTUAL ADDRESS
PHYSICAL ADDRESS
MASTER INDEX
Offset
Offset
1ary 2ary
SUBINDEX
< Page Number >
Frame No
FrameAddr
(unchanged)
![Page 197: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/197.jpg)
Multi-level page tables (III)
• Especially suited for a page size of 4 KB and 32 bits virtual addresses
• Will allocate– 10 bits of the address for the first level,– 10 bits for the second level, and– 12 bits for the offset.
• Master index and sub indexes will all have 210
entries and occupy 4KB
![Page 198: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/198.jpg)
Hashed page tables (I)
• Only contain paged that are in main memory– PTs are much smaller
• Also known as inverted page tables
![Page 199: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/199.jpg)
Hashed page table (II)
PN hashPN
PFN
PN = page numberPFN = page frame number
![Page 200: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/200.jpg)
Selecting the right page size
• Increasing the page size– Increases the length of the offset– Decreases the length of the page number– Reduces the size of page tables
• Less entries– Increases internal fragmentation
• 4KB seems to be a good choice
![Page 201: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/201.jpg)
MEMORY PROTECTION
![Page 202: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/202.jpg)
Objective
• Unless we have an isolated single-user system, we must prevent users from– Accessing– Deleting– Modifying
the address spaces of other processes, including the kernel
![Page 203: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/203.jpg)
Historical considerations
• Earlier operating systems for personal computers did not have any protection– They were single-user machines– They typically ran one program at a time
• Windows 2000, Windows XP, Vista and MacOS X are protected
![Page 204: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/204.jpg)
Memory protection (I)
• VM ensures that processes cannot access page frames that are not referenced in their page table.
• Can refine control by distinguishing among– Read access– Write access– Execute access
• Must also prevent processes from modifying their own page tables
![Page 205: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/205.jpg)
Dual-mode CPU
• Require a dual-mode CPU• Two CPU modes
– Privileged mode or executive mode that allows CPU to execute all instructions
– User mode that allows CPU to execute only safe unprivileged instructions
• State of CPU is determined by a special bits
![Page 206: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/206.jpg)
Switching between states
• User mode will be the default mode for all programs– Only the kernel can run in supervisor mode
• Switching from user mode to supervisor mode is done through an interrupt – Safe because the jump address is at a well-
defined location in main memory
![Page 207: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/207.jpg)
Memory protection (II)
• Has additional advantages:– Prevents programs from corrupting address
spaces of other programs– Prevents programs from crashing the kernel
• Not true for device drivers which are inside the kernel
• Required part of any multiprogramming system
![Page 208: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/208.jpg)
INTEGRATING CACHES AND VM
![Page 209: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/209.jpg)
The problem
• In a VM system, each byte of memory has two addresses– A virtual address– A physical address
• Should cache tags contain virtual addresses or physical addresses?
![Page 210: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/210.jpg)
Discussion• Using virtual addresses
– Directly available– Bypass TLB– Cache entries specific
to a given address space
– Must flush caches when the OS selects another process
• Using physical addresses– Must access first TLB– Cache entries not
specific to a given address space
– Do not have to flush caches when the OS selects another process
![Page 211: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/211.jpg)
The best solution
• Let the cache use physical addresses– No need to flush the cache at each context
switch– TLB access delay is tolerable
![Page 212: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/212.jpg)
Processing a memory access (I)
• if virtual address in TLB :get physical address
else :
create TLB miss exceptionbreak
…
I use Python because it is very compact:hetland.org/writing/instant-python.html
![Page 213: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/213.jpg)
Processing a memory access (II)
if read_access :while data not in cache :
stalldeliver data to CPU
else : # write_access
… Continues on next page
![Page 214: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/214.jpg)
Processing a memory access (III)if write_access_OK :
while data not in cache :stall
write data into cacheupdate dirty bitput data and address in write buffer
else :
# illegal accesscreate TLB miss exception
![Page 215: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/215.jpg)
More Problems (I)
• A virtual memory system has a virtual address space of 4 Gigabytes and a page size of 4 Kilobytes. Each page table entry occupies 4 bytes.
![Page 216: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/216.jpg)
More Problems (II)
• How many bits are used for the byte offset?
• Since 4K =2___, the byte offset will
use __ bits.
• Highlight text in box to see the answer
Since 4KB= 212bytes, the byte offset uses 12 bits
![Page 217: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/217.jpg)
More Problems (III)
• How many bits are used for the page number?
• Since 4G = 2__ we will have __-bit virtual addresses. Since the byte offset occupies ___ of these __ bits, __ bits are left for the page number.
The page number uses 20 bits of the address
![Page 218: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/218.jpg)
More Problems (IV)
• What is the maximum number of page table entries in a page table?
• Address space/ Page size =
2__ / 2__ =
2 ___ PTE’s.
220 page table entries
![Page 219: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/219.jpg)
More problems (VI)
• A computer has 32 bit addresses and a page size of one kilobyte.
• How many bits are used to represent the page number?
___ bits• What is the maximum number of entries in a
process page table?2___ entries
![Page 220: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/220.jpg)
Answer
• As 1KB = 210 bytes, the byte offset occupies10 bits
• The page number uses the remaining 22 bits ofthe address
![Page 221: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/221.jpg)
Some review questions
• Why are TLB entries 64-bit wide while page table entries only require 32 bits?
• What would be the main disadvantage of a virtual memory system lacking a dirty bit?
• What is the big limitation of VM systems that cannot prevent processes from executing the contents of any arbitrary page in their address space?
![Page 222: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/222.jpg)
Answers
• We need extra space for storing the page number
• It would have to write back to disk all pages thatit expels even when they were not modified
• It would make the system less secure
![Page 223: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/223.jpg)
VIRTUAL MACHINES
![Page 224: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/224.jpg)
Key idea
• Let different operating systems run at the same time on a single computer– Windows, Linux and Mac OS– A real-time OS and a conventional OS– A production OS and a new OS being tested
![Page 225: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/225.jpg)
How it is done
• A hypervisor /VM monitor defines two or more virtual machines
• Each virtual machine has– Its own virtual CPU– Its own virtual physical memory– Its own virtual disk(s)
![Page 226: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/226.jpg)
The virtualization process
Actual hardware
CPU
Memory
Disk
Virtual hardware # 1
CPU
Memory
Virtual hardware # 2
CPU
Memory
Disk
Virtual hardware # 1
CPU
Memory
Disk
Hypervisor
![Page 227: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/227.jpg)
Reminder
• In a conventional OS,– Kernel executes in privileged/supervisor
mode• Can do virtually everything
– User processes execute in user mode• Cannot modify their page tables• Cannot execute privileged instructions
![Page 228: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/228.jpg)
KernelPrivileged
mode
Usermode
User processUser process
System call
![Page 229: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/229.jpg)
Two virtual machines
HypervisorPrivilegedmode
Usermode
Userprocess
VM Kernel
Userprocess
Userprocess
VM Kernel
Userprocess
![Page 230: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/230.jpg)
Explanations (II)
• Whenever the kernel of a VM issues a privileged instruction, an interrupt occurs– The hypervisor takes control and do the physical
equivalent of what the VM attempted to do:• Must convert virtual RAM addresses into
physical RAM addresses• Must convert virtual disk block addresses into
physical block addresses
![Page 231: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/231.jpg)
Translating a block address
VM kernel
Virtual disk
Access block x, yof my virtual disk
That's block v, w of the actual disk
Actual disk
Hypervisor
Access block v, w
of actual disk
![Page 232: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/232.jpg)
Handling I/Os
• Difficult task because– Wide variety of devices– Some devices may be shared among several
VMs• Printers• Shared disk partition
–Want to let Linux and Windowsaccess the same files
![Page 233: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/233.jpg)
Virtual Memory Issues
• Each VM kernel manages its own memory– Its page tables map program virtual
addresses into what it believes to be physical addresses
![Page 234: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/234.jpg)
The dilemma
User processA
VM kernel
Page 735 of process A is stored in page frame 435
That's page frame 993 of the actual RAM
Hypervisor
![Page 235: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/235.jpg)
The solution (I)
• Address translation must remain fast!– Hypervisor lets each VM kernel manage their
own page tables but do not use them• They contain bogus mappings!
– It maintains instead its own shadow page tables with the correct mappings• Used to handle TLB misses
![Page 236: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/236.jpg)
Why it works
• Most memory accesses go through the TLB
• The system can tolerate slower page table updates
![Page 237: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/237.jpg)
The solution (II)
• To keep its shadow page tables up to date, hypervisor must track any changes made by the VM kernels
• Mark page tables read-only–Each attempt to update then by a VM
kernel results in an interrupt
![Page 238: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/238.jpg)
Nastiest Issue
• The whole VM approach assumes that a kernel executing in user mode will behave exactly like a kernel executing in privileged mode except that privileged instruction will be trapped
• Not true for all architectures!– Intel x86 Pop flags (POPF) instruction– …
![Page 239: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/239.jpg)
POPF instruction
• Pop top of stack into lower 16 bits of EFLAGS – Designed for a 16-bit architecture
• FFLAGS contains interrupt enable flag (IE)• When executed in privileged mode, POPF
updates all flags• When executed in user mode, POPF updates all
flags but the IE flag
![Page 240: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/240.jpg)
Solutions
1. Modify the instruction set and eliminate instructions like POPF• IBM redesigned the instruction set of their
360 series for the 370 series2. Mask it through clever software
• Dynamic "binary translation" when direct execution of code could not work(VMWare)
![Page 241: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/241.jpg)
Other Approaches (I)
• Can use the VM approach to let binaries written in a specific machine language run on a machine with a different instruction set
• Called emulators• Have a huge performance penalty
– Still work fairly well when the target machine is much faster than the original architecture
– Lets us run very old binaries
![Page 242: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/242.jpg)
Other Approaches (II)
• Can use the VM approach to let programs written in any arbitrary low-level language run on many different architectures
• Java virtual machine (JVM)– Ported to may architectures– Allow execution of programs written in
"bytecode"– Professes to be inherently safe
![Page 243: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/243.jpg)
CP/CMS (I)
• IBM was the dominant computer manufacturer during the 60's and the 70's– Machines were designed for batch processing– Lacked any decent time-sharing OS
• Wanted by universities– TSS/360 was not a great success
![Page 244: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/244.jpg)
CP/CMS (II)
• IBM Cambridge Scientific Center– In Cambridge, MA– Developed a combination of
• A Control Program (CP) supporting virtual machines
• A time-sharing OS (CMS) for a single user• Was a great success!
![Page 245: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/245.jpg)
CP/CMS (III)
• How it worked
CMS on a VM
CMS on a VM
CMS on a VM
CMS on a VM
CP (hypervisor)
![Page 246: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/246.jpg)
CACHE CONSISTENCY
![Page 247: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/247.jpg)
The problem• Specific to architectures with
– Several processors sharing the same main memory
– Multicore architectures• Each core/processor has its own private cache
– Needed for performance • Problem occur when same data are present in
two or more private caches
![Page 248: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/248.jpg)
An example (I)
RAM
CPU
Cachex=0
CPU
Cachex=0
![Page 249: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/249.jpg)
An example (II)
RAM
CPU
Cachex=1
CPU
Cachex=0
Increments x
Still assumes x =0
![Page 250: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/250.jpg)
Our Objective
• Single copy serializability– All operations on all the variables should have
the same effect as if they were executed• in sequence with• a single copy of each variable
![Page 251: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/251.jpg)
One-copy serializability rules
1. Whenever a processes accesses a variable it always gets the value stored by the processor that updated that variable last
2. A processor accessing a variable sees all updates applied to that variable in thesame order
– The exact order does not matter as long as everybody agrees on it
![Page 252: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/252.jpg)
An example
RAM
CPU
Cachex=1
Sets x to 1
CPU
Cachex=0
Resets x to 0
CPU
Cachex=?
CPU
Cachex=?
Both CPUs must applythe two updates
in the same order
![Page 253: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/253.jpg)
Big problem
• When a processor updates a cached variable, the new value of the variable is not immediately written into the main memory– Perfect one-copy serializability is not feasible
![Page 254: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/254.jpg)
New rules
1. Whenever a processes accesses a variable it always gets the value stored by the processor that updated that variable last if the updates are sufficiently separated in times
2. A processor accessing a variable sees all updates applied to that variable in thesame order
– No compromise is possible here
![Page 255: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/255.jpg)
A remark
• Data consistency issues appear in may disguises– Cache consistency– Distributed shared memory
• work done in early to mid 90's– Distributed file systems– Distributed databases
![Page 256: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/256.jpg)
An example (I)
• UNIX workstations use a distributed file system called NFS (Network File System)
• An NFS comprises– client workstations– a centralized server
• NFS allows client workstations to cache contents of the file they access
• What happens when two workstations access the same file?
![Page 257: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/257.jpg)
An example (II)
Server
x’ xx’’
A B
Inconsistent updates
![Page 258: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/258.jpg)
Possible Approaches (I)
• Always keep a single copy:– Guarantees one–copy serializability– Would make the system too slow
• No caching!
• Prevent shared access:– Guarantees one–copy serializability– Would be very slow and complicated
![Page 259: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/259.jpg)
Possible Approaches (II)
• Replicate and update:– Allows multiple processors to cache variables
already cached by other processors– Whenever a processor updates a cached variable,
it propagates the update to all other caches holding a copy of the variable
– Costly because processors tend to repeatedly update the same variable• Temporal locality of accesses
![Page 260: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/260.jpg)
Possible Approaches (III)
• Replicate and invalidate:– Allows multiple processors to cache variables
already cached by other processors– Whenever a processor updates a cached
variable, we invalidate all other cached copies of the variable• Works well with write-through caches
–Will get the correct value later from RAM
![Page 261: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/261.jpg)
A realization: Snoopy caches
• All caches are linked to the main memory through a shared bus
– All caches observe the writes performed by other caches
• When a cache notices that another cache performs a write on a memory location that it has in its cache, it invalidates the corresponding cache block
![Page 262: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/262.jpg)
An example (I)
RAM
CPU
Cachex=2
CPU
CacheFetches x = 2
x = 2
![Page 263: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/263.jpg)
An example (II)
RAM
CPU
Cachex = 2
CPU
Cachex = 2
Also fetches x
x = 2
![Page 264: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/264.jpg)
An example (III)
RAM
CPU
Cachex = 0
CPU
Cachex = 2
Resets x to 0
x = 2
![Page 265: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/265.jpg)
An example (IV)
RAM
CPU
Cachex = 0
CPU
Cachex = ?x = ?
Performs write-through
Detects write-through
and invalidates its copy of x
x = 0
![Page 266: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/266.jpg)
An example (IV)
RAM
CPU
Cachex = 0
CPU
Cachex = 0
when CPU wants to access x. cache gets correct value
from RAM
x = 0
![Page 267: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/267.jpg)
A last correctness condition
• Cache cannot reorder their memory updates– Cache RAM buffer must be FIFO
• First in first out
![Page 268: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/268.jpg)
Example
• A CPU performs– x = 0;– x++; // sets x to 1;
• Final value for x in CPU cache is 1• If write buffer reorders write-through requests,
final value for x in RAM—and other caches will be 0– Ouch!
![Page 269: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/269.jpg)
Miscellaneous fallacies (I)
• Segmented address spaces
– Address is segment number + offset in segment
– Supposed to let programmers organize their address space into meaningful segments
– Programmers—and compilers—hate them
![Page 270: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/270.jpg)
Miscellaneous fallacies (II)
• Ignoring virtual memory behavior when accessing large two-dimensional arrays
– Must access array in a way that minimizes number of page faults
– Done by all good mathematical software libraries
![Page 271: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/271.jpg)
Miscellaneous fallacies (III)
• Believing that you can virtualize any CPU architecture – Some are much more difficult than others
![Page 272: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu](https://reader034.vdocument.in/reader034/viewer/2022051315/56649e405503460f94b31312/html5/thumbnails/272.jpg)
Concluding remarks
• As before, we have seen how human ingenuity has worked around hardware limitations– Cannot increase CPU speed above 3 to 4 GHZ
Pipelining, multicore architectures
– RAM is slower than CPUCaches
– Hard disks much slower than RAMVirtual memory, I/O buffering