multicore architechture-2

58
Copyright © 2009 by W. Feng. Based on material from Matthew Sottile. Multicore Architectures Week 1, Lecture 2

Upload: sheik-mohideen

Post on 14-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 1/58

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Multicore Architectures 

Week 1, Lecture 2

Page 2: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 2/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Multicore Landscape•  Intel

 –  Dual and quad-core Pentium family.

 –  80-core demonstration last year.

•   AMD –  Dual, triple (?!), and quad-core Opteron family.

•  IBM –  Dual and quad-core Power family

 – 

Cell Broadband Engine•  Sun

 –  (Multithreaded) Niagara: 8 cores (64 threads).

Page 3: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 3/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Multicore Classes•  Homogenous Multicore

 –  Replication of the same processor type on the die as ashared memory multiprocessor.

 –  Examples: AMD and Intel dual- and quad-core processors•  Heterogeneous/Hybrid Multicore

 –  Different processor types on a die

 –  Example: IBM Cell Broadband Engine

•  One standard core (PowerPC) + 8 specialiced cores (SPEs)

 –  Rumors: Intel and AMD to follow suit.•  Combining GPU-like processors with standard multicore cores

Page 4: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 4/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Homogeneous Multicore•  The major mainstream chips are all of this form

 –  Examples: Intel, IBM, Sun, AMD, and so on.

•  Take a single “core”, stamp it out lots of times on a die.

•  This will work for now, but it likely will not scale to highcore counts. Why?

Page 5: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 5/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

 AMD Dual-, Triple-, and Quad-Core

Page 6: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 6/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

 AMD Triple-Core?•   Actually quad-core processors with one core disabled.

•  Manufacturing defects that kill one core but leave therest functional would be thrown out as failed quad core.

 –  Why not resell them as triple core?

•  Not a new idea …

 –  Lower clock-rate versions of processors are identical to their higher clock-rate siblings but were part of a batch that failed tomeet certain manufacturing tolerances but were otherwise fine.

Page 7: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 7/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

 AMD Rev. H Quad-Core

Page 8: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 8/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

IBM: Power6

Page 9: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 9/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Sun Niagara8 cores, 8 threadsper core = 64threads

Page 10: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 10/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Intel Nehalem

Page 11: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 11/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Dies

•  Observations –  Core replication obvious.

 – 

Big chunks of real estate dedicated to caches. –  Increasing real estate dedicated to memory and bus logic.

Why?

 –  In the past, SMPs were usually 2 or 4 CPUs, so this logicwas on other chips on the motherboard. Even with big fatSMPs, the logic was off on a different board (often this was

what the $$$ went for in the expensive SMPs.)

Page 12: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 12/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Wait a Minute! There’s More!•  Itanium processors are

based on the concept of VLIW (Very Long

Instruction Word). –  Each instruction is a 128-

bit word composed of four 32-bit instructions to beexecuted concurrently.

 –  Difficulty of identifyingparallelism is left to thecompiler instead of hardware or programmer.

 –  VLIW dead with “Itanic”?

Page 13: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 13/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

VLIW Lives!•  VLIW not dead.

•  Intel Terascale demoprocessor 

 –  80 VLIW cores insteadof 80 superscalar coressimilar to those found inthe Core2 architecturemore commonly foundtoday.

Page 14: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 14/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Multicore: Present•  Core count is increasing.

•  Operating systems schedule processes out to thevarious cores in the same way they always have on

traditional multiprocessor systems.•   Applications get increased performance for free by

reducing the # processes per core yielding adecrease in contention for processor resources.

Page 15: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 15/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Resource Contention•   Assume each process or thread has its own CPU.

 –  Example: The little clock in the corner of the screen has itsown CPU.

•  Result: Massive contention for resources whensimultaneous threads all “fight” for off-chip resources.

•  Process preemption when doing something slow,e.g., going to memory. Other useful work can occur.

• 

If a processor exists for each process, everyonecould proceed at the same time yet will still befighting for a bus that is not growing at the same rateas the core count. Same with caches.

Page 16: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 16/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Blast Back to the Past … 70s and 80s

•  Network topology will matter again!

 –  Mesh-based algorithms, hypercubes,tori, etc.

 –  80-core Intel Terascale demoprocessor had a huge componentrelated to interconnection topology.

 – 

Systolic algorithms are making acomeback!

Page 17: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 17/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Reminder: Reading Assignment•  Brian Hayes, “Computing in a Parallel Universe,”

 American Scientist , November-December 2007.http://www.americanscientist.org/issues/pub/

computing-in-a-parallel-universe •  Herb Sutter, “The Free Lunch Is Over: A

Fundamental Turn Toward Concurrency in Software,”Dr. Dobb's Journal , 30(3), March 2005.http://gotw.ca/publications/concurrency-ddj.htm 

Page 18: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 18/58

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Why Memory Matters

Yes for “traditional multicore” …

No for “emerging multicore” …

Page 19: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 19/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Memory Architecture in Multicore•   As you saw in one of the readings …

 –  The cache is still a key performance feature.

•  … but interesting ideas exist that will turn the usual memory

architecture upside down.•  … hence, we look at traditional memory architecture so we can

compare and contrast it with emerging memory architecture of the Cell, GPGPU, and so on.

Page 20: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 20/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caches: The Multicore Feature of Interest•  Introduced in the 1960s as a way to overcome the

“memory wall”. –  Memory speeds did not advance as fast as the speed of 

functional units. –  Consequence: Processor outruns memory, leading to

decreased utilization.

•  Utilization = (ttotal – tidle) / ttotal 

•  What happens when you go out to main memory?

Idle time. –  Decreased utilization = less work per unit time.

 –  This costs money because time=$$ … so time doing nothingis $$$ wasted.

Page 21: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 21/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Idle Time•  Caches were not the sole fix for idle time.

 –  Preemptive multitasking and time sharing were actually thedominant methods in the early days.

 –  But every program inevitably has to go out to memory, andyou do not always have enough jobs to swap in while otherswait for memory.

 –  Plus, do you really want to be preempted every time you askfor data? Of course not!

• 

So, caches important (for now :-) …

Page 22: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 22/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caches: Quick Overview•  Traditional Single-Level Memory

•  Multiple Memory Levels

Page 23: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 23/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caches: In Action•   Access a location in memory

•  Copy the location and its neighbors into the faster memories closer to the CPU.

Page 24: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 24/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caches: In Action•   Access a location in memory

•  Copy the location and its neighbors into the faster memories closer to the CPU.

Page 25: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 25/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caches: In Action•   Access a location in memory

•  Copy the location and its neighbors into the faster memories closer to the CPU.

Page 26: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 26/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caches: In Action•   Access a location in memory

•  Copy the location and its neighbors into the faster memories closer to the CPU.

Page 27: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 27/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caches: In Action•   Access a location in memory

•  Copy the location and its neighbors into the faster memories closer to the CPU.

Page 28: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 28/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caches: In Action•   Access a location in memory

•  Copy the location and its neighbors into the faster memories closer to the CPU.

Page 29: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 29/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caches: In Action•  Next time you access memory, if you already pulled

the address into one of the cache levels in a previousaccess, the value is provided from the cache.

Page 30: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 30/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caches: In Action•  Next time you access memory, if you already pulled

the address into one of the cache levels in a previousaccess, the value is provided from the cache.

Page 31: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 31/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caches: In Action•  Next time you access memory, if you already pulled

the address into one of the cache levels in a previousaccess, the value is provided from the cache.

Page 32: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 32/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caches: In Action•  Next time you access memory, if you already pulled

the address into one of the cache levels in a previousaccess, the value is provided from the cache.

Page 33: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 33/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caches: In Action•  Next time you access memory, if you already pulled

the address into one of the cache levels in a previousaccess, the value is provided from the cache.

Page 34: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 34/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Why Do Caches Work? Locality•  Spatial and Temporal

 –  Locations near each other in space (address) are highly likelyto be accessed near each other in time.

•  Cost? –   A high cost for one access but amortize this out with faster 

accesses afterwards.

•  Burden?

 –  The machine makes sure that memory is kept consistent. If a

part of cache must be reused, the cache system writes thedata back to main memory before overwriting it with new data.

 –  Hardware cache design deals with managing mappingsbetween the different levels and deciding when to write backdown the hierarchy.

Page 35: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 35/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caches: Memory Consistency?•  What happens when you modify something in

memory?

Page 36: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 36/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caches: Memory Consistency?•  What happens when you modify something in

memory?

Page 37: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 37/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caches: Memory Consistency?•  What happens when you modify something in

memory?

•  Writes to memory become cheap. Only go to slow

memories when needed. Called write-back memory.

Page 38: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 38/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caches: Memory Consistency?•  Eventually written values must make it back to the

main store.

•  When?

 –  Typically, when a cache block is replaced due to a cachemiss, where new data must take the place of old.

•  The programmer does NOT see this.

 –  Hardware takes care of all this … but things can go wrongvery quickly when you modify this model.

•  Forecast for emergent chip multiprocessors: Cell, GPGPU,Terascale, and so on.

Page 39: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 39/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Common Memory Models•  Shared Memory Architecture

•  Distributed Memory Architecture

•  Which is Intel? Which is AMD? Others?

Page 40: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 40/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Memory Models: Shared Memory•  Before

 –  Only one processor has access to modify memory.

•  How do we avoid problems when multiple cache

hierarchies see the same memory?

Page 41: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 41/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caching Issues•   Assume two processors load

locations that are neighbors,so data is replicated in the local processor caches.

•  Now, let one processor modify a value.•  The memory view is now inconsistent. One

processor sees one version of memory, the other sees a different version.

• 

How do we resolve this in hardware such that theadvantages of caches are still seen by applicationdevelopers in terms of performance while ensuring aconsistent (or, coherent) view of memory?

Page 42: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 42/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caching Issues

Page 43: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 43/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caching Issues

Page 44: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 44/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caching Issues

Page 45: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 45/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caching Issues

Page 46: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 46/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Caching: Memory Consistency?•  Easy to see “memory consistency” problem if we

restrict each cache hierarchy to being isolated fromthe others, only sharing main memory .

•  Key insight –  Make this inconsistency “go away” by making the caches

aware of each other.

Page 47: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 47/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

What is Memory Coherence?•  Definition (Courtesy: “Parallel Computer Architecture” by Culler and Singh)

1.  Operations issued by any particular process occur in theorder in which they were issued to the memory system bythat process.

2.  The value returned by each read operation is the valuewritten by the last write to that location in the serial order.

•   Assumption: The above requires a hypothetical ordering for 

all read/write operations by all processes into a total order that is

consistent with the results of the overall execution.

•  Sequential Consistency (SC)

 –  The memory coherence hardware assists in enforcing SC.

Page 48: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 48/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Implicit Properties of Coherence•  The key to solving the cache coherence problem is

the hardware implementation of a cache coherence

 protocol.

•   A cache coherence protocol takes advantage of twohardware features

1.  State annotations for each cache block (often just a couplebits per block).

2.  Exclusive access to the bus by any accessing process.

Page 49: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 49/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Bus Properties•   All processors on the bus see the same activity.

•  So, every cache controller sees bus activity in thesame order.

•  Serialization at the bus level results from the phasesthat compose a bus transaction: –  Bus arbitration: The bus arbiter grants exclusive access to

issue commands onto the bus.

 –  Command/address: The operation to perform (“Read”, “Write”),

and the address. –  Data: The data is then transferred.

Page 50: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 50/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Granularity•  Cache coherence applies at the block level.

•  Recall that when you access a location in memory,that location and its neighbors are pulled into the

cache(s). These are blocks.

Note: To simplify the discussion, we will only consider asingle level of cache. The same ideas translate to

deeper cache hierarchies. 

Page 51: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 51/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Cache Coherency via the Bus•  Key Idea: Bus Snooping

 –   All CPUs on the bus can see activity on the bus regardlessof if they initiated it.

Page 52: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 52/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Cache Coherency via the Bus•  Key Idea: Bus Snooping

 –   All CPUs on the bus can see activity on the bus regardlessof if they initiated it.

Page 53: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 53/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Cache Coherency via the Bus•  Key Idea: Bus Snooping

 –   All CPUs on the bus can see activity on the bus regardlessof if they initiated it.

Page 54: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 54/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Cache Coherency via the Bus•  Key Idea: Bus Snooping

 –   All CPUs on the bus can see activity on the bus regardlessof if they initiated it.

Page 55: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 55/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Invalidation vs. Update•   A cache controller snoops and sees a write to a

location that it has a now-outdated copy of. –  What does it do?

•  Invalidation –  Mark cache block as invalid , so when CPU accesses it again,

a miss will result and the updated data from main memory willbe loaded. Requires one bit per block to implement. 

•  Update

 –  See the write and update the caches with the value observedbeing written to main memory.

Page 56: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 56/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Write-Back vs. Write-Through Caches•  Write-Back

 –  On a write miss, the CPU reads the entire block frommemory where the write address is, updates the value incache, and marks the block as modified (aka dirty).

•  Write-Through –  When the processor writes, even to a block in cache, a bus

write is generated.

•  Write-back is more efficient with respect to bandwidth

usage on the bus, and hence, ubiquitously adopted.

Page 57: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 57/58

Multicore Landscape

Copyright © 2009 by W. Feng. Based on material from Matthew Sottile.

Cache Coherence and Performance•  Unlike details with pipelining that only concern

compiler writers, you the programmer really need toacknowledge that this is going on under the covers.

•  The coherence protocol can impact your 

 performance.

Page 58: multicore architechture-2

7/29/2019 multicore architechture-2

http://slidepdf.com/reader/full/multicore-architechture-2 58/58

Multicore Landscape

C i ht © 2009 b W F B d t i l f M tth S ttil

Cache Coherence: Performance Demo