logo multi-core architecture gv: nguyễn tiến dũng sinh viên: ngô quang thìn nguyễn trung...
TRANSCRIPT
LOGO
Multi-core ArchitectureMulti-core ArchitectureGV: Nguyễn Tiến Dũng
Sinh viên: Ngô Quang Thìn Nguyễn Trung Thành Trần Hoàng Điệp
Lớp: KSTN-ĐTVT-K52
ContentsContents
Background1
Multi-core basics2
Multi-core challenges3
3
1. Background1. Background
1.1 Brief history of Microprocessor
1.2 Moore’s Law
1.3 Past efforts to increase efficiency
1.4 The need for Multi-core
4
1.1 Brief history of Microprocessors1.1 Brief history of Microprocessors
Early 1970s: 4-bit 4004 (Intel)
8-bit:8008 & 8080 (Intel), 6800 (Motorola)
16-bit: 8086 & 8088 (Intel), 68000 (Motorola)
32-bit: 80386 (Intel)
Pentium,… (Intel)
5
1.2 Moore’s Law1.2 Moore’s Law
6
1.3 Past efforts to increase efficiency1.3 Past efforts to increase efficiency
• Microprocessor frequency was synonymous with performance
-> increase in processor frequency
Pentium 4: 1.3 – 3.8 GHz over 8 years• Multiple instruction
7
1.3 Past efforts to increase efficiency1.3 Past efforts to increase efficiency
8
1.4 The need for Multi-core1.4 The need for Multi-core
• Two cores processor: twice the performance and dissipate less heat than the fastest single core processor.
• 2005, IEEE review: “power consumption increase by 60% with every 400MHz rise in clock speed…But the dual-core approach means you can get a significant boost in performance without the need to run at ruinous clock rates”
• Some experts believe that “ by 2017 embedded processors could sport 4096 cores, server CPUs might have 512 cores and desktop chips could use 128 cores” → So astounding!!!
9
2. Multi-core basics2. Multi-core basics
Basic configuration of a microprocessor
Level 1 (L1) Cache: store data frequently used by the processor
L2 cache: lager than L1Main memory: very large, slower
than cacheCommunication method: single
communication bus (shared memory model) or interconnection network (distributed memory model)
10
2. Multi-core basics (cont…)2. Multi-core basics (cont…)
11
2. Multi-core basics (cont…)2. Multi-core basics (cont…)
the single core
12
2. Multicore basics (cont…)2. Multicore basics (cont…)
Multi-core architecture:
Core 1 Core 2 Core 3 Core 4
13
2. Multi-core basics (cont…)2. Multi-core basics (cont…)
core
1
core
2
core
3
core
4
thread 1 thread 2 thread 3 thread 4
The cores run in parallel
14
2. Multi-core basics (cont…)2. Multi-core basics (cont…)
Comparison of a single and multi-core (8 cores) processor used by the Packaging Research Center at Georgia Tech
15
3. Multi-core Challenges3. Multi-core Challenges
Power and temperature managementMemory/cacheProgramming for multi-core
16
3.1 Power and Temperature3.1 Power and Temperature
2 cores were placed on a single chip
→consume twice as much power and generate a large amount of heat
→computer may combustThe chip is architected: number of hot spots doesn’t
grow too large and the heat is spread out across the chip
17
3.1 Power and Temperature (cont…)3.1 Power and Temperature (cont…)
Heat in the CELL processor is dissipated in the Power Processing Element and the rest is spread across the Synergistic Processing Elements.
Monitor temperature in the Chip by one linear sensor and ten internal digital sensor
Since we have private caches: How to keep the data consistent across caches?
→Each core should perceive the memory as a monolithic array, shared by all the cores
3.2 Cache Coherence 3.2 Cache Coherence
Core 1 Core 2 Core 3 Core 4
One or more levels of
cachex=1
One or more levels of
cachex=1
One or more levels of
cache
One or more levels of
cache
Main memoryx=1
multi-core chip
3.2 Cache Coherence
Core 2 reads xCore 1 reads x
Core 1 writes to x, setting it to 2
Core 1 Core 2 Core 3 Core 4
One or more levels of
cachex=2
One or more levels of
cachex=1
One or more levels of
cache
One or more levels of
cache
Main memoryx=2
multi-core chipassuming write-through
caches
3.2 Cache Coherence
21
Core 2 attempts to read x… gets a stale copy
Core 1 Core 2 Core 3 Core 4
One or more levels of
cachex=2
One or more levels of
cachex=1
One or more levels of
cache
One or more levels of
cache
Main memoryx=2
multi-core chip
3.2 Cache Coherence
22
3.2 Cache Coherence3.2 Cache Coherence
Solutions for cache coherenceThis is a general problem with multiprocessors, not
limited just to multi-coreThere exist many solution algorithms, coherence
protocols, etc.
A simple solution: invalidation-based protocol with snooping
23
Inter-core busInter-core bus
Core 1 Core 2 Core 3 Core 4
One or more levels of
cache
One or more levels of
cache
One or more levels of
cache
One or more levels of
cache
Main memory multi-core chip
inter-core bus
3.2 Cache Coherence
24
Revisited: Cores 1 and 2 have both read x
Core 1 Core 2 Core 3 Core 4
One or more levels of
cachex=1
One or more levels of
cachex=1
One or more levels of
cache
One or more levels of
cache
Main memoryx=1
multi-core chip
3.2 Cache Coherence
25
Core 1 writes to x, setting it to 2
Core 1 Core 2 Core 3 Core 4
One or more levels of
cachex=2
One or more levels of
cachex=1
One or more levels of
cache
One or more levels of
cache
Main memoryx=2
multi-core chipassuming write-through
caches
INVALIDATEDsendsinvalidation
request
inter-core bus
3.2 Cache Coherence
26
After invalidation:
Core 1 Core 2 Core 3 Core 4
One or more levels of
cachex=2
One or more levels of
cache
One or more levels of
cache
One or more levels of
cache
Main memoryx=2
multi-core chip
3.2 Cache Coherence
27
Core 2 reads x. Cache misses, and loads the new copy.
Core 1 Core 2 Core 3 Core 4
One or more levels of
cachex=2
One or more levels of
cachex=2
One or more levels of
cache
One or more levels of
cache
Main memoryx=2
multi-core chip
3.2 Cache Coherence
28
3.2 Cache Coherenceupdate protocol
Core 1 writes x=2:
Core 1 Core 2 Core 3 Core 4
One or more levels of
cachex=21660
One or more levels of
cachex=21660
One or more levels of
cache
One or more levels of
cache
Main memoryx=21660
multi-core chipassuming write-through
caches
UPDATED
broadcastsupdatedvalue inter-core bus
29
3.2 Cache Coherence
Which do you think is better? Invalidation or update?
30
• Multiple writes to the same location– invalidation: only the first time– update: must broadcast each write (which includes
new variable value)
• Invalidation generally performs better: it generates less bus traffic
3.2 Cache Coherence
31
3.3 Programming for multi-core3.3 Programming for multi-core
Programmers must use threads or processes
Spread the workload across multiple cores
Write parallel algorithms
OS will map threads/processes to cores
32
3.3 Programming for multi-core3.3 Programming for multi-core
Thread safety very importantPre-emptive context switching: context switch can
happen AT ANY TIME
→ Need to use synchronization
33
3.3 Programming for multi-core3.3 Programming for multi-core
Example:int counter=0;void thread1() { int temp1=counter; counter = temp1 + 1;}void thread2() { int temp2=counter; counter = temp2 + 1;}
34
temp1=counter;counter = temp1 + 1;temp2=counter;counter = temp2 + 1
temp1=counter;temp2=counter;counter = temp1 + 1;counter = temp2 + 1
gives counter=2
gives counter=1
3.3 Programming for multi-core3.3 Programming for multi-core
35
3.3 Programming for multi-core3.3 Programming for multi-core
→ Assigning threads to the cores
Each thread/process has an affinity mask
Affinity mask specifies what cores the thread is allowed to run on
Different threads can have different masks
36
3.3 Programming for multi-core3.3 Programming for multi-core
Example: 4-way multi-core, without SMT
1011
core 3 core 2 core 1 core 0
Process/thread is allowed to run on cores 0,2,3, but not on core 1
37
Multi-core chips an important new trend in computer architecture
Several new multi-core chips in design phases
Parallel programming techniques
likely to gain importance
ConclusionConclusion
LOGO