high performance embedded systems mpsocs
TRANSCRIPT
![Page 1: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/1.jpg)
High Performance Embedded Systems
July 2020
Electronics Engineering Department
Electronics Master Program
MPSoCs
![Page 2: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/2.jpg)
Outline
2
• Multiprocessors Architecture and Taxonomy
• Parallel Execution Mechanism
• Multiprocessors Design Techniques
• Memory Systems
• Processors Symmetry
• Co-processing
![Page 3: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/3.jpg)
3
Multiprocessors Architecture and Taxonomy
Taken from: https://arstechnica.com/gadgets/2020/05/intels-comet-lake-desktop-cpus-are-here/
Intel 4004 Core i9??
![Page 4: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/4.jpg)
4
Multiprocessors Architecture and Taxonomy
Taken from: https://arstechnica.com/gadgets/2020/05/intels-comet-lake-desktop-cpus-are-here/
Intel 4004 Core i9
![Page 5: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/5.jpg)
5
Multiprocessors Architecture and Taxonomy
Taken from: https://arstechnica.com/gadgets/2020/05/intels-comet-lake-desktop-cpus-are-here/
Exynos 7420 finFET transistors
![Page 6: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/6.jpg)
6
Multiprocessors Architecture and Taxonomy
Taken from: https://arstechnica.com/gadgets/2020/05/intels-comet-lake-desktop-cpus-are-here/
Exynos 7420 finFET transistors
![Page 7: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/7.jpg)
7
Multiprocessors Architecture and Taxonomy
Taken from: https://www.researchgate.net/publication/257711815_Where_Photovoltaics_Meets_Microelectronics/figures?lo=1
![Page 8: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/8.jpg)
8
Multiprocessors Architecture and Taxonomy
Taken from: https://www.semiconductor-digest.com/2020/03/10/transistor-count-trends-continue-to-track-with-moores-law/
![Page 9: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/9.jpg)
9
Multiprocessors Architecture and Taxonomy
Taken from: https://www.elprocus.com/difference-between-soc-system-on-chip-single-board-computer/
SoC
![Page 10: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/10.jpg)
10
Multiprocessors Architecture and Taxonomy
Taken from: http://soc.inha.ac.kr/index.php/Project
2-Parallel Radix-
2^4 FFT/IFFT
Processor Chip for
MB-OFDM UWB
communications
![Page 11: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/11.jpg)
11
Multiprocessors Architecture and Taxonomy
Taken from: PrSoC: Programmable System-on-chip (SoC) for silicon prototyping IEEE 2008
![Page 12: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/12.jpg)
12
Multiprocessors Architecture and Taxonomy
Taken from: https://www.elprocus.com/difference-between-soc-system-on-chip-single-board-computer/
SoC
MPSoC
![Page 13: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/13.jpg)
13
Multiprocessors Architecture and Taxonomy
Taken from: https://commons.wikimedia.org/wiki/File:ARM-Cortex-A9.gif
¿MPSoCs?
![Page 14: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/14.jpg)
14
Multiprocessors Architecture and Taxonomy
SoC
Taken from: W. Wolf Multiprocessor Systems-On-Chip
• Is an integrated circuit that implements
most or all of the functions of a
complete electronic system.
• The most fundamental characteristic of
an SoC is complexity.
![Page 15: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/15.jpg)
15
Multiprocessors Architecture and Taxonomy
SoC
Taken from: W. Wolf Multiprocessor Systems-On-Chip
Many product categories:
• Cell phones.
• Telecommunications and networking.
• Digital television.
• Videos games.
• …..
![Page 16: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/16.jpg)
16
Multiprocessors Architecture and Taxonomy
SoC Example
Taken from: W. Wolf Multiprocessor Systems-On-Chip
Processing Elements
![Page 17: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/17.jpg)
17
Multiprocessors Architecture and Taxonomy
SoC Example
Taken from: W. Wolf Multiprocessor Systems-On-Chip
Memory
![Page 18: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/18.jpg)
18
Multiprocessors Architecture and Taxonomy
SoC Example
Taken from: W. Wolf Multiprocessor Systems-On-Chip
Communications
![Page 19: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/19.jpg)
19
Multiprocessors Architecture and Taxonomy
SoC Example
Taken from: W. Wolf Multiprocessor Systems-On-Chip
MPSoCs?
![Page 20: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/20.jpg)
20
Multiprocessors Architecture and Taxonomy
MPSoCs?
Wait!
What is a Parallel Architecture?
![Page 21: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/21.jpg)
21
Multiprocessors Architecture and Taxonomy
Parallel Architecture
“A large collection of processing elements that communicate and cooperate to
solve large problems fast”. - Almasi.
Taken from: M. Aguilar MPSoCs
![Page 22: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/22.jpg)
22
Multiprocessors Architecture and Taxonomy
Parallel Architecture
“A large collection of processing elements that communicate and cooperate to
solve large problems fast”. - Almasi.
Taken from: M. Aguilar MPSoCs
![Page 23: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/23.jpg)
23
Multiprocessors Architecture and Taxonomy
Parallel Architecture
“A large collection of processing elements that communicate and cooperate to
solve large problems fast”. - Almasi.
Taken from: M. Aguilar MPSoCs
SoC
HW+SW
![Page 24: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/24.jpg)
24
Multiprocessors Architecture and Taxonomy
Parallel Architecture
“A large collection of processing elements that communicate and cooperate to
solve large problems fast”. - Almasi.
Taken from: M. Aguilar MPSoCs
SoC
HW+SW
Technology was increased
![Page 25: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/25.jpg)
25
Multiprocessors Architecture and Taxonomy
Parallel Architecture
“A large collection of processing elements that communicate and cooperate to
solve large problems fast”. - Almasi.
Taken from: M. Aguilar MPSoCs
SoC
HW+SW
Technology was increased
![Page 26: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/26.jpg)
26
Multiprocessors Architecture and Taxonomy
Parallel Architecture
“A large collection of processing elements that communicate and cooperate to
solve large problems fast”. - Almasi.
Taken from: M. Aguilar MPSoCs
SoC
HW+SW
MPSoCs Technology was increased
![Page 27: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/27.jpg)
27
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
Serial Communication
Parallel Communication
![Page 28: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/28.jpg)
28
Multiprocessors Architecture and Taxonomy
Here we go
What are MPSoCs?
Taken from: W. Wolf Multiprocessor Systems-On-Chip
![Page 29: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/29.jpg)
29
Multiprocessors Architecture and Taxonomy
What are MPSoCs?
“Are the latest incarnation of very largescale integration (VLSI)
technology”
Taken from: W. Wolf Multiprocessor Systems-On-Chip
???
![Page 30: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/30.jpg)
30
Multiprocessors Architecture and Taxonomy
What are MPSoCs?
“Are the latest incarnation of very largescale integration (VLSI)
technology”
Taken from: W. Wolf Multiprocessor Systems-On-Chip
???• Silicon
• Power
• Area
• …
![Page 31: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/31.jpg)
31
Multiprocessors Architecture and Taxonomy
What are MPSoCs?
“Are the latest incarnation of very largescale integration (VLSI)
technology”
“A single integrated circuit can contain over
100 million transistors, and the International Technology Roadmap
for Semiconductors predicts that chips with a billion transistors are
within reach”
Taken from: W. Wolf Multiprocessor Systems-On-Chip
![Page 32: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/32.jpg)
32
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
MPSoCs
“The multiprocessor System-on-Chip (MPSoC) is a system-on-a-chip
(SoC) which uses multiple processors (see multi-core), usually
targeted for embedded applications”.
SoC
HW+SW
MPSoCs Understood!!
![Page 33: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/33.jpg)
33
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
MPSoCs
“The multiprocessor system-on-chip (MPSoC) uses multiple CPUs
along with other hardware subsystems to implement a system”. -
Wayne Wolf.
Multiprocessor = Multicore?
![Page 34: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/34.jpg)
34
Multiprocessors Architecture and Taxonomy
General Structure MPSoCs
Processing Elements (PE)
• Relation with application context and requirements.
• MPSoCs Homogenous.
• MPSoCs Heterogenous
• Interconnection Element
• Buses.
• NoCs (Networks on Chip). More information here.
Taken from: M. Agular MPSoCs
![Page 35: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/35.jpg)
35
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
Advantage in MPSoCs
• Performance
• Powerful platform (Cores).
• Users.
• Applications.
• Tasks into same application.
Power Consumption
• Low power from parallel approach.
![Page 36: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/36.jpg)
36
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
![Page 37: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/37.jpg)
37
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
MPSoCs Beneficts
• Wireless.
• Multimedia: video and audio.
• Health.
• Military.
• Avionics.
• Aerospacial
![Page 38: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/38.jpg)
38
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
Multiprocessor = Multicore?
![Page 39: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/39.jpg)
39
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
Multiprocessor
• Platform with several CPUs.
• Parallel approach was used.
Multicore
• Platform with only one CPU.
• Multiple cores into CPU.
![Page 40: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/40.jpg)
40
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
MPSoCs Software
![Page 41: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/41.jpg)
41
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
Parallel Approaches
Parallel
Approaches
![Page 42: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/42.jpg)
42
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
Parallel Approaches
Parallel
Approaches
Bits
Threads
TasksInstructions
Data
![Page 43: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/43.jpg)
43
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
MPSoCs Architecture?
![Page 44: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/44.jpg)
44
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
MPSoCs
PEs
![Page 45: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/45.jpg)
45
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
MPSoCs
Homogeneous Heterogenous
PEs
![Page 46: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/46.jpg)
46
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
MPSoCs Heterogeneous
• Different PEs, for example
• GPU (General Purpose Unit).
• DSPs.
• HW Acceleration
• NoC infrastructure.
• Better performance and power consumption
• Use in embedded system.
• Portable system.
• Power consumption.
![Page 47: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/47.jpg)
47
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
MPSoCs Homogenous
• PEs to conform a SoC.
• PE is instanced several times.
• Instance is connected by communication
infrastructure.
• Flexibility and Scalability.
• Worst power consumption.
![Page 48: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/48.jpg)
48
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
MPSoCs Taxonomy?
![Page 49: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/49.jpg)
49
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
Processor Organization
Serial
SISD
Uniprocessor
Multi ALUOverlapped
operations
Parallel
SIMD MISD MIMD
Vector
processor
Array
processor
Tightly
coupled
Loosely
coupled
Shared
memory
Symmetric
multiprocessor
(SMP)Nonuniform
memory access
(NUMA)
Distributed
memory
Clusters
![Page 50: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/50.jpg)
50
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
Where are located MPSoCs?
![Page 51: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/51.jpg)
51
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
Processor Organization
Serial
SISD
Uniprocessor
Multi ALUOverlapped
operations
Parallel
SIMD MISD MIMD
Vector
processor
Array
processor
Tightly
coupled
Loosely
coupled
Shared
memory
Symmetric
multiprocessor
(SMP)Nonuniform
memory access
(NUMA)
Distributed
memory
Clusters
MPSoCs
![Page 52: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/52.jpg)
52
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs and Parallel Computing Lectures Notes
MISD
• This architecture executing
different operations over
different data bundle.
• Multiprocessing approach and
MPSoCs were located in this
category.
![Page 53: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/53.jpg)
53
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
MPSoCs
Homogeneous Heterogenous
PEs
Memory Access
Uniform Access (UMA)
Non-Uniform Access (NUMA)
Processors Symmetry
SMP (Symmetric Multi-processing)
AMP (Asymmetric Multi-processing)
Memory Architecture
Share Memory
Distributed memory
MPSoCs Architecture
![Page 54: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/54.jpg)
54
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
ARM Cortex A9
![Page 55: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/55.jpg)
55
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
Analog Devices - Blackfin
![Page 56: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/56.jpg)
56
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
TI Davinci DM355
![Page 57: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/57.jpg)
57
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
TI OMAP5
![Page 58: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/58.jpg)
58
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
ST Microelectronic Nomadik
![Page 59: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/59.jpg)
59
Multiprocessors Architecture and Taxonomy
Taken from: M. Aguilar MPSoCs
Nexperia
![Page 60: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/60.jpg)
60
Multiprocessors Architecture and Taxonomy
Taken from: http://linuxgizmos.com/new-arm-cortex-a72-nearly-twice-as-fast-as-cortex-a57/
Cortex-A72
![Page 61: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/61.jpg)
Outline
61
• Multiprocessors Architecture and Taxonomy
• Parallel Execution Mechanism
• Multiprocessors Design Techniques
• Memory Systems
• Processors Symmetry
• Co-processing
![Page 62: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/62.jpg)
62
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
![Page 63: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/63.jpg)
63
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
Consider following approaches
• Shared memory.
• Threads.
• Message Passing.
• Data Parallel.
• Hybrid.
• Others
All these can be implemented on any architecture.
![Page 64: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/64.jpg)
64
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
Consider following approaches
• Shared memory.
• Threads.
• Message Passing.
• Data Parallel.
• Hybrid.
• Others
All these can be implemented on any architecture.
![Page 65: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/65.jpg)
65
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
Shared Memory
• Tasks share a common address space, which they read and write
asynchronously.
• Various mechanisms such as locks/semaphores may be used control access to
the shared memory.
• Advantage
• No need to explicitly communicate of data tasks simplified programming.
• Disadvantages
• Need to take care when managing memory, avoid synchronization conflicts.
• Harder to control data locality.
![Page 66: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/66.jpg)
66
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
In Hardware
• Shared memory systems use:
• UMA (Uniform Memory Access)
• NUMA (Non- Uniform Memory
Access)
• COMA (Cache-only memory
architecture)
In Software
• Inter-process communication (IPC).
• Virtual memory mapping.
![Page 67: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/67.jpg)
67
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
Consider following approaches
• Shared memory.
• Threads.
• Message Passing.
• Data Parallel.
• Hybrid.
• Others
All these can be implemented on any architecture.
![Page 68: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/68.jpg)
68
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
Threads
• A thread can be considered as a
subroutine in the main program.
• Threads communicate with each other
through the global memory.
• Commonly associated with shared
memory architectures and operating
systems.
• Posix Threads or pthreads.
• OpenMP.
![Page 69: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/69.jpg)
69
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
Threads
Advantages
• Responsiveness.
• Faster execution.
• Lower resource consumption.
• Better system utilization.
• Simplified share and communication
• Parallelization.
• Drawbacks
• Synchronization.
• Thread crashes a process.
![Page 70: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/70.jpg)
70
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
Consider following approaches
• Shared memory.
• Threads.
• Message Passing.
• Data Parallel.
• Hybrid.
• Others.
All these can be implemented on any architecture.
![Page 71: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/71.jpg)
71
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
Message Passing
• A set of tasks that use their own local memory
during computation.
• Data exchange through sending and receiving
messages.
• Data transfer usually requires cooperative
operations to be performed by each process.
• For example, a send operation must have a
matching receive operation.
• MPI
• Example here
![Page 72: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/72.jpg)
72
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
Consider following approaches
• Shared memory.
• Threads.
• Message Passing.
• Data Parallel.
• Hybrid.
• Others.
All these can be implemented on any architecture.
![Page 73: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/73.jpg)
73
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
Data Parallel
• Consider the following characteristics:
• Parallel work performs operations on a data set,
organized into a common structure.
• Tasks works collectively on the same data structure,
with each task working on a different partition.
• Tasks perform the same operation on their partition.
• Shared memory architectures, all tasks may have
access to the data structure through global memory.
• Distributed memory architectures the data structure is
split up and resides as “chunks” in the local memory
of each task.
• More information here.
![Page 74: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/74.jpg)
74
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
Consider following approaches
• Shared memory.
• Threads.
• Message Passing.
• Data Parallel.
• Hybrid.
• Others
All these can be implemented on any architecture.
![Page 75: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/75.jpg)
75
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
Hybrid
• Using various models (for example OpenMP/MPI).
• Single Program Multiple Data (SPMD)
• Single program is executed by all tasks simultaneously.
• Multiple Program Multiple Data (MPMD)
• Has multiple executables. Task can execute the same of different programs
as other task
![Page 76: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/76.jpg)
76
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
Consider following approaches
• Shared memory.
• Threads.
• Message Passing.
• Data Parallel.
• Hybrid.
• Others. (Depends on the architecture)
All these can be implemented on any architecture.
![Page 77: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/77.jpg)
77
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
Others
• MCAPI (Multicore Association)
• Poly-Platform
• CUDA
![Page 78: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/78.jpg)
78
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
Others
• MCAPI (Multicore Association)
• Poly-Platform
• CUDA
![Page 79: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/79.jpg)
79
Parallel Execution Mechanism
Taken from: https://en.wikipedia.org/wiki/Multicore_Association
MCAPI (Multicore Association)
• Founded in 2005
• First specification and referred to as MCAPI
• Based on message-passing
• Target is addressed to system, toolchain and programming language
heterogeneous.
• Active working
• MCAPI
• Virtualization.
• Open Asymmetric Multiprocessing (OpenAMP)
![Page 80: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/80.jpg)
80
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
Others
• MCAPI (Multicore Association)
• Poly-Platform
• CUDA
![Page 81: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/81.jpg)
81
Parallel Execution Mechanism
Taken from: http://polycoresoftware.com/poly-platform
Poly-Platform
• Collection productivity tools
• Migrating process
• Main approach multicore platforms.
• Driven supports for several SoC, OS and Transport Information.
![Page 82: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/82.jpg)
82
Parallel Execution Mechanism
Taken from: Parallel Computing Lectures Notes
Others
• MCAPI (Multicore Association)
• Poly-Platform
• CUDA
![Page 83: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/83.jpg)
83
Parallel Execution Mechanism
Taken from: https://en.wikipedia.org/wiki/CUDA
CUDA
• Initial release 2007.
• Parallel computing platform and
application programming interface.
• Created by NVIDIA.
• GPU approach.
• Supports in Windows, Linux and
macOS.
![Page 84: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/84.jpg)
Outline
84
• Multiprocessors Architecture and Taxonomy
• Parallel Execution Mechanism
• Multiprocessors Design Techniques
• Memory Systems
• Processors Symmetry
• Co-processing
![Page 85: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/85.jpg)
85
Multiprocessors Design Techniques
Taken from: W.Wolf High-Performance Embedded Computing
Embedded Systems Design Flows
• Co-design flows.
• Platform-based design.
• Two-stage process.
• Programming platforms.
• Standards-Based design.
MPSoCs?
![Page 86: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/86.jpg)
86
Multiprocessors Design Techniques
Challenges
• Software development is a major challenge for MPSoC designers.
• Software that runs on the multiprocessor must be high performance, real time,
and low power.
• Each MPSoC requires its own software development environment: compiler,
debugger, simulator, and other tools.
• Better understanding of how to abstract tasks properly to capture the essential
characteristics of their low-level behavior for system-level analysis.
Taken from: W.Wolf Multprocessor Systems on Chip
![Page 87: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/87.jpg)
87
Multiprocessors Design Techniques
Taken from: W. Wolf Multiprocessor Systems on Chip
Challenges
• Networks-on-chips have emerged over the past few years as an architectural
approach to the design of single-chip multiprocessors.
• FPGAs have emerged as a viable alternative to application-specific integrated
circuits (ASICs) in many markets. FPGA fabrics are also starting to be
integrated into SoCs.
![Page 88: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/88.jpg)
88
Multiprocessors Design Techniques
Taken from: SoC Lectures Notes
Challenges
• C code sequence is not easy to replace.
• Algorithm specification contains parallel specifications (Model of computation
KPN, SDF, etc).
• Not new programming languages.
• Automatically and parallel programming.
• Platform-based design (SW synthesis) or SW and HW synthesis.
![Page 89: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/89.jpg)
89
Multiprocessors Design Techniques
Taken from: MPSoCs https://slideplayer.com/slide/8773117/
Challenges
All MPSOC design have the following requirements:
• Speed.
• Power.
• Area.
• Application Performance.
• Time to market.
![Page 90: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/90.jpg)
90
Multiprocessors Design Techniques
Taken from: SoC Lectures Notes
MPSoCs Programming
• Task mapping to multiprocessor or cores.
• Communication inter-processor management.
• Data transfer engine management.
• Shared resource management.
• Memory management
• Debugging.
![Page 91: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/91.jpg)
91
Multiprocessors Design Techniques
Taken from: SoC Lectures Notes
MPSoCs Exploration
• Divide computational and communications.
![Page 92: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/92.jpg)
92
Multiprocessors Design Techniques
Taken from: SoC Lectures Notes
Virtual Processing Unit VPU
• Load simulator: It is a high-level simulation of
the core behavior.
• Functional simulator: Native execution of
tasks, scheduling is given by the VPU OS.
![Page 93: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/93.jpg)
93
Multiprocessors Design Techniques
Taken from: SoC Lectures Notes
Virtual Processing Unit VPU
Allows spatial and temporal modeling of task mapping to PE
![Page 94: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/94.jpg)
94
Multiprocessors Design Techniques
Taken from: SoC Lectures Notes
Virtual Platform
• It is a software model that allows the exploration of hardware and software.
• It allows hardware platform exploration and optimization.
• Software development, debugging and optimization.
• Concurrent hardware and software design.
![Page 95: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/95.jpg)
95
Multiprocessors Design Techniques
Taken from: SoC Lectures Notes
Virtual Platform
• Requirements:
• High speed in terms of simulation process.
• Compromise between simulation speed and precision.
• Flexibility.
• Usability by developers not experts in hardware.
![Page 96: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/96.jpg)
96
Multiprocessors Design Techniques
Design Techniques
• Core-based Strategy.
• Wrappers.
• System-level design flow.
• Platform-based design.
• Component-based design.
Taken from: W.Wolf High-Performance Embedded Computing
![Page 97: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/97.jpg)
97
Multiprocessors Design Techniques
Design Techniques
• Core-based Strategy.
• Wrappers.
• System-level design flow.
• Platform-based design.
• Component-based design.
Taken from: W.Wolf High-Performance Embedded Computing
![Page 98: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/98.jpg)
98
Multiprocessors Design Techniques
Core-based Strategy
• Core-based synthesis strategy for the IBM CoreConnect bus.
• Coral tool automates many of the tasks required to stitch together multiple
cores using virtual components.
• Each virtual component describes the interfaces for a class of real
components.
• Coral can synthesize some combinational logic.
• Coral also checks the connections between cores using Boolean decision
diagrams.
Taken from: W.Wolf High-Performance Embedded Computing
![Page 99: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/99.jpg)
99
Multiprocessors Design Techniques
Core-based Strategy
Core Connect provides three types of busses:
• A high-speed processor local bus (PLB).
• An on-chip peripheral bus (OPB).
• A device control register (DCR) bus for configuration and status information.
Taken from: W.Wolf High-Performance Embedded Computing
![Page 100: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/100.jpg)
100
Multiprocessors Design Techniques
Taken from: SoC Lectures Notes
Core-based Strategy
![Page 101: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/101.jpg)
101
Multiprocessors Design Techniques
Design Techniques
• Core-based Strategy.
• Wrappers.
• System-level design flow.
• Platform-based design.
• Component-based design.
Taken from: W.Wolf High-Performance Embedded Computing
![Page 102: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/102.jpg)
102
Multiprocessors Design Techniques
Wrappers
• Treats both hardware and software as
components.
• A wrapper is a design unit that interfaces a
module to another module.
• A wrapper can be hardware or software
and may include both.
• The wrapper performs only low-level
adaptations, such as protocol
transformationTaken from: W.Wolf High-Performance Embedded Computing
![Page 103: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/103.jpg)
103
Multiprocessors Design Techniques
Wrappers
Heterogeneous multiprocessor introduce several types of problems:
• Many chips have multiple communication networks to match the network to
the processing needs. Synchronizing communication across network
boundaries is more difficult than communicating within a network.
• Specialized hardware is often needed to accelerate interprocess
communication and free the CPU for more interesting computations.
• The communication primitives should be at a higher level of abstraction than
shared memory.
Taken from: W.Wolf High-Performance Embedded Computing
![Page 104: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/104.jpg)
104
Multiprocessors Design Techniques
Wrappers
A dedicated CPU is added to the system, its software must be adapted
in several ways:
1. The software must be updated to support the platform’s communication
primitives.
2. Optimized implementations of the host processor’s communication
functions must be provided for interprocessor communication.
3. Synchronization functions must be provided.
Taken from: W.Wolf High-Performance Embedded Computing
![Page 105: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/105.jpg)
105
Multiprocessors Design Techniques
Design Techniques
• Core-based Strategy.
• Wrappers.
• System-level design flow.
• Platform-based design.
• Component-based design.
Taken from: W.Wolf High-Performance Embedded Computing
![Page 106: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/106.jpg)
106
Multiprocessors Design Techniques
System-Level Design
• An abstract platform is created from a combination of system requirements,
models of the software, and models of the hardware components.
• Abstract platform is analyzed to determine the application’s performance
and power/energy consumption.
• Based on the results of this analysis, software is allocated and scheduled
onto the platform.
• Golden abstract architecture that can be used to build the implementation.
Taken from: W.Wolf High-Performance Embedded Computing
![Page 107: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/107.jpg)
107
Multiprocessors Design Techniques
System-Level Design
Taken from: W.Wolf High-Performance Embedded Computing
![Page 108: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/108.jpg)
108
Multiprocessors Design Techniques
System-Level Design
Major elements of an abstract architecture:
1. Software tasks are described by their data and
scheduling dependencies; they
interface to an API.
2. Hardware components consist of a core and an
interface.
3. The hardware/software integration is modeled by
the communication network that connects the CPUs
that run the software and the hardware IP
cores.
Taken from: W.Wolf High-Performance Embedded Computing
![Page 109: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/109.jpg)
109
Multiprocessors Design Techniques
Design Techniques
• Core-based Strategy.
• Wrappers.
• System-level design flow.
• Platform-based design.
• Component-based design.
Taken from: W.Wolf High-Performance Embedded Computing
![Page 110: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/110.jpg)
110
Multiprocessors Design Techniques
Platform-based Design
• Design space: platform selection
• Platform programming
• Multi-CPUs
• Concurrency
• Real-Time
• Platform developer must be
provided tools (compiler, editors,
debuggers, simulators, etc)
Taken from: Introduction to Embedded Systems
![Page 111: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/111.jpg)
111
Multiprocessors Design Techniques
Platform-based Design
• Start with functional specifications
• Task graphs.
• Nodes: Task to complete
• Edges: Communication and
dependence between tasks
• Execution time on the nodes.
• Data communicated on the edges.
Taken from: MPSoCs https://slideplayer.com/slide/8773117/
![Page 112: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/112.jpg)
112
Multiprocessors Design Techniques
Platform-based Design
• Map task on pre-designed HW.
• Use extended task graph for SW and
Communication
Taken from: MPSoCs https://slideplayer.com/slide/8773117/
![Page 113: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/113.jpg)
113
Multiprocessors Design Techniques
Platform-based Design
• Map task on pre-designed HW.
• Use extended task graph for SW and
Communication
Taken from: MPSoCs https://slideplayer.com/slide/8773117/
![Page 114: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/114.jpg)
114
Multiprocessors Design Techniques
Design Techniques
• Core-based Strategy.
• Wrappers.
• System-level design flow.
• Platform-based design.
• Component-based design.
Taken from: W.Wolf High-Performance Embedded Computing
![Page 115: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/115.jpg)
115
Multiprocessors Design Techniques
Component Based Design
• Conceptual MPSOCs platform.
• SW, Processor, IP, Communication
Fabric.
• Parallel Development
• Use APIs.
• Quicker time to market.
Taken from: MPSoCs https://slideplayer.com/slide/8773117/
![Page 116: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/116.jpg)
116
Multiprocessors Design Techniques
Component Based Design
Taken from: MPSoCs https://slideplayer.com/slide/8773117/
![Page 117: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/117.jpg)
117
Multiprocessors Design Techniques
Multicore Application Programming Studio (MAPS)
• Developed at RWTH Aachen University in Germany.
• It is a platform that offers tools and technologies for MPSoC programming.
• Main features are:
• Sequential C code partition.
• Parallel programming model.
• Mapping and scheduling.
• Different types of applications.
• Functional Verification (Virtual Platform).
• Multiple applications environment.
• IDE easy to use.
Taken from: M. Aguilar SoC Lectures Notes
![Page 118: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/118.jpg)
118
Multiprocessors Design Techniques
MAPS Flow
Taken from: M. Aguilar SoC Lectures Notes
![Page 119: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/119.jpg)
119
Multiprocessors Design Techniques
MAPS Flow
Taken from: M. Aguilar SoC Lectures Notes
![Page 120: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/120.jpg)
120
Multiprocessors Design Techniques
MAPS Programming Model: C for Paralell Network (CPN)
• Embedded Systems programming was used C language.
• CPN is a language developed as an extension of ANSI C in order to
describe process networks (KPN and SDF).
• A compiler called cpn-cc performs a transformation source-to-source to
convert code in CPN to code C standard with the APIs of the target
architecture.
Taken from: M. Aguilar SoC Lectures Notes
![Page 121: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/121.jpg)
121
Multiprocessors Design Techniques
MAPS Programming Model: C for Paralell Network (CPN)
Taken from: M. Aguilar SoC Lectures Notes
![Page 122: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/122.jpg)
122
Multiprocessors Design Techniques
MAPS Virtual Platform (MVP)
• MAPS Virtual Platform (MVP)
• High level: abstract PEs based on SystemC.
• Low level: (Instruction Set Simulators) ISS-based virtual platform.
• “mPhone” smartphone virtual.
Taken from: M. Aguilar SoC Lectures Notes
![Page 123: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/123.jpg)
123
Multiprocessors Design Techniques
Virtual Processing Element
• It is a parameterizable processing element.
• Clock frequency.
• Type (RISC, VLIW, DSP, etc).
• Scheduling algorithm (Round robin, EDF, based on priorities, etc).
Taken from: M. Aguilar SoC Lectures Notes
![Page 124: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/124.jpg)
Outline
124
• Multiprocessors Architecture and Taxonomy
• Parallel Execution Mechanism
• Multiprocessors Design Techniques
• Memory Systems
• Processors Symmetry
• Co-processing
![Page 125: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/125.jpg)
125
Memory Systems
Memory Systems
Taken from: W. Wolf High-Performance Embedded Computing
Memory Systems
![Page 126: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/126.jpg)
126
Memory Systems
Memory Systems
• The memory system is a traditional bottleneck in computing.
• Not only are memories slower than processors, but processor clock rates
are increasing much faster than memory cycle times.
Taken from: W. Wolf High-Performance Embedded Computing and
https://www.taringa.net/+serviciotecnico/consulta-cuello-de-botella-cpu-debil-en-gpu-potente_15casq
![Page 127: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/127.jpg)
127
Memory Systems
Memory Systems
Taken from: Multi-core architectures
![Page 128: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/128.jpg)
128
Memory Systems
Memory Systems
Taken from: MPSoCs Hardware platforms Lectures Notes
![Page 129: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/129.jpg)
129
Memory Systems
Memory Systems
• Start with a look at parallel memory systems in scientific multiprocessors.
• Consider models for memory and motivations for heterogeneous memory
systems.
• Look at what sorts of consistency mechanisms are needed in embedded
multiprocessors.
Taken from: W. Wolf Hugh-Performance Embedded Computing
![Page 130: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/130.jpg)
130
Memory Systems
Memory Systems
Taken from: W. Wolf High-Performance Embedded Computing
Memory Systems
Homogeneous Heterogenous
![Page 131: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/131.jpg)
131
Memory Systems
Memory Systems
Taken from: W. Wolf High-Performance Embedded Computing
Memory Systems
Homogeneous Heterogenous
![Page 132: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/132.jpg)
132
Memory Systems
Memory Systems
In terms of understanding memory systems considers following case study:
• Scientific processors traditionally use parallel, homogeneous memory
systems to increase system performance.
• Multiple memory banks allow several memory accesses to occur
simultaneously.
Taken from: W. Wolf High-Performance Embedded Computing
![Page 133: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/133.jpg)
133
Memory Systems
Memory Systems
• Each bank is separately addressable.
Taken from: W. Wolf High-Performance Embedded Computing
![Page 134: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/134.jpg)
134
Memory Systems
Memory Systems
• If the memory system has n banks,
then n accesses can be performed in
parallel.
• This is known as the peak access
rate.
Taken from: W. Wolf High-Performance Embedded Computing
![Page 135: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/135.jpg)
135
Memory Systems
Memory Systems
• Cannot keep the memory busy all of
the time.
• A simple statistical model lets us
estimate performance of a random-
access program.
Taken from: W. Wolf High-Performance Embedded Computing
![Page 136: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/136.jpg)
136
Memory Systems
Memory Systems
• Assume that the program accesses a
certain number of sequential
locations, then moves to some other
location.
• Where:
• λ describes probability of a
nonsequential memory access (a
branch in code to be a nonconsecutive
data location).
• k describes sequential accesses.Taken from: W. Wolf High-Performance Embedded Computing
![Page 137: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/137.jpg)
137
Memory Systems
Memory Systems
• Where:
• 𝑝 𝑘 = 𝜆 1 − 𝜆 𝑘−1
• And the mean length of a sequential
access sequence is:
• 𝐿𝑏 =1− 1−𝜆 𝑚
𝜆
Taken from: W. Wolf High-Performance Embedded Computing
![Page 138: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/138.jpg)
138
Memory Systems
Memory Systems
• Use program statistics to estimate
the average probability of
nonsequential accesses, design the
memory system accordingly.
• Use software techniques to
maximize the length of access
sequences wherever possible.
Taken from: W. Wolf High-Performance Embedded Computing
![Page 139: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/139.jpg)
139
Memory Systems
Memory Systems
Taken from: W. Wolf High-Performance Embedded Computing
Memory Systems
Homogeneous Heterogenous
![Page 140: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/140.jpg)
140
Memory Systems
Memory Systems
• Embedded systems can make use of multiple-bank memory systems, but they
also make use of more heterogeneous memory architectures.
• They do so to improve the real-time performance and lower the power
consumption of the memory system.
Taken from: W. Wolf High-Performance Embedded Computing
![Page 141: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/141.jpg)
141
Memory Systems
Memory Systems
Why do heterogeneous memory systems
improve real-time performance?
Taken from: W. Wolf High-Performance Embedded Computing
![Page 142: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/142.jpg)
142
Memory Systems
Memory Systems
• The energy required to perform a memory access depends in part on the size of
the memory block being accessed.
• A heterogeneous memory may be able to use smaller memory blocks, reducing
the access time.
• Energy per access also depends on the number of ports on the memory block.
• By reducing the number of units that can access a given part of memory, the
heterogeneous memory system can reduce the energy required to access that
part of the memory space.
Taken from: W. Wolf High-Performance Embedded Computing
![Page 143: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/143.jpg)
143
Memory Systems
Memory Systems
Taken from: W. Wolf High-Performance Embedded Computing
Memory Systems
Homogeneous Heterogenous
Consistent Memory Systems
![Page 144: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/144.jpg)
144
Memory Systems
Memory Systems
Taken from: W. Wolf High-Performance Embedded Computing
Shared
variables
Consistent
Memory Systems
Snooping
cachesCache
consistency
![Page 145: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/145.jpg)
145
Memory Systems
Memory Systems
• Shared variables
• To worry about whether two processors see the same state of a shared variable.
• If reads and writes of two processors are interleaved, then one processor may write
the variable after another one has written it, causing that processor to erroneously
assume the value of the variable.
• Critical sections, guarded by semaphores, to ensure that critical operations occur in
the right order.
• Use atomic test-and-set operations (often called spin locks) to guard small pieces of
memory.
Taken from: W. Wolf High-Performance Embedded Computing
![Page 146: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/146.jpg)
146
Memory Systems
Memory Systems
• Cache consistency
• If two processors access the same
memory location, then each may have
a copy of the location in its own cache.
• If one processing element writes that
location, then the other will not
immediately see the change and will
make an incorrect computation.
Taken from: W. Wolf High-Performance Embedded Computing
![Page 147: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/147.jpg)
147
Memory Systems
Memory Systems
• Snooping Cache
• This type of cache contains extra
logic that watches the
multiprocessor interconnect for
memory transactions.
• When it sees a write to a location
that it currently contains, it
invalidates that location.
Taken from: W. Wolf High-Performance Embedded Computing
![Page 148: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/148.jpg)
148
Memory Systems
Memory Systems
Taken from: W. Wolf High-Performance Embedded Computing
Shared
memory
Memory Systems
Architecture
Hybrid
memoryDistributed
memory
![Page 149: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/149.jpg)
149
Memory Systems
Memory Systems
Taken from: W. Wolf High-Performance Embedded Computing
Shared
memory
Memory Systems
Architecture
Hybrid
memoryDistributed
memory
![Page 150: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/150.jpg)
150
Memory Systems
Memory Systems
• Shared Memory
• Shared memory parallel computers vary
widely, but generally have in common the
ability for all processors to access all
memory as global address space.
• Multiple processors can operate
independently but share the same memory
resources.
Taken from: W. Wolf High-Performance Embedded Computing,
https://en.wikipedia.org/wiki/Shared_memory#/media/File:Shared_memory.svg,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 151: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/151.jpg)
151
Memory Systems
Memory Systems
• Shared Memory
• Changes in a memory location effected by
one processor are visible to all other
processors.
• Historically, shared memory machines
have been classified as UMA and NUMA,
based upon memory access times.
Taken from: W. Wolf High-Performance Embedded Computing,
https://en.wikipedia.org/wiki/Shared_memory#/media/File:Shared_memory.svg,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 152: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/152.jpg)
152
Memory Systems
Memory Systems
• Shared Memory (Uniform Memory
Access UMA)
• Most commonly represented today by
Symmetric Multiprocessor (SMP)
machines.
• Identical processors.
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 153: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/153.jpg)
153
Memory Systems
Memory Systems
• Shared Memory (Uniform Memory
Access UMA)
• Equal access and access times to
memory.
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 154: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/154.jpg)
154
Memory Systems
Memory Systems
• Shared Memory (Uniform Memory Access
UMA)
• Sometimes called CC-UMA - Cache
Coherent UMA. Cache coherent means if one
processor updates a location in shared
memory, all the other processors know about
the update. Cache coherency is accomplished
at the hardware level.
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 155: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/155.jpg)
155
Memory Systems
Memory Systems
• Shared Memory (Non-Uniform Memory
Access NUMA)
• Often made by physically linking two or
more SMPs.
• One SMP can directly access memory of
another SMP.
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 156: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/156.jpg)
156
Memory Systems
Memory Systems
• Shared Memory (Non-Uniform Memory
Access NUMA)
• Not all processors have equal access time to
all memories.
• Memory access across link is slower
• If cache coherency is maintained, then may
also be called CC-NUMA - Cache Coherent
NUMA.
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 157: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/157.jpg)
157
Memory Systems
Memory Systems
• Shared Memory
• Advantages
• Global address space provides a user-
friendly programming perspective to
memory.
• Data sharing between tasks is both fast
and uniform due to the proximity of
memory to CPUs.
Taken from: W. Wolf High-Performance Embedded Computing,,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 158: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/158.jpg)
158
Memory Systems
Memory Systems
• Shared Memory
• Disadvantages
• Primary disadvantage is the lack of
scalability between memory and CPUs.
Adding more CPUs can geometrically
increases traffic on the shared memory-CPU
path, and for cache coherent systems,
geometrically increase traffic associated with
cache/memory management.
Taken from: W. Wolf High-Performance Embedded Computing,,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 159: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/159.jpg)
159
Memory Systems
Memory Systems
• Shared Memory
• Disadvantages
• Programmer responsibility for
synchronization constructs that ensure
"correct" access of global memory.
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 160: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/160.jpg)
160
Memory Systems
Memory Systems
Taken from: W. Wolf High-Performance Embedded Computing
Shared
memory
Memory Systems
Architecture
Hybrid
memoryDistributed
memory
![Page 161: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/161.jpg)
161
Memory Systems
Memory Systems
• Distributed Memory
• Like shared memory systems, distributed
memory systems vary widely but share a
common characteristic.
• Distributed memory systems require a
communication network to connect inter-
processor memory.
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 162: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/162.jpg)
162
Memory Systems
Memory Systems
• Distributed Memory
• Processors have their own local memory.
Memory addresses in one processor do not
map to another processor, so there is no
concept of global address space across all
processors.
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 163: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/163.jpg)
163
Memory Systems
Memory Systems
• Distributed Memory
• Because each processor has its own local
memory, it operates independently.
Changes it makes to its local memory have
no effect on the memory of other
processors. Hence, the concept of cache
coherency does not apply.
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 164: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/164.jpg)
164
Memory Systems
Memory Systems
• Distributed Memory
• When a processor needs access to data in
another processor, it is usually the task of
the programmer to explicitly define how
and when data is communicated.
Synchronization between tasks is likewise
the programmer's responsibility.
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 165: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/165.jpg)
165
Memory Systems
Memory Systems
• Distributed Memory
• The network "fabric" used for data transfer
varies widely, though it can be as simple as
Ethernet.
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 166: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/166.jpg)
166
Memory Systems
Memory Systems
• Distributed Memory
• Advantages
• Memory is scalable with the number
of processors. Increase the number of
processors and the size of memory
increases proportionately.
Taken from: W. Wolf High-Performance Embedded Computing,
https://en.wikipedia.org/wiki/Shared_memory#/media/File:Shared_memory.svg,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 167: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/167.jpg)
167
Memory Systems
Memory Systems
• Distributed Memory
• Advantages
• Each processor can rapidly access its
own memory without interference and
without the overhead incurred with
trying to maintain global cache
coherency.
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 168: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/168.jpg)
168
Memory Systems
Memory Systems
• Distributed Memory
• Advantages
• Cost effectiveness: can use
commodity, off-the-shelf processors
and networking.
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 169: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/169.jpg)
169
Memory Systems
Memory Systems
• Distributed Memory
• Disadvantages
• The programmer is responsible for
many of the details associated with data
communication between processors.
• It may be difficult to map existing data
structures, based on global memory, to
this memory organization.
• .Taken from: W. Wolf High-Performance Embedded Computing,
https://en.wikipedia.org/wiki/Shared_memory#/media/File:Shared_memory.svg,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 170: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/170.jpg)
170
Memory Systems
Memory Systems
• Distributed Memory
• Disadvantages
• Non-uniform memory access times -
data residing on a remote node takes
longer to access than node local data.
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 171: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/171.jpg)
171
Memory Systems
Memory Systems
Taken from: W. Wolf High-Performance Embedded Computing
Shared
memory
Memory Systems
Architecture
Hybrid
memoryDistributed
memory
![Page 172: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/172.jpg)
172
Memory Systems
Memory Systems
• Hybrid Memory
• The largest and fastest computers in the
world today employ both shared and
distributed memory architectures.
• The shared memory component can be a
shared memory machine and/or graphics
processing units (GPU).
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 173: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/173.jpg)
173
Memory Systems
Memory Systems
• Hybrid Memory
• The distributed memory component is
the networking of multiple shared
memory/GPU machines, which know
only about their own memory - not the
memory on another machine. Therefore,
network communications are required to
move data from one machine to another.
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 174: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/174.jpg)
174
Memory Systems
Memory Systems
• Hybrid Memory
• Current trends seem to indicate that this
type of memory architecture will
continue to prevail and increase at the
high end of computing for the
foreseeable future.
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 175: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/175.jpg)
175
Memory Systems
Memory Systems
• Hybrid Memory
• Advantages and Disadvantages
• Whatever is common to both shared and
distributed memory architectures.
• Increased scalability is an important
advantage.
• Increased programmer complexity is an
important disadvantage.
Taken from: W. Wolf High-Performance Embedded Computing,
https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch
![Page 176: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/176.jpg)
176
Memory Systems
Design Memory Systems?
Taken from: W. Wolf High-Performance Embedded Computing,
![Page 177: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/177.jpg)
177
Memory Systems
Design Memory Systems
A simple model of memory components for parallel memory design would include
three major parameters of a memory component of a given size.
• Area: The physical size of the logical component. This is most important in chip design, but it also
relates to cost in board design.
• Performance: The access time of the component. There may be more than one parameter, with
variations for read and write times, page mode accesses, and so on.
• Energy: The energy required per access. If performance is characterized by multiple modes, energy
consumption will exhibit similar modes.
Taken from: W. Wolf High-Performance Embedded Computing,
![Page 178: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/178.jpg)
178
Memory Systems
Design Memory Systems
Taken from: W. Wolf High-Performance Embedded Computing,
![Page 179: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/179.jpg)
179
Memory Systems
Memory Systems
Taken from: https://www.xataka.com/ordenadores/el-cuello-de-botella-de-la-ley-de-moore-no-esta-en-los-procesadores-sino-en-las-memorias
![Page 180: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/180.jpg)
180
Memory Systems
Memory Systems
Taken from: https://www.xataka.com/ordenadores/el-cuello-de-botella-de-la-ley-de-moore-no-esta-en-los-procesadores-sino-en-las-memorias
![Page 181: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/181.jpg)
Outline
181
• Multiprocessors Architecture and Taxonomy
• Parallel Execution Mechanism
• Multiprocessors Design Techniques
• Memory Systems
• Processors Symmetry
• Co-processing
![Page 182: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/182.jpg)
182
Processors Symmetry
Taken from: W. Wolf High-Performance Embedded Computing
Symmetric
SMP
Multi-processing
Asymmetric
AMP
![Page 183: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/183.jpg)
183
Processors Symmetry
Taken from: W. Wolf High-Performance Embedded Computing
Symmetric
SMP
Multi-processing
Asymmetric
AMP
![Page 184: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/184.jpg)
184
Processors Symmetry
Taken from: M. Aguilar SoCs
Symmetric Multi-processing (SMP)
• System with multiple processors or cores that are communicated by a single
shared memory and are controlled by a single operating system
![Page 185: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/185.jpg)
185
Processors Symmetry
Taken from: https://www.geeksforgeeks.org/what-is-smp-symmetric-multi-processing/
Symmetric Multi-processing (SMP)
• Identical: All the processors are treated equally i.e. all are identical.
• Communication: Shared memory is the mode of communication among
processors.
• Complexity: Are complex in design, as all units share same memory and data
bus.
• Expensive: They are costlier in nature.
• Unlike asymmetric where a task is done only by Master processor, here tasks of
the operating system are handled individually by processors.
![Page 186: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/186.jpg)
186
Processors Symmetry
Taken from: https://www.geeksforgeeks.org/what-is-smp-symmetric-multi-processing/
Symmetric Multi-processing (SMP)
• Applications
• This concept finds its application in parallel processing, where time-sharing
systems(TSS) have assigned tasks to different processors running in parallel
to each other, also in TSS that uses multithreading i.e. multiple threads
running simultaneously.
![Page 187: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/187.jpg)
187
Processors Symmetry
Taken from: https://www.geeksforgeeks.org/what-is-smp-symmetric-multi-processing/
Symmetric Multi-processing (SMP)
• Advantages
• Throughput: Since tasks can be run by all the processors unlike in
asymmetric, hence increased degree of throughput(processes executed in unit
time).
• Reliability: Failing a processor doesn’t fail whole system, as all are equally
capable processors, though throughput do fail a little.
![Page 188: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/188.jpg)
188
Processors Symmetry
Taken from: https://www.geeksforgeeks.org/what-is-smp-symmetric-multi-processing/
Symmetric Multi-processing (SMP)
• Disadvantages
• Complex design: Since all the processors are treated equally by OS, so
designing and management of such OS become difficult.
• Costlier: As all the processors share the common main memory, on account
of which size of memory required is larger implying more expensive.
![Page 189: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/189.jpg)
189
Processors Symmetry
Taken from: https://www.enea.com/globalassets/downloads/operating-systems/enea-oseck/enea-smp-platform-for-xilinx-zynq-datasheet.pdf
Symmetric Multi-processing (SMP)
![Page 190: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/190.jpg)
190
Processors Symmetry
Taken from: https://www.enea.com/globalassets/downloads/operating-systems/enea-oseck/enea-smp-platform-for-xilinx-zynq-datasheet.pdf
Symmetric Multi-processing (SMP)
More information here
![Page 191: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/191.jpg)
191
Processors Symmetry
Taken from: W. Wolf High-Performance Embedded Computing
Symmetric
SMP
Multi-processing
Asymmetric
AMP
![Page 192: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/192.jpg)
192
Processors Symmetry
Taken from: M. Aguilar SoC Lectures Notes
Asymmetric Multi-processing (AMP)
• Is a system with multiple processors or cores that are communicated by a single
shared memory and each processor or cores is controlled by an independent
operating system (different or equal).
![Page 193: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/193.jpg)
193
Processors Symmetry
Asymmetric Multi-processing (AMP)
• Characteristics
• Processors are not treated equally.
• Tasks of the operating system are done by master processor.
• No Communication between Processors as they are controlled by the
master processor.
• Process are master-slave.
• Systems are cheaper.
• Systems are easier to design.
Taken from: https://www.geeksforgeeks.org/what-is-smp-symmetric-multi-processing/
![Page 194: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/194.jpg)
194
Processors Symmetry
Taken from: https://www.openampproject.org/old_website/docs/mca/BKK19%20OpenAMP%20Introduction.pdf
Asymmetric Multi-processing (AMP)
![Page 195: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/195.jpg)
195
Processors Symmetry
Taken from: https://www.openampproject.org/old_website/docs/mca/BKK19%20OpenAMP%20Introduction.pdf
Asymmetric Multi-processing (AMP)
![Page 196: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/196.jpg)
196
Processors Symmetry
Taken from: https://www.openampproject.org/old_website/docs/mca/BKK19%20OpenAMP%20Introduction.pdf
Asymmetric Multi-processing (AMP)
![Page 197: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/197.jpg)
197
Processors Symmetry
Asymmetric Multi-processing (AMP)
Taken from: https://github.com/OpenAMP/open-amp
![Page 198: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/198.jpg)
198
Processors Symmetry
Asymmetric Multi-processing (AMP)
Taken from: https://github.com/OpenAMP/open-amp
![Page 199: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/199.jpg)
199
Processors Symmetry
Taken from: https://www.openampproject.org/old_website/docs/mca/BKK19%20OpenAMP%20Introduction.pdf
Asymmetric Multi-processing (AMP)
![Page 200: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/200.jpg)
Outline
200
• Multiprocessors Architecture and Taxonomy
• Parallel Execution Mechanism
• Multiprocessors Design Techniques
• Memory Systems
• Processors Symmetry
• Co-processing
![Page 201: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/201.jpg)
201
Co-processing
Taken from: https://www.researchgate.net/publication/250840737_Automatic_Generation_of_Application-
Specific_Architectures_for_Heterogeneous_MPSoC_through_Combination_of_Processors/figures
![Page 202: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/202.jpg)
202
Co-processing
Taken from: https://www.researchgate.net/publication/250840737_Automatic_Generation_of_Application-
Specific_Architectures_for_Heterogeneous_MPSoC_through_Combination_of_Processors/figures
![Page 203: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/203.jpg)
203
Co-processing
Taken from: https://www.researchgate.net/publication/250840737_Automatic_Generation_of_Application-
Specific_Architectures_for_Heterogeneous_MPSoC_through_Combination_of_Processors/figures
![Page 204: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/204.jpg)
204
Co-processing
Taken from: http://www.cecs.uci.edu/~papers/esweek06/codes/p288.pdf
![Page 205: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/205.jpg)
205
Co-processing
Taken from: https://www.researchgate.net/publication/221656884_A_Generic_Wrapper_Architecture_for_Multi-
Processor_SoC_Cosimulation_and_Design/figures?lo=1
![Page 206: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/206.jpg)
206
Co-processing
Taken from: https://link.springer.com/chapter/10.1007/978-3-319-01113-4_1
![Page 207: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/207.jpg)
207
Co-processing
What is a coprocessor?
![Page 208: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/208.jpg)
208
Co-processing
A coprocessor is:
• A computer processor used to supplement functions of the primary processor.
• Several operations performed by the coprocessor such as:
• Floating Point (FPU).
• Graphics Processing.
• Signal Processing.
• Cryptography.
• Etc, ……
Taken from: https://youtu.be/xrMUv9ZVKY0
![Page 209: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/209.jpg)
209
Co-processing
A coprocessor is:
• By offloading processor intensive tasks from the main processor, coprocessor can
accelerate system performance.
• Coprocessors allow a line of computers to be customized, so that customers who
do not need extra performance need not pay for it.
Taken from: https://youtu.be/xrMUv9ZVKY0
![Page 210: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/210.jpg)
210
Co-processing
Functions
• A coprocessor may not be a general-purpose processor.
• Coprocessors cannot fetch instructions from memory, execute program flow
control instructions, do input/output operations manage memory and so on.
• The coprocessor requires the host (main) processor to fetch the coprocessor
instructions and handle all other operations aside from the coprocessor functions.
• In some architectures the coprocessor is a more general-purpose computer but
carries out only a limited range of functions under the close control of a
supervisory processor.
Taken from: https://youtu.be/xrMUv9ZVKY0
![Page 211: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/211.jpg)
211
Co-processing
Taken from: https://www.doulos.com/knowhow/arm/using_your_c_compiler_to_exploit_neon/Resources/using_your_c_compiler_to_exploit_neon.pdf
Coprocessor
![Page 212: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/212.jpg)
212
Co-processing
NEON Arm
• v7-A architecture, ARM has introduced a powerful SIMD implementation called
NEON™.
• NEON is a coprocessor which comes with its own instruction set for vector
operations.
• Most vector operations carry out the same operation on all elements of their
operand vector(s) in parallel.
• Using your C compiler to exploit NEON™ Advanced SIMD.
Taken from: https://youtu.be/xrMUv9ZVKY0
![Page 213: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/213.jpg)
213
Co-processing
NEON Arm
• The goal of NEON is to provide a powerful, yet comparatively easy to program
SIMD instruction set that covers integer data types of up to 64-bit width as well
as single precision floating point (32 bit).
• Instead it shares its sixteen 128-bit registers with the vector floating point unit.
• Executed on the same processor core, NEON performance is influenced by
context switching overhead, non-deterministic memory access latency
(cache/MMU access) and interrupt handling.
Taken from: https://youtu.be/xrMUv9ZVKY0
![Page 214: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/214.jpg)
214
Co-processing
NEON Arm
Taken from: https://youtu.be/xrMUv9ZVKY0
![Page 215: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/215.jpg)
215
Co-processing
NEON Arm
Taken from: https://youtu.be/xrMUv9ZVKY0
![Page 216: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/216.jpg)
216
Co-processing
NEON Arm
Taken from: https://youtu.be/xrMUv9ZVKY0
![Page 217: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/217.jpg)
217
Co-processing
NEON Arm
Taken from: https://youtu.be/xrMUv9ZVKY0
![Page 218: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/218.jpg)
218
Co-processing
NEON Arm
Taken from: https://youtu.be/xrMUv9ZVKY0
![Page 219: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/219.jpg)
219
Co-processing
NEON Arm
Taken from: https://youtu.be/xrMUv9ZVKY0
![Page 220: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/220.jpg)
220
Co-processing
DSP’s
Taken from: Introduccion a los Sistemas Empotrados Lectures Notes
![Page 221: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/221.jpg)
221
Co-processing
DSP’s
Taken from: M. Aguilar SoC Lectures Notes
![Page 222: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/222.jpg)
222
Co-processing
DSP’s
Taken from: M. Aguilar SoC Lectures Notes
![Page 223: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/223.jpg)
223
Co-processing
GPU
Taken from: https://www.anandtech.com/show/14101/nvidia-announces-jetson-nano
![Page 224: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/224.jpg)
224
Co-processing
GPU
Taken from: https://www.anandtech.com/show/14101/nvidia-announces-jetson-nano
![Page 225: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/225.jpg)
225
Co-processing
Flight controller UAV
Taken from: https://cdn.sparkfun.com/assets/d/d/9/9/3/Pixhawk4-DataSheet.pdf
![Page 226: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/226.jpg)
226
Co-processing
Flight controller UAV
Taken from: https://cdn.sparkfun.com/assets/d/d/9/9/3/Pixhawk4-DataSheet.pdf
![Page 227: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/227.jpg)
227
References
[1] Lectures Notes, Tecnologico de Costa Rica, Course SoC.
[2] W. Wolf. High-Performance Embedded Computing: Architectures, Applications
and Methodologies. Elsevier, United States of America, 2007.
[3] E. Ashford and S. Arunkumar Introduction to Embedded Systems, 2017
Lectures notes and materials are available in TEC-Digital and web portal
www.ie.tec.ac.cr/sarriola/HPEC
www.ie.tec.ac.cr/joaraya
![Page 228: High Performance Embedded Systems MPSoCs](https://reader030.vdocument.in/reader030/viewer/2022012812/61c3ae6a427037686a209367/html5/thumbnails/228.jpg)
228