benchmarks for analyzing embedded multicore processor
TRANSCRIPT
![Page 1: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/1.jpg)
Leveling the Field for Multicore Open Systems Architectures
Markus LevyPresident, EEMBC
President, Multicore Association
![Page 2: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/2.jpg)
Analyzing the Multicore Ecosystem
• Embedded Microprocessor Benchmark Consortium® (EEMBC)
• Industry benchmarks since 1997• Tool for evaluating embedded processors,
compilers, systems• MultiBench for shredding multicore
processors
![Page 3: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/3.jpg)
Enabling the Multicore Ecosystem
• Initial engagement began in May 2005• Industry-wide participation• Current efforts
• Communications APIs• Hypervisors• Multicore Programming Practices• Resource Management
![Page 4: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/4.jpg)
Multicore Issues to Solve• Concurrent programming • Communications, synchronization, resource
management between/among cores• Performance analysis• Debugging• Distributed power management• OS virtualization• Modeling and simulation• Load balancing• Algorithm partitioning
![Page 5: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/5.jpg)
Multicore Benchmarking Rules
• Do not rely on a single answer• Match your application requirements
– Small or large data sets– Few or many threads– Dependencies– OS overhead
![Page 6: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/6.jpg)
Benchmarking Multicore – What’s Important?
• Measuring scalability• Memory and I/O bandwidth• Inter-core communications• OS scheduling support• Efficiency of synchronization• System-level functionality
![Page 7: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/7.jpg)
EEMBC Multicore Strategy• Evaluation and future development of
scalable SMP architectures– Includes multicore and manycore
• Measure impact of parallelization and scalability across both data processing and computationally-intensive tasks
![Page 8: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/8.jpg)
Some Results
Quad Core
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 4 8
Number of concurrent streams
Spee
dup
64M-check-reassembly 64M-cmykw2 64M-rotatew2 64M-tcp-mixed
Huge drop in performance when oversubscribed Nice scaling on networking
only workloads
Some benchmarks plateau earlier then expected
MultiBench Results
![Page 9: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/9.jpg)
Two Processor System Utilizing Single Memory Controller
QuadCore
Processor 1
QuadCore
Processor 2
DDR2Interface
Processors 1 and 2 must always arbitrate for memory via their front side bus connection through the North Bridge.
NorthBridge
Intel Front Side Bus
![Page 10: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/10.jpg)
Dual Quads vs. Single Quad
1. Find max values for each workload2. Ratio of all scores
0
0.5
1
1.5
2
2.5
64M
-che
ck-r
eass
embl
y
64M
-che
ck-r
eass
embl
y-tc
p
64M
-che
ck-r
eass
embl
y-tc
p-cm
ykw
2-ro
tate
w2
64M
-che
ck-r
eass
embl
y-tc
p-h2
64w
2
64M
-cm
ykw
2
64M
-cm
ykw
2-ro
tate
w2
64M
-rot
atew
2
64M
-tcp
-mix
ed
64M
-x26
4-1w
orke
r
64M
-x26
4-2w
orke
rs
64M
-x26
4-4w
orke
rs
64M
-x26
4-8w
orke
rs
ippk
tche
ck-6
4M-1
Wor
ker
ipre
s-72
M1w
orke
r
ipre
s-72
M2w
orke
r
md5
-32M
1wor
ker
md5
-32M
2wor
ker
md5
-32M
4wor
ker
rgbc
myk
-5x1
2M1w
orke
rs
rgbc
myk
-5x1
2M2w
orke
rs
rgbc
myk
-5x1
2M4w
orke
rs
rgbc
myk
-5x1
2M8w
orke
rs
rota
te-1
6x4M
s1w
1
rota
te-1
6x4M
s1w
2
rota
te-1
6x4M
s1w
4
rota
te-1
6x4M
s1w
8
rota
te-1
6x4M
s32w
1
rota
te-1
6x4M
s32w
2
rota
te-1
6x4M
s32w
4
rota
te-1
6x4M
s32w
8
rota
te-1
6x4M
s4w
1
rota
te-1
6x4M
s4w
2
rota
te-1
6x4M
s4w
4
rota
te-1
6x4M
s4w
8
rota
te-3
4kX
512-
90de
g
rota
te-c
olor
-4M
-90d
eg
No Workloads Yield 2x Scaling
![Page 11: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/11.jpg)
Two Processor System Utilizing Dual Memory Controllers
QuadCore
Processor 1
QuadCore
Processor 2
LinkDDR2Interface
DDR2Interface
Direct AccessShared Access
Doubly Shared Access
![Page 12: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/12.jpg)
Two Processor System Utilizing Single Memory Controller
QuadCore
Processor 1
QuadCore
Processor 2
Link DDR2Interface
• Processor 1 must always access memory by traversing link to Processor 2• Requires arbitration to access Processor 2’s memory
• Processor 2 always has prioritized access to this memory since it is directly attached.
• Affinity can help performance
![Page 13: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/13.jpg)
Dual Quad Cores –Single Memory Controller
Dual Quad Cores –Dual Memory Controllers
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4 6 8 12 16 20
64M-cmykw2
64M-cmykw2-rotatew2
64M-rotatew2
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4 6 8 12 16 20
64M-cmykw2
64M-cmykw2-rotatew2
64M-rotatew2
![Page 14: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/14.jpg)
0
0.5
1
1.5
2
2.5
3
3.5
4
1 2 3 4 5 6 7 8 9
rotate-color-4M-90deg (DS)
rotate-color-4M-90deg (DD)
•Rotation by multiple workers cooperating to process a single image.•Each worker thread acquires slices and writes them to the output buffer.
•Potential bottlenecks related to memory interfaces and synchronization between worker threads.
![Page 15: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/15.jpg)
Dual Quad Cores –Single Memory Controller
Dual Quad Cores –Dual Memory Controllers0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9
rotate-16x4Ms32w1rotate-16x4Ms32w2
rotate-16x4Ms32w4rotate-16x4Ms32w8
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9
rotate-16x4Ms32w1rotate-16x4Ms32w2
rotate-16x4Ms32w4
rotate-16x4Ms32w8
![Page 16: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/16.jpg)
The Multicore Association Roadmap
Communications- MCAPI: ultra-light weight
Resource Management - Memory management- Basic synchronization - Resource registration- Resource partitioning
Task Management-Task scheduling
The Four MCA Pillars
Virtualization (or OS)
Communication ResourceManagement
TaskManagement
Debug
Multicore SystemAdopted stds
MCA Foundation APIs
Value Added Functions• Languages
• Programming Models• Design Environments
• Application Generators• Benchmarks
Services•Load Balancing
•System Mgt.•Power Mgt.•Reliability
•Quality of Service
![Page 17: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/17.jpg)
Why Virtualize?
• Hardware consolidation• Migration and hosting of legacy applications• Resource management and balancing• Faster provisioning• Fault tolerance
![Page 18: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/18.jpg)
Virtualized Multicore System
Fractional CPU assignment for background tasks such as firmware update/system health monitor/power management
Benchmarking involves many system-level elements
HARDWARE PERIPHERALS
ARM CORE #1
TRANGO #1
ARM CORE #2
TRANGO #2
ARM CORE #3
TRANGO #3
ARM CORE #4
TRANGO #4
Firm
ware
Upda
te
AppApp App AppAppApp App App
SMPSMP RTOS
AppApp App App
Prop
rieta
ryAp
plica
tion
VPU A VPU B VPU C VPU D VPU E VPU F VPU G
VPU GVPU FVPU A VPU B VPU C VPU D VPU E
CORE #2 CORE #3 CORE #4 CORE #1
![Page 19: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/19.jpg)
EEMBC HypermarkImportant Metrics
• Overall performance overhead of a hypervisor, i.e. CPU loading
• Static footprint (code and data size)• Jitter• Interrupt latency• Comparison to native performance
![Page 20: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/20.jpg)
20
Multicore Programming Practices(MPP)
• Long term– Continue research into languages, methodologies, etc
• Short term– How today’s embedded C/C++ code may be written to be
“multicore ready”
• Influence of a group of like-minded methodology experts to ensure completeness, usefulness and industry-wide compatibility
• Creation of a standard “best practices” guide through a recognized, neutral industry body– Based on capturing current best practices
![Page 21: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/21.jpg)
MPP Scope & Approach
21
Sequential C/C++
Architecture 1e.g. Shared
Memory
Architecture 2e.g. Programmer Managed Memory
Architecture 3e.g. Message
Passing
• Focus on existing C/C++ without extensions, targeting current architectures
• Draw up framework of common pitfalls when transitioning from serial to parallel
• Consider solutions or avoidance tactics
• Analyze performance of solution
![Page 22: Benchmarks for Analyzing Embedded Multicore Processor](https://reader031.vdocument.in/reader031/viewer/2022013109/5868ddce1a28ab01578bdc08/html5/thumbnails/22.jpg)
Summary
• Performance analysis highlights benefits and pitfalls
• EEMBC licensing and membership
• Portability prevents entrapment• Multicore Association standards and
working groups