design space exploration and application autotuning for runtime...
TRANSCRIPT
![Page 1: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/1.jpg)
Design Space Exploration and Application Autotuning for Runtime Adaptivity in Multicore Architectures
Cristina SilvanoPolitecnico di [email protected]
![Page 2: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/2.jpg)
OutlineResearch challenges in multicore architecturesAutomatic design space explorationApplication autotuning for runtime adaptivitySummary
2
![Page 3: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/3.jpg)
RESEARCH CHALLENGES
![Page 4: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/4.jpg)
Energy efficiency underlies all markets Energy efficiency is of paramount importance for all
application markets (automotive, consumer, mobile, healthcare and beyond) and target systems spanning from sensors, cyber‐physical systems, embedded systems up to servers and HPC systems.
![Page 5: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/5.jpg)
Squeezing of computing cores
201122 nm
200932 nm
200745 nm
200565 nm1.4 mm2
Source:ARM9 STMicroelectronics
201314 nm
5
![Page 6: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/6.jpg)
201122 nm
200932 nm
200745 nm
200565 nm1.4 mm2
Source:ARM9 STmicroelectronics
201314 nm
… entering the multi/many‐core era
5
![Page 7: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/7.jpg)
A wide range of architecture parameters must be tuned to find the best system configuration.
Design space of the target architecture A should consider all possible configurations of each parameters pi :
A = Sp1 x Sp2 x …. x Spn
Large design space: 218 (262 144) design points
The design space in the age of multi/many‐cores
Parameter Min Max
#Processors 2 16
#Threads 1 4
L1 I$ size 2K 16K
L1 D$ size 2K 16K
L1 I$ associativity 1w 8w
L1 D$ associativity 1w 8w
L2 $ size 32K 256K
L2 $ associativity 1w 8w
6
![Page 8: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/8.jpg)
Full search simulation time
FULL SEARCHCan become quickly unfeasible ~10 minutes
per simulation*
262 144 design pointsx
8 data sets=
2 097 152 simulations
10simulation time on 4 parallel cores
* Using a cycle-accurate simulator
Years
8
7
![Page 9: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/9.jpg)
Research Challenges1) What are the best tradeoffs to design many‐cores in terms of
multiple objectives such as energy and performance?Need for automatic Design Space Exploration of many‐core architectures
8
![Page 10: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/10.jpg)
What are the barriers of further scaling?
Transistor density increases ~2x every 2 years
Frequency wall
Power wall
Utilisation wall
… the end of the Dennard scaling… entering the dark silicon era
9
![Page 11: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/11.jpg)
Shift from the power wall……to the programmability wall
STMicrolectronics P2012
The Problem of Computation Offloading on Heterogeneous Parallel Platforms
10
![Page 12: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/12.jpg)
Mobile Application Workloads
* Source: Flurry Analytics
11
![Page 13: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/13.jpg)
Mobile Application Workloads Short bursts of high
intensity computationand power consumption
Long periods of sustained high intensity
Long-use low-intensity
![Page 14: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/14.jpg)
Runtime Resource Management
Amit Kumar Singh, Muhammad Shafique, Akash Kumar, and Jörg Henkel. Mapping on multi/many‐core systems: Survey of current and emerging trends. In Proceedings of the 50th Annual Design Automation Conference (DAC). 2013.
RTRM
App1 App2 App3
13
Allocation
Mapping
44 44 66
…
Target Heterogeneous Platform
![Page 15: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/15.jpg)
Research Challenges1) What are the best trade‐offs to design many‐cores in terms of
multiple objectives such as energy and performance?Need for automatic Design Space Exploration of many‐core architectures
2) Are the system resources used efficiently?Need for runtime resource management
14
![Page 16: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/16.jpg)
Tunable Applications
One or more application parameters, code transformations and code variants (applicationknobs) can be tuned at runtime Adaptivity to adjust the application behavior to the changing operating conditions, usage contexts and resource availability
One or more application parameters, code transformations and code variants (applicationknobs) can be tuned at runtime Adaptivity to adjust the application behavior to the changing operating conditions, usage contexts and resource availability
Application Knobs
15
![Page 17: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/17.jpg)
Tunable Applications
One or more application parameters, code transformations and code variants (applicationknobs) can be tuned at runtime Adaptivity to adjust the application behavior to the changing operating conditions, usage contexts and resource availability
Approximate computing: output just needs to be “good enough” trading off accuracy and throughput
One or more application parameters, code transformations and code variants (applicationknobs) can be tuned at runtime Adaptivity to adjust the application behavior to the changing operating conditions, usage contexts and resource availability
Approximate computing: output just needs to be “good enough” trading off accuracy and throughput
Application Knobs
15
![Page 18: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/18.jpg)
Research Challenges1) What are the best trade‐offs to design many‐cores in terms of
multiple objectives such as energy and performance?Need for automatic Design Space Exploration of many‐core architectures
2) Are we using the system resources efficiently?Need for runtime resource management
3) How can we tune applications at runtime by using application knobs?Need for application autotuning
16
![Page 19: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/19.jpg)
AUTOMATIC DESIGN SPACEEXPLORATION
![Page 20: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/20.jpg)
Objective function: To minimize both energy (x) and execution time (x) of the target application on system configurations x:
where is the design space. The solution is a set of tradeoff configurations p
known as Pareto set
The multi‐objective optimisation problem
(x)
(x)
18
![Page 21: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/21.jpg)
Multi‐objective exploration: Pareto Points
1.5
2
2.5
3
3.5
4
4.5
100 200 300 400 500 600
dominated design point
Pareto point
Multi‐Objective Exploration: best designs are not unique.Pareto points provide tradeoffs with respect to the multiple objectives
Throughput
Error
23
19
![Page 22: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/22.jpg)
The concepts behind the automatic DSE
Input Variables:architecture
parameters that define the
design space
The black box generates the output values accordingly to the inputs.
Output Variables define the objective space
The black box can be:
1) A simulator that models the system behavior and generates output values
2) A set of solvers that modelsthe system behavior andestimates output values
(x)
(x)
![Page 23: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/23.jpg)
MULTICUBE Explorer is an open source Multi‐Objective Design Space Exploration (DSE) framework to tune multi‐core architectures
Efficiency: MULTICUBE Explorer is a design automation and acceleration tool to minimize the numbers of simulations to be executed during the DSE process
Flexibility: MULTICUBE Explorer is a design optimization and exploration tool where you can easily plug‐in your own simulator, optimization algorithms and machine learning techniques.
MULTICUBE Explorer is not “yet another simulator” (there are already many simulators….)
MULTICUBE Explorer: What is it?
wwww.multicube.eu
21
![Page 24: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/24.jpg)
Multicube Explorer: Overview
Architecturesand
Applications
Architecturesand
Applications
Architecturesand
Applications
22
![Page 25: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/25.jpg)
• Efficiency of automatic DSE process can be improved by:
1. Minimizing the numbers of simulations to be executed by using exploration heuristics such as state‐of‐art evolutionary algorithms
2. Speeding up simulations
3. Simulating at higher abstraction levels
4. Defining an analytical response model of the system behavior based on a subset of simulations to predict the unknown system response
Automatic Design Space Exploration: Efficiency
23
![Page 26: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/26.jpg)
1. Design of Experiments (DoEs): To identify the experimentation plan: how to select the design points in the design space to be simulated
2. Optimisation Algorithms:Meta‐heuristics methods inspired by analogies with physics, or with biology to solve multi‐objective optimization problems: simulated annealing, genetic algorithms, evolutionary strategies, etc.
3. Response Surface Modeling (RSM): To use the set of simulated points to obtain an analytical model of the system behavior: linear regression, spline interpolation, artificial neural network, etc.
MULTICUBE Explorer
24
![Page 27: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/27.jpg)
“ReSPIR: A Response Surface‐Based Pareto Iterative Refinement for Application‐Specific Design Space Exploration”, Palermo, G.; Silvano, C.; Zaccaria, V., IEEE Transactions on Computer‐Aided Design of Integrated Circuits and Systems, Volume 28, Issue 12, Dec. 2009 Page(s): 1816 – 1829
How to combine DoEs and Response Surface Models? ReSPIR: RSM‐Support Iterative Pareto Refinement
![Page 28: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/28.jpg)
Why automatic design space exploration?
Faster exploration time
Better quality of results
Benefits
26
![Page 29: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/29.jpg)
Comparison Results: ReSPIR vs MOSA & NSGA‐II
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
18.0
20.0
1.0 2.0 3.0
MOSA
NSGA-II
ReSPIR
Accuracy[% ADRS]
Design Space Analyzed [%]
27
MIPS‐based shared memory multi‐core Design space composed of 217 design points (131 072) Accuracy in terms of Average Distance from Reference Set:
the lower the better approximated Pareto set
![Page 30: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/30.jpg)
The Multi‐View Case Study: The human eye stereo matching
2 eyes → third dimension
28
Design‐time optimization of OpenCL applications
![Page 31: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/31.jpg)
Quality of Results: Pixel Disparity Error
Left camera Right camera
Reference disparity map
1
2
3
Disparity Error
Disparity Error
Application Knobs
29
stereo-matchingstereo-matching
![Page 32: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/32.jpg)
Design Space Exploration (DSE)
design time
30
![Page 33: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/33.jpg)
Parametric implementation:
Customization of application‐specific and platform parameters
Full search can become quickly unfeasible:
Multi‐dimensional large design space
Slow simulation platforms
31
E. Paone, G. Palermo, V. Zaccaria, C. Silvano, D. Melpignano, G. Haugou and T. Lepley, "An Exploration Methodology for a Customizable OpenCL Stereo-Matching Application Targeted to an Industrial Multi-Cluster Architecture", in Proc. CODES+ISSS 2012.
E. Paone, F. Robino, G. Palermo, V. Zaccaria, I. Sander and C. Silvano, "Customization of OpenCL Applications for Efficient Task Mapping under Heterogeneous Platform Constraints", In Proceedings of DATE 2015.
Design‐time optimization of OpenCL applications
![Page 34: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/34.jpg)
Parametric implementation:
Customization of application‐specific and platform parameters
Full search can become quickly unfeasible:
Multi‐dimensional large design space
Slow simulation platforms
design space
Design‐time optimization of OpenCL applications
![Page 35: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/35.jpg)
Parametric implementation:
Customization of application‐specific and platform parameters
Full search can become quickly unfeasible:
Multi‐dimensional large design space
Slow simulation platforms
OP1OP2
OP3
OP4
OP5
OP6
OP7
design space
Design‐time optimization of OpenCL applications
![Page 36: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/36.jpg)
APPLICATION AUTOTUNING
![Page 37: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/37.jpg)
Autonomous Video‐surveillance System
Video Frame Rate Video Resolution
33
Application knobs can be used at runtime to trade‐off the throughput and the quality of results
![Page 38: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/38.jpg)
Multiple Cameras
Video Frame Rate
Video Resolution
Autonomous Video-surveillance
System
Multiple instances of the same application
34
![Page 39: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/39.jpg)
ARGO Application Autotuning Key idea is that most of the applications are dynamically
configurable in terms of a set of tunable parameters, code transformations and code variants (application knobs) to trade‐off accuracy and latency
Error
Latency
35
![Page 40: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/40.jpg)
design-time
run-time
36
Error
Latency
![Page 41: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/41.jpg)
ARGO Application Autotuning ARGO is a light‐weight application autotuning framework for
multi/many‐core platforms in an adaptive multi‐application environment.
Combination of design‐time and run‐time techniques to create an effective way of “self‐aware” computing with limited runtime overhead.Orthogonality between application autotuning and runtime management of system resources
E. Paone, D. Gadioli, G. Palermo, V. Zaccaria, C. Silvano. “Evaluating Orthogonality between Application Auto-Tuning and Run-Time Resource Management for Adaptive OpenCL Applications”, Proc. of IEEE ASAP 2014.
D. Gadioli, G. Palermo, C. Silvano, “Application Autotuning to Support Runtime Adaptivity in Multicore Architectures”, in Proc. IEEE SAMOS 2015.
37
![Page 42: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/42.jpg)
Design‐time explorationDesign‐time exploration phase based on code profiling to define the effects of application knobs and performance metrics (OPs) used to build a prediction model of the application behavior
38
Working Modesrepresent max
resource allocationassociated to OPs
Operating Pointscombine application
knobs and performance metrics
Powerconsumption
Performance
Accuracy
![Page 43: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/43.jpg)
Monitor
Execu
te
PlanAnalyze
IFTTT
Monitoring some metrics
Monitoring some metrics
Modeling application
knobs to metrics
Modeling application
knobs to metricsConfiguration
selectionConfiguration
selection
Tuning application
knobs
Tuning application
knobs
MAPE feedback loop• Application autotuning enables self‐optimization capabilities
based on Monitor‐Analyze‐Plan‐Execute (MAPE) feedback loop*
Host CPU
…
Heterogeneous Many-Core Platform
* J.O. Kephart, D.M. Chess, “The vision of autonomic computing,” Computer, 2003
39
![Page 44: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/44.jpg)
ARGO Application Autotuning FrameworkPower consumption Performance Accuracy
Application
Goals+/-
Goals+/-
RankRank
OP List
Monitors
Autotuning Framework
40
![Page 45: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/45.jpg)
ARGO Application Autotuning FrameworkPower consumption Performance Accuracy
Application
Goals+/-
Goals+/-
RankRank
OP List
Monitors
Autotuning Framework
MAPE feedback loop
40
![Page 46: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/46.jpg)
ARGO Application Autotuning FrameworkPower consumption Performance Accuracy
Application
Goals+/-
Goals+/-
RankRank
OP List
Monitors
Autotuning Framework
1) Monitor
40
![Page 47: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/47.jpg)
ARGO Application Autotuning FrameworkPower consumption Performance Accuracy
Application
Goals+/-
Goals+/-
RankRank
OP List
Monitors
Autotuning Framework
2) Analyze
40
![Page 48: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/48.jpg)
ARGO Application Autotuning FrameworkPower consumption Performance Accuracy
Application
Goals+/-
Goals+/-
RankRank
OP List
Monitors
Autotuning Framework3) Plan
40
![Page 49: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/49.jpg)
ARGO Application Autotuning FrameworkPower consumption Performance Accuracy
Application
Goals+/-
Goals+/-
RankRank
OP List
Monitors
Autotuning Framework
40
4) Execute
![Page 50: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/50.jpg)
Separation of ConcernsPower consumption Performance Accuracy
Application
Goals+/-
Goals+/-
RankRank
OP List
Monitors
Autotuning FrameworkC++ source codes
41
![Page 51: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/51.jpg)
Separation of ConcernsPower consumption Performance Accuracy
Application
Goals+/-
Goals+/-
RankRank
OP List
Monitors
Autotuning FrameworkKnowledge XML file
41
C++ source codes
![Page 52: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/52.jpg)
Separation of ConcernsPower consumption Performance Accuracy
Application
Goals+/-
Goals+/-
RankRank
OP List
Monitors
Autotuning Framework
Adaption XML file
41
App source codesKnowledge XML fileKnowledge XML file
C++ source codes
![Page 53: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/53.jpg)
Separation of ConcernsPower consumption Performance Accuracy
Application
Goals+/-
Goals+/-
RankRank
OP List
Monitors
Autotuning Framework
Adaption XML file
41
Adaptive ApplicationKnowledge XML fileKnowledge XML file
![Page 54: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/54.jpg)
Target HW Platform
Orthogonality Concept: App Autotuning & RTRM
Platform OS
Req
uests
Res
ourc
es
42
![Page 55: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/55.jpg)
Target HW Platform
Orthogonality Concept: App Autotuning & RTRM
Platform OS
Resource Availability
42
Req
uests
Res
ourc
es
![Page 56: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/56.jpg)
Run-Time Resource Manager
Orthogonality Concept: App Autotuning & RTRM
Platform OS
Target HW Platform
42
Resource Availability
![Page 57: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/57.jpg)
Experimental SetupTarget Platform Intel Xeon QuadCore CPU E5‐1607 @ 3GHz & 8GB RAM Linux 3.5 & OpenCL 1.2 runtime provided by Intel SDK 2013
Dynamic Workload DefinitionSingle application reacting to external changesMultiple instances of the same application
Metrics of interest• Throughput: Number of frames per second [fps]• Quality: Normalized disparity error w.r.t reference• Resources: Percentage of CPU used by the application
43
![Page 58: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/58.jpg)
Application Autotuning Tradeoffs
44
[ASAP2014]
![Page 59: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/59.jpg)
Application Autotuning: Dynamic Adaptation (1)
•First phase:The application isprocessing the images by respecting the requirements
45
[SAMOS2015]
![Page 60: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/60.jpg)
Application Autotuning: Dynamic Adaptation (1)
•First phase:The application isprocessing the images by respecting the requirements.
•Second phase:There is a high priority task in the system requiring CPU resources
45
[SAMOS2015]
![Page 61: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/61.jpg)
Application Autotuning: Dynamic Adaptation (1)
•First phase:The application isprocessing the images by respecting the requirements.
•Second phase:There is a high priority task in the system requiring CPU resources
•Third phase:The task has finished releasing CPU resources
45
[SAMOS2015]
![Page 62: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/62.jpg)
Application Autotuning: Dynamic Adaptation (2)
•First phase:The application isprocessing the images by respecting the requirements.
46
[SAMOS2015]
![Page 63: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/63.jpg)
Application Autotuning: Dynamic Adaptation (2)
•Second phase: Threat detected
46
[SAMOS2015]
![Page 64: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/64.jpg)
Application Autotuning: Dynamic Adaptation (2)
•Third phase:The application isprocessing the images by respecting the requirements.
46
[SAMOS2015]
![Page 65: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/65.jpg)
PLAIN LINUX Multiple ApplicationDynamic Workload
[ASAP2014]
![Page 66: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/66.jpg)
ARGO+LINUXPLAIN LINUX
35us overheadwith 100 OPs
[ASAP2014]
![Page 67: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/67.jpg)
ARGO+LINUX
35us overheadwith 100 OPs
ARGO+RTRM
1ms overheadwith 100 OPs
[ASAP2014]
![Page 68: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/68.jpg)
SummaryMULTICUBE Design Space Exploration framework and ARGO
autotuning framework to support design‐time multi‐objective optimization and runtime adaptivity in manycore architectures
Most recent journal papers: TCAD‐2015 (2), TECS‐2013, PARCO‐2013, TECS‐2012 (2), TCAD‐2012, TCAD‐2009.
Most recent conference papers: SAMOS‐2015, DATE‐2015, SAMOS‐2014, ASAP‐2014, ARC‐2014, DATE‐2014, ASP‐DAC‐2014 (2), FPL‐2013, DATE‐2013 (3), CODES‐ISSS‐2012, ReCoSoC‐2012, SASP‐2011, DAC‐2010, DATE‐2010, SASP‐2009.
EU projects:
48
www.2parma.euwww.multicube.eu
![Page 69: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/69.jpg)
Next challenge: The Green500 scenario
To reach the DARPA’s target of 20MW of Exascale supercomputers projected to 2020, current supercomputers must achieve an energy efficiency “quantum leap”, pushing towards a goal of 50 GFlops/W.
Heterogeneous systems currently dominate the top of the Green500 list and this dominance is expected to be a trend for the next coming years to reach the target of 20MW Exascale supercomputers.
To fulfil this target, energy‐efficient heterogeneous supercomputers need to be coupled with a radically new software stack capable of exploiting the benefits offered by heterogeneity to meet the scalability and energy efficiency required by the Exascale era.
www.antarex-project.eu/
The main goal of the ANTAREX project is to provide a breakthrough approach to express by a Domain Specific Language
the application self‐adaptivity and to runtime manage and autotune applications for green and heterogeneous High
Performance Computing (HPC) systems up to the Exascale level.
49
![Page 70: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/70.jpg)
Research Collaborations
![Page 71: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/71.jpg)
People
![Page 72: Design Space Exploration and Application Autotuning for Runtime …specs/projects/antarex/downloads/... · 2019. 12. 30. · Multicore Architectures”, in Proc. IEEE SAMOS 2015](https://reader033.vdocument.in/reader033/viewer/2022051510/5fedd6c1419b0c037a096186/html5/thumbnails/72.jpg)