dissertation research plan mitesh meswani. outline dissertation research update previous approach...
DESCRIPTION
Previous Methodology Trace Selection: Trace the steady state execution of the benchmark suite using CPI for measuring representativeness, One trace per benchmark. Simulate the traces for different SMT knob settings recording the best setting for each pair Use regression modeling techniques to generate an analytical prediction model to predict best settings for a pair Prove model effectiveness for predicting settings for traces from other benchmarksTRANSCRIPT
![Page 1: DISSERTATION RESEARCH PLAN Mitesh Meswani. Outline Dissertation Research Update Previous Approach and Results Modified Research Plan Identifying](https://reader036.vdocument.in/reader036/viewer/2022062401/5a4d1b1e7f8b9ab0599945b0/html5/thumbnails/1.jpg)
DISSERTATION RESEARCH PLANMitesh Meswani
![Page 2: DISSERTATION RESEARCH PLAN Mitesh Meswani. Outline Dissertation Research Update Previous Approach and Results Modified Research Plan Identifying](https://reader036.vdocument.in/reader036/viewer/2022062401/5a4d1b1e7f8b9ab0599945b0/html5/thumbnails/2.jpg)
Outline Dissertation Research Update
Previous Approach and Results Modified Research Plan Identifying Resources Identifying Signatures Performance Counters for profiling Representative tracing and validation
![Page 3: DISSERTATION RESEARCH PLAN Mitesh Meswani. Outline Dissertation Research Update Previous Approach and Results Modified Research Plan Identifying](https://reader036.vdocument.in/reader036/viewer/2022062401/5a4d1b1e7f8b9ab0599945b0/html5/thumbnails/3.jpg)
Previous Methodology Trace Selection: Trace the steady state
execution of the benchmark suite using CPI for measuring representativeness, One trace per benchmark.
Simulate the traces for different SMT knob settings recording the best setting for each pair
Use regression modeling techniques to generate an analytical prediction model to predict best settings for a pair
Prove model effectiveness for predicting settings for traces from other benchmarks
![Page 4: DISSERTATION RESEARCH PLAN Mitesh Meswani. Outline Dissertation Research Update Previous Approach and Results Modified Research Plan Identifying](https://reader036.vdocument.in/reader036/viewer/2022062401/5a4d1b1e7f8b9ab0599945b0/html5/thumbnails/4.jpg)
Recap of Previous Results Models using Decision Trees for SPEC
CPU2000 and Stream Prediction of SMT mode: 97.5% Prediction of SMT Thread Priority: 83%
![Page 5: DISSERTATION RESEARCH PLAN Mitesh Meswani. Outline Dissertation Research Update Previous Approach and Results Modified Research Plan Identifying](https://reader036.vdocument.in/reader036/viewer/2022062401/5a4d1b1e7f8b9ab0599945b0/html5/thumbnails/5.jpg)
Modified Plan Summary Represent the use of relevant shared
resources by a benchmark Identify signatures of shared resource
usage within benchmarks using performance counters
Use traces that represent signatures of shared resource usage that can cover 80% of the benchmarks execution
Finally, identify the best SMT knob settings of the representative traces
![Page 6: DISSERTATION RESEARCH PLAN Mitesh Meswani. Outline Dissertation Research Update Previous Approach and Results Modified Research Plan Identifying](https://reader036.vdocument.in/reader036/viewer/2022062401/5a4d1b1e7f8b9ab0599945b0/html5/thumbnails/6.jpg)
Shared Resources Shared Resources (seven): TLB, Cache
Memory (L2, L3), Branch Unit, FP Unit, FXU Unit, Compare-register Unit, Branch prediction hardware (history table)
How many resources to consider? :- Analyze current traces to eliminate resources contribute less than a threshold value to cycles spent in shared resources. Compare-register unit is not significant Branch unit is also not significant
![Page 7: DISSERTATION RESEARCH PLAN Mitesh Meswani. Outline Dissertation Research Update Previous Approach and Results Modified Research Plan Identifying](https://reader036.vdocument.in/reader036/viewer/2022062401/5a4d1b1e7f8b9ab0599945b0/html5/thumbnails/7.jpg)
Signatures How many? :-
A resource may have mild, moderate, or high contribution, to cycles spent in shared resources
Idea: If we have five resources, equal contribution would mean 100/5 = approx 20% of cycles per resource, using this as basisMild (1% to 15%)Moderate: 16% to 24%High : Greater than 24%
![Page 8: DISSERTATION RESEARCH PLAN Mitesh Meswani. Outline Dissertation Research Update Previous Approach and Results Modified Research Plan Identifying](https://reader036.vdocument.in/reader036/viewer/2022062401/5a4d1b1e7f8b9ab0599945b0/html5/thumbnails/8.jpg)
Finding Signatures Profile the benchmark execution to find
cycles spent in the monitored shared resources
Using performance counters sample the counters periodically
Categorize the benchmark execution (SPEC CPU2000) in one of the possible permutation
![Page 9: DISSERTATION RESEARCH PLAN Mitesh Meswani. Outline Dissertation Research Update Previous Approach and Results Modified Research Plan Identifying](https://reader036.vdocument.in/reader036/viewer/2022062401/5a4d1b1e7f8b9ab0599945b0/html5/thumbnails/9.jpg)
Finding signatures Continued Profiling benchmark execution:
Only six counters allowed per execution What are the Counts for a sample period? :-
Merge them from different executions ? Use the highest sampling rate ?
![Page 10: DISSERTATION RESEARCH PLAN Mitesh Meswani. Outline Dissertation Research Update Previous Approach and Results Modified Research Plan Identifying](https://reader036.vdocument.in/reader036/viewer/2022062401/5a4d1b1e7f8b9ab0599945b0/html5/thumbnails/10.jpg)
Perf Counters to collect data Identified Counters
FP: Completion stalls due to FPU (CMPLU_STALLS_FPU)
FXU: Completion stalls due to FXU (CMPLU_STALLS_FXU)
Derived Counters: LSU Stalls= Total Stalls in LSU – Stalls due
to d-cache miss – stalls due to d-tlb miss
![Page 11: DISSERTATION RESEARCH PLAN Mitesh Meswani. Outline Dissertation Research Update Previous Approach and Results Modified Research Plan Identifying](https://reader036.vdocument.in/reader036/viewer/2022062401/5a4d1b1e7f8b9ab0599945b0/html5/thumbnails/11.jpg)
Perf Counters to collect data continued Unsolved TLB:
Total d-tlb misses, Total i-tlb misses , miss resolution sites not known
Total Cycles spent for accessing d-tlb known, includes cost of hits and misses
Caches L2 , L3 hit for data and instruction known, Maybe greater than actual penalty, execution overlaps
misses, or miss down misspredicted branches Maybe use d-cache miss penalty and i-cache miss
penatly on POWER5 which are counted only if completion is stalled.
Branch History Affects prediction, Counter available to count cycles
missprediction stalls completion
![Page 12: DISSERTATION RESEARCH PLAN Mitesh Meswani. Outline Dissertation Research Update Previous Approach and Results Modified Research Plan Identifying](https://reader036.vdocument.in/reader036/viewer/2022062401/5a4d1b1e7f8b9ab0599945b0/html5/thumbnails/12.jpg)
Representative Traces Collect traces if required, that represent
the signatures found in benchmark profiling
Use the performance data from simulation of single traces to verify the signatures
Collect data for evaluating SMT-knobs on representative traces
![Page 13: DISSERTATION RESEARCH PLAN Mitesh Meswani. Outline Dissertation Research Update Previous Approach and Results Modified Research Plan Identifying](https://reader036.vdocument.in/reader036/viewer/2022062401/5a4d1b1e7f8b9ab0599945b0/html5/thumbnails/13.jpg)
Validation Use Scientific applications to verify if
they are covered by signatures for 80% of their execution
TO DO Identify test applications.