adapting convergent scheduling using machine learning diego puppin*, mark stephenson †, una-may...
TRANSCRIPT
![Page 1: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/1.jpg)
Adapting Convergent Scheduling Using Machine
Learning
Diego Puppin*, Mark Stephenson†, Una-May O’Reilly†, Martin Martin†, and
Saman Amarasinghe†
*Institute for Information Science and Technologies, Italy† , Massachusetts Institute of Technology USA
![Page 2: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/2.jpg)
Outline
This talk shows how one can apply machine learning techniques to find good phase orderings for an instruction scheduler
First, I’ll introduce the scheduler that we are interested in improving
Then, I’ll discuss genetic programmingThen, I’ll present experimental results
![Page 3: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/3.jpg)
R4000 likeProcessorCore
Operandnetwork
Clustered Architectures
Memory and registers separated into clustersRAWClustered VLIWs
When scheduling, we try to co-locate data with computation
![Page 4: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/4.jpg)
Convergent Scheduling
Convergent scheduling passes are symmetric
Each pass takes as input a preference map and outputs a preference map
Passes are modular and can be applied in any order
![Page 5: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/5.jpg)
Convergent SchedulingPreference Maps
Inst
ruct
ions
Clusters
Tim
e
0 1 2 3
4
5
6
7
0
1
2
3
Each entry is a weightThe weights correspond
to the “confidence” of a space-time assignment for a given instruction
![Page 6: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/6.jpg)
Four clusters
High confidence
Low confidence
Example Dependence Graph
![Page 7: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/7.jpg)
Critical Path Strengthening
![Page 8: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/8.jpg)
Path Propagation
![Page 9: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/9.jpg)
Parallelism Distribute
![Page 10: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/10.jpg)
Path Propagation
![Page 11: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/11.jpg)
Final Schedule
![Page 12: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/12.jpg)
Convergent Scheduling
“Classical” scheduling passes make absolute decisions that can’t be undone
Convergent scheduling passes make soft decisions in the form of preferencesMistakes made early on can be undone
Passes don’t impose order!
Pass Pass
![Page 13: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/13.jpg)
Double-Edged Sword
The good news: convergent scheduling does not constrain phase orderNice interface makes writing and integrating
passes easy
The bad news: convergent scheduling does not constrain phase orderLimitless number of phase orders to consider,
some of which are much better than others
![Page 14: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/14.jpg)
Our Proposal
Use genetic programming to automatically search for a phase ordering that’s catered to a givenArchitectureCompiler
Our inspiration comes from Cooper’s work [Cooper et al., LCTES 1999]
![Page 15: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/15.jpg)
Genetic Programming
Searching algorithm analogous to Darwinian evolutionMaintain a population of expressions
(sequence INITTIME (sequence PLACE (if imbalanced LOAD COMM)))
![Page 16: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/16.jpg)
Genetic Programming
Searching algorithm analogous to Darwinian evolutionMaintain a population of expressionsSelection
The fittest expressions in the population are more likely to reproduce
ReproductionCrossing over subexpressions of two expressions
Mutation
![Page 17: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/17.jpg)
General Flow
Create initial population(initial solutions)
Evaluation
Selection
Randomly generated initial population
Create Variants
done?
![Page 18: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/18.jpg)
General Flow
Create initial population(initial solutions)
Evaluation
Selection
Create Variants
done?
Compiler is modified to use the given expression as the phase ordering
Each expression is evaluated by compiling and running the benchmark(s)
Fitness is the relative speedup over our original phase ordering on the benchmark(s)
![Page 19: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/19.jpg)
General Flow
Create initial population(initial solutions)
Evaluation
Selection
Create Variants
done?
Just as with Natural Selection, the fittest individuals are more likely to survive
![Page 20: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/20.jpg)
General Flow
Create initial population(initial solutions)
Evaluation
Selection
Create Variants
done?
Use crossover and mutation to generate new expressions
And thus, generate new and hopefully improved phase orderings
![Page 21: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/21.jpg)
Experimental Setup
We use an in-house VLIW compiler (SUIF, MachSUIF) and simulator
Compiler and simulator are parameterized so we can easily change VLIW configurations
Experiments presented here are for clustered architecturesDetails of the architectures are in the paper
![Page 22: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/22.jpg)
Convergent Scheduling Heuristics
Noise Introduction Initial Time Assignment Preplacement Critical Path Strengthening Communication Minimization Parallelism Distribution Load Balance Dependence Enforcement Assignment Strengthening Functional Unit Distribution Push to first cluster Critical Path Distance Cluster Creation Register Pressure Reduction in Time Register Pressure Reduction in Space
![Page 23: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/23.jpg)
Hand-Tuned Results4-cluster VLIW, Rich Interconnect
0
0.5
1
1.5
2
2.5
3
3.5
4
vvmul rbsorf yuv tomcatv mxm fir cholesky
Spe
edup
PCC
UAS
Convergent
![Page 24: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/24.jpg)
Results4-cluster VLIW, Limited Interconnect
![Page 25: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/25.jpg)
Training an Improved Sequence
Goal: find a sequence that works well for all the benchmarks in the last graph (vmul, rbsorf, yuv, etc.)
Train a sequence using these benchmarks then…For each expression in the population compile
and run all the benchmarks, take the average speedup as fitness
![Page 26: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/26.jpg)
The Schedule
Evolved sequence is much more conservative in communication
inittime func dep func load func dep func comm dep func comm place
func reduces weights of instructions on overloaded clusters
dep increases probability that dependent instruction scheduled “nearby”
comm tries to keep neighboring instructions in same cluster
![Page 27: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/27.jpg)
Results4-cluster VLIW, Limited Interconnect
![Page 28: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/28.jpg)
ResultsLeave-One-Out Cross Validation
![Page 29: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/29.jpg)
Summary of Results
When we changed the architecture, the hand-tuned sequence failedUAS and PCC outperform convergent
schedulingOur GP system found a sequence that
usually outperforms UAS and PCCCross validation suggests that it is
possible to find a “general-purpose” sequence
![Page 30: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/30.jpg)
Running Time
Using about 20 machines in a small cluster of workstations it takes about 2 days to evolve a sequence
This is a one-time process!Performed by the compiler vendor
![Page 31: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/31.jpg)
Disappointing Result
Unfortunately, sequences with conditionals are weeded out of the GP selection processOur system rewards parsimonyConvergent scheduling passes make soft
decisions, so running an extra pass may not be detrimental
We’d like to get to the bottom of this unexpected result
![Page 32: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/32.jpg)
Conclusions
Using GP we’re able to find architecture-specific, application-independent sequences
We can quickly retune the compiler whenThe architecture changesThe compiler itself changes
![Page 33: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/33.jpg)
![Page 34: Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *](https://reader036.vdocument.in/reader036/viewer/2022062407/56649e5f5503460f94b5907b/html5/thumbnails/34.jpg)
Implemented Tests