uppsala architecture stefanos kaxiras, alexandra …stefanos kaxiras, alexandra jimborean which...
TRANSCRIPT
DepartmentofInforma-onTechnology,UppsalaUniversity h9p://it.uu.se/
UARTUppsalaArchitecture
ResearchTeam
HowCanWeImproveDAE?
DAE=DecouplingUsesfromLoads
Selec-ngLoads:Crea-ngAlterna-veAccessPhases
WhyDAE?DAE+FrequencyScalingSavesEnergy
Evalua-ngVersionsatRun-meandSelec-ngtheBestPerformingOne
OvercomingAddressRecomputa-on:Hois-ngLoadstoAccessPhases Info
Problem:Hois-ngallloadsIntoanAccessPhaseIsinfeasible:addresscomputa-onoverheaddiminishesbenefits!
So)wareDecoupledAccess-ExecuteKim-AnhTran,Konstan-nosKoukos,StefanosKaxiras,AlexandraJimborean
Whichloadstoselectforprefetching?
Howtoavoidaddress
recomputa-on?
1
2
1
2
Run>meEvalua>onPhase:evaluateeachcombina-on!
Pickbestcombina-onforremainingitera-ons
A0 E E EA1 EA1 EA1A1 A2 EOrig ...
Run-t
ime
Iteration0 10 20 30 40 50 60slice 0 slice 1 slice 2 slice 3 slice 4 slice 5 ...Loop
Slice
Original
DAE
CPUfrequency fopt
CPUfrequencyfmax
fminExecu�on
Memory-bound:runon lowfrequency
Compute-bound:runonhighfrequency
Originalcode:compromisefrequencytobalancebetweenenergyandperformance.
DAEhelpstosaveenergybyadjus�ngthefrequency.
vs.
Problem:Addressrecomputa3onintheExecutePhaseisexpensive.
CreateoneAccessPhaseforeachlevelofindirec-on:
Orig OriginalCode
Legend
n-thAccessVersion
ExecutePhase(sameforallAn)
An
E
Runeachversion(Originalandalterna-veAccess-Executeversions)foracoupleofitera>ons
wasiden-fiedasthebestperformingversion:se9leonthiscombina-on!
for(...){
}
unroll*2L1=ldx[i]L2=ldy[i]L1+L2
for(...){in-placeDAE
L1=ldx[i]L2=ldy[i]L1+L2
}
L3=ldx[i+1]L4=ldy[i+1]L3+L4
for(...){
L1=ldx[i]L2=ldy[i]
L1+L2
}
L3=ldx[i+1]L4=ldy[i+1]
L3+L4
Register
TransferofdatafromAccesstoExecute
viaregisters
AccessPhase
ExecutePhase
for(...){
}
for (...) {
x[i]
for (...) { for (...) {
Version1:1-indirec�on
a[x[i]]b[a[x[i]]]
y[i]
Indirec�on
Legend
AccessPhase
ExecutePhase
or or
Version2:2-indirec�on
Version3:3-indirec�on
x[i]y[i]
x[i]a[x[i]]
y[i]
}
x[i]a[x[i]]
y[i]
x[i]a[x[i]]
b[a[x[i]]]y[i]
} }for (...) {
}
for (...) {
}
for (...) {
}
for(...){
}
for(...){
}
AccessPhase
ExecutePhase
MemoryAccessComputa�on
Cache
prefetchintocache
usedata
for(...){
}Legend
WhyDAEwithinoneloop?ByapplyingDAEwithinoneloop,wemaytransferdataviaregisters:Accessphaseloadsdataintoaregister,Executephasedirectlyuses
register.
Whyunrolling?UnrollingisrequiredaswenowapplyDAEwithintheloop
Reference:A.Jimborean,K.Koukos,V.Spiliopoulos,D.Black-Schaffer,S.Kaxiras.Fixthecode.Don’ttweakthehardware:AnewcompilerapproachtoVoltage-Frequencyscaling.InProc.ofCGO’14
Contact:[email protected]