massively parallel landscape- evolution modelling using ......evolution modelling using general...
TRANSCRIPT
MassivelyParallelLandscape-EvolutionModellingusingGeneralPurposeGraphicalProcessingUnits
S7331Tuesday9th May2017
GPUTechnologyConferenceA.StephenMcGough,DarrelMaddy
J.Wainwright,S.Liang,M.Rapoportas,A.Trueman,R.Grey,G.KumarVinod,andJamesBell
DurhamUniversity,UKNewcastleUniversity,UK
Outline
• WhatisLandscapeEvolutionModelling(LEM)• ParallelizationofLEM• PreliminaryResults
LandscapeEvolutionModeling
• Landscapeschangeovertimeduetowater/weathering• PhysicalandChemicalWeatheringrequirewatertobreakdownmaterial• HigherenergyflowingwaterbothErodesandTransportsmaterialuntildecreasingenergyconditionsresultinDepositionofmaterial
• Theseprocessestakealongtime• Manyglacial-InterglacialCycles
• Cyclesare~100kaforlast800ka,priorto800kacycleswere~40kainlength
• Wewanttouseretrodiction toworkouthowthelandscapehaschanged
LandscapeEvolutionModeling
• Useasimulationtomodelhowthelandscapechanges• 3DLandscapeisdiscretizedasaregular2Dgrid(x,y)withcellvaluesrepresentingsurfaceheights(z)derivedfromadigitalelevationmodel(DEM)
• Cellscanbe10mx10morlarger
31 22 32
33 32 25 33 34
29 26 27 39 36
27 26 41 44 50
45 44 40 51 55
39 44 46
N
7 10 7
105 8 9
5 9 6
4 6 7 8 4
87
9
8 7
9 8 4 6 5
6
Figure 2.2: The figure illustrates the water accumulation modelling. The amount ofwater accumulates on a cell is the sum of water of all adjacent cells whichhave assigned a direction towards it. Hence computing water accumu-lation on one cell is only ready to be performed when the water accu-mulation of all of its flowing in neighbours have been computed. Imagecourtesy: Gregory E. Tucker and Gregory R. Hancock, 2009.
2.4 Bottlenecks and the Potential of Parallel Solutions
The time complexity of the flow direction computation on non-flat cells is O(n),where n is the total number of cells in the DEM, since the algorithm performs boundedoperations on each cell. The run time e�ciency of the algorithm for water accumu-lation however depends on the longest drainage path in the DEM. Although themodelling is su�ciently e�cient on a small scale DEM, however, it faces a severecomputational challenge when processing massive size grids. The resolution of aDEM has to be high enough to achieve su�cient accuracy, which makes the size ofthe grid to grow substantially to represent a fairly large terrain. Also, geologists ex-pect computers to perform a large number of iterations of water flow directions andaccumulations computation to model the change of the landscape over a very longperiod. Due to these facts, the spatial and temporal scalability of landscape evolutionmodelling depends on the computational power of hardware. Unfortunately, as thefree lunch of Moore’s law is over, it will be unwise to expect the hardware performanceimprovement will satisfy the computational demand in the near future.
Computer scientists have made a couple of attempts to overcome this computa-tional bottleneck. TerraFlow has implemented the algorithms with significant I/Ooptimisations for massive size DEMs. [1] However, it still consumes minutes formillion size DEM on a computer with 500MHz processor and 1GB memory. Sincecomputing water flow direction on each grid is a total independent process, and waterflow accumulations on one drainage path does not depend on others, these computa-tions are possible to be performed in parallel. Chase Wallis team has implemented
9
LandscapeEvolutionModeling(simplified)Eachiterationofthesimulation:
FlowRouting
FlowAccumulation
Erosion/Deposition
1 1 3 1 1
7 2 1 1 5
1 1 1 1 1
2 1 1 1 1
1 1 6 1 2
How much material will be removed?How much material will be deposited?
Current sequential versionis much slower than this…
• Each step is ‘fairly’ fast…• But we want to do lots of them 120K to
1M years• On landscapes of 6-56M cells• If we could simulate 1 year in 1 minute
this would take 83 – 694 days!• assuming 1 year = 1 iteration• may need more
ExecutionanalysisofSequentialLEM
• WestartedfromanexistingsequentialLEM• 51x100cellsforjust120Kyearstook72hours
• estimatefor25Mcells64,000years• Thiswasnon-optimalcode
• Reducedexecutiontimefrom72to4.7hours• 64,000yearsdownto300years
• Butthisisstillnotenoughforourneeds
ExecutionanalysisofSequentialLEM
• PerformanceAnalysis:• ~74%oftimespentroutingandaccumulating• Needordersofmagnitudespeedup
• Sofocuswasonflowrouting/accumulation
Outline
• WhatisLandscapeEvolutionModelling(LEM)• ParallelizationofLEM• PreliminaryResults
ParallelFlowRouting
• Eachcellcanbedoneindependentlyofallothers• SFD
• 100%flowinthedirectionofsteepestdecent(normallylowestneighbour)
• MFD• Flowisproportionedbetweenalllowerneighbours
• Proportionaltoslopetoeachneighbours
• Almostlinearspeed-up• Problemswithcodedivergence
• CUDAWarpssplitwhencodecontainsafork
3 2 47 5 87 1 9
3 2 47 5 87 1 9
SingleflowdirectionvsmultipleflowdirectionMFDis‘better’butmuchmorecomputationallydemanding
ParallelAccumulation:CorrectFlow
• Iterate:• Donotcomputeacelluntilithasnoincorrectcellsflowingintoit• Sumallinputsandaddself• Allcellscanworkindependentlyofeachother
• Somerestrictiononupdatesnothappeningimmediately
FlowRouting Accumulation Correct
1 1
1 11 1 1 1 1
1 1
3 2 2
2 2
4 6 3
4
5 6 197 14
Cellvaluesarenotnormally1,buttheinitialrainfallonthecell
Notthewholestory…
• SinksandPlateaus
• Can’tworkoutflowroutingonsinksandplateaus• Needto‘fake’aflowrouting
• Fillasinkuntilitcanflowout• Turneditintoaplateau
• Fakeflowdirectionsonaplateautotheoutlet
ParallelPlateaurouting
• Needtofindtheoutflowofaplateauandflowallwatertoit• Acommonsolutionistouseabreadthfirstsearchalgorithm
• Parallelimplementation• Thoughresultdoeslook‘unnatural’• Alternativepatternsarepossible– butacceptable
• Weareinvestigatingalternativesolutions
Sinkfilling
• Dealingwithasinglesinkis(relatively)simple• Fillsinkuntilweendupwithaplateau(lake)
• Butwhatifwehavemultiplenestedsinks?
NestedSinkfilling
• ImplementedparallelversionofthesinkfillingalgorithmproposedbyArger etal[2003]
• Identifyeachsink(parallel)• Determinewhichcellsflowintothissink- watershed(parallel)• Determinethelowestcelljoiningeachpairofsinks(parallel/sequential)• WorkouthowhighcellsineachsinkneedtoberaisedtotoallowallcellstoflowoutoftheDEM(sequential)
• Fillallsinkcellstothisheight(parallel)
GPGPUSolution:PARALLEM
• MassivelyparallelversionoftheLEM• ForDirection(includingplateauandsinks)andAccumulation
• Processhasnowbeenparallelized• onNVIDIAFermi/Keplerbasedgraphicscards
• TeslaC2050,GTX580,K20,K40,K80• ~twoordersofmagnitudespeedupovertheoptimizedsequentialcode(upto56mcells)
Outline
• WhatisLandscapeEvolutionModelling(LEM)• ParallelizationofLEM• PreliminaryResults
Results:Performance• Overallperformance
0.01
0.1
1
10
100
1000
0.01 0.1 1 10 100
Tim
e (s
)
DEM size
CybErosion-slimTesla single iteration580 single iteration
Tesla average 10580 average 10
(millions)
Results:Performance• FlowDirection
• Includingsink&plateausolution
0.0001
0.001
0.01
0.1
1
10
100
1000
0.001 0.01 0.1 1 10 100
Tim
e (s
)
DEM size (millions)
Sequential Flow DirectionTesla Flow Direction580 Flow Direction
Terraflow Flow Direction
0.001
0.01
0.1
1
10
100
1000
0.001 0.01 0.1 1 10 100
Tim
e (s
)
DEM size (millions)
Sequential Flow AccumulationTesla Flow Accumulation580 Flow Accumulation
Terraflow Flow Accumulation
Results:Performance• FlowAccumulation
0.001
0.01
0.1
1
10
100
1000
0.001 0.01 0.1 1 10 100
Tim
e (s
)
DEM size (millions)
Sequential Flow AccumulationTesla Flow Accumulation580 Flow Accumulation
Terraflow Flow AccumulationTesla K20 Flow Accumulation
0.001
0.01
0.1
1
10
100
1000
0.001 0.01 0.1 1 10 100
Tim
e (s
)
DEM size (millions)
Sequential Flow AccumulationTesla Flow Accumulation580 Flow Accumulation
Terraflow Flow AccumulationTesla K20 Flow Accumulation
Tesla K20 - removed conditionals
3. PARALLEMpipeline
4. UpperThamesDigitalElevationmodel
TheCurrentSimulation
• CoreModelnowextendedwithprocesses• Mostonlyaffectindividualcells(weathering,vegetation)
• SomehavecrossDEMeffects(massmovement)butcanusesameprocessasbefore
12.SpatialpatternsofLandscapechangeovertime
125-120k
20-15k
Notewhiteisdepositiontodarkerosion.
TheCurrentSimulation• ActivelyrunninglandscapemodelsonK40/K80GPGPUs• Taking~7weekstorunourmodel(MFD)
• Leadingtointerestingresults• Notseenasmodelshavetraditionallybeenmuchsmaller
• Currentlyrunningonjust1GPGPU• Runningmultiplemodelssimultaneously
• Nowhaveamulti-GPGPUcodeforrunningflowaccumulation
• Designedto‘sweep’overthelandscape
UpperThamesValley+120K
0
10
20
30
40
50
60
70
80
90
100
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
138
Problem:LandscapeCuttingwithSFD
0
10
20
30
40
50
60
70
80
90
100
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
125 138
0
10
20
30
40
50
60
70
80
90
100
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
115 125 138
0
10
20
30
40
50
60
70
80
90
100
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
70 115 125 138
0
10
20
30
40
50
60
70
80
90
100
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
50 70 115 125 138
0
10
20
30
40
50
60
70
80
90
100
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
25 50 70 115 125 138
0
10
20
30
40
50
60
70
80
90
100
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
20 25 50 70 115 125 138
0
10
20
30
40
50
60
70
80
90
100
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
15 20 25 50 70 115 125 138
0
10
20
30
40
50
60
70
80
90
100
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
10 15 20 25 50 70 115 125 138
0
10
20
30
40
50
60
70
80
90
100
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
5 10 15 20 25 50 70 115 125 138
SFD MFD
Comparing‘cutin’betweenSFDandMFD
70
72
74
76
78
80
82
84
0 500 1000 1500 2000 2500 3000
1k 20kSFD 20kMFD
Problem:AlgorithmSlow-down
• Correctflowalgorithmrequiresallinputcellstobecorrectbeforeprogressing
• Becomesaproblemforrivers
0
20
40
60
80
100
1 10 100 1000 10000
Perc
enta
ge C
ompl
ete
Iteration
• Correctflowcompletionprofile
Summary
• Abletoshow2+ordersofmagnitudespeedupPARALLEM• Significantpotentialforfurtherspeedup
• Optimizationoftheprocesses• Removesequentialization ofcorrectflow
• TheuseofGPGPUshasallowedustoredresstheexecutionrestrictionwhichhaspreventedusdoingMFD– leadingto‘better’landscapes
[email protected]@newcastle.ac.ukJ.Wainwright,S.Liang,M.Rapoportas,A.Trueman,R.Grey,G.KumarVinod,andJamesBell
WeArerecruiting:- 2 PostDoc (MachineLearning)- 1PostDoc (ParallelProgramming)- AlwayslookingforgoodPhDCandidates