-
Towards Exascale
ROMA
June 2014
Journée des doctorants
-
TowardsExascale
ROMA
-
TowardsExascale
ROMA The story starts with a box ...
... that contains lots of little boxes.
-
TowardsExascale
ROMA The story starts with a box ...
... that contains lots of little boxes.
-
TowardsExascale
ROMA
The Titan SuperComputer:
• 404m2 (the big box)• 299, 008 processor cores
(the small boxes)
• 17.59 PetaFlops
• 8.2 MW• 693.6 TiB of RAM• 240 GB/s transfer speed
to RAM
Image courtesy of Oak Ridge National Laboratory, U.S. Dept. of Energy
-
TowardsExascale
ROMA Then what is Exascale?
×1000
But inthe same box:
-
TowardsExascale
ROMA Then what is Exascale?
×1000
But inthe same box:
-
TowardsExascale
ROMA Then what is Exascale?
×1000
But inthe same box:
-
TowardsExascale
ROMA
-
TowardsExascale
ROMA
Linear algebra, problems get bigger and bigger
Code Aster, Carter(e.g., finite ele-ments)
→Solution of sparsesystemsAx = b
Often the most expensive part in numerical simulation codesSparse direct methods to solve Ax = b:
• Decompose A under the form LU,LDLt or LLt
• Solve the triangular systems Ly = b, then Ux = y3D example in earth science:acoustic wave propagation,27-point finite difference grid
Current goal [Seiscope project]:LU on complete earthn = N3 = 10003
Extrapolation on a 1000× 1000× 1000 grid: 55 exaflops, 200 Tbytesfor factors, 40 TBytes for active memory!
-
TowardsExascale
ROMA
-
TowardsExascale
ROMA
-
TowardsExascale
ROMA
Resilience
The main known problem for Exascale is Resilience.
Time to checkpoint
Time
What if there is 1000×the processing power?It gets worse
-
TowardsExascale
ROMA
Resilience
The main known problem for Exascale is Resilience.
Time to checkpoint
Time
What if there is 1000×the processing power?It gets worse
-
TowardsExascale
ROMA
Resilience
The main known problem for Exascale is Resilience.
Time to checkpoint
Time
What if there is 1000×the processing power?It gets worse
-
TowardsExascale
ROMA
Resilience
The main known problem for Exascale is Resilience.
Time to checkpoint
Time
What if there is 1000×the processing power?It gets worse
-
TowardsExascale
ROMA
Resilience
The main known problem for Exascale is Resilience.
Time to checkpoint
Time
What if there is 1000×the processing power?
It gets worse
-
TowardsExascale
ROMA
Resilience
The main known problem for Exascale is Resilience.
Time to checkpoint
Time
What if there is 1000×the processing power?It gets worse
-
TowardsExascale
ROMA
Resilience
The main known problem for Exascale is Resilience.
Time to checkpoint
Time
What if there is 1000×the processing power?
It gets worse
-
TowardsExascale
ROMA
Resilience
The main known problem for Exascale is Resilience.
Time to checkpointTime
What if there is 1000×the processing power?It gets worse
-
TowardsExascale
ROMA
Fault-tolerance techniques
• Rollback Recovery Strategies: All processors periodicallystop computing and checkpoint (save the state of theparallel application onto resilient storage).
• Coordinated checkpointing, No need to log messages/ All processors need to rollback/ I/O congestion
• Non Coordinated checkpointing/ Need to log messages
• Slowdowns failure-free execution and increasescheckpoint size/time
, Faster re-execution with logged messages• Hierarchical checkpointing
/ Need to log inter-groups messages, Only processors from failed group need to rollback, Faster re-execution with logged messages, Rumor: scales well to very large platforms
-
TowardsExascale
ROMA
Fault-tolerance techniques
• Rollback Recovery Strategies: All processors periodicallystop computing and checkpoint (save the state of theparallel application onto resilient storage).
• Coordinated checkpointing, No need to log messages/ All processors need to rollback/ I/O congestion
• Non Coordinated checkpointing/ Need to log messages
• Slowdowns failure-free execution and increasescheckpoint size/time
, Faster re-execution with logged messages
• Hierarchical checkpointing/ Need to log inter-groups messages, Only processors from failed group need to rollback, Faster re-execution with logged messages, Rumor: scales well to very large platforms
-
TowardsExascale
ROMA
Fault-tolerance techniques
• Rollback Recovery Strategies: All processors periodicallystop computing and checkpoint (save the state of theparallel application onto resilient storage).
• Coordinated checkpointing, No need to log messages/ All processors need to rollback/ I/O congestion
• Non Coordinated checkpointing/ Need to log messages
• Slowdowns failure-free execution and increasescheckpoint size/time
, Faster re-execution with logged messages• Hierarchical checkpointing
/ Need to log inter-groups messages, Only processors from failed group need to rollback, Faster re-execution with logged messages, Rumor: scales well to very large platforms
-
TowardsExascale
ROMA
Replication
Model
• A parallel application comprising n (sequential) processes• Each process replicated g ≥ 2 times• A processing element executes a single replica• The application fails when all replicas in one replica group
have been hit by failures
1 2
. . .
i
. . .
n
Objective
• Show when replication is beneficial to periodiccheckpointing
-
TowardsExascale
ROMA
Replication
Model
• A parallel application comprising n (sequential) processes• Each process replicated g ≥ 2 times• A processing element executes a single replica• The application fails when all replicas in one replica group
have been hit by failures
1 2
. . .
i
. . .
n
Objective
• Show when replication is beneficial to periodiccheckpointing
-
TowardsExascale
ROMA
Replication
Model
• A parallel application comprising n (sequential) processes• Each process replicated g ≥ 2 times• A processing element executes a single replica• The application fails when all replicas in one replica group
have been hit by failures
1 2
. . .
i
. . .
n
Objective
• Show when replication is beneficial to periodiccheckpointing
-
TowardsExascale
ROMA
Replication
Model
• A parallel application comprising n (sequential) processes• Each process replicated g ≥ 2 times• A processing element executes a single replica• The application fails when all replicas in one replica group
have been hit by failures
1 2
. . .
i
. . .
n
Objective
• Show when replication is beneficial to periodiccheckpointing
-
TowardsExascale
ROMA
Replication
Model
• A parallel application comprising n (sequential) processes• Each process replicated g ≥ 2 times• A processing element executes a single replica• The application fails when all replicas in one replica group
have been hit by failures
1 2
. . .
i
. . .
n
Objective
• Show when replication is beneficial to periodiccheckpointing
-
TowardsExascale
ROMA
Prediction
• Predictor (Recall, Precision), Window-based predictions• Predictions must be provided at least Cp seconds in
advance
TimeTR-C TR-C Tlost TR-C
Error(Regular mode)
TimeTR-C Wreg
I
TR-C-Wreg
TR-C
(Prediction without failure)
TimeTR-C Wreg
IError
TR-C-Wreg
TR-C
(Prediction with failure)
C C C D R C
C C Cp C C
C C Cp D R C C
Objective
• Characterize when prediction is useful.
-
TowardsExascale
ROMA
Prediction
• Predictor (Recall, Precision), Window-based predictions• Predictions must be provided at least Cp seconds in
advance
TimeTR-C TR-C Tlost TR-C
Error(Regular mode)
TimeTR-C Wreg
I
TR-C-Wreg
TR-C
(Prediction without failure)
TimeTR-C Wreg
IError
TR-C-Wreg
TR-C
(Prediction with failure)
C C C D R C
C C Cp C C
C C Cp D R C C
Objective
• Characterize when prediction is useful.
-
TowardsExascale
ROMA
Kind of errors
Hard errors
• Easy to detect
• Easy to localize and characterize
• Expensive to correct
Soft errors
• Hard to detect
• Hard to localize and characterize
• Easy to correct (sometimes)
-
TowardsExascale
ROMA
Silent errors
How to spot them
• Add some redundancy
• Error detecting codes
• Selective reliability
How to face them
• Majority vote among the replicas
• Error correcting codes
• Checkpoint recovery
-
TowardsExascale
ROMA
Finding the best trade-off
Let us consider an iterative method
• correction at each step• increases cost of a single iteration• no time wasted for checkpoint• good for low error rates
• checkpointing + detection at each step• small overhead at each iteration (detection)• periodic time loss for checkpointing• checkpoint interval can be tailored on error rate
Solution: combine the two techniques
-
TowardsExascale
ROMA
Finding the best trade-off
Let us consider an iterative method
• correction at each step• increases cost of a single iteration• no time wasted for checkpoint• good for low error rates
• checkpointing + detection at each step• small overhead at each iteration (detection)• periodic time loss for checkpointing• checkpoint interval can be tailored on error rate
Solution: combine the two techniques
-
TowardsExascale
ROMA
Finding the best trade-off
Let us consider an iterative method
• correction at each step• increases cost of a single iteration• no time wasted for checkpoint• good for low error rates
• checkpointing + detection at each step• small overhead at each iteration (detection)• periodic time loss for checkpointing• checkpoint interval can be tailored on error rate
Solution: combine the two techniques
-
TowardsExascale
ROMA
Finding the best trade-off
Let us consider an iterative method
• correction at each step• increases cost of a single iteration• no time wasted for checkpoint• good for low error rates
• checkpointing + detection at each step• small overhead at each iteration (detection)• periodic time loss for checkpointing• checkpoint interval can be tailored on error rate
Solution: combine the two techniques
-
TowardsExascale
ROMA
Dealing with verifications
It is not always possible to use error detection / correctioncodes at each step. What if we still want to use checkpointsand recoveries ?
Problem
• We don’t know when the error occurred
• We don’t know if the last checkpoint is valid
We need a verification mechanism to verify that there were nosilent errors in previous computations and to check thecorrectness of the checkpoints. But this has a cost!
-
TowardsExascale
ROMA
Checkpoints and Verifications
We assume there are no errors during checkpoints (less errorsources when doing I/O).
Simple approach: perform a verification before each checkpointto eliminate risk of corrupted data.
Time
w V C w V C w V C w V C
Is this better?
Time
w C w V C w C w V C w C
-
TowardsExascale
ROMA
Checkpoints and Verifications
We assume there are no errors during checkpoints (less errorsources when doing I/O).
Simple approach: perform a verification before each checkpointto eliminate risk of corrupted data.
Time
w V C w V C w V C w V C
Is this better?
Time
w C w V C w C w V C w C
-
TowardsExascale
ROMA
With k checkpoints and one verification
With multiple checkpoints, the problem is to find when theerror occurred.
Time
Error
V C w C w C w C w C w V R V R V R V
-
TowardsExascale
ROMA
With k checkpoints and one verification
With multiple checkpoints, the problem is to find when theerror occurred.
Time
Error
V C w C w C w C w C w V R V R V R V
-
TowardsExascale
ROMA
With k checkpoints and one verification
With multiple checkpoints, the problem is to find when theerror occurred.
Time
Error
V C w C w C w C w C w V R V R V R V
-
TowardsExascale
ROMA
With k checkpoints and one verification
With multiple checkpoints, the problem is to find when theerror occurred.
Time
Error
V C w C w C w C w C w V R V R V R V
-
TowardsExascale
ROMA
With k checkpoints and one verification
With multiple checkpoints, the problem is to find when theerror occurred.
Time
Error
V C w C w C w C w C w V R V R V R V
Solution
• The problem is very similar with k verifications and onecheckpoint
• With constant C, V and R we can find an optimal solutionto this problem (i.e that minimizes the expectation of theexecution time).
-
TowardsExascale
ROMA
What about DAGs?
Let us consider a Directed Acyclic Graph (DAG) where:
• Nodes represent tasks
• Edges correspond to precedence constraintsWe make several important assumptions on this model:
• All tasks are executed by all the p processors (whichamounts to linearize the task graph and to execute alltasks sequentially)
• Each task has its own undivisible work of size w
Problem: Where do we have to place the checkpoints and theverifications in order to find the optimal expectation of thetime to execute all the tasks without failures?
-
TowardsExascale
ROMA
Starting with simple graphs
We have analytical formulas to compute the expectation of thetime to successfully execute each of these graphs.
• We can find the optimal expectation of the time tosuccessfully execute the fork graph and the linear chainusing a polynomial dynamic programming algorithm.
• The join is probably NP-Complete because of thecombinatorial explosion of the possibilites.
T0
T1
Ti
Tn
T0
Ti
Tn
Tf T0 T1 Ti Tn
Future work: investigate the optimal checkpointing andverification problem for general DAGs.
-
TowardsExascale
ROMA
-
TowardsExascale
ROMA
Memory
Another concern: Bandwidth to Memory:
240Gb/s
When system grows 10 times,
Bandwidth to Memory should grow 20 times!
Since we are not good with architecture, we focus onalgorithms..
-
TowardsExascale
ROMA
Memory
Another concern: Bandwidth to Memory:
240Gb/s
When system grows 10 times,
Bandwidth to Memory should grow 20 times!
Since we are not good with architecture, we focus onalgorithms..
-
TowardsExascale
ROMA
Memory
Another concern: Bandwidth to Memory:
240Gb/s
When system grows 10 times,
Bandwidth to Memory should grow 20 times!
Since we are not good with architecture, we focus onalgorithms..
-
TowardsExascale
ROMA
Memory
Another concern: Bandwidth to Memory:
240Gb/s
When system grows 10 times,
Bandwidth to Memory should grow 20 times!
Since we are not good with architecture, we focus onalgorithms..
-
TowardsExascale
ROMA
Memory
Another concern: Bandwidth to Memory:
240Gb/s
When system grows 10 times,
Bandwidth to Memory should grow 20 times!
Since we are not good with architecture, we focus onalgorithms..
-
TowardsExascale
ROMA
Pebble Game
0/3
0/2
0/4
0/1
Two moves:
• Add a pebble on a vertex.• Remove a pebble from a vertex.
One rule:
• To add pebble on a vertex, all its predecessors must have anumber of pebbles equal to its weight.
One goal:
• All vertices have to be fulfil at least one time and thenumber of used pebbles must be minimized.
-
TowardsExascale
ROMA
Pebble Game
1/3
0/2
0/4
0/1
Two moves:
• Add a pebble on a vertex.• Remove a pebble from a vertex.
One rule:
• To add pebble on a vertex, all its predecessors must have anumber of pebbles equal to its weight.
One goal:
• All vertices have to be fulfil at least one time and thenumber of used pebbles must be minimized.
-
TowardsExascale
ROMA
Pebble Game
0/3
0/2
0/4
0/1
Two moves:
• Add a pebble on a vertex.• Remove a pebble from a vertex.
One rule:
• To add pebble on a vertex, all its predecessors must have anumber of pebbles equal to its weight.
One goal:
• All vertices have to be fulfil at least one time and thenumber of used pebbles must be minimized.
-
TowardsExascale
ROMA
Pebble Game
0/3
0/2
0/4
1/1
Two moves:
• Add a pebble on a vertex.• Remove a pebble from a vertex.
One rule:
• To add pebble on a vertex, all its predecessors must have anumber of pebbles equal to its weight.
One goal:
• All vertices have to be fulfil at least one time and thenumber of used pebbles must be minimized.
-
TowardsExascale
ROMA
Pebble Game
0/3
0/2
0/4
1/1
Wrong
Two moves:
• Add a pebble on a vertex.• Remove a pebble from a vertex.
One rule:
• To add pebble on a vertex, all its predecessors must have anumber of pebbles equal to its weight.
One goal:
• All vertices have to be fulfil at least one time and thenumber of used pebbles must be minimized.
-
TowardsExascale
ROMA
Pebble Game
0/3
0/2
0/4
0/1
pebble counter : 0; number max of pebbles : 0
-
TowardsExascale
ROMA
Pebble Game
1/3
0/2
0/4
0/1
pebble counter : 1; number max of pebbles : 1
-
TowardsExascale
ROMA
Pebble Game
2/3
0/2
0/4
0/1
pebble counter : 2; number max of pebbles : 2
-
TowardsExascale
ROMA
Pebble Game
3/3
0/2
0/4
0/1
pebble counter : 3; number max of pebbles : 3
-
TowardsExascale
ROMA
Pebble Game
3/3
0/2
1/4
0/1
pebble counter : 4; number max of pebbles : 4
-
TowardsExascale
ROMA
Pebble Game
3/3
0/2
2/4
0/1
pebble counter : 5; number max of pebbles : 5
-
TowardsExascale
ROMA
Pebble Game
3/3
0/2
3/4
0/1
pebble counter : 6; number max of pebbles : 6
-
TowardsExascale
ROMA
Pebble Game
3/3
0/2
4/4
0/1
pebble counter : 7; number max of pebbles : 7
-
TowardsExascale
ROMA
Pebble Game
3/3
0/2
3/4
0/1
pebble counter : 6; number max of pebbles : 7
-
TowardsExascale
ROMA
Pebble Game
3/3
0/2
2/4
0/1
pebble counter : 5; number max of pebbles : 7
-
TowardsExascale
ROMA
Pebble Game
3/3
0/2
1/4
0/1
pebble counter : 4; number max of pebbles : 7
-
TowardsExascale
ROMA
Pebble Game
3/3
0/2
0/4
0/1
pebble counter : 3; number max of pebbles : 7
-
TowardsExascale
ROMA
Pebble Game
3/3
1/2
0/4
0/1
pebble counter : 4; number max of pebbles : 7
-
TowardsExascale
ROMA
Pebble Game
3/3
0/4
0/1
2/2
pebble counter : 5; number max of pebbles : 7
-
TowardsExascale
ROMA
Pebble Game
3/3
0/4
1/1
2/2
pebble counter : 6; number max of pebbles : 7
-
TowardsExascale
ROMA
An other modelisation
DefinitionLet G be a DAG with weighted edges and vertices, and π atopological order.
• We define Me(π, x) (memory edges) as the set of edgeseuv such that π(u) < π(x) ≤ π(v)
• We call Cost of π at vertex v the value
Cost(π, v) = w(v) +∑
u∈N+(v)
c(evu) +∑
eux∈Me(π,v)
c(eux)
• We define the Cost of an order as:
Cost(π) = max{Cost(π, v), v ∈ G}
Our goal: minimize Cost(π)
-
TowardsExascale
ROMA
An other modelisation
DefinitionLet G be a DAG with weighted edges and vertices, and π atopological order.
• We define Me(π, x) (memory edges) as the set of edgeseuv such that π(u) < π(x) ≤ π(v)
• We call Cost of π at vertex v the value
Cost(π, v) = w(v) +∑
u∈N+(v)
c(evu) +∑
eux∈Me(π,v)
c(eux)
• We define the Cost of an order as:
Cost(π) = max{Cost(π, v), v ∈ G}
Our goal: minimize Cost(π)
-
TowardsExascale
ROMA
An other modelisation
unprocessedprocessedvertices already
vertices
Figure : Before the processing of v
-
TowardsExascale
ROMA
An other modelisation
unprocessedprocessedvertices already
vertices
Figure : During the processing of v
-
TowardsExascale
ROMA
An other modelisation
unprocessedprocessedvertices already
vertices
Figure : After the processing of v
-
TowardsExascale
ROMA
-
TowardsExascale
ROMA
Energy
One last problem, Energy.
2W /cm2
80W /cm2
8.2MW
Thermal Wall: We cannot improve the clock
efficiency of a chip: it would melt.
-
TowardsExascale
ROMA
Energy
One last problem, Energy.
2W /cm2
80W /cm2
8.2MW
Thermal Wall: We cannot improve the clock
efficiency of a chip: it would melt.
-
TowardsExascale
ROMA
Energy
One last problem, Energy.
2W /cm2
80W /cm2
8.2MW
Thermal Wall: We cannot improve the clock
efficiency of a chip: it would melt.
-
TowardsExascale
ROMA
Energy
One last problem, Energy.
2W /cm2
80W /cm2
8.2MW
Thermal Wall: We cannot improve the clock
efficiency of a chip: it would melt.
-
TowardsExascale
ROMA
Energy
One last problem, Energy.
2W /cm2
80W /cm2
8.2MW
Thermal Wall: We cannot improve the clock
efficiency of a chip: it would melt.
-
TowardsExascale
ROMA
Speed Scaling
One can modify the execution speed f of any task,f ∈ [fmin, fmax].
Let Ti of weight wi executed on processor pj :
time
pj · · · · · ·
Exe(wi , fi )
fi
Exe(wi , fi )
fi
-
TowardsExascale
ROMA
Speed Scaling
One can modify the execution speed f of any task,f ∈ [fmin, fmax].
Let Ti of weight wi executed on processor pj :
time
pj · · · · · ·
Exe(wi , fi )
fi
Exe(wi , fi )
fi
-
TowardsExascale
ROMA
Speed Scaling
One can modify the execution speed f of any task,f ∈ [fmin, fmax].
Let Ti of weight wi executed on processor pj :
time
pj · · · · · ·
Exe(wi , fi )
fi
Exe(wi , fi )
fi
-
TowardsExascale
ROMA
The energy consumption of the execution of task Ti at speed fi :
Ei (fi ) = Exe(wi , fi )f 3i = wi f 2i
→ (Dynamic part of the classical energy model)
-
TowardsExascale
ROMA
Unfortunately some more drawbacks (reliability):
fi
Ri (fi )
frel
Ri (frel)
Ri (fi ) ≈ 1− λ0e−dfi Exe(wi , fi )
-
TowardsExascale
ROMA
Unfortunately some more drawbacks (reliability):
fi
Ri (fi )
frel
Ri (frel)
Ri (fi ) ≈ 1− λ0e−dfi Exe(wi , fi )
-
TowardsExascale
ROMA
Unfortunately some more drawbacks (reliability):
fi
Ri (fi )
frel
Ri (frel)
Ri (fi ) ≈ 1− λ0e−dfi Exe(wi , fi )
-
TowardsExascale
ROMA A solution: two executions!
Ri = 1− (1− Ri (f(1)i ))(1− Ri (f
(2)i ))
time
p1
p2
T(1)i f
(1)i
T(2)i f
(2)i
ti
Energy consumption with two executions:
Ei = wi
(f(1)i
)2+ wi
(f(2)i
)2
fi
Ei (fi )
wi f2i + wi f
2i = 2Ei (fi )
frel
Ei (frel)
frel√2
-
TowardsExascale
ROMA A solution: two executions!
Ri = 1− (1− Ri (f(1)i ))(1− Ri (f
(2)i ))
time
p1
p2
T(1)i f
(1)i
T(2)i f
(2)i
ti
Energy consumption with two executions:
Ei = wi
(f(1)i
)2+ wi
(f(2)i
)2
fi
Ei (fi )
wi f2i + wi f
2i = 2Ei (fi )
frel
Ei (frel)
frel√2
-
TowardsExascale
ROMA A solution: two executions!
Ri = 1− (1− Ri (f(1)i ))(1− Ri (f
(2)i ))
time
p1
p2
T(1)i f
(1)i
T(2)i f
(2)i
ti
Energy consumption with two executions:
Ei = wi
(f(1)i
)2+ wi
(f(2)i
)2
fi
Ei (fi )
wi f2i + wi f
2i = 2Ei (fi )
frel
Ei (frel)
frel√2
-
TowardsExascale
ROMA A solution: two executions!
Ri = 1− (1− Ri (f(1)i ))(1− Ri (f
(2)i ))
time
p1
p2
T(1)i f
(1)i
T(2)i f
(2)i
ti
Energy consumption with two executions:
Ei = wi
(f(1)i
)2+ wi
(f(2)i
)2
fi
Ei (fi ) wi f 2i + wi f2i = 2Ei (fi )
frel
Ei (frel)
frel√2
-
TowardsExascale
ROMA A solution: two executions!
Ri = 1− (1− Ri (f(1)i ))(1− Ri (f
(2)i ))
time
p1
p2
T(1)i f
(1)i
T(2)i f
(2)i
ti
Energy consumption with two executions:
Ei = wi
(f(1)i
)2+ wi
(f(2)i
)2
fi
Ei (fi ) wi f 2i + wi f2i = 2Ei (fi )
frel
Ei (frel)
frel√2
-
TowardsExascale
ROMA A solution: two executions!
Ri = 1− (1− Ri (f(1)i ))(1− Ri (f
(2)i ))
time
p1
p2
T(1)i f
(1)i
T(2)i f
(2)i
ti
Energy consumption with two executions:
Ei = wi
(f(1)i
)2+ wi
(f(2)i
)2
fi
Ei (fi ) wi f 2i + wi f2i = 2Ei (fi )
frel
Ei (frel)
frel√2
-
TowardsExascale
ROMA
To sum up
We need to find for each task:
• the number of execution (one or two)• their speed• their mapping (processor)
In order to minimize the energy consumption under theconstraints:
• ∀i , ti ≤ D (bounded makespan)• ∀i , Ri (Ti ) ≥ Ri (frel) (minimum reliability)
-
TowardsExascale
ROMA
Two kind of results
Theoretical:
• FPTAS for linear chains;• Inapproximability for independent tasks;• With a relaxation on the makespan constraint (β), we can
approximate the optimal solution within 1 + 1β2
, for all
β ≥ max(
2− 32p+1 , 2−p+24p+2
).
But also simulations for general DAGs.
-
TowardsExascale
ROMA
-
TowardsExascale
ROMA
Sparse direct solution: main research issues
Code Aster,EDF Pump,nuclear backupcircuit
01234D
epth
(km
)
0Dip (km)
5
10
15
20
Cros
s (km
)
5 10 15 20
3000 4000 5000 6000m/s
Frequency domainseismic modeling,Helmholtz equa-tions, SEISCOPEproject
Extrapolation on a 1000× 1000× 1000 grid:55 exaflops, 200 Tbytes for factors, 40 TBytes for active memory!
Main algorithmic issues
• Parallel algorithmic issues: synchronization avoidance,mapping irregular data structures, scheduling.
• Performance scalability: time but also memory/proc whenincreasing number of processors (and problem size).
• Numerical issues: numerical accurary, hybrid iterative-directsolvers, application (elliptic PDEs) specific solvers
-
TowardsExascale
ROMA
Execution of malleable task trees
• It is one of the problems submitted to Exascale• Motivation: linear algebra, sparse matrix factorisations. . .• Principle: many processors available → can parallelize the
tree but also the tasks
• Difficulty: parallelisation is not perfect; the moreprocessors we allocate to a task, the more losses occur
0
1 2 3 4
11 12 13 14 21 22 23
121 122 123 124 231 232
-
TowardsExascale
ROMA
• In the model developed (time to complete a task of lengthL with p processors is L/pα for 0 < α < 1): makespan-optimal processor allocation to the tree looks likeelectricity charges repartition → nice structure to workwith
• This model forgets some relevant constraints as memorylimit or granularity: other models are designed to handlethem
0
1 2 3 4
11 12 13 14 21 22 23
121 122 123 124 231 232
-
TowardsExascale
ROMA
• In the model developed (time to complete a task of lengthL with p processors is L/pα for 0 < α < 1): makespan-optimal processor allocation to the tree looks likeelectricity charges repartition → nice structure to workwith
• This model forgets some relevant constraints as memorylimit or granularity: other models are designed to handlethem
0
1 2 3 4
11 12 13 14 21 22 23
121 122 123 124 231 232
-
TowardsExascale
ROMA
• In the model developed (time to complete a task of lengthL with p processors is L/pα for 0 < α < 1): makespan-optimal processor allocation to the tree looks likeelectricity charges repartition → nice structure to workwith
• This model forgets some relevant constraints as memorylimit or granularity: other models are designed to handlethem
0
1 2 3 4
11 12 13 14 21 22 23
121 122 123 124 231 232
-
TowardsExascale
ROMA
• In the model developed (time to complete a task of lengthL with p processors is L/pα for 0 < α < 1): makespan-optimal processor allocation to the tree looks likeelectricity charges repartition → nice structure to workwith
• This model forgets some relevant constraints as memorylimit or granularity: other models are designed to handlethem
0
1 2 3 4
11 12 13 14 21 22 23
121 122 123 124 231 232
-
TowardsExascale
ROMA
• In the model developed (time to complete a task of lengthL with p processors is L/pα for 0 < α < 1): makespan-optimal processor allocation to the tree looks likeelectricity charges repartition → nice structure to workwith
• This model forgets some relevant constraints as memorylimit or granularity: other models are designed to handlethem
0
1 2 3 4
11 12 13 14 21 22 23
121 122 123 124 231 232
-
TowardsExascale
ROMA
-
TowardsExascale
ROMA