the 2d ising model on gpu clusters
DESCRIPTION
This talk was given by me at the Spring Meeting 2010 of the DPG at Regensburg today, in the division "Dynamics and Statistical Physics".TRANSCRIPT
The 2D Ising Model on GPU Clusters
Benjamin BlockUniversity of Mainz, Institute for Physics
Thanks to: Tobias Preis, Peter Virnau
Overview
• GPUs: Optimized for massively parallel processing
• Previous work: GPU Accelerated Ising Model
• Architecture specific optimization
• GPU clusters begin to establish – Multi GPU implementation useful
T. Preis, P. Virnau, W. Paul, J. J. Schneider:GPU Accelerated Monte Carlo Simulation of the 2D and 3D Ising Model, J. Comp. Phys., 228 (2009)
Ising Model (Ferromagnetism)
T >> TC T ~ TC T << TC
Lattice of spins
Metropolis Monte Carlo
Perform successive spin flips!
Probability: Metropolis criterion
Parallelization of Metropolis Updates
Idea: Update non-interacting domains in parallel
Checkerboard Update
Programming the GPU
Slowglobal
memory
Fastshared
memory
Store spin lattice
Use for local computations
Execute the same code for different data in parallel
Utilize different kinds of memory
Reduce slow memory access
Slowglobal
memory
Fastshared
memory
Idea: Store 4x4 spin blocks in 1 unit of GPU memory
For each parallel thread
Access 16 spins with one memory lookup
Perform local computations in (fast) shared memory
XOR
Update scheme in shared memoryInteger array in shared memory
Perform Computations
(draw random number, evaluate
Metropolis criterion)
Old spins New spinsUpdate pattern
=
Performance measurement
CPU previous
CPU optimized
GPUprevious
GPU optimized
Fair comparison:
Heavily optimized CPU implementation
How to measure performance?Single spin flips per time unit!
~ 20x
~ 200x
Multi GPU communication
Distribute spin lattice among many GPUs
Border information has to be passed between GPUs after each complete update step
Multi-GPU Performance
Measure: Single spin flips per GPU
Communication overhead
Bottleneck forsmall system sizes
Simulation on GPU Clusters
• On 64 GPUs: 256 GB video memory!
• A lattice of 800.000 x 800.000 spins could be processed.
• Processing the whole lattice on 64 GPUs: 3 seconds!
Tesla S1070 UnitAt NEC Nehalem Cluster Stuttgart
128 GPUs
Conclusion
• Optimization is important (CPU and GPU) for fair comparison
• The 2D Ising model is a good candidate for parallel processing on GPU clusters
• Submitted to be published in Computer Physics Communications
• Source code will be made available at www.tobiaspreis.de