cuda implementation of the lattice boltzmann method · cuda implementation of the lattice boltzmann...

Post on 16-Apr-2018

228 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CUDA Implementation of the Lattice Boltzmann MethodCSE 633 Parallel Algorithms

Andrew Leach

University at Buffalo

2 Dec 2010

A. Leach (University at Buffalo) CUDA LBM Nov 2010 1 / 16

Motivation

The Lattice Boltzmann Method(LBM) solves the Navier-Stokesequation accurately and efficiently.

Uniformity makes it easy to parallelize.

High volume of simple calculations make it ideal for GPGPUcomputing.

A. Leach (University at Buffalo) CUDA LBM Nov 2010 2 / 16

Motivation

The Lattice Boltzmann Method(LBM) solves the Navier-Stokesequation accurately and efficiently.

Uniformity makes it easy to parallelize.

High volume of simple calculations make it ideal for GPGPUcomputing.

A. Leach (University at Buffalo) CUDA LBM Nov 2010 2 / 16

Motivation

The Lattice Boltzmann Method(LBM) solves the Navier-Stokesequation accurately and efficiently.

Uniformity makes it easy to parallelize.

High volume of simple calculations make it ideal for GPGPUcomputing.

A. Leach (University at Buffalo) CUDA LBM Nov 2010 2 / 16

LBM Degrees of Freedom

F5

F8

F2

F3

F7

F6

F1

F4

Each lattice point has an associated mass density

This mass density is projected in 9 directions

A. Leach (University at Buffalo) CUDA LBM Nov 2010 3 / 16

LBM Degrees of Freedom

F5

F8

F2

F3

F7

F6

F1

F4

Each lattice point has an associated mass density

This mass density is projected in 9 directions

A. Leach (University at Buffalo) CUDA LBM Nov 2010 3 / 16

LBM Stream

F8

F1

F5F2

F6

F3

F7F4

At each time step, each neighbor passes mass density

A. Leach (University at Buffalo) CUDA LBM Nov 2010 4 / 16

LBM Collision

F5

F8

F2

F3

F7

F6

F1

F4

Collision occurs with the accepted mass densities

Equillibrium condition is solved

New projected mass densities are assigned

A. Leach (University at Buffalo) CUDA LBM Nov 2010 5 / 16

LBM Collision

F5

F8

F2

F3

F7

F6

F1

F4

Collision occurs with the accepted mass densities

Equillibrium condition is solved

New projected mass densities are assigned

A. Leach (University at Buffalo) CUDA LBM Nov 2010 5 / 16

LBM Collision

F5

F8

F2

F3

F7

F6

F1

F4

Collision occurs with the accepted mass densities

Equillibrium condition is solved

New projected mass densities are assigned

A. Leach (University at Buffalo) CUDA LBM Nov 2010 5 / 16

LBM Boundary Conditions

Bounceback is implemented at solid boundaries

The inlet has predetermined mass density

The outlet accepts outward flow

A. Leach (University at Buffalo) CUDA LBM Nov 2010 6 / 16

LBM Boundary Conditions

Bounceback is implemented at solid boundaries

The inlet has predetermined mass density

The outlet accepts outward flow

A. Leach (University at Buffalo) CUDA LBM Nov 2010 6 / 16

LBM Boundary Conditions

Bounceback is implemented at solid boundaries

The inlet has predetermined mass density

The outlet accepts outward flow

A. Leach (University at Buffalo) CUDA LBM Nov 2010 6 / 16

Code: Data Structures

F1 HOST

Device

Data initialized as an array on host

Pitch stores the width of a row in memory, determined byCudaMallocPitch()

Memory is allocated on the device linear memory withCudaMallocArray()

Array copied from host to device with CudaMemcpy2D()

A. Leach (University at Buffalo) CUDA LBM Nov 2010 7 / 16

Code: Data Structures

F1 HOST

Device

Data initialized as an array on host

Pitch stores the width of a row in memory, determined byCudaMallocPitch()

Memory is allocated on the device linear memory withCudaMallocArray()

Array copied from host to device with CudaMemcpy2D()

A. Leach (University at Buffalo) CUDA LBM Nov 2010 7 / 16

Code: Data Structures

F1 HOST

Device

Data initialized as an array on host

Pitch stores the width of a row in memory, determined byCudaMallocPitch()

Memory is allocated on the device linear memory withCudaMallocArray()

Array copied from host to device with CudaMemcpy2D()

A. Leach (University at Buffalo) CUDA LBM Nov 2010 7 / 16

Code: Data Structures

F1 HOST

Device

Data initialized as an array on host

Pitch stores the width of a row in memory, determined byCudaMallocPitch()

Memory is allocated on the device linear memory withCudaMallocArray()

Array copied from host to device with CudaMemcpy2D()

A. Leach (University at Buffalo) CUDA LBM Nov 2010 7 / 16

Code: Textures

The stream step requires a lot of data retrieval

Texture memory has fast retrieval but limited space

Use cudaBindTextureToArray() to copy data as a texture

A. Leach (University at Buffalo) CUDA LBM Nov 2010 8 / 16

Code: Textures

The stream step requires a lot of data retrieval

Texture memory has fast retrieval but limited space

Use cudaBindTextureToArray() to copy data as a texture

A. Leach (University at Buffalo) CUDA LBM Nov 2010 8 / 16

Code: Textures

The stream step requires a lot of data retrieval

Texture memory has fast retrieval but limited space

Use cudaBindTextureToArray() to copy data as a texture

A. Leach (University at Buffalo) CUDA LBM Nov 2010 8 / 16

Code: Kernels

A kernel is launched on a grid of blocks

Each block consists of threads which will independently run thekernel(SIMD)

What follows is the Kernel for the stream() method. This exampleutilizes a lock-step texture look up.

A. Leach (University at Buffalo) CUDA LBM Nov 2010 9 / 16

Code: Kernels

A kernel is launched on a grid of blocks

Each block consists of threads which will independently run thekernel(SIMD)

What follows is the Kernel for the stream() method. This exampleutilizes a lock-step texture look up.

A. Leach (University at Buffalo) CUDA LBM Nov 2010 9 / 16

Code: Kernels

A kernel is launched on a grid of blocks

Each block consists of threads which will independently run thekernel(SIMD)

What follows is the Kernel for the stream() method. This exampleutilizes a lock-step texture look up.

A. Leach (University at Buffalo) CUDA LBM Nov 2010 9 / 16

Code: Stream

A. Leach (University at Buffalo) CUDA LBM Nov 2010 10 / 16

Runtime Analysis

The following slides contain graphs comparing run times for the LBM on alaptop with 1.3 GHZ processor running sequential C code and a singleTesla GPU running parallel code in CUDA. The change in performancebased on block size is also explored.

A. Leach (University at Buffalo) CUDA LBM Nov 2010 11 / 16

Sequential vs Parallel

A. Leach (University at Buffalo) CUDA LBM Nov 2010 12 / 16

Sequential vs Parallel

A. Leach (University at Buffalo) CUDA LBM Nov 2010 13 / 16

Sequential vs Parallel

A. Leach (University at Buffalo) CUDA LBM Nov 2010 14 / 16

Thank You

Thanks to Dr.Graham Pullan from Cambridge University for letting me useand modify his code.

A. Leach (University at Buffalo) CUDA LBM Nov 2010 15 / 16

Bibliography

Alexander Wagner, A Practical Introduction to the Lattice BoltzmannMethod. North Dakota State University, March 2008.

Graham Pullan, A 2D Lattice Boltzmann Flow Solver Demo.http://www.many-core.group.cam.ac.uk/projects/LBdemo.shtml,University of Cambridge.

A. Leach (University at Buffalo) CUDA LBM Nov 2010 16 / 16

top related