GPU Meeting (20 Sept 2011)

Matt Kinsey: Porting the 2D Wave Equation to the GPU

Optimal number of threads per block is 32*n-1, where n is an integer. The best performance in the example shown was 63 threads per block.
Minimum number of blocks per grid is 32, according to the user’s guide.
Every time a kernel is called, the memory needs to be pushed from the CPU to the GPU. Thus it is optimal to minimize the kernel calls.
In the 2D wave equation problem, Matt utilized texture memory to reduce the number of kernels to one. The memory is indexed in a space-filling curve. This results in better cache locality.
With texture memory, one can take advantage of built-in linear interpolation and boundary conditions. Texture memory can be addressed in 1D, 2D, and 3D.

Computational Cosmology at Georgia Tech