Matt Kinsey: Porting the 2D Wave Equation to the GPU
- Optimal number of threads per block is 32*n-1, where n is an integer. The best performance in the example shown was 63 threads per block.
- Minimum number of blocks per grid is 32, according to the user’s guide.
- Every time a kernel is called, the memory needs to be pushed from the CPU to the GPU. Thus it is optimal to minimize the kernel calls.
- In the 2D wave equation problem, Matt utilized texture memory to reduce the number of kernels to one. The memory is indexed in a space-filling curve. This results in better cache locality.
- With texture memory, one can take advantage of built-in linear interpolation and boundary conditions. Texture memory can be addressed in 1D, 2D, and 3D.