Tag Archives: GPU

GPU Meeting (20 Sept 2011)

Matt Kinsey: Porting the 2D Wave Equation to the GPU
  • Optimal number of threads per block is 32*n-1, where n is an integer.  The best performance in the example shown was 63 threads per block.
  • Minimum number of blocks per grid is 32, according to the user’s guide.
  • Every time a kernel is called, the memory needs to be pushed from the CPU to the GPU.  Thus it is optimal to minimize the kernel calls.
  • In the 2D wave equation problem, Matt utilized texture memory to reduce the number of kernels to one.  The memory is indexed in a space-filling curve.  This results in better cache locality.
  • With texture memory, one can take advantage of built-in linear interpolation and boundary conditions.  Texture memory can be addressed in 1D, 2D, and 3D.

GPU Papers

  • (abs, pdf) Fluke et al., Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters
  • (abs, pdf) Schive et al., GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics
  • (abs, pdf) Wang et al., Adaptive Mesh Fluid Simulations on GPU
  • (abs, pdf) Jonsson & Primack, Accelerating Dust Temperature Calculations with Graphics Processing Units
  • (abs, pdf) Bédorf et al., A sparse octree gravitational N-body code that runs entirely on the GPU processor