Tag Archives: GPU

21 Dec 2011

  • (abs, pdf) Turk et al., Magnetic Fields in Population III Star Formation
  • (abs, pdf) Turk & Smith, High-Performance Astrophysical Simulations and Analysis with Python
  • (abs, pdf) Nakasato, Implementation of a Parallel Tree Method on a GPU
  • (abs, pdf) Whalen & Fryer, The Formation of Supermassive Black Holes from Low-Mass Pop III Seeds
  • (abs, pdf) Ciardi et al., The effect of intergalactic helium on hydrogen reionisation: implications for the sources of ionising photons at z > 6

30 Nov 2011

  • (abs, pdf) Haschke et al., Metallicity distribution functions of the old populations of the Magellanic Clouds from RR Lyrae stars
  • (abs, pdf) Hopkins et al., Realistic Stellar Feedback & Bulge Formation in Clumpy Disks
  • (abs, pdf) Salvadori & Ferrara, First stars in Damped Lyman Alpha systems
  • (abs, pdf) Hassan et al., Unleashing the Power of Distributed CPU/GPU Architectures: Massive Astronomical Data Analysis and Visualization case study
  • (abs, pdf) Yajima et al., Sub-millimeter brightness of early star-forming galaxies

GPU Meeting (04 Oct 2011)

Lionel London (CRA) – GPU Computing in Matlab

  • Matlab Parallel Computing Toolbox (PCT) vs. Jacket
  • PCT allows for multi-cpu and GPU computing. Limited to 12 cores on the local machine. Very high level.
  • GPU computing with PCT requires an nVidia card with v1.3 compute capabilities. GPGPU with Jacket has relaxed requirements but is very expensive ($4k for 5 licenses!)
  • PCT GPU can run external .cu files.
  • Jacket has 10x more CUDA-enabled functions than Matlab.  It’s cluster capable.

GPU Meeting (20 Sept 2011)

Matt Kinsey: Porting the 2D Wave Equation to the GPU
  • Optimal number of threads per block is 32*n-1, where n is an integer.  The best performance in the example shown was 63 threads per block.
  • Minimum number of blocks per grid is 32, according to the user’s guide.
  • Every time a kernel is called, the memory needs to be pushed from the CPU to the GPU.  Thus it is optimal to minimize the kernel calls.
  • In the 2D wave equation problem, Matt utilized texture memory to reduce the number of kernels to one.  The memory is indexed in a space-filling curve.  This results in better cache locality.
  • With texture memory, one can take advantage of built-in linear interpolation and boundary conditions.  Texture memory can be addressed in 1D, 2D, and 3D.

GPU Papers

  • (abs, pdf) Fluke et al., Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters
  • (abs, pdf) Schive et al., GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics
  • (abs, pdf) Wang et al., Adaptive Mesh Fluid Simulations on GPU
  • (abs, pdf) Jonsson & Primack, Accelerating Dust Temperature Calculations with Graphics Processing Units
  • (abs, pdf) Bédorf et al., A sparse octree gravitational N-body code that runs entirely on the GPU processor