GPU Computation


General purpose graphics processing units (GPGPUs, or GPUs for short) can be used to speed up computations in a small, but rapidly growing number of software packages.  At the beginning of 2010, WestGrid established a 16-GPU testbed for visualization and limited GPU-based computation, as part of the Checkers cluster. In early 2012, a much larger GPU-based computational capability became available as part of the Parallel cluster, which has a total of 180 GPUs.

A more detailed description of the GPU facilities on Parallel is given below, along with brief instructions on how to request compute nodes with GPUs for interactive visualization or batch-oriented computations. Users who have used the checkers GPUs for computation in the past should consider migrating their code to Parallel because of the increased number of GPU resources.

For those users that are specifically looking for remote visualization resources, please refer to the Remote Visualization page.

GPU Computing on Parallel


As part of the Parallel cluster, there are 60 special 12-core nodes that have 3 general-purpose GPUs each, for a total of 180 GPUs.

The compute nodes are based on the HP Proliant SL390 server architecture, with each node having 2 sockets.  Each socket has a 6-core Intel E5649 (Westmere) processor, running at 2.53 GHz.  The 12 cores associated with one compute node share 24 GB of RAM.

The GPUs are NVIDIA Tesla M2070s, each with about 5.5 GB of memory and what is known as Compute Capability 2.0 (which means that 64-bit floating point calculation is supported, along with other capabilities that make these graphics boards suitable for general-purpose use, beyond just graphics applications).

Building CUDA and OpenCL-based applications

To set up the NVIDIA compiler environment use:

module load cuda

Add -I/global/software/cuda/include/ to the CPPFLAGS or to your compiler command line as appropriate so that the include files can be found.

Running GPU jobs on Parallel

To request a GPU node on Parallel, select the GPU queue with -q gpu.  Also specify the number of GPUs as a modifier to the nodes resource request as shown in this example. Note that you would normally request full nodes (ppn=12) when working on Parallel, as indicated in the Parallel QuickStart Guide.  However, if you are not using all the GPUs on a node, you can request part of a node by adjusting the ppn and gpus parameters appropriately.  When using part of a node you should also add a mem parameter to specify the memory requirements of the job.

#PBS -S /bin/bash
#PBS -q gpu
#PBS -l nodes=1:ppn=12:gpus=3

echo "Current working directory is `pwd`"

echo "Node file: $PBS_NODEFILE :"
echo "---------------------"
echo "---------------------"
NUM_PROCS=`/bin/awk 'END {print NR}' $PBS_NODEFILE`
echo "Running on $NUM_PROCS processors."

echo "GPU file: $PBS_GPUFILE :"
echo "------------------"
echo "------------------"
NUM_GPUS=`/bin/awk 'END {print NR}' $PBS_GPUFILE`
echo "$NUM_GPUS GPUs assigned."

echo "Starting at `date`"
nvidia-debugdump -l
echo "Finished at `date`"

To monitor the GPU resources being used by a job you can ssh to a node of interest and run the nvidia-smi command.


Using GPUs for remote visualization

WestGrid provides the ability to perform remote visualization using a subset of the GPUs on Parallel. For more information on using the remote visualization capabilities, please refer to the WestGrid Remote Visualization page.


Updated 2012-12-07.