Parallel QuickStart Guide

About this QuickStart Guide

This QuickStart guide gives a brief overview of the WestGrid Parallel facility, highlighting some of the features that distinguish it from other WestGrid resources. It is intended to be read by new WestGrid account holders and by current users considering whether to move to the Parallel system. For more detailed information about the Parallel hardware and performance characteristics, available software, usage policies and how to log in and run jobs, follow the links given below.

Introduction

The Parallel cluster adds 7056 cores and 180 general-purpose graphics processing units to the WestGrid computing capabilities. Parallel is intended for multi-node parallel applications that can run in a relatively short time (less than 3 days) and can take advantage of its InfiniBand interconnect or special GPU-based nodes. It can also be used for applications that have license restrictions that prevent them from being run elsewhere. 

Request for access

Unlike most WestGrid systems, a separate request is required to obtain a WestGrid account on Parallel. If you think the software you would like to run is appropriate for the Parallel cluster, please write to accounts@westgrid.ca with a subject line of the form "Parallel account request (your_username)" with a request for an account and a mention of the software you propose to use.

Hardware

Processors

Along with a login node and other machines for job management and file serving, the main part of the Parallel cluster consists of multi-core compute nodes.

There are 528 12-core standard nodes and 60 special 12-core nodes that have 3 general-purpose GPUs each.

The compute nodes are based on the HP Proliant SL390 server architecture, with each node having 2 sockets.  Each socket has a 6-core Intel E5649 (Westmere) processor, running at 2.53 GHz.  The 12 cores associated with one compute node share 24 GB of RAM.

The GPUs are NVIDIA Tesla M2070s, each with about 5.5 GB of memory and what is known as Compute Capability 2.0 (which means that 64-bit floating point calculation is supported, along with other capabilities that make these graphics boards suitable for general-purpose use, beyond just graphics applications).

Interconnect

As mentioned in the introduction, the Parallel cluster is intended for multi-node jobs that make use of the low-latency interconnect. Parallel uses an InfiniBand 4X QDR (Quad Data Rate) 40 Gbit/s switched fabric, with a two to one blocking factor (to reduce purchase costs).

Storage

There are three types of storage locations: home directories, global scratch and local scratch.  The home directories and global scratch are shared among the Breezy, Lattice and Parallel systems.

Home directories

There is approximately 6.5 TB of disk space allocated for home directories. There is a storage quota of 50 GB, with a 200,000-file limit, for each individual home directory.  Use your home directory for files that you want to save for longer than 30 days.  Run jobs from your directory in /global/scratch.

You can check the status of the quota for your home directory with

/usr/local/ibrix/bin/ibrix_quota -f /home

Global scratch directories

A total of about 154 TB is used for global scratch.  A directory /global/scratch/user_name has been set up for each user. The default quota in /global/scratch for individual users is 450 GB, with a 200,000-file limit. If you need an increased quota, please write to WestGrid support

You can check the status of the quota for your /global/scratch directory with

/usr/local/ibrix/bin/ibrix_quota -f /global/scratch

 

30-day policy: Please note that /global/scratch is intended only for files associated with running jobs or waiting for post-processing. Files older than 30 days are subject to deletion.

Local scratch directories

Each compute node has a local scratch directory, /tmp, with approximately 70 GB of storage space.  If you have an I/O intensive program you should consider using /tmp for temporary files associated with your running jobs. Delete files in /tmp at the end of your runs. Note that these local scratch partitions are shared among all users of a given node, so, you are not guaranteed that all the space will be available for any given run.

Software

See the main WestGrid software page for tables showing the installed application software on Parallel and other WestGrid systems, as well as information about the operating system, compilers, and mathematical and graphical libraries.

Please write to WestGrid support if there is additional software that you would like installed.

Using Parallel

Getting started

To log in to Parallel, connect to parallel.westgrid.ca using an ssh (secure shell) client. For more information about connecting and setting up your environment, see the QuickStart Guide for New Users.

Batch job policies

As on other WestGrid systems batch jobs are handled by a combination of TORQUE and Moab software. For more information about submitting jobs, see Running Jobs

Unlike most other WestGrid systems, we prefer that the syntax "-l nodes=xx,ppn=12" be used rather than "-l procs=yyy" when requesting processor resources on Parallel.  Parallel is used almost exclusively for large parallel jobs that use whole nodes.  This has the potential of improving the performance of some jobs and minimizes the impact of a misbehaving job or hardware failure. Since there are 12 cores per node on Parallel, a  ppn (processors per node) parameter of 12 will request that all the processors on a node be used.  Also, it is recommended that you ask for 22-23  GB of memory per node requested, using the mem parameter.  So, a typical job submission on Lattice would look like:

qsub -l nodes=4:ppn=12,mem=88gb,walltime=72:00:00 parallel_diffuse.pbs

The following limits are in place for batch jobs submitted to the default queue (that is, if no queue is specified on qsub command):

 

Resource
Policy or limit
Maximum walltime, but, see below for other comments related to walltime. 72 hours
Suggested maximum memory resource request, mem, per node. 23 GB
Maximum number of running jobs for a single user 64
Maximum cores (sum for all jobs) for a single user 3072 (for example, 256 12-core nodes)
Maximum jobs in Idle queue ?

 

Some nodes have a maximum walltime limit of 24 hours and a few are restricted to just 3 hours.  In particular, most of the GPU-enabled nodes accessed with -q gpu have a 24-hour limit.

Interactive and short test jobs

The login node can be used for short testing and debugging sessions. If you are unsure how much memory your calculations require, you are testing parallel code, or you need to run tests that last more than a few minutes, you should not do your testing on the login node.  Instead, see the Working Interactively section of the Running Jobs page for a method to reserve processors for interactive use, using qsub -I. 

If you do not need to work interactively, but, just need to run short batch jobs to test your software, specifying a short walltime will increase your chances of getting a quick response.

There are a couple of nodes reserved for jobs less than 3 hours that can be accessed by using -q interactive and an appropriately short time limit. For example:

qsub -q interactive -l walltime=03:00:00 job_script.pbs

If you require GPU-enabled nodes, these are available through the interactive queue if you specify the gpus resource along with the nodes and processors you require.  For example:

qsub -q interactive -l walltime=03:00:00,nodes=1:ppn=12:gpus=3 job_script.pbs

Please see the GPU Computations page for more information about running programs that require GPUs.  On that page there is mention of using -q gpu to request nodes with GPUs.  However, for short tests with two nodes or fewer you are probably better off to use -q interactive instead of -q gpu.

If you require exclusive access to a node while testing, you can add naccesspolicy=singlejob, as shown here:

qsub -l walltime=03:00:00,naccesspolicy=singlejob job_script.pbs

Jobs using GPUs

For information about requesting nodes and running jobs that use GPUs, please see the GPU Computations page.

 


Updated 2013-01-28.