Snowpatch QuickStart Guide

Update Apr. 11, 2012

The Snowpatch nodes have been integrated into the Bugaboo cluster and the Snowpatch headnode will disappear. All users who have used Snowpatch in the past should from now on submit their jobs from Bugaboo. No changes to job submission scripts are required.

About this QuickStart Guide

This QuickStart guide gives a brief overview of the WestGrid Snowpatch facility, highlighting some of the features that distinguish it from other WestGrid resources. It is intended to be read by new WestGrid account holders and by current users considering whether to move to the Snowpatch system. For more detailed information about the Snowpatch hardware and performance characteristics, available software, usage policies and how to log in and run jobs, follow the links given below.

Introduction

Snowpatch is a Linux Intel 256-core blade cluster with a Gigabit Ethernet interconnect.

Currently, 100% of the processors are allocated to the ATLAS high-energy physics project. This does not mean that WestGrid users cannot use the Snowpatch cluster. It does mean, however, that ATLAS jobs will have higher priority than other WestGrid jobs and non-ATLAS jobs will run only when there are no ATLAS jobs in the input queue, i.e., when ATLAS is not using all available resources. At the time of writing (Feb. 2011) this is quite often the case and a significant portion of the Snowpatch cluster continues to be used by non-ATLAS jobs.

Hardware

Processors

Snowpatch consists of two HP C-class c7000 blade chassis, outfitted with 16 BL460c blades each. Each (dual-socket) blade has two Intel Xeon X5355 quad-core processors, giving 256 cores in total. The processors run at 2.66 GHz. There are 16 GB of shared memory (RAM) per node (blade), that is, 2 GB per core.

Interconnect

The blades are connected with a Gigabit Ethernet interconnect.

Storage

Each node has about 113 GB of local scratch space in the /scratch directory.

Home directories are NFS mounted from Gridstore. As a consequence, one sees the same home directories as on Gridstore (and Blackhole), Robson and Hydra. On the head node (Snowpatch itself) the data and vault directories from Gridstore (see the Gridstore documentation for details) can be accessed as well. However, data and vault are not mounted on the compute nodes. Thus, if files from the data and/or vault directories are required by a job, those files must be copied to the home directory first. The home directory's primary purpose is to provide storage for jobs running on Snowpatch and Robson. It must not be used for long-term storage. Thus, files that are not used by jobs anymore should be moved to the data or vault directories. This can be done on Snowpatch or more efficiently on Gridstore itself.

Software

A list of the installed software on Snowpatch is available on the WestGrid software web page. Please write to support@westgrid.ca if there is software that you would like installed.

Using Snowpatch

To log in to Snowpatch, connect to snowpatch.westgrid.ca using an ssh (secure shell) client. For more information about connecting and setting up your environment, see the QuickStart Guide for New Users.

Unified User Environment

Snowpatch is the first system that implements the so-called "unified user environment" within WestGrid. This includes the following features:

  • Users do not have to modify the PATH environment variable or other environment variables (like, e.g., LD_LIBRARY_PATH, etc.) to run certain programs. In fact, changing such environment variables is strongly discouraged as this may actually prevent the system from working properly. Basically, issuing commands should "just work" without any changes to the environment.
  • There exist generic commands for compilers: cc, c++, f77, f90. These commands will use the optimized compilers for the particular system, e.g., icc, icpc, ifort on Snowpatch. Thus, the user does not have to know the particular names of the compiler on the system, using these generic names will "just work". While the underlying compilers may still differ from WestGrid system to WestGrid system we guarantee that the generic command will work with the common, non-system specific options. For example, "cc -O3 -o myprog myprog.c" will work on all systems. Furthermore, the environment variables CC, CXX, FC, F77, and F90 have been set such that they point to these generic compilers. [The specific names, e.g., ifc, ifort, gcc, gfortran, etc., can still be used, if so desired].
  • Similarly, MPI programs can be compiled using the generic commands mpicc, mpicxx, mpif77 and mpif90 which in turn will use the same generic compilers mentioned above.
  • It is not necessary to specify directories for library locations when linking with software libraries. The generic compilers will search all relevant directories and will link with the specified libraries, i.e., it is not necessary to specify any -L<directory> options when compiling. In fact, specifying -L<directory> options is discouraged as it may lead to errors.
  • Similarly, the runtime linker will search all relevant directories to find the necessary libraries at runtime. I.e., it is not necessary to compile using flags like
    -Wl,-rpath.<directory> or -R<directory> nor is it necessary to set the LD_LIBRARY_PATH variable.
  • There exist generic library names for the most common libraries, in particular, the BLAS and LAPACK libraries. I.e., the user does not have to know whether the BLAS library is actually Intel's MKL library or AMD's ACML library, or IBM's ESSL library, etc. Compiling with -lblas will link with the BLAS library that is most appropriate for the system. Similarly, -llapack will link with the LAPACK library optimized for the system.
  • MPI programs are executed using the mpiexec interface defined in the MPI-2 standard:
    mpiexec -n <N> <program> <program arguments>

    where <N> is the number of processes to be used, <program> is the name of the program to be run and <program arguments> are its command line arguments (which may be empty).

  • There are two common ways of requesting processors in a torque submission script:

    #PBS -l procs=N

    will request N processors (cores) anywhere on the cluster. Basically, the scheduler will allocate the next N processors that are available for the job. In contrast,

    #PBS -l nodes=n:ppn=m

    will allocate exactly m processors (cores) on n (different) nodes each for the job (i.e., the total number of processors will be N = n*m). Note, that the first method will usually substantially reduce the waiting time of parallel jobs. Thus, the second method should only be used when it is really necessary.

Important: Snowpatch is the first system that implements these ideas and we may not get everything right immediately. Therefore: if you find something that does not work as expected, please send us (support@westgrid.ca) a bug report! Furthermore, we encourage WestGrid users to comment on this "unified environment", send us suggestions, etc. Thanks!

Parallel Programming

Introduction

The Snowpatch environment can be used for interactive development and batch runs involving parallel programs using MPI. Version 2 of MPICH is available.  See Running Jobs for an overview of running batch jobs on WestGrid systems. 

Basic commands for compiling MPI parallel programs are given in the following sections.

OpenMP parallel programming techniques may be used also, but, an OpenMP job can use only the eight processors within a single blade.  Use the -openmp flag when compiling with an Intel compiler and set the OMP_NUM_THREADS environment variable to control the number of threads.

Message Passing Interface (MPI)

Compiling

Scripts are provided for compiling MPI programs and linking them with 64-bit MPICH 2 libraries. To use the Intel compilers, use mpif77, mpif90, mpicc or mpicxx according to the language. For the GCC compilers, mpigcc and mpig++ are available.

These scripts add the appropriate flags so that necessary include files and libraries are found. To see the compiler name and the options being used, use a -show flag.  For example:

mpif90 -show
mpig++ -show

Here are some examples using the Intel compilers

mpif90 -g diffuse.f writeppm.f -o diffuse
mpicc -fast pi.c -lm -o pi
mpicxx -fast pi.C -lm -o pi

and the GCC compilers:

mpigcc -O3 pi.c -lm -o pi
mpig++ -O3 pi.C -lm -o pi

Running

If your program allows, compare the results with a single processor to those from a two-processor run. Gradually increase the number of processors to see how performance scales. After you have learned the characteristics of your code, please do not run with more processors than can be efficiently used.

MPI jobs are run by submitting a script containing the mpiexec command to the TORQUE batch job handling system with the qsub command.

Here is an example of a script to run an MPI program, pn, using 6 processors (3 nodes with 2 processors per node). If the script file is named pn.pbs, submit the job with qsub pn.pbs.

#!/bin/bash

#PBS -l procs=6

# Script for running MPI sample program pn on SnowPatch

cd $PBS_O_WORKDIR

echo "Current working directory is `pwd`"

echo "Running on hosts:"
cat $PBS_NODEFILE

echo "Starting run at: `date`"

mpiexec -n $PBS_NCPUS ./pn

echo "Job finished at: `date`"

The form "./pn" is used to ensure that the program can be run even if "." (the current directory) is not in your PATH.

Source code for the pn program itself is pn.f.

Note that the variable PBS_NCPUS is set by the batch system to the number of processors requested with the procs resource parameter on the qsub command line or in directive lines in the script.

Job Submission Rules

The following rules apply to jobs submitted to the Snowpatch facility

  • the default walltime is 72 hours for serial jobs and 24 hours for parallel jobs;
  • the maximum walltime is 6 days;
  • a user may have running and submitted jobs that request a maximum number of processor-hours (the product of requested processors times walltime) of all jobs together of 256 days; all additional jobs that exceed this limit will be queued under blocked jobs.

Updated 2009-04-09.