Hungabee QuickStart Guide

About this QuickStart Guide

This QuickStart guide gives a brief overview of the WestGrid Hungabee facility, highlighting some of the features that distinguish it from other WestGrid resources. It is intended to be read by new WestGrid account holders and by current users considering whether to move to the Hungabee system. For more detailed information about the Hungabee hardware and performance characteristics, available software, usage policies and how to log in and run jobs, follow the links given below.

Introduction

Hungabee comprises an SGI UV100 login node and an SGI UV1000 computational node. The UV1000 is a large non-uniform memory access (NUMA) shared-memory multiprocessor machine with 2048 cores (Intel Xeon E7 cpu family) and 16TB of RAM. If you have an application that can take advantage of this architecture, please contact us at accounts@westgrid.ca. For help with issues with systems please contact support@westgrid.ca.

This is a picture of Hungabee during installation in December 2011 (click for a larger image in a new window).

Hungabee during installation December 2011.

Request for access

Unlike most WestGrid systems, a separate request is required to obtain a WestGrid account on Hungabee.  If you think the software you would like to run is appropriate for Hungabee, please write to accounts@westgrid.ca with a subject line of the form "Hungabee account request (your_username)" with a request for an account and a mention of the software you propose to use.

Hardware

See http://www.sgi.com/products/servers/uv/specs.html if you need detailed technical information about the hardware.

Interconnect

Hungabee is not a cluster and is therefore not interconnected in the same sense as a cluster. Nevertheless, the UV1000's NUMA architecture is implemented using a combination of Intel's Quickpath technology and SGI's equivalent NUMAlink. The result is that any core is capable of accessing any of the installed memory. However, the speed of access is extremely sensitive to the relative location of processor and memory. The closer they are to one another, the better.

Storage

Scratch space on the UV1000 is provided by two SGI IS5000  storage arrays. Each array contains 80 x 2.5 in. 600 GB SAS drives spinning at ten thousand RPMs. These arrays are directly attached to the UV1000 via 8 fibre channel connections, for a theoretical maximum bidirectional throughput of 6.4 GBytes/s. After RAID and volume configuration, these provide a single high performance file system with 50 TB. This file system is local to the uv1000 and is currently NFS-exported to the UV100.

A Lustre file system is attached to both the U100 and the UV1000 (and the Jasper cluster). This file system is housed in an SGI IS16000 disk array holding 5 x 50-bay drive enclosures containing 250 x 2 TB Sata drives spinning at 7200 rpm and after RAID and volume configuration this provides a single 355 TB filesystem. This parallel file system is available to users on Hungabee and Jasper through a QDR (40 Gbit) Infiniband interconnect.

The current version of Lustre running at the UofA site is 1.8 and this restricts data transfer to a single InfiniBand connection for any single compute node, which is a potential bottleneck for an extremely large single node like the UV1000. For this reason, we encourage users to run jobs from the scratch space by copying files from the Lustre filesystem at the beginning of the job and copying output files back at the end. At this time the Lustre file system is more appropriate for medium term storage for Hungabee users.

Software

See the main WestGrid software page for tables showing the installed application software on Hungabee and other WestGrid systems, as well as information about the operating system, compilers, and mathematical and graphical libraries. Please write to WestGrid support if there is additional software that you would like installed.

Using Hungabee

Getting started

Log in to the UV100 by connecting to the host name hungabee.westgrid.ca using an ssh (secure shell) client. For more information about connecting and setting up your environment, see the QuickStart Guide for New Users. In particular, the environment on Hungabee is controlled using modules. Please see Setting up your environment with modules. Modules for the Intel Compilers and SGI's MPI implementation are loaded by default, so you should be able to compile serial, OpenMP, and MPI programs without executing any module commands.

Home directory

Disk space and file quotas are enforced on Hungabee home directories.  Default limits are as follows:

Disk space:  1.0 TB

File count:  500,000

Quotas can be exceeded by up to 25% for up to 72 hours.

To view your quota information, type:

lfs quota -u <your username> /lustre

If you require more than the default disk space, you should apply for a RAC allocation.  If you require more than the default file count, you should contact support@westgrid.ca.

Working directory

For the best performance, jobs on Hungabee should be submitted from /uv-global/scratch. This is a local disk on the uv1000 and there is an environmental variable, UV_SCRATCH, that points to your personal directory on the local disk. Type

cd $UV_SCRATCH

as soon as you log into Hungabee, or put this step in your login script.

Batch job policies

As on other WestGrid systems batch jobs are handled by a combination of TORQUE and Moab software. For more information about submitting jobs, see Running Jobs.

Although Hungabee is not a cluster, the batch job system treats it as such. This is a limitation of TORQUE and Moab, not of the hardware. From the point of view of the batch system, Hungabee is made up of 256 8-core (virtual) nodes, each with 64GB of memory. This implies that you can not request more than 64GB of memory per core (pmem). The maximum value is actually 65520MB, which is slightly less than 64GB (65536MB).

If you require more memory per core, request more cores. You do not have to use the extra cores. For example, to run a single-core job that requires 4TB of memory, you would have to request at least 512 cores.

 

Resource
Policy or limit
Maximum walltime (hours) 72
Minimum number of cores (procs) 32*
Maximum memory resource request per core, pmem (MB)
65520
Maximum number of running jobs for a single user unlimited
Maximum cores (sum for all jobs) for a single user unlimited
Maximum jobs in Idle queue 5

 

* Jobs with procs less than or equal to 16 will run on uv100.

 

Note: hungabee has a 'special' queue for users whose job requiements won't fit into the default queue policy. Please contact support@westgrid.ca to discuss access to this queue.

Interactive jobs

Except for compiling programs and small tests, interactive use of Hungabee should be through the '-I' option to qsub.

Compiling and running programs

The latest Intel compilers are available on Hungabee. The compiler commands are icc (C compiler), icpc (C++ compiler), and ifort (Fortran compiler). In the sections below, basic use of the compilers is shown for OpenMP and for MPI-based parallel programs.  Additional compiler directives for optimization or debugging should often be used.

OpenMP programs

The Intel compilers include support for shared-memory parallel programs that include parallel directives from the OpenMP standard. Use the -openmp compiler option to enable this support, for example:

icc -o prog -openmp prog.c
ifort -o prog -openmp prog.f90

Before running an OpenMP program, set the OMP_NUM_THREADS environment variable to the desired number of threads using bash-shell syntax:

export OMP_NUM_THREADS=8

or C-shell (tsch-shell) syntax:

setenv OMP_NUM_THREADS 8

according to the shell you are using. Then, to test your program interactively, launch it like you would any other:

./prog

Here is a sample TORQUE job script for running an OpenMP-based program. Note the use of the omplace command to launch the program. This ensures that successive threads are pinned to unique cores for optimal performance.

#!/bin/bash
#PBS -S /bin/bash

## procs should be a multiple of 16
## pmem (per core memory) should avoid multiple of 8 GB

#PBS -l pmem=8190mb
#PBS -l procs=256
#PBS -l walltime=12:00:00
#PBS -m bea
#PBS -M yourEmail@address

cd $PBS_O_WORKDIR

export OMP_NUM_THREADS=$PBS_NP

omplace -nt $OMP_NUM_THREADS ./prog

MPI programs

MPI programs can be compiled using the compiler wrapper scripts mpicc, mpicxx, and mpif90. These scripts invoke the Intel compilers and link against the SGI MPT (multi-processing toolkit) library. Use the mpirun command to launch an MPI program, for example:

mpicc -o prog prog.c
mpirun -np 8 ./prog

After your program is compiled and tested, you can submit large-scale production runs to the batch job system. Here are some sample TORQUE batch job scripts for MPI-based programs.

#!/bin/bash
#PBS -S /bin/bash

## procs should be a multiple of 16
## pmem (per core memory) should avoid multiple of 8 GB

#PBS -l pmem=8190mb
#PBS -l procs=256
#PBS -l walltime=12:00:00
#PBS -m bea
#PBS -M yourEmail@address

cd $PBS_O_WORKDIR

mpirun -np $PBS_NP ./prog > out

This second example is for an MPI job using more than 64 GB of memory per core. The dplace command is used to distribute the MPI tasks evenly among the set of allocated cores and memory nodes.

#!/bin/bash
#PBS -S /bin/bash

#PBS -l procs=32
#PBS -l pmem=65520mb
#PBS -l walltime=12:00:00

cd $PBS_O_WORKDIR

mpirun -np 16 dplace -s1 -c 0-31:2 ./prog > out

 


Updated 2012-06-05.