Glacier QuickStart Guide

About this QuickStart Guide

This QuickStart guide provides a brief overview of the WestGrid Glacier facility, indicating its role within WestGrid and highlighting some of the features that distinguish it from other WestGrid resources. It is intended to be read by new WestGrid account holders and by current users considering whether to move to the Glacier system.
For more detailed information about the Glacier hardware and performance characteristics, available software, usage policies and how to log in and run jobs, follow the links given below.

Introduction

NOTE: Glacier is a legacy, 32-bit system and, as such, new WestGrid accounts are NOT created on this machine by default. If you would like an account on this machine, please contact support@westgrid.ca and provide a brief description of your project along with some justification for wanting to use this system. If you are unsure about which WestGrid system to use, please read the following: Choosing Which System To Use.

Glacier is an IBM IBM-eserver-logo cluster, originally with 840 nodes, connected via gigE network. The system is most suitable for serial processing and parallel jobs which do not require a fast interconnect fabric and can use a 32-bit architecture.

The address glacier.westgrid.ca is an alias for 3 head nodes:

nunatak1.westgrid.ca
nunatak2.westgrid.ca
nunatak3.westgrid.ca

(Nunatak is an Inuktitut word meaning "lonely peak", a rock or mountain rising above ice.)

Hardware

Processors

The Glacier cluster was originally comprised of 840 computational nodes (each with two 3.06 GHz Intel Xeon 32-bit processors). Over time, some nodes have been retired, so, as of this writing (2012-11-09) the number of active nodes is down to approximately 700.  The nodes are assigned names of the form ice{i}_{j} where i = [1-60] is a chassis number and j = [1-14] denotes a blade inside the chassis. Nodes in Chassis 1-54 have 2GB of RAM, and nodes in Chassis 55-60 have 4GB.

Interconnect

The interconnect between blades within a chassis is Gigabit Ethernet (GigE). The chassis are connected through four GigE uplinks.

Storage

Storage space is provided through IBM's General Parallel File System (GPFS) - a high-performance shared-disk file system that can provide fast data access from all nodes. A Storage Area Network (SAN) with almost 14 TB of disk space connected directly to 8 storage nodes (moraine1,...,moraine8) is used to fulfill I/O requests from all nodes.

There are two general-access file systems available on Glacier with different characteristics and purposes, as summarized here:

/global/home

  • /global/home/username is your home directory (assigned to the HOME environment variable).
  • Disk space is limited, so, please use this file system to store only your essential data (source code, processed results if "small" in size, etc.)
  • Although the quota command itself may not report this, there is a storage limit of 50 GB in your home directory.
  • If your code creates large data sets do not use this file system as a starting directory for your jobs. Please use /global/scratch instead.
  • We backup the /global/home file system with a 14-day expiration policy (backup frequency every 36h).
  • Size: 15TB .

/global/scratch

  • File system designed for fast changing "large" data sets and work area.
  • Please create a subdirectory of your choice (cd /global/scratch ; mkdir username ) and use it as a starting directory for your jobs.
  • Although the quota command itself may not report this, there is a storage limit of 100 GB for your data in /global/scratch.
  • Note: We do not backup this file system.
  • Size: 8.6TB .

In addition to the above file systems, each compute node has an approximately 35 GB local partition for temporary files associated with running jobs. On the compute nodes, you can access this temporary storage area as either /scratch or /tmp - both directory references point to the same space.  For jobs using many small files (a few MB each, say) use this local scratch storage (/scratch or /tmp) instead of /global/scratch, as the latter is optimized for large files. 

Software

See the main WestGrid software page for tables showing the installed software on Glacier and other WestGrid systems, including information about the operating system and compilers.

Using Glacier

To log in to Glacier, connect to glacier.westgrid.ca using an ssh (secure shell) client. For more information about connecting and setting up your environment, see Setting up Your Computer.

For examples of compiling serial and parallel programs on Glacier, along with sample batch job scripts, see the Glacier programming page.

As on other WestGrid systems batch jobs are handled by a combination of TORQUE and Moab software. For more information about submitting jobs, see Running Jobs. Please note that the maximum walltime limit per job on Glacier is 240 hours.

The maximum number of jobs that a user may have queued to run is 1000.

To facilitate testing and debugging, a couple of Glacier nodes are reserved for short jobs (less than 10 minutes). To requests these nodes, add the debug Quality of Service (QOS) resource request to your job script.

#PBS -l qos=debug,walltime=00:10:00

To improve the startup time of parallel jobs use the parallel QOS request:

#PBS -l qos=parallel

Please note the following details regarding the QOS requests:

- debug QOS : Maximum 4 CPUs ; maximum walltime 10 minutes; uses nodes ice1_1 and ice1_2 - see associated memory limits below.

Note that Glacier nodes have only 2 CPUs, so, if you need to debug with the maximum of 4 CPUs, add nodes=2:ppn=2 to the -l resource list.

- parallel QOS: Minimum 4 CPUs ; maximum walltime 240 hours; uses all nodes except those reserved for QOS:debug.

- normal QOS: Maximum walltime 240 hours; uses all nodes except those reserved for QOS:debug.

Memory specification

A default memory limit of 768 MB is assigned to each job. To override this value use the mem resource request on the qsub command line or batch job script. For example:

#PBS -l mem=1024mb

The maximum value of the mem parameter for a single processor is:

2007mb for the 756 nodes (90% of the cluster) in racks 1-9 (ice1_1,...,ice54_14) and
4005mb for the 84 nodes (10% of the cluster) in rack 10 (ice55_1,...,ice60_14).

The mem parameter is the total memory limit for a job. For a parallel job, the pmem parameter can be used to specify a per-process memory requirement. For example:

#PBS -l nodes=10,mem=20gb,pmem=2gb

means that submitted job needs 10 processors, 20gb of memory with 2gb of RAM per process. Since 2gb (2048 MB) is > 2007mb, this job can be only executed on nodes ice55_1,...,ice60_14. One might expect such a job to wait in the input queue for a longer time than a job that could run on one of the smaller memory nodes.


Updated 2012-11-09.