QuickStart Guide for New Users

Table of Contents

About this QuickStart Guide

This QuickStart guide is intended to help new WestGrid users find basic information needed to start using WestGrid. It consists primarily of links to other pages on the WestGrid web site. If you do not have a WestGrid account yet, please read the page for prospective users.

Getting Started

After an application for a WestGrid account has been approved, an email is sent to the new user to direct him or her to some of the key pages on the WestGrid web site. This guide gives a more extensive list. We recommend that you go through all of the topics on this page, exploring the links to more detailed information on those particular subjects that are most relevant to you. It is useful to try things as you go along, asking questions at support@westgrid.ca if you encounter difficulties.

Choosing Which System to Use

After receiving confirmation that your WestGrid account has been approved, or even before you have applied for an account if there is some doubt as to the suitability of WestGrid for your computing, we suggest you write to support@westgrid.ca for advice on which WestGrid system to use.

There are a number of factors to consider when choosing a system, the most important typically being whether or not your job runs in parallel and the amount of memory required per process. If the job can be run in parallel, an important criterion to use in determining where to run it is whether it can make use of multiple cluster nodes (such as when using MPI) or has to be run on a shared-memory machine (such as when OpenMP is used). Some general guidelines are:

  • Small-memory serial jobs or undemanding parallel jobs can be run on the Glacier cluster, although this is an aging system and accounts on Glacier are not automatically set up. Some of the newer clusters, such as Checkers may be used for serial jobs, depending on the memory requirements (< 16 GB for Checkers, <24 GB for Hermes, <48 GB for Grex).
  • OpenMP-based parallel jobs and large-memory (> 48 GB) serial jobs should be run on the shared-memory architecture system, Breezy or Hungabee.
  • For MPI-based parallel programs requiring a high-performance interconnect try the Bugaboo, Checkers, Grex, Jasper, Lattice, Nestor, Orcinus or Parallel clusters.  The memory per process and storage requirements may influence our advice on which system to use. For example, Lattice has the least physical memory per core (1.5 GB/core), whereas Grex has the most (4 GB/core).  Bugaboo and Nestor have large associated storage facilities.
  • Jobs that require access to graphics/visualization hardware and software can be run on the visualization nodes of the Parallel cluster.
  • A commercial license for the Gaussian Chemistry software is only available on the Grex cluster. In other cases, availability of certain software libraries may dictate the system to use, but, it may be possible to work around such issues by installing additional software or substituting one library for another. See the main WestGrid software page for tables showing the installed software on WestGrid systems, including information about the operating system and compilers.

We hope to develop a page that will expand on these ideas for determining which system to use. In the meantime, see the Computing Facilities page or other QuickStart guides for links to details about the computing facilities available through WestGrid.

Setting up Your Computer

To connect to and work with WestGrid systems you may have to install one or more software packages on your own computer. Although web browser-based tools may become available for accessing WestGrid in the future, especially as grid services are developed, most users will continue to log in and work directly on remote systems for some time to come.

Terminal client supporting ssh

The most important piece of software you will need is a terminal (client) program that supports the secure shell (SSH) protocol for network communications to remote servers. Linux and Mac OS X users can typically use the built-in terminal programs, whereas Microsoft Windows users often install an additional SSH client, such as PuTTY. PuTTY can be obtained from http://www.chiark.greenend.org.uk/~sgtatham/putty/ . There is an extensive list of SSH clients at http://en.wikipedia.org/wiki/Comparison_of_SSH_clients .

File transfer client supporting scp and sftp

You will also need software that supports secure transfer of files between your computer and the WestGrid machines. The command line programs scp and sftp can be used from within terminal programs on Linux or Mac OS X computers. On Microsoft Windows platforms, similar programs, pscp and psftp come with PuTTY.  WinSCP is a free file transfer program for Microsoft Windows that offers a graphical interface.

X Window display server for graphics

To use graphical programs on WestGrid computers and show the results on your monitor, you will need to run an X Window display server (X server) program on your local computer. You start up such a program and leave it running in the background while using your ssh terminal program. When graphics commands are relayed by your ssh client from the remote WestGrid computer to the X Window display server, it will display the appropriate graphics on your screen. Your keyboard and mouse commands can be relayed in the other direction and passed from your ssh client to the graphics program running on the remote system. The process is called X11 tunnelling or forwarding. For this to work, you should look for an option in the settings or preferences of your ssh client program to turn on X11 tunnelling.

Commercial X Window display servers are available, but, most users can get by with free programs. Linux users will find the X Window support already installed with most distributions. Modern versions of Mac OS X ship with a program called X11, which is not installed by default but is on the system disks. One option for Microsoft Windows users is to install Xming. If installing Xming, you should also install the optional font package. If your graphics hardware does not work well with Xming, you could try Xming-mesa, from the same site.

Connecting and Logging In

To successfully connect to WestGrid systems, your computer's IP address must be correctly registered in the Domain Name System (DNS). To test whether your IP address is suitable, visit http://westgrid.ca/iptest .

To connect to a WestGrid system, start your ssh client and specify the host name of the chosen system and your user name in the connection dialogue box or on the ssh command line, depending on what type of ssh program you are using. Each WestGrid machine to which you can connect has an Internet address of the form machine_name.westgrid.ca. So, for example, to connect to Orcinus from a command-line ssh program, you could type:

ssh your_username@orcinus.westgrid.ca

If your user name on your local system is the same as on WestGrid, you may omit it and simply type:

ssh orcinus.westgrid.ca

To start a session with X11 forwarding turned one can typically use

ssh -X orcinus.westgrid.ca

although from Mac OS X systems, you may have to use

ssh -Y orcinus.westgrid.ca

If you have successfully connected to one of the WestGrid login servers, you will be prompted for a user name and password. The user name is not your full name, nor your email address, but, is the 2- to 8-character name that was entered in the "Requested Username" box when you applied for a WestGrid account. The password to use is the one you specified on that form also. The same password is used for all WestGrid systems. For security, it is stored in an encrypted form. Consequently, if you have forgotten your password, WestGrid administrators will not be able to tell you what it is. Also for security reasons new passwords are not sent via email. Instead, you choose your own new password and enter it on a web form that is validated using a temporary password given to you by telephone. To request a new password, write to support@westgrid.ca and you will be given instructions on who to telephone.

Working Interactively

The hardware at most of the WestGrid sites is set up with one or more servers to which users have direct login access, with the main computational clusters being accessed indirectly, by submitting batch job scripts. The batch jobs run non-interactively when the scheduling system is able to find a time slot with the computational resources needed for the job. However, interactive sessions are typically needed to prepare the batch scripts and input files, compile and debug programs, manage data and post-process results. Some guidelines for working interactively are given in this section.

The UNIX environment

Each of the WestGrid computers runs some version of the UNIX (or Linux) operating system. The program that responds to your typed commands and allows you to run other programs is called the UNIX shell. Examples of a UNIX shell are bash and tcsh. It is useful to have some knowledge of the shell and a variety of other command-line programs that you can use to manipulate files. If you are new to UNIX systems, we recommend that you work through one of the many online tutorials that are available, such as the UNIX Tutorial for Beginners provided by the University of Surrey. The tutorial covers such fundamental topics, among others, as creating, renaming and deleting files and directories, how to produce a listing of your files and how to tell how much disk space you are using.

The UNIX man command (man for "manual") can be used to get information about other commands. For example, a reference page about the ls command, for listing file names and properties, can be displayed by typing:

man ls

The default environment varies from one WestGrid system to another and also depends on which UNIX shell you selected on your WestGrid account application form. The working environment is partially determined by the commands in one or more startup files that are automatically executed every time you log in. For bash shell users, these files may include .bashrc and .bash_profile. For tcsh users, .login and .cshrc are executed. You can customize your environment by editing these files to change such things as the appearance of the shell prompt (the characters that appear at the start of the line when the shell is waiting for you to type a command) and the command path (a list of directories in which the shell will search for commands). Use caution when modifying these files, as inappropriate changes may prevent you from being able to work on the system.

Please note that binary executable files from Microsoft Windows PCs will not run on the WestGrid systems. In order to work with such programs, you must obtain the source code and recompile for use on UNIX or Linux. Not all programs will have such source code available.

File systems

As on other computer systems, in a UNIX environment there is a file system that provides a hierarchy of directories (called folders on some other systems) for storing files. When you log in, you are working in part of the file system called your home directory. You may create files and subdirectories in your home directory, although on some WestGrid systems there is a quota limiting the amount of space you can use. How you organize your files is up to you, but, it might be helpful to create a separate subdirectory for each job that you submit and to have a separate directory for program source code.

When naming files and directories, you will find it easier to navigate the file hierarchy and to reference files in UNIX commands if you do not use spaces in file names. Also, keep in mind that UNIX is case sensitive in most situations, so, for example, Nobel_Prize.exe and nobel_prize.exe refer to different files. Another difference between UNIX and Microsoft Windows environments is that a file suffix, if present, is of no particular significance to the basic UNIX file manipulation commands. So, for example, there is no requirement for executable programs to have an ".exe" suffix.

Besides your home directory, on most of the WestGrid systems there are additional places (/tmp, /scratch and /global/scratch among others) where you can store files and from which you can run programs. Some file systems have more space than others. Sometimes there are performance reasons for choosing one location vs. another. There may also be different usage policies (how long you can keep files and how big they can be) for the various file systems. See the QuickStart guide for the particular system you are using for more information or write to support@westgrid.ca for advice.

Transferring files

When just starting out on WestGrid systems, you will likely have source code or data to be transferred from your own computer or one at your own institution. Similar to the requirement for a terminal program supporting SSH (Secure Shell), WestGrid requires that you use file transfer software that supports SCP (Secure Copy) or SFTP (SSH File Transfer Protocol). Most ssh packages come with additional programs to support these secure file transfer methods.

Once you have files on a WestGrid system, you may move them between directories using the UNIX mv command, or to other WestGrid sites using scp or sftp. We also provide a utility called gcp (grid copy) that efficiently transfers files between WestGrid systems.

For long term storage of large files, consider using the WestGrid storage facility.

One thing to be aware of when transferring files is that there are different conventions for the characters that terminate each line in a text file on UNIX/Linux, Microsoft Windows and Macintosh computers. File transfer software typically has a transfer mode in which line-ending conversion is done automatically. For example, in Microsoft Windows-based programs, files that have a .txt suffix would be treated as text files for which conversion would likely be done, but, C or Fortran source code files having names ending in .c or .f, respectively, might not be recognized as text. You may have to configure your file transfer software to correctly handle files that you commonly use.

Editing files

One choice for creating and editing files, to prepare batch scripts or input for your programs, for example, is to transfer files to your own computer to use a local editor with which you are familiar. However, a better choice for most users is to edit the files directly on the WestGrid system on which they will be used. There are several editors available for you to use, as shown on the software page. Two editors commonly used on UNIX systems are emacs and vi. However, if you are coming from a Microsoft Windows background and have set up your computer with X Windows software, as described above, then, you may prefer the nedit editor. This is a graphical editor, with keyboard shortcuts similar to what would be found on PCs. See the next section for comments about running nedit and other interactive programs.

There are also a number of UNIX commands available for looking at the contents of files. For example, to page through an output file, test.pbs.o31416, the more command can be used:

more test.pbs.o31416

Running interactive programs

To run a program on a UNIX system, type the name of the corresponding executable file on the command line at the shell prompt. The UNIX shell searches for the command only in the directories in a list stored in a variable, PATH. You can see this list by typing:

echo $PATH

If you get a "command not found" error, check for a spelling mistake or a letter typed in the wrong case, or confirm that the directory containing the executable file is in your command path. On some WestGrid systems, the current working directory is not part of the default command path. In such a case, you can either change the PATH or type "./" in front of the command, as in:

./my_command

Many programs (including UNIX commands), take additional arguments, such as numerical parameters or file names, which are listed on the command line after the name of the executable program. Often the command-line arguments are preceded by a dash. For example, to list the last 40 lines of the file, geometry.in, you could use the UNIX tail command:

tail -40 geometry.in

Restrictions on interactive jobs

Since the servers to which you log in are shared by many users, interactive work on those machines should be limited to activities such as editing files, compiling programs or running small, short, tests of your program. The memory and number of processors varies among the login servers, so, the exact policy on the length and size allowed for test runs varies from machine to machine. On some systems there are special queues with short time limits that are intended for batch jobs for testing and debugging. It is also possible to submit a placeholder batch job to reserve one or more dedicated processors, which may then be used for interactive work, without interferring with other users' jobs. See Working Interactively for more information.

Software

Locating installed software

Installed software on WestGrid systems includes the UNIX or Linux operating system and a number of standard utilities that often come with such systems. A number of major commercial and free software packages are also available, as well as compilers and a variety of numerical, graphics and file-manipulation libraries for researchers compiling their own codes. Refer to the main WestGrid software page for details on which packages have been installed on each of the main computational systems. The installation directories have not been standardized, so, please refer to the table at the top of the software page for a list of the directories where software is typically installed on each system.

Installing your own software

You are welcome to install software under your home directory (if the software license allows the software to be used on remote machines that are not under your direct control and which may not be at your home institution). If you need to share a software package with other members of your group, a corresponding UNIX group can be created to control access to the software. Write to support@westgrid.ca for details on how to do this.

See the programming section below for information on compiling your code.

Requesting software installation

If a package was installed for testing or at the request of a limited number of researchers, it may not be listed on the software page. So, if there is a package that you need, there is a chance that it has already been installed, but, not announced. In any case, please write to support@westgrid.ca to ask whether a given software package is available or can be installed.  Normally, commercial software can be requested through an online form, but, the software on which that form was based was broken during an "upgrade" in August 2010.  Until that can be fixed, please direct software requests to support@westgrid.ca .

Software licensing

Although WestGrid has purchased some commercial software, such as the Gaussian chemistry code, there are other packages, such as ABAQUS and MATLAB being run on WestGrid systems using licenses provided by WestGrid institutions, rather than WestGrid itself. There are often limitations on such licenses, in terms of where the software may be run and how many simultaneous copies may be used.

Programming

A general introduction to programming on WestGrid systems is given elsewhere. That page including links to such things as parallel programming tutorials and to a series of pages giving examples of using the main compilers on all the WestGrid systems. Tables on the software page list all the compilers and the numerical (and other) libraries available to link with your code.

If you have used non-standard language features in your code you may need to make some changes in order to get it to run on WestGrid systems. Trying your code with more than one compiler is recommended, as this helps identify non-portable sections of your code that should be improved. Write to support@westgrid.ca if you would like help in porting, debugging or optimizing your code.

Sometimes researchers have chosen to use WestGrid because they want to increase the size of the problem being studied. Running the code on larger data sets can sometimes uncover performance issues or memory access problems. If the code was previously run only on a 32-bit system, moving to a 64-bit environment may require changes if inappropriate assumptions were made regarding the size of some data types, for example.

Another issue that arises when tackling larger problems is the length of time required for the calculation. Some WestGrid systems have job time limits as short as one day. It is recommended that you design your program to include a checkpoint and restart capability. That is, you should periodically write out enough data so that your program can be restarted, if necessary, by reading in that data. That way you can avoid losing the entire calculation if the program doesn't finish before the job time limit is reached.

Running Batch Jobs

The batch environment

As mentioned above, the main WestGrid computational clusters are accessed by submitting batch job scripts from a login server. It is usually not necessary (and in some cases not allowed) to log on to the compute nodes directly. The system software that handles your batch job consists of two pieces: a resource manager (TORQUE) and a scheduler (Moab). Documentation for these packages is available through Cluster Resources. However, typical users will not need to study those details.

Batch job scripts

Batch job scripts are UNIX shell scripts (basically text files of commands for the UNIX shell to interpret, similar to what you could execute by typing directly at a keyboard) containing special comment lines that contain TORQUE directives. TORQUE evolved from software called PBS (Portable Batch System). Consequences of that history are that the TORQUE directive lines begin with #PBS, some environment variables contain "PBS" (such as $PBS_O_WORKDIR in the script below) and the script files themselves typically have a .pbs suffix (although that is not required).

There are small, but, significant differences in the batch job scripts, particularly for parallel jobs, among the various WestGrid systems. Examples for each system, for both serial and parallel jobs are given in the programming documentation. So, if you begin working on one WestGrid system and switch to another, refer to the documentation before submitting jobs on the second system.

Here is an example job script, diffuse.pbs, for a serial job on the Glacier cluster, to run a program named diffuse.

#!/bin/bash
#PBS -S /bin/bash

# Script for running serial program, diffuse, on glacier

cd $PBS_O_WORKDIR
echo "Current working directory is `pwd`"

echo "Starting run at: `date`"
./diffuse
echo "Job finished with exit code $? at: `date`"

Commands for submitting, monitoring and deleting jobs

To submit the script, diffuse.pbs, to the batch job handling system, use the qsub command:

qsub diffuse.pbs

If a job is expected to take longer than the default time limit (typically three hours) or uses more than the default memory, additional arguments may be added to the qsub command line. If diffuse is a parallel program, you also have to specify the number of nodes on which it is to run. For example:

qsub -l walltime=72:00:00,mem=1500mb,nodes=4 diffuse.pbs

Please see the Running Jobs pages or QuickStart guides for the individual WestGrid systems for more information about the walltime, memory and node limits for specific machines.

When qsub processes the job, it assigns it a job ID and places the job in a queue to await execution. To check on the status of all the jobs on the system, type:

showq

To limit the listing to show just the jobs associated with your user name, type:

showq -u username

To delete a job, use the qdel command with the jobid assigned from qsub:

qdel jobid

On some WestGrid systems it is difficult to directly monitor some aspects of a job's progress, so, it is a good idea to make sure that your program periodically writes output to a file. You can then check the contents of that file to see how the program is doing. In other cases, such as when you need to confirm how much memory your job is using, you may have to write to support@westgrid.ca to request that an administrator check on the job for you.

Post-Processing

After having completed some calculations on the WestGrid machines, most researchers will need to post-process some output files.

Managing files

In some cases, after a preliminary examination of the output, there be a way to reduce the volume of data by extracting key numbers and then discarding some of the output. The UNIX grep utility may be helpful in simple cases. A more elaborate process using shell scripts or other programs may be needed. Once the data have been consolidated, files should be backed up, either by transferring them to your own computer or by using the WestGrid storage facility, as mentioned in the section on transferring files, above. If you have a large number of small files, you should consider combining and compressing them with the tar and gzip programs.

Visualization

For most types of calculation, graphical display of the output can be useful for identifying bugs in programs, to help interpret the data and to summarize the results for others. WestGrid has hardware and software at one site specially geared toward remote visualization, however, it is possible to use visualization tools on any of the WestGrid systems. Graphical data analysis needs tend to be quite specific, so, you are encouraged to discuss your particular project with WestGrid support analysts. In some cases it may be feasible to produce graphs or images in batch mode and in other cases, where more interactivity is required, we may recommend using the WestGrid visualization server or transferring the data back to your own computer for visualization there.

Usage Guidelines

Job limits

WestGrid is comprised of a wide range of hardware types, from single node large shared memory machines to clusters consisting of many dual-processor small-memory nodes. The maximum time limit allowed, the maximum number of processors that may be requested, the maximum number of jobs that can run simultaneously, etc. have been set by system adminstrators based on the characteristics of the machines and the role they play in the WestGrid environment. Generally speaking, jobs that request more resources (processors or memory) will have more strict limits than jobs that use less.

How much is reasonable?

In general, you may submit as many jobs as you like as the batch scheduling system will restrict the number that are run at any given time. However, so as not to unnecessarily burden the scheduling system or alarm other users, in most cases you should stage job submission so that you don't have many weeks of work waiting to run. It would be reasonable to submit some tens of jobs, for example, if they last a few days each, or hundreds of jobs if they are only a few hours long. You should plan to monitor your runs regularly.

WestGrid users are also expected to take some responsibility for ensuring that their jobs are running efficiently, through the use of appropriate algorithms and compiler optimization options and linking to optimized libraries when possible. Programs should be tested on small problems before committing to longer runs using more resources. In general, parallel programs run more efficiently on smaller numbers of processors. So, study how the performance of your code depends on the number of processors used and balance the need for quick turnaround of your jobs with overall efficiency (that is, use small numbers of processors unless you have a good reason not to).

Job priorities and the fairshare policy

Users are usually concerned that their jobs may not be progressing in the queue relative to other users. There are a number of factors that affect the priority of the jobs waiting to run. The basic mechanism for determining the priority is called fairshare in which target usage amounts are assigned to each project. When considering which jobs to run, the scheduling software takes into account the past history (typically over a time span of a couple of weeks, with more recent usage weighted more heavily) and compares the amount of processing completed to the target. Priorities of the jobs are raised or lowered so as to try to meet the fairshare targets.

Resource Allocation Committee

In spite of the name, everyone's fair share is not the same. There is a mechanism for requesting enhanced priority if a project's needs for computational or storage resources extend beyond the average. Periodically, applications are solicited for awards from a Resource Allocation Committee (RAC) for this privilege.

Accounting

Project usage statistics are available for viewing by project members by logging on to the Compute Canada Database (CCDB) site.

Getting Help

Beyond this guide, there are a number of resources you can use to get help with WestGrid. These include:

  • WestGrid web site - Explore the WestGrid web site to get information not covered above. See the main WestGrid support page for an overview of technical support and hints about navigating through the site.
  • Support email list - Write to support@westgrid.ca for any kind of support question, such as account or connection problems, advice on system characteristics or which WestGrid system to use, debugging, optimization, parallelization or other programming issues, software enquiries, visualization or data management advice, etc.
  • Training seminars - During the fall and winter, WestGrid offers a series of seminars through video conferencing and, in some cases, by web streaming. Past topics have included an overview of WestGrid facilities, introduction to UNIX, serial and parallel (OpenMP and MPI) programming, submitting jobs and data visualization. See the WestGrid training page for the schedule and list of topics in the next seminar series.
  • Online training - There are numerous online tutorials on topics such as basic UNIX commands, shell scripting and parallel programming. Some of these are referenced on the corresponding WestGrid web pages or you can write to the support list mentioned above for recommendations on material covering specific topics.

Updated 2013-01-23.