Computing Facilities

Introduction

The WestGrid computing facilities are distributed among several resource provider sites, with some specialization at each site. WestGrid is connected by good networks so that users can use the system which best fits their needs, regardless of where it is physically located.

WestGrid provides several types of computing systems, since different users' programs will run best on different kinds of hardware. The systems are for high performance computing, so they are something beyond what you would find on a desktop. We have clusters, clusters with fast interconnect, and shared memory systems. Use the system which best fits your needs, not necessarily the one closest to you. Anything else is less than optimal and a waste of valuable resources.

See the QuickStart Guide for New Users introduction to choosing the most appropriate system. For more detailed information about the differences between the WestGrid systems, see the pages in this section.

Serial programs can run on one CPU or core of a compute cluster. Some researchers have a serial program which they need to run many times; they can run multiple copies simultaneously on a cluster.

Parallel programs have multiple processes or threads running at the same time which have some need to communicate with each other. Then the important distinction is how much they need to communicate and how quickly they need to do it.

In order of increasing demands, those programs can run on a regular cluster, a cluster with a fast interconnect, or a shared memory machine. This depends on how the program is written (MPI, OpenMP, threads, etc). How well a parallel program scales will determine how many nodes of a cluster or machine that program should be run on.

Other factors will also affect the decision of which system to run on. For example, the amount of memory available (particularly the amount of memory per processor that is needed), the software which is installed, the restrictions due to software licensing, etc.

WestGrid also has specialized systems. For example, ones with special visualization capabiliites, GPUs, etc. See the QuickStart Guides for more information about each system.

List of facilities by location

  • Simon Fraser University
    • Bugaboo, Storage facility
  • University of Alberta
    • Hungabee, Jasper, Checkers
  • University of British Columbia
    • Glacier, Orcinus
  • University of Calgary
    • Breezy, Lattice, Parallel
  • University of Manitoba
    • Grex
  • University of Saskatchewan
    • Storage facility (Silo and Hopper)
  • University of Victoria
    • Hermes, Nestor, Storage facility

List of facilties by general type

  • Storage
    • USask Storage Facility -- the primary storage site
    • UVic Storage Facility and SFU Storage facility -- for use in special cases where there is a need for large storage close to the compute nodes
  • Shared memory
    • Hungabee
  • Cluster
    • Glacier, Hermes, Breezy (large memory)
  • Cluster with fast interconnect
    • Bugaboo, Checkers, Grex, Jasper, Lattice, Nestor, Orcinus, Parallel
  • Visualization
    • Checkers and Parallel both have special nodes with Graphics Processing Units (GPUs).

Future Plans

Please write to support@westgrid.ca for an update on future plans if you need to know in planning your own research computing activities.

Retired Systems

Some older WestGrid systems have been removed from general service, typically being replaced with more energy-efficient machines with more capability.

Machine name Period of Service Description

Snowpatch

Mar. 2009 - ?

The Snowpatch cluster of 32 computenodes with 16GB of memory and 8 cores each was incorporated into the Bugaboo cluster in Apr. 2012. Thus, this system was not actually retired, the hardware is still in production. The system just no longer exists as a separate cluster.

Gridstore/Blackhole

Gridstore: Jul. 2003-
Aug. 2011

Blackhole:
Jul. 2003- Apr. 2010

The Gridstore/Blackhole facility provided the primary storage services for WestGrid until that function was moved to Silo.  See the WestGrid Data Storage page for details about Silo and other WestGrid storage facilities.

Dendrite/Synapse

Synapse:

April 2005
Oct. 2011

Dendrite:

April 2005

June 2010

Dendrite, one of a pair of IBM Power5-based used for large shared-memory parallel programs was decommissioned after hardware problems. Synapse, with 256 GB of RAM was available through the Cortex front end until the end of October 2011, when it was decomissioned as well. Breezy and soon to be available Hungabee are other machines appropriate for large-memory serial or single-node threaded parallel programs (such as those based on OpenMP).

Hydra

Dec. 2003-
Jan. 2011

This SGI visualization server was a testbed for remote visualization applications for several years. Visualization services now focus on several GPU-equipped nodes of the Checkers cluster.

Lattice

(not to be confused with a current machine with the same name!)

2003-
Oct. 2009

Lattice was a cluster consisting of 36 HP ES45 nodes, 19 HP ES40 nodes, and one additional HP ES45 node dedicated for interactive jobs. Each node had 4 Alpha CPUs and 2-8 GB of memory. The CPUs were clocked at 0.67-1.25 GHz. They provided good floating point performance for their time. The ES45 nodes used a Quadrics interconnect which provided much lower latency and higher bandwidth than commodity networks, making Lattice suitable for demanding parallel jobs.  The current Lattice cluster is targeted at similar jobs, but, has an InfiniBand interconnect.

Lattice was also the home to the commercial Gaussian license for WestGrid.  This service is now provided on Grex.

Matrix July 2005- Mar. 2011 Matrix was a 256-core HP cluster (128 dual-core AMD Opteron-based nodes, running at 2.4 GHz, with 2 GB of RAM per node).  It used an Infiniband interconnect.  Its intended use was MPI-based parallel processing.  This kind of processing is now provided by Lattice and several other WestGrid clusters.
Nexus Sept. 2003
Feb 28 2011
Nexus and related SGI servers provided the main large-memory capability for WestGrid for many years.  Current alternatives for large-memory programs include Breezy and, coming in 2012, Hungabee.
Robson Oct. 2004- Aug. 2011 Robson was a small (56-core) cluster based on 1.6 GHz PowerPC 970 processors in an IBM JS20 BladeCentre configuration.  Each 4-core compute node (blade) shared 4 GB of RAM.  The system was used for a wide range of jobs including serial jobs and parallel jobs with low interprocess communications requirements.  Unique features of Robson include direct access to a large storage facility and a batch environment with a queue for preemptible jobs.
Terminus July 2008- Dec. 2012 Terminus will continue to operate in 2013 at the University of Calgary for researchers there but will no longer be available for WestGrid use.

 


Updated 2012-11-01.