University of Saskatchewan Storage Site (Silo) QuickStart Guide

About this QuickStart Guide

This QuickStart guide gives a brief overview of the WestGrid Silo facility at the University of Saskatchewan, highlighting some of its key features. It is intended to be read by new WestGrid account holders and by current users interested in using Silo.

Introduction

Silo is a primary storage facility at WestGrid with over 3.15 PB (3150 TB) of  spinning disk. It is an archival facility, which means that you should use Silo to store valuable data that you are not actively using for computation.  Unlike most other WestGrid systems, some of the data on Silo is backed up. See Using Silo section for more details.

All data stored on WestGrid systems must be directly related to a research project described in the user's WestGrid profile.  Information about research projects is provided by the user during the account application process and can be updated by sending email to support@westgrid.ca.  Quotas are enforced per account on all filesystems on Silo.  Any requests for increase in quota for any of the three filesystems should be made to support@westgrid.ca.  Requests for large increases in quota may be escalated to the local or national Resource Allocation Committee. 

Two servers are available for accessing the storage facility:

silo.westgrid.ca

Silo is for file and data transfer. Access is via gcp, ssh, scp, sftp, or grid-tools, e.g., globus-url-copy. The shell on Silo is restricted; it can only be used for managing and downloading files. You cannot run programs or scripts on Silo.  For large file transfers, accessing the storage facility via Silo will give you the best performance.

For a list of commands available on Silo, type:

ls /usr/local/rbin

For best performance when copying large amounts of data to the storage facility from another WestGrid machine, use gcp (not scp) to Silo.

hopper.westgrid.ca

This is a processing node for the storage facility. No shell restrictions exist on Hopper. All the files that are accessible on Silo are accessible on Hopper as well: Silo and Hopper share filesystems.  If you wish to do complex operations on files, including editing and running scripts, seek ye Hopper.

Using Silo

There are 2 types of online storage on Silo, both meeting different needs.

Every file is retained on disk until the owner of the file deletes it. If files are stored in a backed up space (e.g. /home), and those files are changed at a later date, multiple versions will be retained on tape. (Note: a new version is only considered to have been created once backups have occurred overnight. For example, if a user edits a file 4 times in the same day, only the most recent version of those edits will be added to the retained backups.) Once a file is deleted by its owner, the last N versions are retained on tape for the retention period, where N varies by filesystem as shown below. After a file has been deleted for a longer period than the retention time, all copies of the file are gone forever.

/home

  • Backed up to tape—1 copy kept
  • retention period for deleted files: 45 days
  • number of retained versions of the same filename: 3
  • default quota: 500 GB

Space in /home on Silo will be made available for every user by default.  /home is a GPFS filesystem.  Because the data in /home is backed up to tape, it is somewhat more expensive storage space than /data.

/data

  • No tape backup—0 copies kept
  • retention period for deleted files: 0 days
  • number of retained versions of the same filename: 0
  • default quota: 1000 GB

Space in /data on Silo will be made available for every user by default.   /data is a GPFS filesystem that is NOT backed up. This is the directory into which a user would store files that they could, in an emergency, regenerate by other means.  In the rare case of damage to the disk system, these files could not be recovered from the storage facility.

This is a good place to store data that meets one or more of the following criteria:

  1. large files that would be convenient to keep but can be re-downloaded
  2. large files that can be recreated or regenerated relatively easily
  3. intermediate results of computation which need to be preserved for a short to medium length of time (as distinct from /scratch or /tmp on clusters)

For data in /home, direct incremental backups from the storage array at the UofS will be performed frequently (every 2-3 days, or nightly if total backup load allows).  WestGrid makes no guarantees that any file can be recovered, regardless of where a file is stored.  Best effort will be made to ensure reliability for all researcher data. WestGrid gives no guarantee regarding the security of data. Although every effort is made to ensure data is not compromised, there is no recovery procedure in place for major disaster (e.g. Acts of God).

 

Hardware

Storage

Disk System

Disk storage:  total 4.2 PB raw, 3.15 PB usable

  • 600 x 1TB SATA drives
  • 1800 x 2TB SATA drives  
  • RAID 6
  • 2 pairs of Dual IBM/DDN DSC9900 Controllers

Tape System

IBM LTO 3584 tape library

  • 6 frames capable of holding 6000 LTO tapes
  • 6 LTO4 drives
  • 6 LTO5 drives
  • 1460 x LTO4 tapes (averaging >1TB/tape with compression)
  • 760 LTO5 tapes (averaging 1.5TB/tape with compression)

Backup Software

  • IBM Tivoli Storage Manager (TSM)

 

 


Updated 2010-09-28