Using Celera Assembler (wgs-assembler) on WestGrid


Celera Assembler (wgs-assembler) is a whole-genome shotgun (WGS) DNA sequence assembler for building up long sequences from fragments.  It includes utilities to convert from several common formats for such data. See the wgs-assembler project home page for more information.

Running wgs-assembler on Breezy

The wgs-assembler software has been installed on Breezy in /global/software/wgs.  There is a version-specific subdirectory with the executables, for example, /global/software/wgs/wgs-6.1/bin. For convenience, there is a link to that from the generic name /global/software/wgs/bin.  If you have concerns about referencing the generic name (as the underlying software may be updated to a newer version without warning), use the version-specific path instead.

To use some of the wgs-assembler utilities, the PERL5LIB environment variable has to be modified. A bash shell example is shown here:

export PERL5LIB=/global/software/perl/modules/lib/perl5/site_perl:$PERL5LIB

Some example data for testing the software is in /global/software/wgs/example.

A batch job script for running the example (presuming that the input .frg files have been copied to the directory containing the script) is:

#PBS -S /bin/bash

# Script for running wgs-assembler
# 2010-12-02 DSP


echo "Running on host `hostname`"

echo "Current working directory is `pwd`"

echo "Starting run at: `date`"

export PERL5LIB=/global/software/perl/modules/lib/perl5/site_perl:$PERL5LIB
perl $WGSDIR/runCA -p pging -d testassembly_batch porphyromonas_gingivalis_w83.*

echo "Program finished with exit code $? at: `date`"

For information about submitting batch job scripts, see the Runnings Jobs page. In particular, since many sequence analyses will require large amounts of memory, you are likely to have to set the mem resource parameter on the qsub command line when submitting these jobs.

Running wgs-assembler on Bugaboo

The wgs-assembler software, including the internal package, kmer, has been installed on Bugaboo in /usr/local/wgs, which is identical to the version-specific directory /usr/local/wgs-6.1. The executables are in the default PATH, hence specifying the name of the executable is sufficient and recommended .

Please note that each node of Bugaboo have either 16 GB of memory (8 core nodes) or 24GB of memory (12 core nodes). If you want to use all of the memory of a node, you can add

#PBS -l nodes=1:ppn=12

to your TORQUE (batch job) script. In this case you can use all 12 processors of that node by changing the default number of threads using a command-line argument to the runCA script:

runCA ovlThreads=$PBS_NP ...

(During installation, the default value of threads used in the runCA script was changed from two to one to avoid possible interference with the other jobs, in case your job was assigned to the same node as other jobs.)


Updated 2011-02-28.