UGA logo RCC: Research Computing Center
 
 
Home >
 
 
RESOURCES
SERVICES
Application & Code Development
Consulting
Grantwriting Support

RCCBatchBlast

Category | Version | Author | Description
Program on:altix | inQuiry | pcluster | rcluster, IOB

Category(ies): Bioinformatics

Version: 4.0

Author / Distributor:

Yecheng Huang, RCC, UGA

Description:

There is a semi-auto pipeline to run NCBI blast at RCC rcluster.
Split big query file with multiple query sequences into multiple small input files and run blastall(NCBI).

altix: Not available

Back to top


inQuiry: Not available

Back to top


pcluster: Not available

Back to top


rcluster, IOB: running program | Documentation | Installation | System

Running Program:

  • Search utilities
    • rccbatchblast - given sequences in FASTA format, find similar sequences in a BLAST database at rcluster. It splits the inoput files in to chunks and submits all chunks to the queue. It takes all standard options from ncbi blastall. There are two more options:
            -s number of sequences in each unit. The input sequence file will be splitted in to            many smaller size files. This option defines how many sequences in each            splitted file.
            -q The name of the queue. The jobs will be submitted to the queue. For more            detail about queue, please refer to rcc queue
  • Search Result utilities
    • rccbatchblast-check - check the results of rccbatchblast
      * After submit your job, check if your jobs are done.
      * if all jobs succeed, the blast result will merge in output file; number of input sequences, number of result queries, and total CPU time will be summarized in check.report.
      * if jobs failed, or there are duplicated results in units, suspicious folders will be backup with prefix e + original folder name; commands of clean up and resubmission are given at the report.
      * Please check and analyst errors and resubmit. All results are written to check.report
      * In: original fasta file name-of-output-blast-result
      * Out: check.report,output-blast-result
  • Advanced utilities
Refer NCBI Blast, blast database and Bioteam BTBatchBlast for more options.

Make a clean directory, copy your fasta in the folder, and working under this folder.
e.x.

mkdir my-new-folder
cp input.fasta my-new-folder
cd my-new-folder
rccbatchblast -i input.fasta -d targetdatabase -p program-name -b bValue -v vValue -size numbe-of-sequence-in-split-unit -queue queue-name -m mValue

default size-of-split-unit=1000; (for tblastx, we suggest size to 200)
default queue-name=r1-96h; Refer queue at rcluster for more options of queue-name.

The output file will be named at the following rccbatchblast-check

bjobs -u your-user-name

your-user-name: the user who run the above RCCBatchBlast.
To kill all the jobs you submit, use


bkill -u your-user-name 0


rccbatchblast-check infile outfile

infile: Original input fasta file to blast.
outfile:give a name to the result of the balst.

If the check result is all successful, all balst results merge to the output file named at rccbatchblast-check. There is no need to keep the *h foloders. Use teh following to clean up

rm -rf *h

Documentation:

Note: DO NOT use "submit job to the queue".
rccbatchblast is a script which already takes care of the submitting to queue.
Except command is"rccbatchblast", the options are same as NCBI blast, plus the options of queue name and chunk size. please refer to Blast

Installation: iNquiry Package

System(s): Unix

Back to top


 
Partnering with UGA