WU BLAST is a powerful tool for gene and protein identification, using sensitive, selective and rapid similarity searches of protein and nucleotide sequence databases based on NCBI Blast. For more details, please refer to WU Blast homepage
The 2.0 package provide the following data analysis and utility programs:
blasta - the unified database search program, which provides blastp, blastn, blastx, tblastn, and tblastx search functionality.
xdformat - the recommended program for rapidly converting sequences from FASTA format into the native XDF format read by blasta. The program can also append new sequences to an existing database; automatically rollback on errors; provides flexible indexing and verification services; and can dump data back into FASTA format.
xdget - a flexible tool for retrieving sequences (or segments thereof) from an indexed XDF database; retrieved sequences are optionally reverse-complemented and translated in the case of nucleotide sequences. xdformat and xdget are actually one-and-the-same program, to ensure their compatibility.
nrdb - a tool for rapidly removing trivial redundancy (i.e., duplicate sequences) from one or more input files in FASTA format. A simple hash table is used, combined with data compression techniques to allow larger nucleotide sequence data sets to be manipulated in memory.
patdb - a tool for rapidly removing trivial redundancy, as well as identifying perfect substrings, from one or more input files in FASTA format. A Patricia Tree is used, combined with a Finite State Automaton. This tool is perhaps most useful when applied to protein sequences, which often differ in their inclusion of the initiator methionine or other post-translational modifications. Patdb can also be more practically applied to protein sequences than to nucleotide sequences, because the data compression techniques of the nrdb program, which are so effective with nucleotide sequences, are not employed by patdb.
wu-blastall - a PERL script for converting an NCBI blastall command line into a rough equivalent blasta command line and then invoking blasta. The output is still in WU BLAST format. This is primarily intended as a technology demonstration tool but may also assist users in their migration from NCBI BLAST to the more accurate WU BLAST. For benchmarking of BLASTs, careful tweaking of parameters may be required, but even with great care, benchmarking for speed can still be confounded by inaccuracies in NCBI BLAST.
wu-formatdb - a PERL script for converting an NCBI formatdb command line into the equivalent xdformat command line and then invoking xdformat. This is primarily intended as a technology demonstration tool but may also assist users in their migration from NCBI BLAST to WU BLAST.
pam - a program to compute amino acid substitution scoring matrices having arbitrary scales, using the Dayhoff PAM model.
pressdb.real - the legacy pressdb program for users who are reliant on the NCBI BLAST 1.4 database format for nucleotide sequences.
setdb.real - the legacy setdb program for users who are reliant on the NCBI BLAST 1.4 database format for amino acid sequences.
gb2fasta - a parser to extract nucleotide sequences from GenBank flat files into FASTA format.
gt2fasta - a parser to extract amino acid sequences from CDS features in GenBank flat files and output them in FASTA format.
sp2fasta - a parser to extract protein or nucleotide sequences from EMBL, TrEMBL, or SWISS-PROT database files and output them in FASTA format.
pir2fasta - a parser to extract protein sequences from NBRF PIR database files and output them in FASTA format.
dust - a low-complexity filter for nucleotide sequences (Hancock and Armstrong, 1994; Tatusov and Lipman, unpublished).
xnu - a low-complexity filter for protein sequences (Claverie and States, 1993). The program identifies short-periodicity repeats.
sysblast.sample - a sample configuration file that system administrators may wish to modify and install as /etc/sysblast. Parameter settings in this file can be used to: limit the number of threads employed by each BLAST process; change the default number of threads employed per process; alter the "nice" value for BLAST processes; limit the amount of memory utilized by each BLAST process.
xdformat -n -o mydatabase ./myfasta
run blast jobbsub -q queueName /usr/local/wublast/blastp ./mywublastdb ./myfasta -o out.QUERY
matrix files are located at /usr/local/wublast/matrix