fastacmd
Table of Contents
Introduction
fastacmd retrives FASTA formatted sequences from a blast database,
as long as it it was successfully formatted using the '-o' option.
Command line options
The fastacmd options are:
fastacmd 2.2.5 arguments:
-d Database [String] Optional
default = nr
-p Type of file
G - guess mode (look for protein, then nucleotide)
T - protein
F - nucleotide [String] Optional
default = G
-s Search string: GIs, accessions and locuses may be used delimited
by comma. [String] Optional
-i Input file wilth GIs/accessions/locuses for batch
retrieval [String] Optional
-a Retrieve duplicate accessions [T/F] Optional
default = F
-l Line length for sequence [Integer] Optional
default = 80
-t Definition line should contain target gi only [T/F] Optional
default = F
This option is only relevant to non-redundant databases only (ie:
protein nr and pataa, as provided in the NCBI ftp site)
-o Output file [File Out] Optional
default = stdout
-c Use Ctrl-A's as non-redundant defline separator [T/F] Optional
default = F
-D Dump the entire database in fasta format [T/F] Optional
default = F
-L Range of sequence to extract (Format: start,stop)
0 in 'start' refers to the beginning of the sequence
0 in 'stop' refers to the end of the sequence [String] Optional
default = 0,0
-S Strand on subsequence (nucleotide only): 1 is top, 2 is bottom
[Integer] default = 1
-T Print taxonomic information for requested sequence(s) [T/F]
default = F
-I Print database information only (overrides all other options) [T/F]
default = F
Usage
1.) Retrieving a sequence by gi:
fastacmd -d nt -s 555
>gi|555|emb|X65215.1|BTMISATN B.taurus microsatellite DNA (624bp)
ACCTCCACTAGCTTTGTTTGTAGTGATGCTCTGTAGCACCACTGGGAAGCCCTTTAATGAATGTG
CCTTTCCGCAAATCACACACACACAAATACACTTATAGAAACAAGGTGATTTTCTTGAAATAATA
AAACAAAATTTGGAAGAAGATTTTTACTGTCTTAGGAAAAGTAAGGCATTGGAAGGTGGCTAGGT
ATGACATATGAAGTTGCATTTTAAAACTGGAATTGGACAACTGATATTCAGTGATATTTATGCTA
CTACCTTCTAGAATCGAGAGCATGCACCCCACTCTGTACTCTTGCCTGGAGAATCCATGATGAGA
GCCTGGTAGGCTGCAGTCCATGGGGTCACACAGAGTCGGACATGACTGAGCGACTTCACTTTCAC
TTTTCAATTTCATGCATTGGAGCCGGAAATGGCAACCCACTCCAGTGTTCTTGCCTGGAGAATCC
CAGGGATGGGGAAGCCTGGTGGGCTGCTGTCTATGGGGTCGCAGAGAGTCAGACACGACTGAAGT
GACTTAGCAGCAACCTTCTGGAATAAACGCCTCAGGCTTTAAACTCTGGCTTGACCATTCACTAG
CCATGGGATCCACTAGAGTCGACCTGCAGGCATGCAAGC
2.) Printing a summary of database statistics:
fastacmd -d nt -I
Database: All GenBank+EMBL+DDBJ+PDB sequences
(but no EST, STS, GSS, or phase 0,1 or 2 HTGS sequences)
1,711,089 sequences; 7,976,531,563 total letters
File name:
/usr/ncbi/db/blast/nt
Date: Mar 26, 2003 10:25 PM Version: 4
Longest sequence: 1,421,559 bp
3.) Obtaining a FASTA file from a blast database:
fastacmd -D -d nt -o nt.fsa
[output removed for brevity]
4.) Retrieving only part of a sequence:
fastacmd -d nt -s 555 -L0,32
gi|555:1-32 B.taurus microsatellite DNA (624bp)
ACCTCCACTAGCTTTGTTTGTAGTGATGCTCT
5.) Retrieving taxonomic information for a given sequence:
fastacmd -d nt -s 555 -T
NCBI sequence id: gi|555|emb|X65215.1|BTMISATN
NCBI taxonomy id: 9913
Common name: cow
Scientific name: Bos taurus
Return values
The following exit values are returned:
0 Completed successfully
1 An error occurred
2 Blast database was not found
3 Failed search (accession, gi, taxonomy info)
4 No taxonomy database was found
Notes/Troubleshooting
A) Taxonomy information
In order to access to the taxonomy information using fastacmd,
the blast databases should have been obtained from the NCBI ftp
site (ftp://ftp.ncbi.nih.gov/blast/db) and an additional set
of files are needed. These files are archived as taxdb.tar.gz
under the same directory as the blast databases on the NCBI ftp
site.
Please install these files in the same directory as the blast
databases (and do not forget to update your ncbi configuration
file to point to this directory).
Here are some of the error messages one might encounter when
accessing the taxonomy information from the blast databases:
fastacmd -d testdb -s 555 -T
[fastacmd] ERROR: Taxonomy information not encoded in your blast database.
This blast database does not contain the taxonomy id encoded
for this gi/accession. Only preformatted blast databases provided
by the NCBI contain taxonomy identifiers encoded (formatdb cannot
add this).
fastacmd -d patnt -s 412262 -T
[fastacmd] ERROR: Taxonomy information is not available.
Please download it from ftp://ftp.ncbi.nih.gov/blast/db/taxdb.tar.gz
Download the required files and install them as described above.