UGA logo RCC: Research Computing Center
 
 
Home >
 
 
RESOURCES
SERVICES
Application & Code Development
Consulting
Grantwriting Support

fastacmd

Table of Contents


Introduction

fastacmd retrives FASTA formatted sequences from a blast database, as long as it it was successfully formatted using the '-o' option.


Command line options

The fastacmd options are:

fastacmd 2.2.5   arguments:

  -d  Database [String]  Optional
    default = nr
  -p  Type of file
         G - guess mode (look for protein, then nucleotide)
         T - protein   
         F - nucleotide [String]  Optional
    default = G
  -s  Search string: GIs, accessions and locuses may be used delimited
      by comma. [String]  Optional
  -i  Input file wilth GIs/accessions/locuses for batch
      retrieval [String]  Optional
  -a  Retrieve duplicate accessions [T/F]  Optional
    default = F
  -l  Line length for sequence [Integer]  Optional
    default = 80
  -t  Definition line should contain target gi only [T/F]  Optional
    default = F

    This option is only relevant to non-redundant databases only (ie: 
    protein nr and pataa, as provided in the NCBI ftp site)

-o Output file [File Out] Optional default = stdout -c Use Ctrl-A's as non-redundant defline separator [T/F] Optional default = F -D Dump the entire database in fasta format [T/F] Optional default = F -L Range of sequence to extract (Format: start,stop) 0 in 'start' refers to the beginning of the sequence 0 in 'stop' refers to the end of the sequence [String] Optional default = 0,0 -S Strand on subsequence (nucleotide only): 1 is top, 2 is bottom [Integer] default = 1 -T Print taxonomic information for requested sequence(s) [T/F] default = F -I Print database information only (overrides all other options) [T/F] default = F

Usage

1.) Retrieving a sequence by gi:

fastacmd -d nt -s 555
>gi|555|emb|X65215.1|BTMISATN B.taurus microsatellite DNA (624bp)
ACCTCCACTAGCTTTGTTTGTAGTGATGCTCTGTAGCACCACTGGGAAGCCCTTTAATGAATGTG
CCTTTCCGCAAATCACACACACACAAATACACTTATAGAAACAAGGTGATTTTCTTGAAATAATA
AAACAAAATTTGGAAGAAGATTTTTACTGTCTTAGGAAAAGTAAGGCATTGGAAGGTGGCTAGGT
ATGACATATGAAGTTGCATTTTAAAACTGGAATTGGACAACTGATATTCAGTGATATTTATGCTA
CTACCTTCTAGAATCGAGAGCATGCACCCCACTCTGTACTCTTGCCTGGAGAATCCATGATGAGA
GCCTGGTAGGCTGCAGTCCATGGGGTCACACAGAGTCGGACATGACTGAGCGACTTCACTTTCAC
TTTTCAATTTCATGCATTGGAGCCGGAAATGGCAACCCACTCCAGTGTTCTTGCCTGGAGAATCC
CAGGGATGGGGAAGCCTGGTGGGCTGCTGTCTATGGGGTCGCAGAGAGTCAGACACGACTGAAGT
GACTTAGCAGCAACCTTCTGGAATAAACGCCTCAGGCTTTAAACTCTGGCTTGACCATTCACTAG
CCATGGGATCCACTAGAGTCGACCTGCAGGCATGCAAGC

2.) Printing a summary of database statistics:

fastacmd -d nt -I
Database: All GenBank+EMBL+DDBJ+PDB sequences 
(but no EST, STS, GSS, or phase 0,1 or 2 HTGS sequences) 
           1,711,089 sequences; 7,976,531,563 total letters

File name:
/usr/ncbi/db/blast/nt
   Date: Mar 26, 2003 10:25 PM    Version: 4    
   Longest sequence: 1,421,559 bp

3.) Obtaining a FASTA file from a blast database:

fastacmd -D -d nt -o nt.fsa
[output removed for brevity]

4.) Retrieving only part of a sequence:

fastacmd -d nt -s 555 -L0,32
gi|555:1-32 B.taurus microsatellite DNA (624bp)
ACCTCCACTAGCTTTGTTTGTAGTGATGCTCT

5.) Retrieving taxonomic information for a given sequence:

fastacmd -d nt -s 555 -T
NCBI sequence id: gi|555|emb|X65215.1|BTMISATN
NCBI taxonomy id: 9913
Common name: cow
Scientific name: Bos taurus

Return values

The following exit values are returned:

     0     Completed successfully

     1     An error occurred

     2     Blast database was not found

     3     Failed search (accession, gi, taxonomy info)

     4     No taxonomy database was found

Notes/Troubleshooting

A) Taxonomy information

In order to access to the taxonomy information using fastacmd, the blast databases should have been obtained from the NCBI ftp site (ftp://ftp.ncbi.nih.gov/blast/db) and an additional set of files are needed. These files are archived as taxdb.tar.gz under the same directory as the blast databases on the NCBI ftp site.

Please install these files in the same directory as the blast databases (and do not forget to update your ncbi configuration file to point to this directory).

Here are some of the error messages one might encounter when accessing the taxonomy information from the blast databases:

fastacmd -d testdb -s 555 -T
[fastacmd] ERROR: Taxonomy information not encoded in your blast database.

This blast database does not contain the taxonomy id encoded for this gi/accession. Only preformatted blast databases provided by the NCBI contain taxonomy identifiers encoded (formatdb cannot add this).

fastacmd -d patnt -s 412262 -T
[fastacmd] ERROR: Taxonomy information is not available. 
Please download it from ftp://ftp.ncbi.nih.gov/blast/db/taxdb.tar.gz

Download the required files and install them as described above.

 
Partnering with UGA