SSAHA (Sequence Search and Alignment by Hashing Algorithm) is an algorithm for very fast matching and alignment of DNA sequences. It achieves its fast search speed by encoding sequence information in a perfect hash function.SSAHA2 is a package combining SSAHA with cross_match.
More details are at project site.
ssaha2Build: This program constructs the hashtable
required by the other ssaha2 programs.
This provides an index for the given subject
sequence.
ssaha2: This program aligns query sequences against a
subject hashtable.
ssahaSNP: ssahaSNP is a polymorphism detection tool.
It detects homozygous SNPs and indels by aligning
shotgun reads to the finished genome sequence.
From the best alignment, SNP candidates are
screened, taking into account the quality value
of the bases with variation as well as the quality
values in the neighbouring bases, using
neighbourhood quality standard (NQS).
ssaha2 Server: This program reads in previously constructed
hashtables and runs the DNA search engine. Clients
are accepted over a TCP connection which pass query
sequences from the user.
where subject_file and query_file are fasta or fastq files with the
reference (subject) and query sequence files. hash_name is the root
name of the hash table files for the reference sequence created
using ssaha2Build.
OPTIONS:
-h, -help Print this page.
-v, -version Print version information.
-c, -cookbook Print some example parameter sets, suitable
for common tasks.
All other options, except the -solexa flag and the -output ssaha2 cigar option,
are
key-value pairs, values are described for the keys below (defaults in brackets):
-save <FILENAME>
ssaha2Build: Root name of the files to which the
hash table is saved. The set of files
have the extensions FILENAME.head FILENAME.body
FILENAME.name FILENAME.base FILENAME.size
ssaha2, ssahaSNP: Read hash table from files created by ssaha2Build.
If -save is not specified then the first file name
specifies the reference sequence for which a hash
table is constructed on the fly
-kmer Word size for ssaha hashing (12).
-skip Step size for ssaha hashing (12).
-ckmer Word size for cross_match matching (10).
-cmatch Minimum match length for cross_match matching (14).
-cut Number of repeats allowed before this kmer is ignored (10000).
-seeds Number of kmer matches required to flag a hit (5).
-depth Number of hits to consider for alignment (50).
-memory: Memory assigned in MBs for the alignment matrix (200).
-score Minimum score for match to be reported (30).
-identity Minimum identity for match to be reported (50.000000).
-port Port number for server (60000).
-align If set to > 0, output graphical alignment (0).
If set to 2 and -solexa flag is set: output also quality score.
-edge Augment hit by this many bases before alignment (200).
-array: Memory assigned in bytes for frequency arrays (4000000).
-start: first sequence to process in query (0).
-end: last sequence to process in query, 0 means process all (0).
-sense Allow really patchy hits to go for alignment (0).
-best If set to 1, only report the best alignment for each
match, if multiple best scores report all (0).
-454: Tune for 454 reads if set to 1, otherwise tune for ABI reads (0).
-NQS: Use NQS to filter SNPs if set to 1, otherwise output all candidates
(1).
-quality: Quality value to use for variation base in NQS (23).
-tags: If set to 1, prefix added to output summary lines to
aid parsing, the prefix depends upon the chosen
output format, e.g. if output is ssaha2 then the
prefix is ALIGNMENT (1).
-output: ssaha2 - original ssaha2 line only (default)
sugar - Simple UnGapped Alignment Report
cigar - Compact Idiosyncratic Gapped Alignment Report
vulgar - Verbose Useful Labelled Gapped Alignment Report
psl - Tab separated format similar to BLT
- http://genome.ucsc.edu/goldenPath/help/customTrack.ht
ml
pslx - Tab separated format with sequence
gff - http://www.sanger.ac.uk/Software/formats/GFF/
ssaha2 cigar - alternate between ssaha2 and cigar format lines
for a full description of output formats see:
http://www.sanger.ac.uk/Software/analysis/SSAHA2/formats.shtml
-name Flag that modifies option '-output cigar' such that read name
and length are also reported when there was no hit found.
-diff: Output all hits within diff of the best (-1).
-udiff: Ignore best hit if second best score within udiff (0).
-fix: If set to 1, fix -edge, -seeds, -score so that they
are not updated according to read length in ssahaSNP (0).
-disk: If set to 1, read hashtable from disk rather than
loading to memory (0)
-weight: If >0, apply this much weighting to rare kmers (0).
-solexa: implies (ssaha2 and ssahaSNP only):
-seeds 2 -score 12 -sense 1 -cmatch 10 -ckmer 6 -skip 1.
Top scoring hits with lower quality at the mismatch positions have
their Smith-Waterman score incremented by 1. Mapping scores are
changed accordingly. SsahaSNP reports in such cases only the top
scoring hit (no Repeat lines).
ssaha2Build [OPTIONS] -save hash_name subject_file
ssaha2 [OPTIONS] -save hash_name query_file
ssaha2 [OPTIONS] subject_file query_file
ssahaSNP [OPTIONS] -save hash_name query_file
ssahaSNP [OPTIONS] subject_file query_file
ssaha2Server [OPTIONS] hash_name
where subject_file and query_file are fasta or fastq files with the
reference (subject) and query sequence files. hash_name is the root
name of the hash table files for the reference sequence created
using ssaha2Build.
OPTIONS:
-h, -help Print this page.
-v, -version Print version information.
-c, -cookbook Print some example parameter sets, suitable
for common tasks.
All other options, except the -solexa flag and the -output ssaha2 cigar option,
are
key-value pairs, values are described for the keys below (defaults in brackets):
-save <FILENAME>
ssaha2Build: Root name of the files to which the
hash table is saved. The set of files
have the extensions FILENAME.head FILENAME.body
FILENAME.name FILENAME.base FILENAME.size
ssaha2, ssahaSNP: Read hash table from files created by ssaha2Build.
If -save is not specified then the first file name
specifies the reference sequence for which a hash
table is constructed on the fly
-kmer Word size for ssaha hashing (12).
-skip Step size for ssaha hashing (12).
-ckmer Word size for cross_match matching (10).
-cmatch Minimum match length for cross_match matching (14).
-cut Number of repeats allowed before this kmer is ignored (10000).
-seeds Number of kmer matches required to flag a hit (5).
-depth Number of hits to consider for alignment (50).
-memory: Memory assigned in MBs for the alignment matrix (200).
-score Minimum score for match to be reported (30).
-identity Minimum identity for match to be reported (50.000000).
-port Port number for server (60000).
-align If set to > 0, output graphical alignment (0).
If set to 2 and -solexa flag is set: output also quality score.
-edge Augment hit by this many bases before alignment (200).
-array: Memory assigned in bytes for frequency arrays (4000000).
-start: first sequence to process in query (0).
-end: last sequence to process in query, 0 means process all (0).
-sense Allow really patchy hits to go for alignment (0).
-best If set to 1, only report the best alignment for each
match, if multiple best scores report all (0).
-454: Tune for 454 reads if set to 1, otherwise tune for ABI reads (0).
-NQS: Use NQS to filter SNPs if set to 1, otherwise output all candidates
(1).
-quality: Quality value to use for variation base in NQS (23).
-tags: If set to 1, prefix added to output summary lines to
aid parsing, the prefix depends upon the chosen
output format, e.g. if output is ssaha2 then the
prefix is ALIGNMENT (1).
-output: ssaha2 - original ssaha2 line only (default)
sugar - Simple UnGapped Alignment Report
cigar - Compact Idiosyncratic Gapped Alignment Report
vulgar - Verbose Useful Labelled Gapped Alignment Report
psl - Tab separated format similar to BLT
- http://genome.ucsc.edu/goldenPath/help/customTrack.ht
ml
pslx - Tab separated format with sequence
gff - http://www.sanger.ac.uk/Software/formats/GFF/
ssaha2 cigar - alternate between ssaha2 and cigar format lines
for a full description of output formats see:
http://www.sanger.ac.uk/Software/analysis/SSAHA2/formats.shtml
-name Flag that modifies option '-output cigar' such that read name
and length are also reported when there was no hit found.
-diff: Output all hits within diff of the best (-1).
-udiff: Ignore best hit if second best score within udiff (0).
-fix: If set to 1, fix -edge, -seeds, -score so that they
are not updated according to read length in ssahaSNP (0).
-disk: If set to 1, read hashtable from disk rather than
loading to memory (0)
-weight: If >0, apply this much weighting to rare kmers (0).
-solexa: implies (ssaha2 and ssahaSNP only):
-seeds 2 -score 12 -sense 1 -cmatch 10 -ckmer 6 -skip 1.
Top scoring hits with lower quality at the mismatch positions have
their Smith-Waterman score incremented by 1. Mapping scores are
changed accordingly. SsahaSNP reports in such cases only the top
scoring hit (no Repeat lines).