Software Applications
Resources
- Overview
- Bioinformatics
- Computational Chemistry
- Computational Physics
- Statistics
- Database
- Others
- Data Applications
- Utilities
- All Software Applications
All Software Applications
For a complete list of all applications on RCC service, please check the complete list page.
Software Applications
snowhite
Category: Bioinformatics
Program on
zcluster
Version
1.1.4
Author / Distributor
Citing SnoWhite
For SnoWhite:
Dlugosch KM, Rieseberg LH. SnoWhite: A pipeline for aggressive cleaning of next-generation sequence reads. In prep.
If you use the TagDust option, you should ALSO cite:
Lassmann T, Hayashizaki Y, Daub CO. 2009. TagDust - A program to eliminate artifacts from next generation sequencing data. Bioinformatics 25: 2839-2840.
Description
A cleaning pipeline for next-generation cDNA sequences, more details at snowhite
Running Program
Also refer to submit jobs to queues
/usr/local/snowhite/latest/ is pointed to the latest update version.
Version 1.1.4 is at /usr/local/snowhite/1.1.4
Example running this at queue, shell script sub.sh
#!/bin/bash
cd working_directory
time perl /usr/local/snowhite/latest/snowhite_1.1.4.pl [options]
qsub-q queueName ./sub.sh
Documentation
perl /usr/local/snowhite/latest/snowhite_1.1.4.pl -help
Input <f>
Usage Error:
Run: perl snowhite_1.1.4.pl [OPTIONS]
OPTIONS =
Files:
-f: <FILENAME> fasta sequences (specify path if needed)
-q: <FILENAME> quality file (optional)
-v: <FILENAME> vector/primer/adapter file (optional)
-o: <FILENAME> name for new output folder and file prefixes (default = sequence input filename)
Adapter clipping:
-c: <integer> number of bases to clip off the front of all sequences (default = 0)
-C: <3/5/B/FILENAME/> clip at 3', 5', Both, or according to sequences in FILENAME (default = 5)
SeqClean step:
-m: <integer> minimum sequence length for cleaned reads (default = 50bp, applies to all steps)
-x: <T/F> discard reads with internal vector/primer contaminants? (default = F)
-p: <integer> processor number (optional, default = 1)
Terminal poly trimming (e.g. 3'AAAAAAAAAACGATTAG...):
-l: <integer> minimum length of terminal A/T repeat (min = 1, default = 6)
-a: <3/5/B> poly A at 3', 5', or Both ends (default = 3)
-t: <3/5/B> poly T at 3', 5', or Both ends (default = 5)
Terminal poly trimming inside of cap (e.g. 3'CGAAAAAAAAAAAACGATTAG...):
-b: <integer> number of terminal bases to look beyond for start of terminal poly A/T (default = 0)
-r: <integer> minimum length of A/T repeat inside of -b to consider as poly A/T (min = 2, default = 10)
Internal poly trimming (e.g. 3'...CCGTATAGGAAAAAAAAAAAAAAAAAAAACGATTAGGG...5'):
-i: <integer> minimum length of internal poly A/T repeat to consider as poly A/T (default = 100bp, extreme case)
-k: <T/F> keep the longer end of sequence broken by a single internal polyA/T (default = F)
General poly trimming settings:
-n: <T/F> interpret Ns within A/T repeats as As or Ts (default = T)
-s: <T/F> ignore single alternative bases within A/T repeats (default = T)
TagDust step:
-e: <T/F> execute TagDust, assuming primer/adapter (-v) file is provided (default = F)
-d: <decimal> false discovery rate (default = 0.01)
Installation
source code from snowhite
System(s)
Unix
