razers3 - Faster, fully sensitive read mapping
Contents
Description
RazerS 3 is a versatile full-sensitive read mapper based on k-mer counting and seeding filters. It
supports single and paired-end mapping, shared-memory parallelism, and optimally parametrizes the filter
based on a user-defined minimal sensitivity. See http://www.seqan.de/projects/razers for more
information.
Input to RazerS 3 is a reference genome file and either one file with single-end reads or two files
containing left or right mates of paired-end reads. Use - to read single-end reads from stdin.
(c) Copyright 2009-2014 by David Weese.
Examples
razers3-i96-tc12-omapped.razershg18.fareads.fq
Map single-end reads with 4% error rate using 12 threads.
razers3-i95-no-gaps-omapped.razershg18.fareads.fq.gz
Map single-end gzipped reads with 5% error rate and no indels.
razers3-i94-rr95-tc12-ll280--le80-omapped.razershg18.fareads_1.fqreads_2.fq
Map paired-end reads with up to 6% errors, 95% sensitivity, 12 threads, and only output aligned
pairs with an outer distance of 200-360bp.
razers3 3.5.8 [tarball] RAZERS3(1)
Formats, Naming, Sorting, And Coordinate Schemes
RazerS 3 supports various output formats. The output format is detected automatically from the file name
suffix.
.razers
Razer format
.fa, .fasta
Enhanced Fasta format
.eland Eland format
.gff GFF format
.sam SAM format
.bam BAM format
.afg Amos AFG format
By default, reads and contigs are referred by their Fasta ids given in the input files. With the
-gn and -rn options this behaviour can be changed:
0 Use Fasta id.
1 Enumerate beginning with 1.
2 Use the read sequence (only for short reads!).
3 Use the Fasta id, do NOT append /L or /R for mate pairs.
The way matches are sorted in the output file can be changed with the -so option for the following
formats: razers, fasta, sam, and afg. Primary and secondary sort keys are:
0 1. read number, 2. genome position
1 1. genome position, 2. read number
The coordinate space used for begin and end positions can be changed with the -pf option for the
razer and fasta formats:
0 Gap space. Gaps between characters are counted from 0.
1 Position space. Characters are counted from 1.
Name
razers3 - Faster, fully sensitive read mapping
Options
-h, --help
Display the help message.
--version
Display version information.
MainOptions:-i, --percent-identityDOUBLE
Percent identity threshold. In range [50..100]. Default: 95.
-rr, --recognition-rateDOUBLE
Percent recognition rate. In range [80..100]. Default: 100.
-ng, --no-gaps
Allow only mismatches, no indels. Default: allow both.
-f, --forward
Map reads only to forward strands.
-r, --reverse
Map reads only to reverse strands.
-m, --max-hitsINTEGER
Output only <NUM> of the best hits. In range [1..inf]. Default: 100.
--unique
Output only unique best matches (-m 1 -dr 0 -pa).
-tr, --trim-readsINTEGER
Trim reads to given length. Default: off. In range [14..inf].
-o, --outputOUTPUT_FILE
Mapping result filename (use - to dump to stdout in razers format). Default: <READSFILE>.razers.
Valid filetypes are: .sam, .razers, .gff, .fasta, .fa, .eland, .bam, and .afg.
-v, --verbose
Verbose mode.
-vv, --vverbose
Very verbose mode.
Paired-endOptions:-ll, --library-lengthINTEGER
Paired-end library length. In range [1..inf]. Default: 220.
-le, --library-errorINTEGER
Paired-end library length tolerance. In range [0..inf]. Default: 50.
OutputFormatOptions:-a, --alignment
Dump the alignment for each match (only razer or fasta format).
-pa, --purge-ambiguous
Purge reads with more than <max-hits> best matches.
-dr, --distance-rangeINTEGER
Only consider matches with at most NUM more errors compared to the best. Default: output all.
-gn, --genome-namingINTEGER
Select how genomes are named (see Naming section below). In range [0..1]. Default: 0.
-rn, --read-namingINTEGER
Select how reads are named (see Naming section below). In range [0..3]. Default: 0.
--full-readid
Use the whole read id (don't clip after whitespace).
-so, --sort-orderINTEGER
Select how matches are sorted (see Sorting section below). In range [0..1]. Default: 0.
-pf, --position-formatINTEGER
Select begin/end position numbering (see Coordinate section below). In range [0..1]. Default: 0.
-ds, --dont-shrink-alignments
Disable alignment shrinking in SAM. This is required for generating a gold mapping for Rabema.
FiltrationOptions:-fl, --filterSTRING
Select k-mer filter. One of pigeonhole and swift. Default: pigeonhole.
-mr, --mutation-rateDOUBLE
Set the percent mutation rate (pigeonhole). In range [0..20]. Default: 5.
-ol, --overlap-lengthINTEGER
Manually set the overlap length of adjacent k-mers (pigeonhole). In range [0..inf].
-pd, --param-dirSTRING
Read user-computed parameter files in the directory <DIR> (swift).
-t, --thresholdINTEGER
Manually set minimum k-mer count threshold (swift). In range [1..inf].
-tl, --taboo-lengthINTEGER
Set taboo length (swift). In range [1..inf]. Default: 1.
-s, --shapeSTRING
Manually set k-mer shape.
-oc, --overabundance-cutINTEGER
Set k-mer overabundance cut ratio. In range [0..1]. Default: 1.
-rl, --repeat-lengthINTEGER
Skip simple-repeats of length <NUM>. In range [1..inf]. Default: 1000.
-lf, --load-factorDOUBLE
Set the load factor for the open addressing k-mer index. In range [1..inf]. Default: 1.6.
VerificationOptions:-mN, --match-N
N matches all other characters. Default: N matches nothing.
-ed, --error-distrSTRING
Write error distribution to FILE.
-mf, --mismatch-fileSTRING
Write mismatch patterns to FILE.
MiscOptions:-cm, --compact-multDOUBLE
Multiply compaction threshold by this value after reaching and compacting. In range [0..inf].
Default: 2.2.
-ncf, --no-compact-fracDOUBLE
Don't compact if in this last fraction of genome. In range [0..1]. Default: 0.05.
ParallelismOptions:-tc, --thread-countINTEGER
Set the number of threads to use (0 to force sequential mode). In range [0..inf]. Default: 1.
-pws, --parallel-window-sizeINTEGER
Collect candidates in windows of this length. In range [1..inf]. Default: 500000.
-pvs, --parallel-verification-sizeINTEGER
Verify candidates in packages of this size. In range [1..inf]. Default: 100.
-pvmpc, --parallel-verification-max-package-countINTEGER
Largest number of packages to create for verification per thread-1. In range [1..inf]. Default:
100.
-amms, --available-matches-memory-sizeINTEGER
Bytes of main memory available for storing matches. In range [-1..inf]. Default: 0.
-mhst, --match-histo-start-thresholdINTEGER
When to start histogram. In range [1..inf]. Default: 5.
Required Arguments
ARGUMENT0INPUT_FILE
A reference genome file. Valid filetypes are: .sam[.*], .raw[.*], .gbk[.*], .frn[.*], .fq[.*],
.fna[.*], .ffn[.*], .fastq[.*], .fasta[.*], .faa[.*], .fa[.*], .embl[.*], and .bam, where * is any
of the following extensions: gz, bz2, and bgzf for transparent (de)compression.
READS List of INPUT_FILE's
Either one (single-end) or two (paired-end) read files. Valid filetypes are: .sam[.*], .raw[.*],
.gbk[.*], .frn[.*], .fq[.*], .fna[.*], .ffn[.*], .fastq[.*], .fasta[.*], .faa[.*], .fa[.*],
.embl[.*], and .bam, where * is any of the following extensions: gz, bz2, and bgzf for transparent
(de)compression.
Synopsis
razers3 [OPTIONS] <GENOMEFILE> <READSFILE>
razers3 [OPTIONS] <GENOMEFILE> <PE-READSFILE1> <PE-READSFILE2>
