art_illumina - Simulation of Illumina sequencers

Author

       This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage
       of the program.

art_illumina 3.19.15                              February 2016                                  ART_ILLUMINA(1)

Description

       ART  is  a  set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates
       sequencing reads by mimicking real sequencing process with empirical error  models  or  quality  profiles
       summarized from large recalibrated sequencing data.

       art_illumina can be used for Simulation of Illumina sequencers

Examples

       1) single-end read simulation

              art_illumina -sam-i reference.fa -l 150 -ss HS25 -f 10 -o single_dat

       2) paired-end read simulation

              art_illumina -sam-i reference.fa -p-l 150 -ss HS25 -f 20 -m 200 -s 10 -o paired_dat

       3) mate-pair read simulation

              art_illumina -sam-i reference.fa -mp-l 50 -f 20 -m 2500 -s 50 -o matepair_dat

       4) amplicon sequencing simulation with 5' end single-end reads

              art_illumina -amp-sam-na-i amp_reference.fa -l 50 -f 10 -o amplicon_5end_dat

       5) amplicon sequencing simulation with paired-end reads

              art_illumina -amp-p-sam-na-i amp_reference.fa -l 50 -f 10 -o amplicon_pair_dat

       6) amplicon sequencing simulation with matepair reads

              art_illumina -amp-mp-sam-na-i amp_reference.fa -l 50 -f 10 -o amplicon_mate_dat

       7) generate an extra SAM file with zero-sequencing errors for a paired-end read simulation

              art_illumina -ef-i reference.fa -p-l 50 -f 20 -m 200 -s 10 -o paired_twosam_dat

       8) reduce the substitution error rate to one 10th of the default profile

              art_illumina -i reference.fa -qs 10 -qs2 10 -l 50 -f 10 -p-m 500 -s 10 -sam-o reduce_error

       9) turn off the masking of genomic regions with unknown nucleotides 'N'

              art_illumina -nf 0 -sam-i reference.fa -p-l 50 -f 20 -m 200 -s 10 -o paired_nomask

       10) masking genomic regions with >=5 'N's within the read length 50

              art_illumina -nf 5 -sam-i reference.fa -p-l 50 -f 20 -m 200 -s 10 -o paired_maskN5

Name

       art_illumina - Simulation of Illumina sequencers

Notes

       *  ART by default selects a built-in quality score profile according to the read length specified for the
       run.

       * For single-end simulation, ART requires input sequence file, outputfile prefix, read length,  and  read
       count/fold coverage.

       * For paired-end simulation (except for amplicon sequencing), ART also requires the parameter values of

              the mean and standard deviation of DNA/RNA fragment lengths

Options

-1--qprof1
              the first-read quality profile

       -2--qprof2
              the second-read quality profile

       -amp--amplicon amplicon sequencing simulation

       -c--rcount
              total number of reads/read pairs to be generated [per amplicon if for amplicon simulation](not  be
              used together with -f/--fcov)

       -d--id
              the prefix identification tag for read ID

       -ef--errfree
              indicate to generate the zero sequencing errors SAM file as well the regular one

              NOTE:  the  reads  in  the  zero-error  SAM file have the same alignment positions as those in the
              regular SAM file, but have no sequencing errors

       -f--fcov
              the fold of read coverage to be simulated  or  number  of  reads/read  pairs  generated  for  each
              amplicon

       -h--help
              print out usage information

       -i--in
              the filename of input DNA/RNA reference

       -ir--insRate
              the first-read insertion rate (default: 0.00009)

       -ir2--insRate2 the second-read insertion rate (default: 0.00015)

       -dr--delRate
              the first-read deletion rate (default:  0.00011)

       -dr2--delRate2 the second-read deletion rate (default: 0.00023)

       -l--len
              the length of reads to be simulated

       -m--mflen
              the mean size of DNA/RNA fragments for paired-end simulations

       -mp--matepair indicate a mate-pair read simulation

       -nf--maskN
              the cutoff frequency of 'N' in a window size of the read length for masking genomic regions

              NOTE: default: '-nf 1' to mask all regions with 'N'. Use '-nf 0' to turn off masking

       -na--noALN
              do not output ALN alignment file

       -o--out
              the prefix of output filename

       -p--paired
              indicate a paired-end read simulation or to generate reads from both ends of amplicons

              NOTE:  art  will automatically switch to a mate-pair simulation if the given mean fragment size >=
              2000

       -q--quiet
              turn off end of run summary

       -qs--qShift
              the amount to shift every first-read quality score by

       -qs2--qShift2
              the amount to shift every second-read quality score by

              NOTE: For -qs/-qs2 option, a positive number will shift up quality scores (the  max  is  93)  that
              reduce  substitution  sequencing  errors and a negative number will shift down quality scores that
              increase sequencing errors. If shifting scores by x, the error rate will be 1/(10^(x/10))  of  the
              default profile.

       -rs--rndSeed
              the seed for random number generator (default: system time in second)

              NOTE: using a fixed seed to generate two identical datasets from different runs

       -s--sdev
              the standard deviation of DNA/RNA fragment size for paired-end simulations.

       -sam--samout
              indicate to generate SAM alignment file

       -sp--sepProf
              indicate to use separate quality profiles for different bases (ATGC)

       -ss--seqSys
              The name of Illumina sequencing system of the built-in profile used for simulation

              NOTE: sequencing system id names are:

       GA1 - Genome Analyzer I, GA2 - Genome Analyzer II

       HS10 - HiSeq 1000, HS20 - HiSeq 2000, HS25 - HiSeq 2500, MS - MiSeq

       -M--cigarM
              indicate to use CIGAR 'M' instead of '=/X' for alignment match/mismatch

Usage

art_illumina [options] -sam-i <seq_ref_file> -l <read_length> -f <fold_coverage> -ss <sequencing_system>
       -o <outfile_prefix>

       art_illumina [options] -sam-i <seq_ref_file> -l <read_length> -f <fold_coverage> -o <outfile_prefix>

       art_illumina [options] -sam-i <seq_ref_file> -l <read_length> -c <total_num_reads> -o <outfile_prefix>

       art_illumina  [options]  -sam-i <seq_ref_file> -l <read_length> -f <fold_coverage> -m <mean_fragsize> -s
       <std_fragsize> -o <outfile_prefix>

       art_illumina [options] -sam-i <seq_ref_file> -l <read_length> -c <total_num_reads> -m <mean_fragsize> -s
       <std_fragsize> -o <outfile_prefix>

art_illumina - Simulation of Illumina sequencers

Contents

Author

Description

Examples

Name

Notes

Options

Usage

See Also