logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

mason_frag_sequencing - Fragment Sequencing Simulation

Description

       Given a FASTA file with fragments, simulate sequencing thereof.

       This program is a more lightweight version of mason_sequencing without support for the application of VCF
       and  fragment  sampling.   Output  of  SAM is also not available.  However, it uses the same code for the
       simulation of the reads as the more powerful mason_simulator.

       You can use mason_frag_sequencing if you want to implement you rown fragmentation behaviour, e.g. if  you
       have implemented your own bias models.

Name

       mason_frag_sequencing - Fragment Sequencing Simulation

Options

-h, --help
              Display the help message.

       --version
              Display version information.

       -q, --quiet
              Low verbosity.

       -v, --verbose
              Higher verbosity.

       -vv, --very-verbose
              Highest verbosity.

       --seedINTEGER
              Seed to use for random number generator. Default: 0.

       -i, --inINPUT_FILE
              Path  to  input  file.  Valid  filetypes  are:  .sam[.*],  .raw[.*],  .gbk[.*], .frn[.*], .fq[.*],
              .fna[.*], .ffn[.*], .fastq[.*], .fasta[.*], .faa[.*], .fa[.*], .embl[.*], and .bam, where * is any
              of the following extensions: gz, bz2, and bgzf for transparent (de)compression.

       -o, --outOUTPUT_FILE
              Output of single-end/left end reads. Valid filetypes are: .sam[.*], .raw[.*],  .frn[.*],  .fq[.*],
              .fna[.*],  .ffn[.*],  .fastq[.*],  .fasta[.*],  .faa[.*], .fa[.*], and .bam, where * is any of the
              following extensions: gz, bz2, and bgzf for transparent (de)compression.

       -or, --out-rightOUTPUT_FILE
              Output of right reads.  Giving this options enables paired-end simulation.  Valid  filetypes  are:
              .sam[.*],  .raw[.*],  .frn[.*],  .fq[.*],  .fna[.*],  .ffn[.*],  .fastq[.*], .fasta[.*], .faa[.*],
              .fa[.*], and .bam, where * is any of the following extensions: gz, bz2, and bgzf  for  transparent
              (de)compression.

       --force-single-end
              Force single-end simulation although --out-right is given.

   GlobalReadSimulationOptions:--seq-technologySTRING
              Set sequencing technology to simulate. One of illumina, 454, and sanger. Default: illumina.

       --seq-mate-orientationSTRING
              Orientation  for  paired  reads.   See section Read Orientation below. One of FR, RF, FF, and FF2.
              Default: FR.

       --seq-strandsSTRING
              Strands to simulate from, only  applicable  to  paired  sequencing  simulation.  One  of  forward,
              reverse, and both. Default: both.

       --embed-read-info
              Whether or not to embed read information.

       --read-name-prefixSTRING
              Read names will have this prefix. Default: simulated..

   BS-SeqOptions:--enable-bs-seq
              Enable BS-seq simulation.

       --bs-seq-protocolSTRING
              Protocol to use for BS-Seq simulation. One of directional and undirectional. Default: directional.

       --bs-seq-conversion-rateDOUBLE
              Conversion rate for unmethylated Cs to become Ts. In range [0..1]. Default: 0.99.

   IlluminaOptions:--illumina-read-lengthINTEGER
              Read length for Illumina simulation. In range [1..inf]. Default: 100.

       --illumina-error-profile-fileINPUT_FILE
              Path  to  file  with  Illumina  error  profile.   The file must be a text file with floating point
              numbers separated by space, each giving a positional error rate. Valid filetype is: .txt.

       --illumina-prob-insertDOUBLE
              Insert per-base probability for insertion  in  Illumina  sequencing.  In  range  [0..1].  Default:
              0.00005.

       --illumina-prob-deletionDOUBLE
              Insert  per-base  probability  for  deletion  in  Illumina  sequencing.  In range [0..1]. Default:
              0.00005.

       --illumina-prob-mismatch-scaleDOUBLE
              Scaling factor for Illumina mismatch probability. In range [0..inf]. Default: 1.0.

       --illumina-prob-mismatchDOUBLE
              Average per-base mismatch probability in Illumina sequencing. In range [0.0..1.0]. Default: 0.004.

       --illumina-prob-mismatch-beginDOUBLE
              Per-base mismatch probability of first base in Illumina sequencing. In range [0.0..1.0].  Default:
              0.002.

       --illumina-prob-mismatch-endDOUBLE
              Per-base  mismatch  probability of last base in Illumina sequencing. In range [0.0..1.0]. Default:
              0.012.

       --illumina-position-raiseDOUBLE
              Point where the error curve raises in relation to read length. In range [0.0..1.0]. Default: 0.66.

       --illumina-quality-mean-beginDOUBLE
              Mean PHRED quality for non-mismatch bases of first base in Illumina sequencing. Default: 40.0.

       --illumina-quality-mean-endDOUBLE
              Mean PHRED quality for non-mismatch bases of last base in Illumina sequencing. Default: 39.5.

       --illumina-quality-stddev-beginDOUBLE
              Standard deviation of PHRED quality for non-mismatch bases of first base in  Illumina  sequencing.
              Default: 0.05.

       --illumina-quality-stddev-endDOUBLE
              Standard  deviation  of  PHRED quality for non-mismatch bases of last base in Illumina sequencing.
              Default: 10.0.

       --illumina-mismatch-quality-mean-beginDOUBLE
              Mean PHRED quality for mismatch bases of first base in Illumina sequencing. Default: 40.0.

       --illumina-mismatch-quality-mean-endDOUBLE
              Mean PHRED quality for mismatch bases of last base in Illumina sequencing. Default: 30.0.

       --illumina-mismatch-quality-stddev-beginDOUBLE
              Standard deviation of PHRED quality for mismatch bases  of  first  base  in  Illumina  sequencing.
              Default: 3.0.

       --illumina-mismatch-quality-stddev-endDOUBLE
              Standard  deviation  of  PHRED  quality  for  mismatch  bases of last base in Illumina sequencing.
              Default: 15.0.

       --illumina-left-template-fastqINPUT_FILE
              FASTQ file to use for a template for left-end reads.  Valid  filetypes  are:  .sam[.*],  .raw[.*],
              .gbk[.*],  .frn[.*],  .fq[.*],  .fna[.*],  .ffn[.*],  .fastq[.*],  .fasta[.*],  .faa[.*], .fa[.*],
              .embl[.*], and .bam, where * is any of the following extensions: gz, bz2, and bgzf for transparent
              (de)compression.

       --illumina-right-template-fastqINPUT_FILE
              FASTQ file to use for a template for right-end reads. Valid  filetypes  are:  .sam[.*],  .raw[.*],
              .gbk[.*],  .frn[.*],  .fq[.*],  .fna[.*],  .ffn[.*],  .fastq[.*],  .fasta[.*],  .faa[.*], .fa[.*],
              .embl[.*], and .bam, where * is any of the following extensions: gz, bz2, and bgzf for transparent
              (de)compression.

   SangerSequencingOptions:--sanger-read-length-modelSTRING
              The model to use for sampling the Sanger read length. One of normal and uniform. Default: normal.

       --sanger-read-length-minINTEGER
              The minimal read length when the read length is sampled uniformly.  In  range  [0..inf].  Default:
              400.

       --sanger-read-length-maxINTEGER
              The  maximal  read  length  when the read length is sampled uniformly. In range [0..inf]. Default:
              600.

       --sanger-read-length-meanDOUBLE
              The mean read length when the read length is sampled with normal distribution. In range  [0..inf].
              Default: 400.

       --sanger-read-length-errorDOUBLE
              The  read  length standard deviation when the read length is sampled uniformly. In range [0..inf].
              Default: 40.

       --sanger-prob-mismatch-scaleDOUBLE
              Scaling factor for Sanger mismatch probability. In range [0..inf]. Default: 1.0.

       --sanger-prob-mismatch-beginDOUBLE
              Per-base mismatch probability of first base in Sanger sequencing. In  range  [0.0..1.0].  Default:
              0.005.

       --sanger-prob-mismatch-endDOUBLE
              Per-base  mismatch  probability  of  last base in Sanger sequencing. In range [0.0..1.0]. Default:
              0.001.

       --sanger-prob-insertion-beginDOUBLE
              Per-base insertion probability of first base in Sanger sequencing. In range  [0.0..1.0].  Default:
              0.0025.

       --sanger-prob-insertion-endDOUBLE
              Per-base  insertion  probability  of last base in Sanger sequencing. In range [0.0..1.0]. Default:
              0.005.

       --sanger-prob-deletion-beginDOUBLE
              Per-base deletion probability of first base in Sanger sequencing. In  range  [0.0..1.0].  Default:
              0.0025.

       --sanger-prob-deletion-endDOUBLE
              Per-base  deletion  probability  of  last base in Sanger sequencing. In range [0.0..1.0]. Default:
              0.005.

       --sanger-quality-match-start-meanDOUBLE
              Mean PHRED quality for non-mismatch bases of first base in Sanger sequencing. Default: 40.0.

       --sanger-quality-match-end-meanDOUBLE
              Mean PHRED quality for non-mismatch bases of last base in Sanger sequencing. Default: 39.5.

       --sanger-quality-match-start-stddevDOUBLE
              Mean PHRED quality for non-mismatch bases of first base in Sanger sequencing. Default: 0.1.

       --sanger-quality-match-end-stddevDOUBLE
              Mean PHRED quality for non-mismatch bases of last base in Sanger sequencing. Default: 2.

       --sanger-quality-error-start-meanDOUBLE
              Mean PHRED quality for erroneous bases of first base in Sanger sequencing. Default: 30.

       --sanger-quality-error-end-meanDOUBLE
              Mean PHRED quality for erroneous bases of last base in Sanger sequencing. Default: 20.

       --sanger-quality-error-start-stddevDOUBLE
              Mean PHRED quality for erroneous bases of first base in Sanger sequencing. Default: 2.

       --sanger-quality-error-end-stddevDOUBLE
              Mean PHRED quality for erroneous bases of last base in Sanger sequencing. Default: 5.

   454SequencingOptions:--454-read-length-modelSTRING
              The model to use for sampling the 454 read length. One of normal and uniform. Default: normal.

       --454-read-length-minINTEGER
              The minimal read length when the read length is sampled uniformly. In range [0..inf]. Default: 10.

       --454-read-length-maxINTEGER
              The maximal read length when the read length is sampled uniformly.  In  range  [0..inf].  Default:
              600.

       --454-read-length-meanDOUBLE
              The  mean read length when the read length is sampled with normal distribution. In range [0..inf].
              Default: 400.

       --454-read-length-stddevDOUBLE
              The read length standard deviation when the read length is sampled with  normal  distribution.  In
              range [0..inf]. Default: 40.

       --454-no-sqrt-in-std-dev
              For error model, if set then (sigma = k * r)) is used, otherwise (sigma = k * sqrt(r)).

       --454-proportionality-factorDOUBLE
              Proportionality  factor for calculating the standard deviation proportional to the read length. In
              range [0..inf]. Default: 0.15.

       --454-background-noise-meanDOUBLE
              Mean of lognormal distribution to use for the noise. In range [0..inf]. Default: 0.23.

       --454-background-noise-stddevDOUBLE
              Standard deviation of lognormal distribution to use for the noise.  In  range  [0..inf].  Default:
              0.15.

Read Orientation

       You can use the --mate-orientation to set the relative orientation when doing paired-end sequencing.  The
       valid values are given in the following.

       FR     Reads are inward-facing, the same as Illumina paired-end reads: R1 --> <-- R2.

       RF     Reads are outward-facing, the same as Illumina mate-pair reads: R1 <-- --> R2.

       FF     Reads are on the same strand: R1 --> --> R2.

       FF2    Reads are on the same strand but the "right" reads are sequenced to the left of the "left"  reads,
              same as 454 paired: R2 --> --> R1.

mason_frag_sequencing 2.0.9 [tarball]                                                   MASON_FRAG_SEQUENCING(1)

Sequencing Simulation

       Simulation  of  base  qualities  is  disabled  when  writing  out  FASTA files.  Simulation of paired-end
       sequencing is enabled when specifying two output files.

Synopsis

mason_frag_sequencing [OPTIONS] -iIN.fa-oOUT.{fa,fq} [-orOUT2.{fa,fq}]

See Also