logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

srf2fastq - Converts SRF files to Sanger fastq format

Author

       James Bonfield, Steven Leonard - Wellcome Trust Sanger Institute

                                                   December 10                                      srf2fastq(1)

Description

srf2fastq  extracts sequences and qualities from one or more SRF archives and writes them in Sanger fastq
       format to stdout.

       Note that Illumina also have a fastq format (used in the GERALD directories) which  differs  slightly  in
       the  use  of  log-odds  scores for the quality values. The format described here is using the traditional
       Phred style of quality encoding.

Examples

       To extract only the good quality sequences from all srf files in the current directory  using  calibrated
       confidence values (if available).

           srf2fastq -c -C *.srf > runX.fastq

       To extract a paired end run into two separate files with sequences named name/1 and name/2.

           srf2fastq -s runX -a -n runX.srf

       To  extract a paired end run as a single file, alternating forward and reverse sequences, with the second
       read being reverse complemented.

           srf2fastq -S -r 2 runX.srf > runX.fastq

Name

srf2fastq - Converts SRF files to Sanger fastq format

Options

-c     Outputs calibrated confidence values using the ZTR CNF1 chunk type for a single quality per  base.
              Without  this use the original Illumina _prb.txt files consisting of four quality values per base,
              stored in the ZTR CNF4 chunks.

       -C     Masks out sequences tagged as bad quality.

       -sroot
              Generates files on disk with filenames starting root, one file per  non-explicit  element  in  the
              SRF/ZTR region (REGN) chunk. Typically this results in two files for paired end runs. The filename
              suffixes  come  from the names listed in the SRF region chunks.  This option conflicts with the -S
              parameter.

       -S     Splits sequences into regions, but sequentially lists each sequence region to  stdout  instead  of
              splitting to separate files on disk. This option conflicts with the -s parameter.

       -n     When  using  -s  the  filename suffixes are simply numbered (starting with 1) instead of using the
              names listed in the SRF region chunks.

       -a     Appends region index to the sequence names. Ie generate "name/1" and "name/2" for a paired read.

       -e     Include any explicit sequence (ZTR region chunk of type 'E') in the sequence output. The  explicit
              sequence  is  also  included  in  the quality line too. Currently this is utilised by ABI SOLiD to
              store the last base of the primer.

       -rregionlist
              Reverse complements the sequence and reverses the quality values for all  regions  in  the  regionlist.  This  is a comma separated list of integer values enumerating the regions, starting from 1.
              Note that this option only works when either -s or -S are specified.

Synopsis

srf2fastq  [options] srf_archive ...

See Also