logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

QTLtools - A complete tool set for molecular QTL discovery and analysis

Authors

       Olivier Delaneau (olivier.delaneau@gmail.com), Halit Ongen (halitongen@gmail.com)

QTLtools-v1.3                                      06 May 2020                                       QTLtools(1)

Bugs

       o Versions  up  to  and  including  1.2, suffer from a bug in reading missing genotypes in VCF/BCF files.
         This bug affects variants with a DS field in their genotype's FORMAT and have a  missing  genotype  (DS
         field  is  .)  in  one  of the samples, in which case genotypes for all the samples are set to missing,
         effectively removing this variant from the analyses.  Affected modes: cis,  correct,  gwas,  pca,  rep,
         trans, rtc-union

       Please submit bugs to <https://github.com/qtltools/qtltools>

Citations

       Delaneau  O., Ongen H., Brown A. A., et al. A complete tool set for molecular QTL discovery and analysis.
       NatCommun8, 15452 (2017).  <https://doi.org/10.1038/ncomms15452>

       Ongen H, Brown A. A., Delaneau O., et al. Estimating the causal tissues for complex traits and  diseases.
       NatGenet. 2017;49(12):1676-1683. doi:10.1038/ng.3981 <https://doi.org/10.1038/ng.3981>

       Fort  A.,  Panousis  N.  I.,  Garieri  M.,  et  al.  MBV: a method to solve sample mislabeling and detect
       technical bias in large combined genotype and sequencing  assay  datasets,  Bioinformatics33(12),  1895
       2017.  <https://doi.org/10.1093/bioinformatics/btx074>

Description

       QTLtools  is  a complete tool set for molecular QTL discovery and analysis that is fast, user and cluster
       friendly.  QTLtools performs multiple key tasks such as  checking  the  quality  of  the  sequence  data,
       checking  that  sequence and genotype data match, quantifying and stratifying individuals using molecular
       phenotypes, discovering proximal or distal molQTLs and integrating them with  functional  annotations  or
       GWAS  data,  and  analyzing  allele  specific expression.  It utilizes HTSlib <http://www.htslib.org/> to
       quickly and efficiently handle common genomics files types like VCF, BCF, BAM, SAM, CRAM, BED,  and  GTF,
       and the Eigen C++ library <http://eigen.tuxfamily.org/> for fast linear algebra.

Example Files

       exons.50percent.chr22.bed.gz  <http://jungle.unige.ch/QTLtools_examples/exons.50percent.chr22.bed.gz>
       exons.50percent.chr22.bed.gz.tbi   <http://jungle.unige.ch/QTLtools_examples/exons.50percent.chr22.bed.gz.tbi>
       gencode.v19.annotation.chr22.gtf.gz     <http://jungle.unige.ch/QTLtools_examples/gencode.v19.annotation.chr22.gtf.gz>
       gencode.v19.exon.chr22.bed.gz <http://jungle.unige.ch/QTLtools_examples/gencode.v19.exon.chr22.bed.gz>
       genes.50percent.chr22.bed.gz  <http://jungle.unige.ch/QTLtools_examples/genes.50percent.chr22.bed.gz>
       genes.50percent.chr22.bed.gz.tbi   <http://jungle.unige.ch/QTLtools_examples/genes.50percent.chr22.bed.gz.tbi>
       genes.covariates.pc50.txt.gz  <http://jungle.unige.ch/QTLtools_examples/genes.covariates.pc50.txt.gz>
       genes.simulated.chr22.bed.gz  <http://jungle.unige.ch/QTLtools_examples/genes.simulated.chr22.bed.gz>
       genes.simulated.chr22.bed.gz.tbi   <http://jungle.unige.ch/QTLtools_examples/genes.simulated.chr22.bed.gz.tbi>
       genotypes.chr22.vcf.gz   <http://jungle.unige.ch/QTLtools_examples/genotypes.chr22.vcf.gz>
       genotypes.chr22.vcf.gz.tbi    <http://jungle.unige.ch/QTLtools_examples/genotypes.chr22.vcf.gz.tbi>
       GWAS.b37.txt   <http://jungle.unige.ch/QTLtools_examples/GWAS.b37.txt>
       HG00381.chr22.bam   <http://jungle.unige.ch/QTLtools_examples/HG00381.chr22.bam>
       HG00381.chr22.bam.bai    <http://jungle.unige.ch/QTLtools_examples/HG00381.chr22.bam.bai>
       hotspots_b37_hg19.bed    <http://jungle.unige.ch/QTLtools_examples/hotspots_b37_hg19.bed>
       results.genes.full.txt.gz     <http://jungle.unige.ch/QTLtools_examples/results.genes.full.txt.gz>
       TFs.encode.bed.gz   <http://jungle.unige.ch/QTLtools_examples/TFs.encode.bed.gz>

File Formats

.bcf|.vcf|.vcf.gz
              These  files  are  used  for  genotype  data.   The  official  VCF  specification  is described at
              <https://samtools.github.io/hts-specs/VCFv4.2.pdf>.  The VCF/BCF files  used  with  QTLtools  must
              satisfy  this  spec's  requirements.   BCF  files  must  be  indexed  with  bcftoolsindexin.bcf
              <http://samtools.github.io/bcftools/bcftools.html>.  VCF  files  should  be  compressed  by  bgzip
              <http://www.htslib.org/doc/bgzip.html>    and    indexed    with    tabix-pvcfin.vcf.gz
              <http://www.htslib.org/doc/tabix.html>.

       .bed|.bed.gz
              These files are used for phenotype data, and in certain modes they can also be used with the --vcf
              option, which can be used to correlate two molecular phenotypes.  The format used for QTLtools  is
              a  custom  UCSC  BED  format  <https://genome.ucsc.edu/FAQ/FAQformat.html#format1>,  which  has  6
              annotation columns followed by sample columns.  The header line must exist, and must begin with  a
              #  and  columns  must  be  tab  separated.  THISISADIFFERENTFILEFORMATTHANTHEONEUSEDFORFASTQTL,THUSFASTQTLBEDFILESAREINCOMPATIBLEWITHQTLTOOLS.   Phenotype  BED  files  must  be
              compressed   by  bgzip  <http://www.htslib.org/doc/bgzip.html>  and  indexed  with  tabix-pbedin.bed.gz <http://www.htslib.org/doc/tabix.html>.  MissingvaluesmustbecodedasNA.   Following
              is an example BED file:

              #chr start     end  pid  gid  strand    sample1   sample2
              1    9999 10000     exon1     gene1     +    15   234
              1    9999 10000     exon2     gene1     +    11   134
              1    19999     20000     exon1     gene2     -    154  284
              1    19999     20000     exon2     gene2     -    112  301

              BED file's annotation columns' descriptions:
              1   Phenotype chromosome [string]
              2   Start position of the phenotype [integer, 0-based]
              3   End position of the phenotype [integer, 1-based]
              4   Phenotype ID [string]
              5   Phenotype group ID or any type of info about the phenotype [string]
              6   Phenotype strand [+/-]

       .bam|.sam|.cram
              These  files  are  used  for  sequence  data.   The  official  SAM  specification  is described at
              <https://samtools.github.io/hts-specs/SAMv1.pdf>.  The SAM/BAM/CRAM files used with QTLtools  must
              satisfy  this spec's requirements.  SAM/BAM/CRAM files must be indexed with samtoolsindexin.bam
              <http://www.htslib.org/doc/samtools.html>.

       .gtf   These  files  are  used  for  gene  annotation.   The   file   specification   is   described   at
              <https://www.ensembl.org/info/website/upload/gff.html>.   The GTF files used must comply with this
              spec, and should have  the  gene_id,  transcript_id,  gene_name,  gene_type,  and  trnascript_type
              attributes.  We recommend using gene annotations from GENCODE <https://www.gencodegenes.org/>.

       covariatefiles
              The  covariate  file contains the covariate data in simple text format.  ThemissingvaluesshouldbeencodedasNA.  Both quantitative  and  qualitative  covariates  are  supported.   Quantitative
              covariates  are assumed when only numeric values are provided.  Qualitative covariates are assumed
              when only non-numeric values are provided.  In practice, qualitative covariates with F factors are
              converted in F-1 binary covariates.  Following is an example a covariate file:

              id   sample1   sample2   sample3
              PC1  -0.02     0.14 0.16
              PC2  0.01 0.11 0.10
              PC3  0.03 0.05 0.07
              COV  A    B    C

       include/excludefiles
              The various --{include,exclude}-{sites,samples,phenotypes,covariates}  options  require  a  simple
              text  file  which  lists  the  IDs of the desired type, one ID per line.  The include options will
              result in running the analyses only in this subset of IDs, whereas  exclude  options  will  remove
              these  IDs  from  the  analyses.  The IDs for --{include,exclude}-sites refer to the 3rd column in
              VCF/BCF  files,  --{include,exclude}-covariates  refer  to  the   1st   column   in   COV   files,
              --{include,exclude}-phenotyps  refer  to the 4th column in BED files and when --grp-best option is
              used to the 5th column.  The --include-positions and --exclude-positions options  require  a  text
              file  which lists the chromosomes and positions (separated by a space) of genotypes to be excluded
              or included. One position per line.

Global Options

       QTLtools can read gzip, bgzip, and bzip2 files, and can output gzip and bzip2 files.  This  is  dependent
       on the input and output files' extension.  E.g --out output.txt.gz will write a gzipped file.

       The  following  are  common  options  that are used in all of the modes.  Some of these will not apply to
       certain modes.

       --help Produces a description of options for a given mode.

       --seedinteger
              Random seed for analyses that utilizes randomness.   Useful  for  generating  replicable  results.
              Default=15112011.

       --logfile
              Dump screen output to this file.

       --silent
              Disable screen output.

       --exclude-samplesfile
              List of samples to exclude.  One sample name per line.

       --include-samplesfile
              List of samples to include.  One sample name per line.

       --exclude-sitesfile
              List of variants to exclude.  One variant ID per line.

       --include-sitesfile
              List of variants to include.  One variant ID per line.

       --exclude-positionsfile
              List of positions to exclude from genotypes.  One chr position per line (separated by a space).

       --include-positionsfile
              List of positions to include from genotypes.  One chr position per line (separated by a space).

       --exclude-phenotypesfile
              List of phenotypes to exclude.  One phenotype ID per line.

       --include-phenotypesfile
              List of phenotypes to include.  One phenotype ID per line.

       --exclude-covariatesfile
              List of covariates to exclude.  One covariate name per line.

       --include-covariatesfile
              List of covariates to include.  One covariate name per line.

Important Notes

       o BED files' startpositionis0-based, whereas the endpositionis1-based.   Positions  in  all  other
         files  used  in QTLtools are 1-based.  All positions provided as option arguments and filters, even the
         ones referring to BED files, must be 1-based.  1-based means the first base of  the  sequence  has  the
         position 1, whereas in 0-based the first position is 0.

       o Make sure the chromosome names are the same across all files.  If some files have e.g. chr1 and another
         has 1 as a chromosome name then these will be considered different chromosomes.

       o BED files used for FastQTL <http://fastqtl.sourceforge.net/> are not directly compatible with QTLtools.
         To  convert  a  FastQTL BED file to the format used in QTLtools you need to add 2 columns after the 4th
         column.

       o The quan mode in version 1.2 and above is not compatible with  the  quantifications  generated  by  the
         previous  versions.   This  due to bug fixes and slight adjustments to the way we quantify.  DonotmixquantificationsgeneratedbyearlierversionsofQTLtoolswithquantificationsfromversion1.2andabove, as this will create a bias in your dataset.

       o Make sure you index all your genotype, phenotype, and sequence files.

       o Use BCF and BAM files for the best performance.

Modes

bamstatQTLtoolsbamstat--bam[in.sam|in.bam|in.cram]--bedannotation.bed.gz--outoutput.txt[OPTIONS]

                    Calculate basic QC metrics for BAM/SAM.

       mbvQTLtoolsmbv--bam[in.sam|in.bam|in.cram]--vcf[in.vcf|in.vcf.gz|in.bcf]--outoutput.txt[OPTIONS]

                    Match BAM to VCF

       pcaQTLtoolspca--vcf[in.vcf|in.vcf.gz|in.bcf]|--bedin.bed.gz--outoutput.txt[OPTIONS]

                    Calculate principal components for a BED/VCF/BCF/CRAM file.

       correctQTLtoolscorrect--vcf[in.vcf|in.vcf.gz|in.bcf]|--bedin.bed.gz--covcovariates.txt|--normal--outoutput.txt[OPTIONS]

                    Covariate correction of a BED or a VCF file.

       cisQTLtoolscis--vcf[in.vcf|in.vcf.gz|in.bcf|in.bed.gz]--bedquantifications.bed.gz[--nominalfloat|--permuteinteger|--mappingin.txt] --outoutput.txt[OPTIONS]

                    cis QTL analysis.

       transQTLtoolstrans--vcf[in.vcf|in.vcf.gz|in.bcf|in.bed.gz]--bedquantifications.bed.gz[--nominal|--permute|--sampleinteger|--adjustin.txt] --outoutput.txt[OPTIONS]

                    trans QTL analysis.

       fenrichQTLtoolsfenrich--qtlsignificanty_genes.bed--tssgene_tss.bed--bedTFs.encode.bed.gz--outoutput.txt[OPTIONS]

                    Functional enrichment for QTLs.

       fdensityQTLtoolsfdensity--qtlsignificanty_genes.bed--bedTFs.encode.bed.gz--outoutput.txt[OPTIONS]

                    Functional density around QTLs.

       genrichQTLtoolsgenrich--qtlsignificanty_genes.bed--tssgene_tss.bed--vcf1000kg.vcf--gwasgwas_hits.bed--outoutput.txt[OPTIONS]

                    GWAS enrichment for QTLs.  This mode is deprecated and not supported, use rtc instead.

       rtcQTLtoolsrtc--vcf[in.vcf|in.vcf.gz|in.bcf|in.bed.gz]--bedquantifications.bed.gz--hotspotshotspots_b37_hg19.bed[--gwas-cis|--gwas-trans|--mergeQTL-cis|--mergeQTL-trans]variants_external.txtqtls_in_this_dataset.txt--outoutput.txt[OPTIONS]

                    Regulatory  Trait Concordance score analysis to test if two colocalizing variants are due to
                    the same functional effect.

       rtc-unionQTLtoolsrtc-union--vcf     [in.vcf|in.vcf.gz|in.bcf|in.bed.gz]     ...       --bedquantifications.bed.gz ...  --hotspotshotspots_b37_hg19.bed--resultsqtl_results_files.txt
                    ...  [OPTIONS]

                    Find  the  union  of  QTLs  from  independent  datasets.   If  there  was  a  QTL in a given
                    recombination interval in one dataset, then find the best QTL (may or may not be genome-wide
                    significant) in the same recombination interval in all other datasets.

       extractQTLtoolsextract[--vcf--bed--cov]relevant_file--outoutput_prefix[OPTIONS]

                    Data extraction mode.  Extract all the data from the provided files into one flat file.

       quanQTLtoolsquan--bam[in.sam|in.bam|in.cram]--gtfgene_annotation.gtf--out-prefixoutput[OPTIONS]

                    Quantify gene and exon expression from RNAseq.

       aseQTLtoolsase--bam[in.sam|in.bam|in.cram]--vcf[in.vcf|in.vcf.gz|in.bcf]--indsample_name_in_vcf--mapqinteger--outoutput.txt[OPTIONS]

                    Measure allele specific expression from RNAseq at transcribed heterozygous SNPs

       repQTLtoolsrep--bedquantifications.bed.gz--vcf[in.vcf|in.vcf.gz|in.bcf]--qtlqtls_external.txt--outoutput.txt[OPTIONS]

                    Replicate QTL associations in an independent dataset

       gwasQTLtoolsgwas--vcf[in.vcf|in.vcf.gz|in.bcf|in.bed.gz]--bedquantifications.bed.gz--outoutput.txt[OPTIONS]

                    GWAS tests. Correlate all genotypes with all phenotypes.

Name

       QTLtools - A complete tool set for molecular QTL discovery and analysis

See Also

QTLtools-bamstat(1),  QTLtools-mbv(1),  QTLtools-pca(1),  QTLtools-correct(1), QTLtools-cis(1), QTLtools-trans(1), QTLtools-fenrich(1), QTLtools-fdensity(1),  QTLtools-rtc(1),  QTLtools-rtc-union(1),  QTLtools-extract(1), QTLtools-quan(1), QTLtools-ase(1), QTLtools-rep(1), QTLtools-gwas(1)

       QTLtools website: <https://qtltools.github.io/qtltools>

Synopsis

QTLtools [MODE] [OPTIONS]

See Also