QTLtools rtc-union - Find the union of QTLs from independent datasets
Contents
Bugs
Versions up to and including 1.2, suffer from a bug in reading missing genotypes in VCF/BCF files. This
bug affects variants with a DS field in their genotype's FORMAT and have a missing genotype (DS field is
.) in one of the samples, in which case genotypes for all the samples are set to missing, effectively
removing this variant from the analyses.
Please submit bugs to <https://github.com/qtltools/qtltools>
Citation
Ongen H, Brown AA, Delaneau O, et al. Estimating the causal tissues for complex traits and diseases. NatGenet. 2017;49(12):1676-1683. doi:10.1038/ng.3981 <https://doi.org/10.1038/ng.3981>
Description
This mode finds the best molQTL (may or may not be genome-wide significant) in each region flanked by
recombination hotspots (coldspot), if there was a molQTL in the same coldspot in one dataset. First we
map all the significant molQTLs in all of the datasets to coldspots. Subsequently if certain datasets do
not have a significant molQTL in a given coldspot for a given phenotype, we then take the most
significant variant associated with that phenotype in that coldspot, for all the missing datasets.
Example
o Find the union of 3 datasets, correcting for technical covariates, and rank normal transforming the
phenotypes with 20 jobs on a compute cluster (qsub needs to be changed to the job submission system
used [bsub, psub, etc...]):
for j in $(seq 1 20); do
echo "QTLtools rtc-union --bed dataset1.bed.gz dataset2.bed.gz dataset3.bed.gz --vcf dataset1.bcf
dataset2.bcf dataset3.bcf --cov dataset1.covariates.txt dataset2.covariates.txt
dataset3.covariates.txt --results dataset1.txt dataset2.txt dataset3.txt --hotspots
hotspots_b37_hg19.bed --normal --conditional --chunk $j 20 --out-suffix .chunk.$j.20.txt" | qsub
done
Name
QTLtools rtc-union - Find the union of QTLs from independent datasets
Options
--vcf[in.vcf|in.bcf|in.vcf.gz|in.bed.gz]...
Genotypes in VCF/BCF format, or another molecular phenotype in BED format. If there is a DS field
in the genotype FORMAT of a variant (dosage of the genotype calculated from genotype
probabilities, e.g. after imputation), then this is used as the genotype. If there is only the GT
field in the genotype FORMAT then this is used and it is converted to a dosage. If a single file
is provided then all datasets are assumed to have the same genotypes, and all datasets' samples
are all included in this file. If multiple files are provided for each dataset, then all--vcf,--bed,--cov,and--resultsfilesMUSTbeinthesameorder. E.g if the first vcf file is from
dataset1, then the first bed, cov, and results files must also be from dataset1. REQUIRED.
--bedquantifications.bed.gz ...
Molecular phenotype quantifications in BED format for each of the datasets. All--vcf,--bed,--cov,and--resultsfilesMUSTbeinthesameorder. E.g if the first vcf file is from dataset1,
then the first bed, cov, and results files must also be from dataset1. REQUIRED.
--resultssignificant_qtls.txt ...
Results file with the QTLs in each of the datasets. All--vcf,--bed,--cov,and--resultsfilesMUSTbeinthesameorder. E.g if the first vcf file is from dataset1, then the first bed, cov,
and results files must also be from dataset1. REQUIRED.
--hotspotsrecombination_hotspots.bed
Recombination hotspots in BED format. REQUIRED.
--out-suffixsuffix
If provided output files will be suffixed with this.
--covcovariates.txt
Covariates to correct the phenotype data with for each of the datasets. All--vcf,--bed,--cov,and--resultsfilesMUSTbeinthesameorder. E.g if the first vcf file is from dataset1, then
the first bed, cov, and results files must also be from dataset1.
--force
If the output file exists, overwrite it.
--normal
Rank normal transform the phenotype data so that each phenotype is normally distributed.
RECOMMENDED.
--conditional
molQTLs contain independent signals so execute the conditional analysis.
--windowinteger
Size of the cis window flanking each phenotype's start position. DEFAULT=1000000.
RECOMMENDED=1000000.
--pheno-colinteger
1-based phenotype id column number. DEFAULT=1
--geno-colinteger
1-based genotype id column number. DEFAULT=8
--rank-colinteger
1-based conditional analysis rank column number. Only relevant if --conditional is in effect.
DEFAULT=12
--best-colinteger
1-based phenotype column number Only relevant if --conditional is in effect. DEFAULT=21
--chunkinteger1integer2
For parallelization. Divide the data into integer2 number of chunks and process chunk number
integer1. Chunk 0 will print a header. Mutually exclusive with --region. Minimumnumberofchunkshastobeatleastthesamenumberofchromosomesinthe--bedfile.--regionchr:start-end
Genomic region to be processed. E.g. chr4:12334456-16334456, or chr5. Mutually exclusive with
--chunk.
Output File
outputfile
Space separated output file with the following columns.
1 Column showing that this is a rtc-union result. Always __UNION__
2 The phenotype ID
3 The genotype ID. This can say __UNION_FILLER_MAX_INDEP__, __UNION_FILLER_MISS_GENO__, or
__UNION_FILLER_MISS_PHENO__ which are fillers for missing cases in one of the datasets.
4 The rank of the best variant in this coldspot. If this was discovered in the rtc-union run then
this would be -1, and if there was already a significant variant in this coldspot then a different
value.
5 Dummy field indicating that this is the best hit per rank
6 The p-value of the association. Will be 0 if this was already significant in the dataset
7 The coldspot ID
8 The coldspot region
See Also
QTLtools(1)
QTLtools website: <https://qtltools.github.io/qtltools>
Synopsis
QTLtoolsrtc-union--vcf [in.vcf|in.vcf.gz|in.bcf|in.bed.gz] ... --bedquantifications.bed.gz ...
--hotspotshotspots_b37_hg19.bed--resultsqtl_results_files.txt ... [OPTIONS]