seer - sequence element enrichment analysis
Contents
Description
Sequence Element Enrichment Analysis
The .pheno file format is tab separated, two columns with sample name, one with phenotype. Phenotypes of
only 0 or 1 will be treated as binary, any other value and the phenotype will be treated as quantitative.
Therefore for missing phenotype values the sample should simply be excluded from this file.
Examples
Basic usage:
seer -k dsm_input.txt.gz --pheno metadata.pheno > significant_kmers.txt
To use the kmds output, increase execution speed and give the most complete output
seer -k filtered.gz --pheno metadata.pheno --struct filtered.dsm --threads 4 --print_samples
Name
seer - sequence element enrichment analysis
Options
Requiredoptions:-k [ --kmers ] arg
dsm kmer output file
-p [ --pheno ] arg
.pheno metadata
Covariateoptions:--struct arg
mds values from kmds
--covar_file arg
file containing covariates
--covar_list arg
list of columns covariates to use. Format is 1,2q,3 (use q for quantitative)
Performanceoptions:--threads arg (=1)
number of threads. Suggested: 4
Filteringoptions:--no_filtering
turn off all filtering and peform tests on all kmers input
--max_length arg (=100)
maximum kmer length
--maf arg (=0.01)
minimum kmer frequency
--min_words arg
minimum kmer occurrences. Overrides --maf--chisq arg (=10e-5)
p-value threshold for initial chi squared test. Set to 1 to show all
--pval arg (=10e-8)
p-value threshold for final logistic test. Set to 1 to show all
Otheroptions:--print_samples
print lists of samples significant kmers were found in
--version
prints version and exits
-h [ --help ]
full help message
