logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

mirtop - mirtop Documentation

Author

       Lorena  Pantano,  Thomas  Desvignes,  Karen  EIlbeck,  Ioannis  Vlachos, Bastian Fromm, Marc K. Halushka,
       Michael Hackenberg, Gianvito Urgese

Documentation For The Code

bam
       Read bam files

       mirtop.bam.bam.read_bam(bam_fn,args,clean=True)
              Read bam file and perform realignment of hits

              Args:bam_fn: a BAM file with alignments to the precursor

                     precursors:dictwithkeysbeingprecursornamesandvalues
                            being sequences. Come from mirtop.mirna.fasta.read_precursor().

                     clean: Use mirtop.filter.clean_hits() to remove lower score hits.

              Returns:reads(dict):
                            keys are read_id and values are mirtop.realign.hitsmirtop.bam.filter.clean_hits(reads)
              Select only best matches from a list of hits from the same read.

              Args:reads: dictionary as:

                     >>> {'read_id': mirtop.realign.hits, ...}

              Returns:
                 reads: same than input but with best hits only.

       mirtop.bam.filter.tune(seq,precursor,start,cigar)
              The  actual  fn  that  will  realign the sequence to find the nt changes at 5', 3' sequence and nt
              variations.

              Args:seq(str): sequence of the read.

                     precursor(str): sequence of the precursor.

                     start(int): start position of sequence on the precursor, +1.

                     cigar(str): similar to SAM CIGAR attribute.

              Returns:
                 list with:
                     subs (list): substitutions

                     add (list): nt added to the end

                     cigar (str): updated cigar

   exporter
       Read GFF files and output isomiRs compatible format

       mirtop.exporter.isomirs.convert(args)
              Main function to convert from GFF3 to isomiRs Bioc Package.

              Reads a GFF file to produces output file containing Expression counts

              Args:args(namedtuple):argumentsparsedfromcommandlinewithmirtop.libs.parse.add_subparser_counts().

              Returns:file(file):withcolumnslike:
                            UID miRNA Variant Sample1 Sample2 ... Sample N

       Read GFF files and output FASTA format

       mirtop.exporter.fasta.convert(args)
              Main function to convert from GFF3 to FASTA format.

              Args:args:supportedoptionsforthissub-command.
                            See mirtop.libs.parse.add_subparser_export().

       mirtop.exporter.vcf.cigar_2_key(cigar,readseq,refseq,pos,var5p,var3p,parent_ini_pos,parent_end_pos,hairpin)Args:  'cigar(str)': CIGAR standard of a compressed alignment representation, this CIGAR omits the
                     '1' integer.  'readseq(str)': the  read  sequence  'refseq(str)':  the  reference  sequence
                     'pos(str)': the start current position 'var5p(int)': extra nucleotides not in the reference
                     miRNA  (5p  strand)  'var3p(int)': extra nucleotides not in the reference miRNA (3p strand)
                     'parent_ini_pos(int)': the start position of the parent  miRNA  'parent_end_pos(int)':  the
                     last  position  of  the  parent miRNA 'hairpin(str)': the string of the hairpin for all the
                     miRNA

              Returns:
                     'key_pos(str list)': a list with the positions of the  variants.   'key_var(str  list)':  a
                     list  with  the  variant  keys  found.  'ref(str)': reference base(s).  'alt(str)': altered
                     base(s).

       mirtop.exporter.vcf.convert(args)
              Main function to convert from GFF3 to VCF.

              Args:args:supportedoptionsforthissub-command.
                            See mirtop.libs.parse.add_subparser_export().

       mirtop.exporter.vcf.create_vcf(mirgff3,precursor,gtf,vcffile)Args:  'mirgff3(str)': File with mirGFF3 format that will  be  converted  'precursor(str)':  Fasta
                     format  sequences  of  all miRNA hairpins 'gtf(str)': Genome coordinates 'vcffile': name of
                     the file to be saved

              Returns:
                     Nothing is returned, instead, a VCF file is generated

   gff
       GFF reader and creator helpers

       mirtop.gff.body.create(reads,database,sample,args,quiet=False)
              Read https://github.com/miRTop/mirtop/issues/9mirtop.gff.body.lift_to_genome(line,mapper)Functiontogetaclassoftypefeaturefromclassgff.py
                     and map the precursors coordinates to the genomic coordinates

              Args:line(str): string GFF line.  mapper(dict): dict with mirna-precursor-genomic coordinas from
                        mirna.mapper.read_gtf_to_mirna function.

              Returns:(line): string with GFF line with updated chr, star, end, strand

       mirtop.gff.body.paste_columns(line,sep='')
              Create GFF/GTF line from read_gff_line

       mirtop.gff.body.read(fn,args)
              Read GTF/GFF file and load into annotate, chrom counts, sample, line

       mirtop.gff.body.read_gff_line(line)
              Read GFF/GTF line and return dictionary with fields

       mirtop.gff.body.read_variant(attrb,sep='')
              Read string in variants attribute.

              Args:attrb(str): string in Variant attribute.

              Returns:(gff_dict):dictionarywith:

                            >>> {'iso_3p': -3, ...}

       mirtop.gff.body.variant_with_nt(line,precursors,matures)
              Return nucleotides changes for each variant type using Variant attribute, precursor sequences  and
              mature position.

       Compare multiple GFF files to a reference

       mirtop.gff.compare.compare(args)
              From a list of GFF files produce comparison with a reference set.

              Args:args(namedtuple):argumentsparsedfromcommandlinewithmirtop.libs.parse.add_subparser_compare().    First  file  will  be  considered  the
                            reference set.

              Returns:(out_file): comparison of the GFF files with the reference.

       mirtop.gff.compare.read_reference(fn)
              Read GFF into UID:Variant

              Args:fn(str): GFF file.

              Returns:srna(dict): dict with >>> {'UID': 'iso_snp:-2,...'}

       Helpers to define the header fo the GFF file

       mirtop.gff.header.create(samples,database,custom,filter=None)
              Create header for GFF file.

              Args:samples(list): character list with names for samples

                     database(str): name of the database.

                     custom(str): extra lines.

                     filter(list): character list with filter definition.

              Returns:header(str): header string.

       mirtop.gff.header.read_samples(fn)
              Read samples from the header of a GFF file.

              Args:fn(str): GFF file to read.

              Returns:(list): character list with sample names.

       mirtop.gff.header.read_version(fn)
              Extract mirGFF3 version

       mirtop.gff.merge.merge(dts,samples)
              For dictionary with sample as keys and values as lines merge them into one GFF file.

              Args:dts(dict): dictionary as >>> {'file': {'mirna':  {start:  gff_list}}}.   gff_list  has  the
                     format as defined in mirtop.gff.body.read().

                     samples(list): character list with sample names.

              Returns:merged_lines(nesteddicts):gff_list has the format as defined in mirtop.gff.body.read().

       Produce stats from GFF3 format

       mirtop.gff.stats.stats(args)
              From a list of GFF files produce general isomiRs stats.

              Args:args(namedtupled):argumentsparsedfromcommandlinewithmirtop.libs.parse.add_subparser_stats().

              Returns:(stdout)or(out_file): GFF general stats.

       Update gff3 files to newest version

       mirtop.gff.update.convert(args)
              Update previous GFF3 versions.

              Args:args(namedtupled):argumentsparsedfromcommandlinewithmirtop.libs.parse.add_subparser_update().

              Returns:(out_file): most updated GFF3 file.

       mirtop.gff.update.update_file(gff_file,new_gff_file)
              Update file from file version to current version

       mirtop.gff.validator.check_multiple(args)
              Check GFF3 format.

              Args:args(namedtupled):argumentsparsedfromcommandlinewithmirtop.libs.parse.add_subparser_validator().

              Returns:(std_out): warnings or errors of the files showing issues with the format.

   importer
       Read isomiR GFF files

       mirtop.importer.isomirsea.cigar2variants(cigar,sequence,tag)
              From cigar to Variants in GFF format

       mirtop.importer.isomirsea.header(fn)
              Custom header for isomiR-SEA importer.

              Args:fn(str): file name with isomiR-SEA GFF output

              Returns:(str): isomiR-SEA header string.

       mirtop.importer.isomirsea.read_file(fn,args)
              Read isomiR-SEA file and convert to mirtop GFF format.

              Args:fn(str): file name with isomiR-SEA output information.

                     database(str): database name.

                     args(namedtuple):argumentsfromcommandline.
                            See mirtop.libs.parse.add_subparser_gff().

              Returns:reads(nesteddicts):gff_listhastheformatas
                            defined in mirtop.gff.body.read().

       Read prost! files

       mirtop.importer.prost.header()
              Custom header for PROST! importer.

              Returns:(str): PROST! header string.

       mirtop.importer.prost.read_file(fn,hairpins,database,mirna_gtf)
              Read PROST! file and convert to mirtop GFF format.

              Args:fn(str): file name with PROST output information.

                     database(str): database name.

                     args(namedtuple):argumentsfromcommandline.
                            See mirtop.libs.parse.add_subparser_gff().

              Returns:reads: dictionary where keys are read_id and values are mirtop.realign.hits

       Read seqbuster files

       mirtop.importer.seqbuster.header()
              Custom header for seqbuster importer.

              Returns:(str): seqbuster header string.

       mirtop.importer.seqbuster.read_file(fn,args)
              Read seqbuster file and convert to mirtop GFF format.

              Args:fn(str): file name with seqbuster output information.

                     database(str): database name.

                     args(namedtuple):argumentsfromcommandline.
                            See mirtop.libs.parse.add_subparser_gff().

              Returns:reads: dictionary where keys are read_id and values are mirtop.realign.hits

       Read sRNAbench files

       mirtop.importer.srnabench.read_file(folder,args)
              Read sRNAbench file and convert to mirtop GFF format.

              Args:fn(str): file name with sRNAbench output information.

                     database(str): database name.

                     args(namedtuple):argumentsfromcommandline.
                            See mirtop.libs.parse.add_subparser_gff().

              Returns:reads(nesteddicts):gff_listhastheformatas
                            defined in mirtop.gff.body.read().

       Read isomiR GFF files from optimir tool

       mirtop.importer.optimir.read_file(fn,args)
              Read OptimiR file and convert to mirtop GFF format.

              Args:fn(str): file name with isomiR-SEA output information.

                     database(str): database name.

                     args(namedtuple):argumentsfromcommandline.
                            See mirtop.libs.parse.add_subparser_gff().

              Returns:reads(nesteddicts):gff_listhastheformatas
                            defined in mirtop.gff.body.read().

       Read Manatee files

       mirtop.importer.manatee.read_file(fn,database,args)
              Read Manatee file and convert to mirtop GFF format.

              Args:fn(str): file name with Manatee output information.

                     database(str): database name.

                     args(namedtuple):argumentsfromcommandline.
                            See mirtop.libs.parse.add_subparser_gff().

              Returns:reads(nesteddicts):gff_listhastheformatas
                            defined in mirtop.gff.body.read().

   libs
       Centralize  running  of  external commands, providing logging and tracking. Integrated from bcbio package
       with some changes.

       mirtop.libs.do.find_bash()
              Find bash full path

       mirtop.libs.do.find_cmd(cmd)
              Find command in session

       mirtop.libs.do.run(cmd,data=None,checks=None,region=None,log_error=True,log_stdout=False)
              Run the provided command, logging details and checking for errors.

       Helpers to work with fastq files

       mirtop.libs.fastq.is_fastq(in_file)Checkwhetherfileisfastqaccepting
                     txt, fq and fastq extensions understanding compression with gzip: .gzip and .gz (copy  from
                     bcbio)

              Args:in_file(str): file name.

              Returns:(boolean): Yes or Not.

       mirtop.libs.fastq.open_fastq(in_file)openafastqfile,usinggzipifitisgzipped
                     (from bcbio package)

              Args:in_file(str): file name.

              Returns:(File): file handler.

       mirtop.libs.fastq.splitext_plus(fn)Splitonfileextensions,allowingforzippedextensions.
                     (copy from bcbio)

              Args:fn(str): file name.

              Returns:base,ext(str,str): basename and extension.

       mirtop.libs.parse.parse_cl(in_args)
              Function to parse the subcommands arguments.

       utils from http://www.github.com/chapmanb/bcbio-nextgen.gitmirtop.libs.utils.chdir(p)
              Change dir temporaly using with:

              >>> with chdir(temporal):
                      do_something()

       mirtop.libs.utils.file_exists(fname)
              Check if a file exists and is non-empty.

       mirtop.libs.utils.safe_dirs(dirs)
              Create folder if not exitsts

       mirtop.libs.utils.safe_remove(fn)
              Remove file skipping

   mirna
       Read bam files

       mirtop.mirna.annotate.annotate(reads,mature_ref,precursors,quiet=False)
              Using coordinates, mismatches and realign to annotate isomiRs

              Args:reads(dictsofhits):
                            dict object that comes from mirotp.bam.bam.read_bam()mirbase_ref(dictofmirnapositions):
                            dict object that comers from mirtop.mirna.read_mature()precursorsdictobject(key:fasta):
                            that comes from mirtop.mirna.fasta.read_precursor()quiet(boolean):
                            verbosity state

              Return:reads(dict):
                            dictionary where keys are read_id and values are mirtop.realign.hits

       Read precursor fasta file

       mirtop.mirna.fasta.read_precursor(precursor,sps=None)
              Load precursor file for that species

              Args:precursor(str): file name with fasta sequences

                     sps(str):ifany,selectspeciestokeep.
                            It'll do a header_sequence.find(sps).

              Returns:hairpin(dict):keysareprecursornamesand
                            values are precursor sequences.

       Read database information

       mirtop.mirna.mapper.get_primary_transcript(database)GettheIDtoidentifytheprimarytranscriptinthe
                     GTF  file  with  the  miRNA  and  precursor  coordinates to be able to parse BAM files with
                     genomic coordinates.

       mirtop.mirna.mapper.guess_database(args)
              Guess database name from GFF file.

              Args:gtf(str):filenamewithGFFmiRNAgenomicpositionsand
                            header lines.

              Returns:database(str): name of the database

              TODO: this needs to be generic to other databases.

       mirtop.mirna.mapper.read_gtf_chr2mirna(gtf)
              Load GTF file with precursor positions on genome.

              Args:gtf(str):filenamewithGFFmiRNAgenomicpositionsand
                            header lines.

              Returns:db_mir(dict):dictionarywithkeysbeingchrandvalues
                            mirna and genomic positions.

       mirtop.mirna.mapper.read_gtf_to_mirna(gtf)
              Load GTF file with precursor positions on genome.

              Args:gtf(str):filenamewithGFFmiRNAgenomicpositionsand
                            header lines.

              Returns:db_mir(dict):dictionarywithkeysbeingmirnasandvalues
                            genomic positions.

       mirtop.mirna.mapper.read_gtf_to_precursor(gtf)
              Load GTF file with precursor positions on genome Return dict with key  being  precursor  name  and
              value a dict of mature miRNA with relative position to precursor.

              Args:gtf(str):filenamewithGFFmiRNAgenomicpositionsand
                            header lines.

              Returns:map_dict(dict):

                     >>> {'parent': {mirna: [start, end]}}

       mirtop.mirna.mapper.read_gtf_to_precursor_mirbase(gtf,format='precursor')
              Load  GTF  file  with  precursor positions on genome Return dict with key being precursor name and
              value a dict of mature miRNA with relative position to precursor. For  miRBase  and  similar  GFF3
              files.

              Args:gtf(str):filenamewithGFFmiRNAgenomicpositionsand
                            header lines.

              Returns:map_dict(dict):

                     >>> {'parent': {mirna: [start, end]}}

       mirtop.mirna.mapper.read_gtf_to_precursor_mirgenedb(gtf,format='precursor')
              Load  GTF  file  with  precursor positions on genome Return dict with key being precursor name and
              value a dict of mature miRNA with relative position to precursor. For MirGeneDB and  similar  GFF3
              files.

              Args:gtf(str):filenamewithGFFmiRNAgenomicpositionsand
                            header lines.

              Returns:map_dict(dict):

                     >>> {'parent': {mirna: [start, end]}}

       mirtop.mirna.realign.align(x,y,local=False)
              Pairwise            alignments            between            two            sequenes.            ‐
              https://medium.com/towards-data-science/pairwise-sequence-alignment-using-biopython-d1a9d0ba861fArgs:x(str): short sequence.

                     y(str): long sequence.

                     local(boolean): local or global alignment.

              Returns:aligned_x(hit): alignment information, socre and positions.

       mirtop.mirna.realign.align_from_variants(sequence,mature,variants)Givingthesequenceread,
                     the mature from get_mature_sequence,  and  the  variant  GFF  annotation:  get  a  list  of
                     substitutions

              Args:sequence(str): read sequence.

                     mature(str):maturesequencefrommirtop.mirna.realing.get_mature_sequence().

                     variants(str): string from Variant attribute in GFF file.

              Returns:snp(list): [[pos, target, reference]]

       mirtop.mirna.realign.cigar2snp(cigar,reference)
              From  a  CIGAR string and reference sequence detect mistmatches positions and reference and target
              nucleotides.

              Args:cigar(str): CIGAR string.

                     reference(str): reference sequence.

              Returns:snp(list): position of mismatches (indels included) as:

                     >>> [pos, seq_nt, ref_nt]

       mirtop.mirna.realign.cigar_correction(cigarLine,query,target)
              Read from CIGAR in BAM file to define mismatches.

              Args:cirgarLine(str): CIGAR string from BAM file.

                     query(str): read sequence.

                     target(str): target sequence.

              Returns:(list): [query_nts, target_nts]

       mirtop.mirna.realign.expand_cigar(cigar)
              From short CIGAR version to long CIGAR version where each character is each nts in the sequence.

              Args:cigar(str): CIGAR string.

                     >>> 10MA3M

              Returns:cigar_long(str): CIGAR long.

                     >>> MMMMMMMMMMAMMM

       mirtop.mirna.realign.get_mature_sequence(precursor,mature,exact=False,nt=5)Fromprecursorandmaturepositions
                     get mature sequence with +/- 4 flanking nts.

              Args:precursor(str): long sequence.

                     mature(list): [start, end].

                     exact(boolean): not add 4+/- flanking nts.

                     nt(int): number of nts to get.

              Returns:(str): mature sequence.

       classmirtop.mirna.realign.hits
              "Class with alignment information.

       mirtop.mirna.realign.is_sequence(seq)
              This function check whether the sequence is valid or not.

              Args:seq(str): string acting as a sequence.

              Returns:boolean: whether is or not a valid nucleotide sequence.

       classmirtop.mirna.realign.isomir
              Class to represent isomiRs information.

              format(sep='\t')
                     Create tabular line from variant fields.

              formatGFF()
                     Create Variant attribute.

              format_id(sep='\t')
                     Create simple identifier from variant fields.

              get_score(sc)
                     Get score from variant fields.

              is_iso()
                     Define whether element is isomiR or not.

              set_pos(start,l,strand='+')
                     Set end position

       mirtop.mirna.realign.make_cigar(seq,mature)
              Function that will create CIGAR string from aligment between read and reference sequence.

              Args:seq(str): read sequence.

                     mature(str): short sequence.

              Return:short(str): CIGAR string.

       mirtop.mirna.realign.make_id(seq)
              Create a unique identifier for the sequence from the nucleotides, replacing 5  nts  for  a  unique
              sequence.

              It uses the code from mirtop.mirna.keys().

              Inspired           by           MINTplate:           https://cm.jefferson.edu/MINTbasehttps://github.com/TJU-CMC-Org/MINTmap/tree/master/MINTplatesArgs:seq(str): nucleotides sequences.

              Returns:idName(str): unique identifier for the sequence.

       mirtop.mirna.realign.read_id(idu)
              Read a unique identifier for the sequence and convert it to the nucleotides, replacing  an  unique
              code for 5 nts.

              It uses the code from mirtop.mirna.keys().

              Inspired           by           MINTplate:           https://cm.jefferson.edu/MINTbasehttps://github.com/TJU-CMC-Org/MINTmap/tree/master/MINTplatesArgs:idu(str): unique identifier for the sequence.

              Returns:seq(str): nucleotides sequences.

       mirtop.mirna.realign.reverse_complement(seq)
              Get reverse complement of a sequences

              Args:seq(str): sequence.

                     >>> GCAT

              Returns:(str): reverse complemente sequence:

                     >>> ATGC

       mirtop.mirna.realign.variant_to_3p(hairpin,pos,variant)Fromasequenceandastartpositiongetthents
                     +/- indicated by iso_3p. Pos option is 0-base-index

              Args:hairpin(str):longsequence:

                            >>> AAATTTT

                     position(int): >>> 3

                     variant(int):numberofntsinvolvedinthevariant:

                            >>> -1

              Returns:(str):nucleotideinvolvedinthevariant:

                            >>> A

       mirtop.mirna.realign.variant_to_5p(hairpin,pos,variant)Fromasequenceandastartpositiongetthents
                     +/- indicated by iso_5p. Pos option is 0-base-index

              Args:hairpin(str):longsequence:

                            >>> AAATTTT

                     position(int): >>> 3

                     variant(int):numberofntsinvolvedinthevariant:

                            >>> -1

              Returns:(str):nucleotideinvolvedinthevariant:

                            >>> T

       mirtop.mirna.realign.variant_to_add(read,variant)Fromasequenceandastartpositiongetthents
                     +/- indicated by iso_3p. Pos option is 0-base-index

              Args:hairpin(str):longsequence:

                            >>> AAATTTT

                     position(int): >>> 3

                     variant(int):numberofntsinvolvedinthevariant:

                            >>> 2

              Returns:(str):nucleotideinvolvedinthevariant:

                            >>> TT

       mirtop.mirna.snps.create_vcf(isomirs,matures,gtf,vcf_file=None)
              Create vcf file of changes for all samples.  PASS will be ones with >  3  isomiRs  supporting  the
              position and > 30% of reads, otherwise LOW

       mirtop.mirna.snps.liftover(pass_pos,matures)
              Make position at precursor scale

       mirtop.mirna.snps.liftover_to_genome(pass_pos,gtf)
              Liftover from precursor to genome

       mirtop.mirna.snps.print_vcf(data)
              Print vcf line following rules.

   classesclassmirtop.mirna.realign.hits
              "Class with alignment information.

       classmirtop.mirna.realign.isomir
              Class to represent isomiRs information.

              format(sep='\t')
                     Create tabular line from variant fields.

              formatGFF()
                     Create Variant attribute.

              format_id(sep='\t')
                     Create simple identifier from variant fields.

              get_score(sc)
                     Get score from variant fields.

              is_iso()
                     Define whether element is isomiR or not.

              set_pos(start,l,strand='+')
                     Set end position

Name

       mirtop - mirtop Documentation

See Also