logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

libpll — Phylogenetic Likelihood Library

Availability

       Source code and binaries are available at <https://github.com/xflouris/libpll>.

Description

libpll is a library for phylogenetics.

       pll_partition_t*pll_partition_create(unsignedinttips,unsignedintclv_buffers,unsignedintstates,unsignedintsites,unsignedintrate_matrices,unsignedintprob_matrices,unsignedintrate_cats,unsignedintscale_buffers,unsignedintattributes);
              Creates a partition with either tips character arrays or tips CLV arrays (depending on attributes,
              see  PartitionAttributes),  and,  additionally, clv_buffers CLV vectors, for storing conditional
              probabilities at inner nodes.  The partition structure is constructed for states number of  states
              (e.g.  4  for  nucleotide and 20 for amino-acid data) and sufficient space is allocated to host an
              alignment of size sites*tips.  The  number  of  rate  matrices  that  can  be  used  is  given  by
              rate_matrices.  Additionally,  the  function  allocates  space for hosting rate_matrices arrays of
              substitution parameters, frequencies, and auxiliary eigen-decomposition arrays (transparent to the
              user). The parameter prob_matrices dictates the number of probability  matrices  for  which  space
              will  be  allocated. This parameter is typically set to the number of branches the tree has (e.g.,
              2n-3 for unrooted and 2n-2 for rooted,  where  n  is  the  number  of  tips/leaves).  libpll  will
              automatically create space for prob_matrices*rate_cats, where rate_cats is the number of different
              rate  categories.  The  array  of  probability matrices is indexed from 0 to prob_matrices-1. Each
              matrix entry consists of sufficient space to accommodate  rate_cats  matrices,  which  are  stored
              consecutively  in memory.  Note that libpll will not allocate space for the different substitution
              matrices specified by rate_matrices.  The  user  must  indicate  that  to  libpll  by  multiplying
              prob_matrices  with  the  corresponding factor.  Finally, scale_buffers sets the number of scaling
              buffers to be allocated, and attributes states the hardware acceleration options to be  used  (see
              PartitionAttributes). The function returns a pointer to the allocated pll_partition_t structure.
              Note that, rate_matrices are used to address heterotachy,  i.e.  transition  probability  matrices
              computed  from  different rate matrices. For more information, see Updatingtransitionprobabilitymatrices.

       voidpll_partition_destroy(pll_partition_t*partition);
              Deallocates all data associated with the partition pointed by partition.

       intpll_set_tip_states(pll_partition_t*partition,unsignedinttip_index,constunsignedint*map,constchar*sequence);
              Set the tip CLV (or tip character array) with index tip_index of instance partition, according  to
              the  character  sequence  sequence  and  the  conversion  table  map,  which  translates (or maps)
              characters to states.  For an example see SettingCLVvectorsattipsfromsequencesandmaps.

       intpll_set_tip_clv(pll_partition_t*partition,unsignedinttip_index,constdouble*clv);
              Set the tip CLV with index tip_index of instance partition, to the contents of the array clv.  For
              an  example  see  SettingCLVvectorsmanually. Note, this function cannot be used in conjunction
              with the PLL_ATTRIB_PATTERN_TIP (see PartitionAttributes).

       voidpll_set_subst_params(pll_partition_t*partition,unsignedintparams_index,constdouble*params);
              Sets the parameters for substitution model with index params_index, where params_index ranges from
              0 to rate_matrices-1, as specified in the pll_partition_create() call. Array params should contain
              exactly (states*states-states)/2 parameters of type double.  These values correspond to the  upper
              triangle elements (above the main diagonal) of the rate matrix.

       voidpll_set_frequencies(pll_partition_t*partition,unsignedintparams_index,constdouble*frequencies);
              Sets  the  base frequencies for the substitution model with index params_index, where params_index
              ranges from 0 to rate_matrices-1, as specified in the pll_partition_create() call.  The  array  of
              base  frequencies  (frequencies)  is  copied  into  the  instance. The order of bases in the array
              depends on the encoding used when converting tip sequences to CLV. For example, if the  pll_map_nt
              map was used with the pll_set_tip_states() function to describe nucleotide data, then the order is
              A, C, G, T. However, this can be arbitrarily set by adjusting the provided map.

       voidpll_set_pattern_weights(pll_partition_t*partition,constunsignedint*pattern_weights);
              Sets  the vector of pattern weights (pattern_weights) for partition. The function reads and copies
              the first partition->sites elements of pattern_weights into partition->pattern_weights.

       voidpll_set_category_rates(pll_partition_t*partition,constdouble*rates);
              Sets  the  rate  categories  for  partition.   The   function   reads   and   copies   the   first
              partition->rate_cats elements of array rates into partition->rates.

       intpll_update_invariant_sites(pll_partition_t*partition);
              Updates  the  invariant  sites  array  partition->invariant,  according  to  the  sequences in the
              partition. This function is implicitly called by pll_update_invariant_sites_proportion() when  the
              specified  proportion of invariant sites is greater than zero, but it must be explicitly called by
              the client code if the sequences change.

       intpll_update_invariant_sites_proportion(pll_partition_t*partition,unsignedintparams_index,doubleprop_invar);
              Updates the proportion  of  invariant  sites  for  the  partition  rate  matrix  with  with  index
              params_index.  Note that, this call will not implicitly update the transition probability matrices
              computed from the particular rate matrix, but must be done explicitly for example with a  call  to
              pll_update_prob_matrices().

       intpll_update_prob_matrices(pll_partition_t*partition,constunsignedint*params_index,constunsignedint*matrix_indices,constdouble*branch_lengths,unsignedintcount);
              Computes the transition probability matrices specified by the count indices in matrix_indices, for
              all  rate  categories.  A  matrix  with  index matrix_indices[i] will be computed using the branch
              length branch_lengths[i]. To compute the matrix for rate category j, the function  uses  the  rate
              matrix with index params_indices[j]. Matrices are stored in partition->pmatrix[matrix_indices[i]].
              Note  that,  each  such  entry holds the matrices for all rate categories, stored consecutively in
              memory.

       intpll_update_eigen(pll_partition_t*partition,unsignedintparams_index);
              Updates    the    eigenvectors    (partition->eigenvecs[params_index]),    inverse    eigenvectors
              (partition->eigenvecs[params_index]),  and  eigenvalues (partition->eigenvals[params_index]) using
              the  substitution  parameters   (partition->subst_params[params_index])   and   base   frequencies
              (partition->frequencies[params_index]) specified by params_index.

       voidpll_show_pmatrix(pll_partition_t*partition,unsignedintindex,unsignedintfloat_precision);
              Prints  the  transition  probability  matrices for each rate category of partition associated with
              index to standard output. The floating point precision is dictated by float_precision.

       unsignedintpll_count_invariant_sites(pll_partition_t*partition,unsignedint*state_inv_count);
              Returns the number of invariant sites  in  the  sequence  alignment  from  partition.   The  array
              state_inv_count  must  be  of  size partition->states and is filled such that entry i contains the
              count of invariant sites for state i.

       intpll_update_invariant_sites(pll_partition_t*partition);
              Updates the invariant  sites  array  partition->invariant,  according  to  the  sequences  in  the
              partition.  This function is implicitly called by pll_update_invariant_sites_proportion() when the
              specified proportion of invariant sites is greater than zero, but it must be explicitly called  by
              the client code if the sequences change.

       intpll_update_invariant_sites_proportion(pll_partition_t*partition,unsignedintparams_index,doubleprop_invar);
              Updates  the  proportion  of  invariant  sites  for  the  rate  matrix  of  partition  with  index
              params_index. Note that, this call will not implicitly update the transition probability  matrices
              computed  from  the particular rate matrix, but must be done explicitly for example with a call to
              pll_update_prob_matrices().

       voidpll_update_partials(pll_partition_t*partition,constpll_operation_t*operations,unsignedintcount);
              Updates the count conditional probability vectors (CPV) defined by the entries of  operations,  in
              the  order  they  appear in the array. Each operations entry describes one CPV from partition. See
              also pll_operation_t.

       voidpll_show_clv(pll_partition_t*partition,unsignedintclv_index,intscaler_index,unsignedintfloat_precision);
              Prints to standard output the conditional probability vector for index clv_index  from  partition,
              using  the  scale  buffer with index scaler_index.  If no scale buffer was used, then scaler_index
              must be passed the value PLL_SCALE_BUFFER_NONE. The floating precision  (number  of  digits  after
              decimal  point)  is  dictated  by  float_precision. The output contains brackets, curly braces and
              round brackets to separate the values as sites, rate categories and states related, respectively.

       doublepll_compute_root_loglikelihood(pll_partition_t*partition,unsignedintclv_index,intscaler_index,constunsignedint*freqs_index,double*persite_lnl);
              Evaluates the log-likelihood of a  rooted  tree,  for  the  vector  of  conditional  probabilities
              (partials)  with index clv_index, scale buffer with index scaler_index (or PLL_SCALE_BUFFER_NONE),
              and base frequencies arrays with indices freqs_index (one per rate category).  If  persite_lnl  is
              not  NULL, then it must be large enough to hold partition->sites double-precision values, and will
              be filled with the per-site log-likelihoods.

       doublepll_compute_edge_loglikelihood(pll_partition_t*partition,unsignedintparent_clv_index,intparent_scaler_index,unsignedintchild_clv_index,intchild_scaler_index,unsignedintmatrix_index,constunsignedint*freqs_index,double*persite_lnl);
              Evaluates the log-likelihood of an unrooted tree, by providing the conditional probability vectors
              (partials) for two nodes that share an edge  with indices parent_clv_index resp.  child_clv_index,
              scale  buffers  with indices parent_scaler_index resp. child_clv_index (or PLL_SCALE_BUFFER_NONE),
              the transition probability matrix with index matrix_index and base frequencies arrays with indices
              freqs_index (one per rate category). If persite_lnl is not NULL, then it must be large  enough  to
              hold  partition>sites`  double-precision  values,  and  will  be  filled  with  the  per-site log-
              likelihoods.

Name

       libpll — Phylogenetic Likelihood Library

Synopsis

       Partition management
              pll_partition_t*pll_partition_create(unsignedinttips,unsignedintclv_buffers,unsignedintstates,unsignedintsites,unsignedintrate_matrices,unsignedintprob_matrices,unsignedintrate_cats,unsignedintscale_buffers,unsignedintattributes);voidpll_partition_destroy(pll_partition_t*partition);

       Partition parameters setup
              intpll_set_tip_states(pll_partition_t*partition,unsignedinttip_index,constunsignedint*map,constchar*sequence);intpll_set_tip_clv(pll_partition_t*partition,unsignedinttip_index,constdouble*clv);voidpll_set_pattern_weights(pll_partition_t*partition,constunsignedint*pattern_weights);intpll_set_asc_bias_type(pll_partition_t*partition,intasc_bias_type);voidpll_set_asc_state_weights(pll_partition_t*partition,constunsignedint*state_weights);voidpll_set_subst_params(pll_partition_t*partition,unsignedintparams_index,constdouble*params);voidpll_set_frequencies(pll_partition_t*partition,unsignedintparams_index,constdouble*frequencies);voidpll_set_category_rates(pll_partition_t*partition,constdouble*rates);voidpll_set_category_weights(pll_partition_t*partition,constdouble*rate_weights);

       Transition probability matrices
              intpll_update_prob_matrices(pll_partition_t*partition,constunsignedint*params_index,constunsignedint*matrix_indices,constdouble*branch_lengths,unsignedintcount);intpll_update_eigen(pll_partition_t*partition,unsignedintparams_index);voidpll_show_pmatrix(pll_partition_t*partition,unsignedintindex,unsignedintfloat_precision);

       Invariant sites
              unsignedintpll_count_invariant_sites(pll_partition_t*partition,unsignedint*state_inv_count);intpll_update_invariant_sites(pll_partition_t*partition);intpll_update_invariant_sites_proportion(pll_partition_t*partition,unsignedintparams_index,doubleprop_invar);

       Conditional probability vectors
              voidpll_update_partials(pll_partition_t*partition,constpll_operation_t*operations,unsignedintcount);voidpll_show_clv(pll_partition_t*partition,unsignedintclv_index,intscaler_index,unsignedintfloat_precision);

       Evaluation of log-Likelihood
              doublepll_compute_root_loglikelihood(pll_partition_t*partition,unsignedintclv_index,intscaler_index,constunsignedint*freqs_index,double*persite_lnl);doublepll_compute_edge_loglikelihood(pll_partition_t*partition,unsignedintparent_clv_index,intparent_scaler_index,unsignedintchild_clv_index,intchild_scaler_index,unsignedintmatrix_index,constunsignedint*freqs_index,double*persite_lnl);

       Likelihood function derivatives
              intpll_update_sumtable(pll_partition_t*partition,unsignedintparent_clv_index,unsignedintchild_clv_index,constunsignedint*params_indices,double*sumtable);intpll_compute_likelihood_derivatives(pll_partition_t*partition,intparent_scaler_index,intchild_scaler_index,doublebranch_length,constunsignedint*params_indices,constdouble*sumtable,double*d_f,double*dd_f);

       FASTA file handling
              pll_fasta_t*pll_fasta_open(constchar*filename,constunsignedint*map);intpll_fasta_getnext(pll_fasta_t*fd,char**head,long*head_len,char**seq,long*seq_len,long*seqno);voidpll_fasta_close(pll_fasta_t*fd);longpll_fasta_getfilesize(pll_fasta_t*fd);longpll_fasta_getfilepos(pll_fasta_t*fd);intpll_fasta_rewind(pll_fasta_t*fd);

       PHYLIP file handling
              pll_msa_t*pll_phylip_parse_msa(constchar*filename,unsignedint*msa_count);voidpll_msa_destroy(pll_msa_t*msa);

       Newick handling
              pll_rtree_t*pll_rtree_parse_newick(constchar*filename,unsignedint*tip_count);pll_utree_t*pll_utree_parse_newick(constchar*filename,unsignedint*tip_count);pll_utree_t*pll_utree_parse_newick_string(char*s,unsignedint*tip_count);

       Unrooted tree structure manipulation
              voidpll_utree_destroy(pll_utree_t*root);voidpll_utree_show_ascii(pll_utree_t*tree,intoptions);char*pll_utree_export_newick(pll_utree_t*root);intpll_utree_traverse(pll_utree_t*root,int(*cbtrav)(pll_utree_t*),pll_utree_t**outbuffer,unsignedint*trav_size);unsignedintpll_utree_query_tipnodes(pll_utree_t*root,pll_utree_t**node_list);unsignedintpll_utree_query_innernodes(pll_utree_t*root,pll_utree_t**node_list);voidpll_utree_create_operations(pll_utree_t**trav_buffer,unsignedinttrav_buffer_size,double*branches,unsignedint*pmatrix_indices,pll_operation_t*ops,unsignedint*matrix_count,unsignedint*ops_count);intpll_utree_check_integrity(pll_utree_t*root);pll_utree_t*pll_utree_clone(pll_utree_t*root);pll_utree_t*pll_rtree_unroot(pll_rtree_t*root);intpll_utree_every(pll_utree_t*node,int(*cb)(pll_utree_t*));

       Rooted tree structure manipulation
              voidpll_rtree_destroy(pll_rtree_t*root);voidpll_rtree_show_ascii(pll_rtree_t*tree,intoptions);char*pll_rtree_export_newick(pll_rtree_t*root);intpll_rtree_traverse(pll_rtree_t*root,int(*cbtrav)(pll_rtree_t*),pll_rtree_t**outbuffer,unsignedint*trav_size);unsignedintpll_rtree_query_tipnodes(pll_rtree_t*root,pll_rtree_t**node_list);unsignedintpll_rtree_query_innernodes(pll_rtree_t*root,pll_rtree_t**node_list);voidpll_rtree_create_operations(pll_rtree_t**trav_buffer,unsignedinttrav_buffer_size,double*branches,unsignedint*pmatrix_indices,pll_operation_t*ops,unsignedint*matrix_count,unsignedint*ops_count);voidpll_rtree_create_pars_buildops(pll_rtree_t**trav_buffer,unsignedinttrav_buffer_size,pll_pars_buildop_t*ops,unsignedint*ops_count);voidpll_rtree_create_pars_recops(pll_rtree_t**trav_buffer,unsignedinttrav_buffer_size,pll_pars_recop_t*ops,unsignedint*ops_count);

       Topological rearrangement moves
              intpll_utree_spr(pll_utree_t*p,pll_utree_t*r,pll_utree_rb_t*rb,double*branch_lengths,unsignedint*matrix_indices);intpll_utree_spr_safe(pll_utree_t*p,pll_utree_t*r,pll_utree_rb_t*rb,double*branch_lengths,unsignedint*matrix_indices);intpll_utree_nni(pll_utree_t*p,inttype,pll_utree_rb_t*rb);intpll_utree_rollback(pll_utree_rb_t*rollback,double*branch_lengths,unsignedint*matrix_indices);

       Parsimony functions
              intpll_set_parsimony_sequence(pll_parsimony_t*pars,unsignedinttip_index,constunsignedint*map,constchar*sequence);pll_parsimony_t*pll_parsimony_create(unsignedint*tips,unsignedintstates,unsignedintsites,double*score_matrix,unsignedintscore_buffers,unsignedintancestral_buffers);doublepll_parsimony_build(pll_parsimony_t*pars,pll_pars_buildop_t*operations,unsignedintcount);voidpll_parsimony_reconstruct(pll_parsimony_t*pars,constunsignedint*map,pll_pars_recop_t*operations,unsignedintcount);doublepll_parsimony_score(pll_parsimony_t*pars,unsignedintscore_buffer_index);voidpll_parsimony_destroy(pll_parsimony_t*pars);

       Auxiliary functions
              intpll_compute_gamma_cats(doublealpha,unsignedintcategories,double*output_rates);void*pll_aligned_alloc(size_tsize,size_talignment);voidpll_aligned_free(void*ptr);unsignedint*pll_compress_site_patterns(char**sequence,constunsignedint*map,intcount,int*length);

       Core functions
              voidpll_core_create_lookup(unsignedintstates,unsignedintrate_cats,double*lookup,constdouble*left_matrix,constdouble*right_matrix,unsignedint*tipmap,unsignedinttipmap_size,unsignedintattrib);voidpll_core_update_partial_tt(unsignedintstates,unsignedintsites,unsignedintrate_cats,double*parent_clv,unsignedint*parent_scaler,constunsignedchar*left_tipchars,constunsignedchar*right_tipchars,constunsignedint*tipmap,unsignedinttipmap_size,constdouble*lookup,unsignedintattrib);voidpll_core_update_partial_ti(unsignedintstates,unsignedintsites,unsignedintrate_cats,double*parent_clv,unsignedint*parent_scaler,constunsignedchar*left_tipchars,constdouble*right_clv,constdouble*left_matrix,constdouble*right_matrix,constunsignedint*right_scaler,constunsignedint*tipmap,unsignedintattrib);voidpll_core_update_partial_ii(unsignedintstates,unsignedintsites,unsignedintrate_cats,double*parent_clv,unsignedint*parent_scaler,constdouble*left_clv,constdouble*right_clv,constdouble*left_matrix,constdouble*right_matrix,constunsignedint*left_scaler,constunsignedint*right_scaler,unsignedintattrib);intpll_core_update_sumtable_ti(unsignedintstates,unsignedintsites,unsignedintrate_cats,constdouble*parent_clv,constunsignedchar*left_tipchars,double**eigenvecs,double**inv_eigenvecs,double**freqs,unsignedint*tipmap,double*sumtable,unsignedintattrib);intpll_core_likelihood_derivatives(unsignedintstates,unsignedintsites,unsignedintrate_cats,constdouble*rate_weights,constunsignedint*parent_scaler,constunsignedint*child_scaler,constint*invariant,constunsignedint*pattern_weights,doublebranch_length,constdouble*prop_invar,double**freqs,constdouble*rates,double**eigenvals,constdouble*sumtable,double*d_f,double*dd_f,unsignedintattrib);doublepll_core_edge_loglikelihood_ii(unsignedintstates,unsignedintsites,unsignedintrate_cats,constdouble*parent_clv,constunsignedint*parent_scaler,constdouble*child_clv,constunsignedint*child_scaler,constdouble*pmatrix,double**frequencies,constdouble*rate_weights,constunsignedint*pattern_weights,constdouble*invar_proportion,constint*invar_indices,constunsignedint*freqs_indices,double*persite_lnl,unsignedintattrib);doublepll_core_edge_loglikelihood_ti(unsignedintstates,unsignedintsites,unsignedintrate_cats,constdouble*parent_clv,constunsignedint*parent_scaler,constunsignedchar*tipchars,constunsignedint*tipmap,constdouble*pmatrix,double**frequencies,constdouble*rate_weights,constunsignedint*pattern_weights,constdouble*invar_proportion,constint*invar_indices,constunsignedint*freqs_indices,double*persite_lnl,unsignedintattrib);intpll_core_update_pmatrix(double*pmatrix,unsignedintstates,doublerate,doubleprop_invar,doublebranch_length,double*eigenvals,double*eigenvecs,double*inv_eigenvecs,unsignedintattrib);

Version History

       New  features  and  important  modifications  of  libpll  (short  lived  or minor bug releases may not be
       mentioned):

              v0.2.0 released September 9th, 2016
                     First public release.

              v0.3.0 released May 15th, 2017
                     Added faster vectorizations for 20-state and arbitrary-state models,  unweighted  parsimony
                     functions,  randomized  stepwise  addition,  portable  functions  for parsing trees from C-
                     strings, per-rate category scalers for preventing  numerical  underflows.  Modified  newick
                     exporting  function to accept callbacks for custom printing. Fixed derivatives computation,
                     parsing of branch lengths, invariant  sites  computation,  log-likelihood  computation  for
                     cases  where  we  have  scaling and patterns, ascertainment bias computation, per-site log-
                     likelihood computation, memory leaks. Added run-time detection of hardware.

              v0.3.1 released May 17th, 2017
                     Correct updating of paddded eigen-decomposition arrays for models with a number  of  states
                     not being a power of two. Added portable hardware detection for clang and GCC.

              v0.3.2 released July 12th, 2017
                     Added optional per-rate category scalers for protein and generic kernels.  Improved fix for
                     negative  transition  probability matrices caused by numerics.  Fixed initialization of tip
                     CLVs when using ascertainment bias  correction  with  non-DNA  sequences.  Fixed  excessive
                     memory  allocation when compressing site patterns and issue with PHYLIP parsing when header
                     ends with CRLF.

libpll 0.3.2                                      July 12, 2017                                        libpll(3)

See Also