libpll — Phylogenetic Likelihood Library
Contents
Availability
Source code and binaries are available at <https://github.com/xflouris/libpll>.
Copyright
Copyright (C) 2015-2017, Tomas Flouri, Diego Darriba
All rights reserved.
Contact: Tomas Flouri <Tomas.Flouri@h-its.org>, Scientific Computing, Heidelberg Insititute for
Theoretical Studies, 69118 Heidelberg, Germany
This software is licensed under the terms of the GNU Affero General Public License version 3.
GNUAfferoGeneralPublicLicenseversion3
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero
General Public License as published by the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even
the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General
Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If
not, see <http://www.gnu.org/licenses/>.
Description
libpll is a library for phylogenetics.
pll_partition_t*pll_partition_create(unsignedinttips,unsignedintclv_buffers,unsignedintstates,unsignedintsites,unsignedintrate_matrices,unsignedintprob_matrices,unsignedintrate_cats,unsignedintscale_buffers,unsignedintattributes);
Creates a partition with either tips character arrays or tips CLV arrays (depending on attributes,
see PartitionAttributes), and, additionally, clv_buffers CLV vectors, for storing conditional
probabilities at inner nodes. The partition structure is constructed for states number of states
(e.g. 4 for nucleotide and 20 for amino-acid data) and sufficient space is allocated to host an
alignment of size sites*tips. The number of rate matrices that can be used is given by
rate_matrices. Additionally, the function allocates space for hosting rate_matrices arrays of
substitution parameters, frequencies, and auxiliary eigen-decomposition arrays (transparent to the
user). The parameter prob_matrices dictates the number of probability matrices for which space
will be allocated. This parameter is typically set to the number of branches the tree has (e.g.,
2n-3 for unrooted and 2n-2 for rooted, where n is the number of tips/leaves). libpll will
automatically create space for prob_matrices*rate_cats, where rate_cats is the number of different
rate categories. The array of probability matrices is indexed from 0 to prob_matrices-1. Each
matrix entry consists of sufficient space to accommodate rate_cats matrices, which are stored
consecutively in memory. Note that libpll will not allocate space for the different substitution
matrices specified by rate_matrices. The user must indicate that to libpll by multiplying
prob_matrices with the corresponding factor. Finally, scale_buffers sets the number of scaling
buffers to be allocated, and attributes states the hardware acceleration options to be used (see
PartitionAttributes). The function returns a pointer to the allocated pll_partition_t structure.
Note that, rate_matrices are used to address heterotachy, i.e. transition probability matrices
computed from different rate matrices. For more information, see Updatingtransitionprobabilitymatrices.
voidpll_partition_destroy(pll_partition_t*partition);
Deallocates all data associated with the partition pointed by partition.
intpll_set_tip_states(pll_partition_t*partition,unsignedinttip_index,constunsignedint*map,constchar*sequence);
Set the tip CLV (or tip character array) with index tip_index of instance partition, according to
the character sequence sequence and the conversion table map, which translates (or maps)
characters to states. For an example see SettingCLVvectorsattipsfromsequencesandmaps.
intpll_set_tip_clv(pll_partition_t*partition,unsignedinttip_index,constdouble*clv);
Set the tip CLV with index tip_index of instance partition, to the contents of the array clv. For
an example see SettingCLVvectorsmanually. Note, this function cannot be used in conjunction
with the PLL_ATTRIB_PATTERN_TIP (see PartitionAttributes).
voidpll_set_subst_params(pll_partition_t*partition,unsignedintparams_index,constdouble*params);
Sets the parameters for substitution model with index params_index, where params_index ranges from
0 to rate_matrices-1, as specified in the pll_partition_create() call. Array params should contain
exactly (states*states-states)/2 parameters of type double. These values correspond to the upper
triangle elements (above the main diagonal) of the rate matrix.
voidpll_set_frequencies(pll_partition_t*partition,unsignedintparams_index,constdouble*frequencies);
Sets the base frequencies for the substitution model with index params_index, where params_index
ranges from 0 to rate_matrices-1, as specified in the pll_partition_create() call. The array of
base frequencies (frequencies) is copied into the instance. The order of bases in the array
depends on the encoding used when converting tip sequences to CLV. For example, if the pll_map_nt
map was used with the pll_set_tip_states() function to describe nucleotide data, then the order is
A, C, G, T. However, this can be arbitrarily set by adjusting the provided map.
voidpll_set_pattern_weights(pll_partition_t*partition,constunsignedint*pattern_weights);
Sets the vector of pattern weights (pattern_weights) for partition. The function reads and copies
the first partition->sites elements of pattern_weights into partition->pattern_weights.
voidpll_set_category_rates(pll_partition_t*partition,constdouble*rates);
Sets the rate categories for partition. The function reads and copies the first
partition->rate_cats elements of array rates into partition->rates.
intpll_update_invariant_sites(pll_partition_t*partition);
Updates the invariant sites array partition->invariant, according to the sequences in the
partition. This function is implicitly called by pll_update_invariant_sites_proportion() when the
specified proportion of invariant sites is greater than zero, but it must be explicitly called by
the client code if the sequences change.
intpll_update_invariant_sites_proportion(pll_partition_t*partition,unsignedintparams_index,doubleprop_invar);
Updates the proportion of invariant sites for the partition rate matrix with with index
params_index. Note that, this call will not implicitly update the transition probability matrices
computed from the particular rate matrix, but must be done explicitly for example with a call to
pll_update_prob_matrices().
intpll_update_prob_matrices(pll_partition_t*partition,constunsignedint*params_index,constunsignedint*matrix_indices,constdouble*branch_lengths,unsignedintcount);
Computes the transition probability matrices specified by the count indices in matrix_indices, for
all rate categories. A matrix with index matrix_indices[i] will be computed using the branch
length branch_lengths[i]. To compute the matrix for rate category j, the function uses the rate
matrix with index params_indices[j]. Matrices are stored in partition->pmatrix[matrix_indices[i]].
Note that, each such entry holds the matrices for all rate categories, stored consecutively in
memory.
intpll_update_eigen(pll_partition_t*partition,unsignedintparams_index);
Updates the eigenvectors (partition->eigenvecs[params_index]), inverse eigenvectors
(partition->eigenvecs[params_index]), and eigenvalues (partition->eigenvals[params_index]) using
the substitution parameters (partition->subst_params[params_index]) and base frequencies
(partition->frequencies[params_index]) specified by params_index.
voidpll_show_pmatrix(pll_partition_t*partition,unsignedintindex,unsignedintfloat_precision);
Prints the transition probability matrices for each rate category of partition associated with
index to standard output. The floating point precision is dictated by float_precision.
unsignedintpll_count_invariant_sites(pll_partition_t*partition,unsignedint*state_inv_count);
Returns the number of invariant sites in the sequence alignment from partition. The array
state_inv_count must be of size partition->states and is filled such that entry i contains the
count of invariant sites for state i.
intpll_update_invariant_sites(pll_partition_t*partition);
Updates the invariant sites array partition->invariant, according to the sequences in the
partition. This function is implicitly called by pll_update_invariant_sites_proportion() when the
specified proportion of invariant sites is greater than zero, but it must be explicitly called by
the client code if the sequences change.
intpll_update_invariant_sites_proportion(pll_partition_t*partition,unsignedintparams_index,doubleprop_invar);
Updates the proportion of invariant sites for the rate matrix of partition with index
params_index. Note that, this call will not implicitly update the transition probability matrices
computed from the particular rate matrix, but must be done explicitly for example with a call to
pll_update_prob_matrices().
voidpll_update_partials(pll_partition_t*partition,constpll_operation_t*operations,unsignedintcount);
Updates the count conditional probability vectors (CPV) defined by the entries of operations, in
the order they appear in the array. Each operations entry describes one CPV from partition. See
also pll_operation_t.
voidpll_show_clv(pll_partition_t*partition,unsignedintclv_index,intscaler_index,unsignedintfloat_precision);
Prints to standard output the conditional probability vector for index clv_index from partition,
using the scale buffer with index scaler_index. If no scale buffer was used, then scaler_index
must be passed the value PLL_SCALE_BUFFER_NONE. The floating precision (number of digits after
decimal point) is dictated by float_precision. The output contains brackets, curly braces and
round brackets to separate the values as sites, rate categories and states related, respectively.
doublepll_compute_root_loglikelihood(pll_partition_t*partition,unsignedintclv_index,intscaler_index,constunsignedint*freqs_index,double*persite_lnl);
Evaluates the log-likelihood of a rooted tree, for the vector of conditional probabilities
(partials) with index clv_index, scale buffer with index scaler_index (or PLL_SCALE_BUFFER_NONE),
and base frequencies arrays with indices freqs_index (one per rate category). If persite_lnl is
not NULL, then it must be large enough to hold partition->sites double-precision values, and will
be filled with the per-site log-likelihoods.
doublepll_compute_edge_loglikelihood(pll_partition_t*partition,unsignedintparent_clv_index,intparent_scaler_index,unsignedintchild_clv_index,intchild_scaler_index,unsignedintmatrix_index,constunsignedint*freqs_index,double*persite_lnl);
Evaluates the log-likelihood of an unrooted tree, by providing the conditional probability vectors
(partials) for two nodes that share an edge with indices parent_clv_index resp. child_clv_index,
scale buffers with indices parent_scaler_index resp. child_clv_index (or PLL_SCALE_BUFFER_NONE),
the transition probability matrix with index matrix_index and base frequencies arrays with indices
freqs_index (one per rate category). If persite_lnl is not NULL, then it must be large enough to
hold partition>sites` double-precision values, and will be filled with the per-site log-
likelihoods.
Name
libpll — Phylogenetic Likelihood Library
Synopsis
Partition management
pll_partition_t*pll_partition_create(unsignedinttips,unsignedintclv_buffers,unsignedintstates,unsignedintsites,unsignedintrate_matrices,unsignedintprob_matrices,unsignedintrate_cats,unsignedintscale_buffers,unsignedintattributes);voidpll_partition_destroy(pll_partition_t*partition);
Partition parameters setup
intpll_set_tip_states(pll_partition_t*partition,unsignedinttip_index,constunsignedint*map,constchar*sequence);intpll_set_tip_clv(pll_partition_t*partition,unsignedinttip_index,constdouble*clv);voidpll_set_pattern_weights(pll_partition_t*partition,constunsignedint*pattern_weights);intpll_set_asc_bias_type(pll_partition_t*partition,intasc_bias_type);voidpll_set_asc_state_weights(pll_partition_t*partition,constunsignedint*state_weights);voidpll_set_subst_params(pll_partition_t*partition,unsignedintparams_index,constdouble*params);voidpll_set_frequencies(pll_partition_t*partition,unsignedintparams_index,constdouble*frequencies);voidpll_set_category_rates(pll_partition_t*partition,constdouble*rates);voidpll_set_category_weights(pll_partition_t*partition,constdouble*rate_weights);
Transition probability matrices
intpll_update_prob_matrices(pll_partition_t*partition,constunsignedint*params_index,constunsignedint*matrix_indices,constdouble*branch_lengths,unsignedintcount);intpll_update_eigen(pll_partition_t*partition,unsignedintparams_index);voidpll_show_pmatrix(pll_partition_t*partition,unsignedintindex,unsignedintfloat_precision);
Invariant sites
unsignedintpll_count_invariant_sites(pll_partition_t*partition,unsignedint*state_inv_count);intpll_update_invariant_sites(pll_partition_t*partition);intpll_update_invariant_sites_proportion(pll_partition_t*partition,unsignedintparams_index,doubleprop_invar);
Conditional probability vectors
voidpll_update_partials(pll_partition_t*partition,constpll_operation_t*operations,unsignedintcount);voidpll_show_clv(pll_partition_t*partition,unsignedintclv_index,intscaler_index,unsignedintfloat_precision);
Evaluation of log-Likelihood
doublepll_compute_root_loglikelihood(pll_partition_t*partition,unsignedintclv_index,intscaler_index,constunsignedint*freqs_index,double*persite_lnl);doublepll_compute_edge_loglikelihood(pll_partition_t*partition,unsignedintparent_clv_index,intparent_scaler_index,unsignedintchild_clv_index,intchild_scaler_index,unsignedintmatrix_index,constunsignedint*freqs_index,double*persite_lnl);
Likelihood function derivatives
intpll_update_sumtable(pll_partition_t*partition,unsignedintparent_clv_index,unsignedintchild_clv_index,constunsignedint*params_indices,double*sumtable);intpll_compute_likelihood_derivatives(pll_partition_t*partition,intparent_scaler_index,intchild_scaler_index,doublebranch_length,constunsignedint*params_indices,constdouble*sumtable,double*d_f,double*dd_f);
FASTA file handling
pll_fasta_t*pll_fasta_open(constchar*filename,constunsignedint*map);intpll_fasta_getnext(pll_fasta_t*fd,char**head,long*head_len,char**seq,long*seq_len,long*seqno);voidpll_fasta_close(pll_fasta_t*fd);longpll_fasta_getfilesize(pll_fasta_t*fd);longpll_fasta_getfilepos(pll_fasta_t*fd);intpll_fasta_rewind(pll_fasta_t*fd);
PHYLIP file handling
pll_msa_t*pll_phylip_parse_msa(constchar*filename,unsignedint*msa_count);voidpll_msa_destroy(pll_msa_t*msa);
Newick handling
pll_rtree_t*pll_rtree_parse_newick(constchar*filename,unsignedint*tip_count);pll_utree_t*pll_utree_parse_newick(constchar*filename,unsignedint*tip_count);pll_utree_t*pll_utree_parse_newick_string(char*s,unsignedint*tip_count);
Unrooted tree structure manipulation
voidpll_utree_destroy(pll_utree_t*root);voidpll_utree_show_ascii(pll_utree_t*tree,intoptions);char*pll_utree_export_newick(pll_utree_t*root);intpll_utree_traverse(pll_utree_t*root,int(*cbtrav)(pll_utree_t*),pll_utree_t**outbuffer,unsignedint*trav_size);unsignedintpll_utree_query_tipnodes(pll_utree_t*root,pll_utree_t**node_list);unsignedintpll_utree_query_innernodes(pll_utree_t*root,pll_utree_t**node_list);voidpll_utree_create_operations(pll_utree_t**trav_buffer,unsignedinttrav_buffer_size,double*branches,unsignedint*pmatrix_indices,pll_operation_t*ops,unsignedint*matrix_count,unsignedint*ops_count);intpll_utree_check_integrity(pll_utree_t*root);pll_utree_t*pll_utree_clone(pll_utree_t*root);pll_utree_t*pll_rtree_unroot(pll_rtree_t*root);intpll_utree_every(pll_utree_t*node,int(*cb)(pll_utree_t*));
Rooted tree structure manipulation
voidpll_rtree_destroy(pll_rtree_t*root);voidpll_rtree_show_ascii(pll_rtree_t*tree,intoptions);char*pll_rtree_export_newick(pll_rtree_t*root);intpll_rtree_traverse(pll_rtree_t*root,int(*cbtrav)(pll_rtree_t*),pll_rtree_t**outbuffer,unsignedint*trav_size);unsignedintpll_rtree_query_tipnodes(pll_rtree_t*root,pll_rtree_t**node_list);unsignedintpll_rtree_query_innernodes(pll_rtree_t*root,pll_rtree_t**node_list);voidpll_rtree_create_operations(pll_rtree_t**trav_buffer,unsignedinttrav_buffer_size,double*branches,unsignedint*pmatrix_indices,pll_operation_t*ops,unsignedint*matrix_count,unsignedint*ops_count);voidpll_rtree_create_pars_buildops(pll_rtree_t**trav_buffer,unsignedinttrav_buffer_size,pll_pars_buildop_t*ops,unsignedint*ops_count);voidpll_rtree_create_pars_recops(pll_rtree_t**trav_buffer,unsignedinttrav_buffer_size,pll_pars_recop_t*ops,unsignedint*ops_count);
Topological rearrangement moves
intpll_utree_spr(pll_utree_t*p,pll_utree_t*r,pll_utree_rb_t*rb,double*branch_lengths,unsignedint*matrix_indices);intpll_utree_spr_safe(pll_utree_t*p,pll_utree_t*r,pll_utree_rb_t*rb,double*branch_lengths,unsignedint*matrix_indices);intpll_utree_nni(pll_utree_t*p,inttype,pll_utree_rb_t*rb);intpll_utree_rollback(pll_utree_rb_t*rollback,double*branch_lengths,unsignedint*matrix_indices);
Parsimony functions
intpll_set_parsimony_sequence(pll_parsimony_t*pars,unsignedinttip_index,constunsignedint*map,constchar*sequence);pll_parsimony_t*pll_parsimony_create(unsignedint*tips,unsignedintstates,unsignedintsites,double*score_matrix,unsignedintscore_buffers,unsignedintancestral_buffers);doublepll_parsimony_build(pll_parsimony_t*pars,pll_pars_buildop_t*operations,unsignedintcount);voidpll_parsimony_reconstruct(pll_parsimony_t*pars,constunsignedint*map,pll_pars_recop_t*operations,unsignedintcount);doublepll_parsimony_score(pll_parsimony_t*pars,unsignedintscore_buffer_index);voidpll_parsimony_destroy(pll_parsimony_t*pars);
Auxiliary functions
intpll_compute_gamma_cats(doublealpha,unsignedintcategories,double*output_rates);void*pll_aligned_alloc(size_tsize,size_talignment);voidpll_aligned_free(void*ptr);unsignedint*pll_compress_site_patterns(char**sequence,constunsignedint*map,intcount,int*length);
Core functions
voidpll_core_create_lookup(unsignedintstates,unsignedintrate_cats,double*lookup,constdouble*left_matrix,constdouble*right_matrix,unsignedint*tipmap,unsignedinttipmap_size,unsignedintattrib);voidpll_core_update_partial_tt(unsignedintstates,unsignedintsites,unsignedintrate_cats,double*parent_clv,unsignedint*parent_scaler,constunsignedchar*left_tipchars,constunsignedchar*right_tipchars,constunsignedint*tipmap,unsignedinttipmap_size,constdouble*lookup,unsignedintattrib);voidpll_core_update_partial_ti(unsignedintstates,unsignedintsites,unsignedintrate_cats,double*parent_clv,unsignedint*parent_scaler,constunsignedchar*left_tipchars,constdouble*right_clv,constdouble*left_matrix,constdouble*right_matrix,constunsignedint*right_scaler,constunsignedint*tipmap,unsignedintattrib);voidpll_core_update_partial_ii(unsignedintstates,unsignedintsites,unsignedintrate_cats,double*parent_clv,unsignedint*parent_scaler,constdouble*left_clv,constdouble*right_clv,constdouble*left_matrix,constdouble*right_matrix,constunsignedint*left_scaler,constunsignedint*right_scaler,unsignedintattrib);intpll_core_update_sumtable_ti(unsignedintstates,unsignedintsites,unsignedintrate_cats,constdouble*parent_clv,constunsignedchar*left_tipchars,double**eigenvecs,double**inv_eigenvecs,double**freqs,unsignedint*tipmap,double*sumtable,unsignedintattrib);intpll_core_likelihood_derivatives(unsignedintstates,unsignedintsites,unsignedintrate_cats,constdouble*rate_weights,constunsignedint*parent_scaler,constunsignedint*child_scaler,constint*invariant,constunsignedint*pattern_weights,doublebranch_length,constdouble*prop_invar,double**freqs,constdouble*rates,double**eigenvals,constdouble*sumtable,double*d_f,double*dd_f,unsignedintattrib);doublepll_core_edge_loglikelihood_ii(unsignedintstates,unsignedintsites,unsignedintrate_cats,constdouble*parent_clv,constunsignedint*parent_scaler,constdouble*child_clv,constunsignedint*child_scaler,constdouble*pmatrix,double**frequencies,constdouble*rate_weights,constunsignedint*pattern_weights,constdouble*invar_proportion,constint*invar_indices,constunsignedint*freqs_indices,double*persite_lnl,unsignedintattrib);doublepll_core_edge_loglikelihood_ti(unsignedintstates,unsignedintsites,unsignedintrate_cats,constdouble*parent_clv,constunsignedint*parent_scaler,constunsignedchar*tipchars,constunsignedint*tipmap,constdouble*pmatrix,double**frequencies,constdouble*rate_weights,constunsignedint*pattern_weights,constdouble*invar_proportion,constint*invar_indices,constunsignedint*freqs_indices,double*persite_lnl,unsignedintattrib);intpll_core_update_pmatrix(double*pmatrix,unsignedintstates,doublerate,doubleprop_invar,doublebranch_length,double*eigenvals,double*eigenvecs,double*inv_eigenvecs,unsignedintattrib);Version History
New features and important modifications of libpll (short lived or minor bug releases may not be
mentioned):
v0.2.0 released September 9th, 2016
First public release.
v0.3.0 released May 15th, 2017
Added faster vectorizations for 20-state and arbitrary-state models, unweighted parsimony
functions, randomized stepwise addition, portable functions for parsing trees from C-
strings, per-rate category scalers for preventing numerical underflows. Modified newick
exporting function to accept callbacks for custom printing. Fixed derivatives computation,
parsing of branch lengths, invariant sites computation, log-likelihood computation for
cases where we have scaling and patterns, ascertainment bias computation, per-site log-
likelihood computation, memory leaks. Added run-time detection of hardware.
v0.3.1 released May 17th, 2017
Correct updating of paddded eigen-decomposition arrays for models with a number of states
not being a power of two. Added portable hardware detection for clang and GCC.
v0.3.2 released July 12th, 2017
Added optional per-rate category scalers for protein and generic kernels. Improved fix for
negative transition probability matrices caused by numerics. Fixed initialization of tip
CLVs when using ascertainment bias correction with non-DNA sequences. Fixed excessive
memory allocation when compressing site patterns and issue with PHYLIP parsing when header
ends with CRLF.
libpll 0.3.2 July 12, 2017 libpll(3)
