PAM Statistics
Introduction
The PAM data structure can be used to generate many diversity statistics. By default, many of the statistics presented in Soberon and Cavner (2015) are generated as well as some phylogenetic diversity metrics if a tree is provided. Additionally, new statistics can be created by decorating functions with the appropriate statistic type and adding them to the stats instance.
Generating Default Statistics
You can generate the base statistics without any special configuration of a PamStats instance. Simply supply the PAM and optionally a tree then request the category of statistics you would like to be returned.
>>> stats = PamStats(pam, tree=my_tree)
>>> site_stats = stats.calculate_site_statistics()
>>> species_stats = stats.calculate_species_statistics()
>>> diversity_stats = stats.calculate_diversity_statistics()
Generate Statistics and Write to GeoJSON File
If you wish to visualize the outputs of the statistics in a GIS software like QGIS, you can do so by writing the outputs as GeoJSON.
>>> import json
>>> from lmpy.matrix import Matrix
>>> from lmpy.statistic.pam_stats import PamStats
>>> from lmpy.spatial.geojsonify import geojsonify_matrix
>>> # Load PAM from file
>>> pam_filename = 'my_pam.lmm'
>>> pam = Matrix.load(pam_filename)
>>> stats = PamStats(pam)
>>> # Create and write GeoJSON
>>> matrix_geojson = geojsonify_matrix(pam, resolution=0.5, omit_values=[0])
>>> with open('matrix.geojson', mode='wt') as out_json:
... json.dump(matrix_geojson, out_json)
Adding New Statistics
It is fairly simple at add new metrics to the computations. The first step is to identify which class of metrics the new metric belongs to, this will determine how the metric is called within the statistics package and what the expected output is.
For example, if we wanted to add a metric that computed the sum of tip lengths for the species present at a site, we would need the tree of those species as input and we would produce a single value for each site. This maps to TreeMetric and so we can define our function with a tree as input and add the TreeMetric decorator.
See: register_metric
@TreeMetric
def sum_tip_lengths(tree):
tip_length_sum = 0
for node in tree.nodes():
if node.is_leaf():
tip_length_sum += node.edge_length
return tip_length_sum
Then we can register this new metric with the statistics package and it will automatically be calculated at the appropriate time with the appropriate inputs.
>>> stats = PamStats(pam, tree=tree)
>>> stats.register_metric('Sum tip lengths', sum_tip_lengths)
>>> site_stats = stats.calculate_site_statistics()
Covariance Matrix Metrics
Covariance matrix metrics take a PAM as input and produce a site by site or species by species matrix of metric values.
sigma_sites
Matrix of covariance of composition of sites. \(\mathbf{\Sigma}_{sites}(j,k) = \frac{1}{S}\sum_{i=1}^{S}\delta_{j,l}\delta_{k,l} - \frac{\alpha_j\alpha_k}{S^2}\)
sigma_species
Matrix of covariance of ranges of species. \(\mathbf{\Sigma}_{species}(h,i) = \frac{1}{N}\sum_{j=i}^{N}\delta_{i,j}\delta_{h,j} - \frac{\omega_i\omega_h}{N^2}\)
Diversity Metrics
Diversity metrics take a PAM as input and produce a single metric value for the entire study.
schluter_species_variance_ratio
Schluter species-ranges covariance. \(V_{species} = \frac{\bar{\psi}^* - N /\beta_W^2}{1/\beta_W - \bar{\varphi}^* / S}\)
schluter_site_variance_ratio
Schluter sites-composition covariance. \(V_{sites} = \frac{\bar{\varphi}^* - S /\beta_W^2}{1/\beta_W - \bar{\psi}^* / N}\)
num_sites
Num sites is the total number of sites in the study area that have any species present.
num_species
Num species is the total number of species in the study that are present at any site.
whittaker
Whittaker’s multaplicative beta diversity metric for a PAM. \(\beta_W = \frac{1}{\bar{\omega}^{*}}\)
lande
Lande is Lande’s addative beta diversity metric for a PAM. \(\beta_A = S(1 - 1/\beta_W)\)
legendre
Legendre is Legendre’s beta diversity metric for a PAM. \(\beta_L = SS(\mathbf{X}) = SN / \beta_W - \left (\sum_{j=1}^{S}\omega_j^2 \right ) / N\)
c_score
C-score is the Stone & Robers checkerboard score for the PAM. \(C = \frac{2}{S(S-1)}\left [ \sum_{i=1}^{N} \sum_{h<i}(\omega_i - \omega_{i,h})(\omega_h - \omega_{i,h}) \right ]\)
Species Matrix Metrics
Species matrix metrics take a PAM as input and return a column of metric values for each species in the study.
omega
Omega is the range size for each species.
omega_proportional
Omega proportional is the range size of each species as a proportion of the total number of sites. \(\omega_i^* = \frac{\bar{\rho}_i}{\bar{\psi}_i^* - \beta_W^{-1}}\)
psi
Psi is the range richness of each species. \(\psi_j = \sum_{i=1}^{N}\delta_{i,j} \alpha_i\)
psi_average_proportional
Psi average proportional is the mean proportional species diversity.
Site Matrix Metrics
Site matrix metrics take a PAM as input and return a column of values for each site in the study area.
alpha
Alpha diversity is the number of species present at each site.
alpha_proportional
Alpha proportional diversity is the ratio of the number of species present at each site to the total number of species in the entire study area. \(\alpha_j^* = \frac{\tau_j}{\bar{\varphi}_j^*-\beta_W^{-1}}\)
phi
Phi is the sum of the range size of the species present at each site. \(\varphi_i = \sum_{j=1}^{S}\delta_{i,j} \omega_j\)
phi_average_proportional
Phi average proportional is the mean proportional range size of the species present at each site.
PAM Distance Matrix Metrics
PAM distance matrix metrics are site-based metrics generated using a PAM and a distance matrix for the tree over the entire study area. These statistics return a single column of values for each site.
pearson_correlation
Pearson correlation is the pearson correlation coefficient for each site.
Tree Metrics
Tree metrics are site-based metrics generated from a phylogenetic tree that only contains tips for species present at a site. These metrics return a single value for the current site.
phylogenetic_diversity
Phylogenetic diversity is the sum of all of the branch lengths in the tree that only contains species present at a site.
Tree Distance Matrix Metrics
Tree distance matrix metrics are site-based statistics generated from a species by species distance matrix for the species present at a particular site. A single value is returned for these metrics for the current site.
mean_nearest_taxon_distance
Mean nearest taxon distance, or MNTD, is the mean of the distance from each tip to the closest tip to it for a tree of all species present at a site.
mean_pairwise_distance
Mean pairwise distance, or MPD, is the mean of the distances of each tip to all other tips in the tree of species present at a site.
sum_pairwise_distance
Sum pairwise distance is the sum of the distances from each tip to all other tips in a tree of the species present at a site.