Wednesday 2 March 2016

Simple models for abundance


Simple models for abundance


Ecology is fundamentally the study of the distribution and abundance of living things, and a core goal is to understand the factors that influence the spatial and temporal variation in distribution and abundance patterns. Over the last few months, I have had a few people asking me how to model abundance, so I decided to make a quick post on the topic.

Abundance is frequently measured as a count, such as the number of individuals of a target species seen in a plot or transect. Commonly, you will also record some features of the site that you think may influence the abundance of the target species – vegetation cover, for example. The Poisson distribution is the fundamental probability distribution to model abundance data, as it supports discrete positive values (e.g., counts). The Poisson-log Generalised Linear Model is the most common regression model for studying the relationship between abundance and the factors of interest. In a Poisson-log GLM, the relationship between abundance and the factors of interest is linear on the log-scale, and the abundance follows a Poisson distribution with mean abundance λi:

            log(λi) = intercept + slope * xi; or equivalently λi = exp(intercept + slope * xi)
            Abundance ~ Poisson(λi)

where xi is the factor of interest. Let’s do an example in R, where we are interested in how the vegetation cover in a plot, measured as a proportion of the plot covered by vegetation, influences the abundance of Australian huntsman spiders.  


cute and adorable huntsman spider on Newland Head Conservation Park, South Australia 

            First of all, let’s simulate some data in R:

                     
            #### sample.size: number of plots surveyed for huntsman, n = 30

  set.seed(1000)
sample.size<-30

            #### Simulate vegetation cover data for the survey plots from a Beta distribution

veg.cov<-rbeta(sample.size, 0.5, 0.5)

#### Now, let’s define the parameters of the Poisson-log GLM for the relationship between
#### the huntsman abundance and the vegetation cover

intercept.sim=-2    #### Intercept  
slope.sim=5  #### Slope

#### Define the mean huntsman abundance λ as a function of the vegetation cover

lambda<-exp(intercept.sim+slope.sim*veg.cov)

#### Finally, obtain the huntsman abundance in each and every survey plot by drawing values
#### from a Poisson distribution with mean λ


abundance<-rpois(sample.size, lambda)

##### Let’s have a look at the relationship between the huntsman abundance and the
#### vegetation cover


plot(abundance~veg.cov, xlab="Vegetation cover (%)", ylab="Huntsman abundance", pch=19, cex=3)

The relationship between the huntsman abundance and the vegetation cover

            Now, it is time for the formal modelling to answer the question, does vegetation cover influences the huntsman abundance? We respond to the question by doing a Poisson-log GLM on the simulated data.

#### Fit a Poisson-log GLM to the huntsman abundance (dependent variable) and
#### vegetation cover data (independent).

fit1<-glm(abundance~veg.cov, family=poisson())

#### Have a look at the summary Table.

summary(fit1)

The ‘Estimate’ column shows that ‘veg.cov’ indeed has a strong positive effect, and the column 'Pr(>|z|)’ indicates that the effect is statistically significant  (p <0.05). Therefore, we conclude that the huntsman abundance increases with increasing vegetation cover.


The summary table for the Poisson-log model (fit1), obtained by running summary(fit1)

We have simulated the data and that means that we know exactly the values of the intercept (intercept.sim = -2) and the slope (slope.sim  = 5) of the Poisson-log GLM. Is the algorithm able to recover these values? That is, how similar are the values for the intercept and the slope estimated by the fitted model to the simulated values? Obviously, the better the best. Having a look at the column ‘Estimate’, we can see that the values compare well. To further confirm the reliability of the fitted model, we can calculate the 95% Confidence Intervals for the intercept and the slope for the fitted model:

#### Calculate the 95% Confidence Intervals

conf.int<-confint(fit1)

conf.int


95% Confidence Intervals for the coefficients of the Poisson-log regression

            Again, we see that our simulated values fall well within the 95% Confidence Intervals drawn from the fitted model. We conclude that our fitted model is good, and that it recovers the coefficients adequately. 

            This was a very simple, quick, example. Modelling abundance can be way more difficult and complicated. For example, here I have assumed that the huntsman abundance is perfectly observed, that means that we have counted all the spiders present in each and every survey plot. The reality is that this is rarely the case and that we commonly do not detect all the individuals present in a place. There are statistical tools to deal with this kind of biases, but those are an entirely different world


Huntsman spiders are very variable in colour, a trait that can influence the capacity of researchers to detect them and in turn biasing abundance estimates. This one was just a few metres away from the spider pictured in the previous photo. 

An annotated R script for conducting the Poisson-log GLM can be found in my GitHub repository: https://github.com/pablogarciadiaz/Code-Blog/blob/master/Poisson-Regression.R


No comments:

Post a Comment