Even Faster Gaussian Processes in STAN Using functions

In my last post, I discussed how to get predictions from Gaussian Processes in STAN quickly using the analytical solution. I was able to get it down to 3.8 seconds, which is pretty quick. But I can do better.

One of this issues you may or may not have noticed is, using the model wherein posterior predictions are generated quantities, your computer bogs down with anything over 1000 iterations (I know mine froze up). The issue here is that generated quantities saves all the variables into memory, including those large matrices \( \boldsymbol{K}_{obs}, \boldsymbol{K}_{obs}^{*}, \boldsymbol{K}^{*} \) . We can resolve this issue, and speed things up, using functions to calculate the predictive values while storing the matrices only as local variables within the function (they are not saved in memory).

One obvious solution is to not calculate them within STAN, but to do it externally. Python can do this relatively quickly with judicious use of numpy (whose linear algebra functions are all written in C and so very fast) and numba/just-in-time compilers. However, since STAN compiles everything into C++,  and many of its linear algebra functions are optimized for speed, it makes sense to do it all in STAN. Fortunately, we can create our own functions within STAN to do this, which are then compiled and executed in C++. The code is as follows:

import numpy as np
import matplotlib.pyplot as plt
import pystan as pyst

X = np.arange(-5, 6)
Y_m = np.sin(X)
Y = Y_m + np.random.normal(0, 0.5, len(Y_m))

X_pred = np.linspace(-5, 7, 100)

gp_pred = """
functions{
	// radial basis, or square exponential, function
	matrix rbf(int Nx, vector x, int Ny, vector y, real eta, real rho){
		matrix[Nx,Ny] K;
		for(i in 1:Nx){
			for(j in 1:Ny){
				K[i,j] = eta*exp(-rho*pow(x[i]-y[j], 2));
			}
		}
		return K;
	}

        // all of the posterior calculations are now done within this function
	vector post_pred_rng(real eta, real rho, real sn, int No, vector xo, int Np, vector xp, vector yobs){
			matrix[No,No] Ko;
			matrix[Np,Np] Kp;
			matrix[No,Np] Kop;
			matrix[Np,No] Ko_inv_t;
			vector[Np] mu_p;
			matrix[Np,Np] Tau;
			matrix[Np,Np] L2;
			vector[Np] Yp;

                        // note the use of matrix multiplication for the sn noise, to remove any for-loops
			Ko = rbf(No, xo, No, xo, eta, rho) + diag_matrix(rep_vector(1, No))*sn ;
			Kp = rbf(Np, xp, Np, xp, eta, rho) + diag_matrix(rep_vector(1, Np))*sn ;
			Kop = rbf(No, xo, Np, xp, eta, rho) ;
			
			Ko_inv_t = Kop' / Ko;
			mu_p = Ko_inv_t * yobs;
			Tau = Kp - Ko_inv_t * Kop;
			L2 = cholesky_decompose(Tau);
			Yp = mu_p + L2*rep_vector(normal_rng(0,1), Np);
			return Yp;
	}
}
data{
	int<lower=1> N1;
	int<lower=1> N2;
	vector[N1] X;
	vector[N1] Y;
	vector[N2] Xp ;
}
transformed data{
	vector[N1] mu;
	for(n in 1:N1) mu[n] = 0;
}
parameters{
	real<lower=0> eta_sq;
	real<lower=0> inv_rho_sq;
	real<lower=0> sigma_sq;
}
transformed parameters{
	real<lower=0> rho_sq;
	rho_sq = inv(inv_rho_sq);
}
model{
	matrix[N1,N1] Sigma;
	matrix[N1,N1] L_S;

        // can actually use those functions here, too!!
	Sigma = rbf(N1, X, N1, X, eta_sq, rho_sq) + diag_matrix(rep_vector(1, N1))*sigma_sq;

	L_S = cholesky_decompose(Sigma);
	Y ~ multi_normal_cholesky(mu, L_S);

	eta_sq ~ cauchy(0,5);
	inv_rho_sq ~ cauchy(0,5);
	sigma_sq ~ cauchy(0, 5);
}
generated quantities{
	vector[N2] Ypred;
   
        // this is where the magic happens. note that now we only store Ypred,
        // instead of all those extraneous matrices
	Ypred = post_pred_rng(eta_sq, rho_sq, sigma_sq,
		N1, X, N2, Xp, Y);
}
"""

gp1 = pyst.StanModel(model_code=gp_pred)

data1 = {'N1': len(X), 'X': X, 'Y': Y, 'Xp': X_pred, 'N2': len(X_pred)}
fit_gp1 = gp1.sampling(data1, iter=1000)

There are a couple of tricks here. I used a diagonal matrix \(\boldsymbol{I}\sigma_n\) to remove a for-loop in calculating the noise variance.  It has the added benefit of cleaning up the code (I find for-loops to be messy to read, I like things tidy). Second, by using functions, the generated quantities only stores the predictions, not all the extra matrices, freeing up tons of memory. 

One other thing to note is that the prediction function ends with the suffix ‘_rng’. This is because it is a random number generator, drawing random observations from the posterior distribution. Using the ‘_rng’ suffix allows the function to access other ‘_rng’ functions, chiefly the normal_rng function which draws random normal deviates. That’s necessary for the cholesky trick to turn N(0,1) numbers into the posterior distribution (see the last post). However, normal_rng only returns ONE number, so you have to repeat it for however many observations you need, hence the rep_vector( ) wrapper.

This code is extremely fast. After compilation, it executes 4 chains, 1000 iterations in roughly 1 second (compared to just over three from the previous post). Further, since I no longer have memory issues, I can run more iterations. Whereas 5000 iterations froze my computer before, now it executes in 5 seconds. 10,000 iterations, previously unimaginable on my laptop, runs in 10 seconds.

Pretty great.

Fast Gaussian Process Models in STAN

As described in an earlier post, Gaussian process models are a fast, flexible tool for making predictions. They’re relatively easy to program if you happen to know the parameters of your covariance function/kernel, but what if you want to estimate them from the data? There are several methods available, but my favorite so far is STAN. True, it requires programming the kernel by hand, but I actually find this easier to understand than trying to parse out the kernel functions from, say, scikit-learn.

STAN can fit GP models quickly, but there are certain tricks you can do that make it lightning fast and accurate. I’ve had trouble getting scikit to converge on a stable/accurate solution, but STAN does this with no problem. Plus, the Hamiltonian Monte-Carlo sampler is very quick for GPs (see the STAN User Manual for more).

Here’s a quick tutorial on how to fit GPs in STAN, and how to speed them up. First, let’s import our modules and simulate some fake data:

import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt

X = np.arange(-5, 6)
Y_m = np.sin(X)
Y = Y_m + np.random.normal(0, 0.5, len(Y_m))

Now, let’s develop the STAN model. Here’s the boilerplate model for a GP using a squared-exponential model (taken right from the STAN manual).

gp_no_pred = """
data{
	int<lower=1> N;
	vector[N] X;
	vector[N] Y;
}
transformed data{
	vector[N] mu;

	for(n in 1:N) mu[n] = 0;
}
parameters{
	real<lower=0> s_f;
	real<lower=0> inv_rho;
	real<lower=0> s_n;
}
transformed parameters{
	real<lower=0> rho;

	rho = inv(inv_rho);
}
model{
	matrix[N,N] Sigma;

	for(i in 1:(N-1)){
		for(j in (i+1):N){
			Sigma[i,j] = s_f * exp(-rho * pow(X[i]-X[j], 2));
                        Sigma[j,i] = Sigma[i,j];
		}
	}

	for(k in 1:N){
		Sigma[k,k] = s_f + s_n;
	}
	
	Y ~ multi_normal(mu, Sigma);
        s_f ~ cauchy(0,5);
        inv_rho ~ cauchy(0,5);
        s_n ~ cauchy(0,5);

}
"""

This model fits the parameters of the kernel. On my computer, it does so in about  0.15 seconds total (1000 iterations). 

Now that we know what the hyperparameters are, we’d like to predict new values, but we also want to do so incorporating full uncertainty in the hyperparameters. The way the STAN manual says to go about this is to make a second vector containing the locations you’d like to predict, paste those together to the X values you have, make a vector of prediction points as parameters, paste those to the Y values you have, and feed those into the multivariate normal distribution as one big mush. The issue here is that the covariance matrix, Sigma, gets very large very fast. Large covariance matrices take a while to invert in the multivariate probability density, and so slows down the sampler.

Here’s the model, after making a vector of 100 prediction points to get a smooth line:

X_pred = np.linspace(-5, 7, 100)

gp_pred1 = """
data{
	int<lower=1> N1;
	int<lower=1> N2;
	vector[N1] X;
	vector[N1] Y;
	vector[N2] Xp;
}
transformed data{
	int<lower=1> N;
	vector[N1+N2] Xf;
	vector[N1+N2] mu;

	N = N1+N2;
	for(n in 1:N) mu[n] = 0;
	for(n in 1:N1) Xf[n] = X[n];
	for(n in 1:N2) Xf[N1+n] = Xp[n];
}
parameters{
	real<lower=0> s_f;
	real<lower=0> inv_rho;
	real<lower=0> s_n;
	vector[N2] Yp;
}
transformed parameters{
	real<lower=0> rho;

	rho = inv(inv_rho);
}
model{
	matrix[N,N] Sigma;
	vector[N] Yf;

	for(n in 1:N1) Yf[n] = Y[n];
	for(n in 1:N2) Yf[N1+n] = Yp[n];

	for(i in 1:(N-1)){
		for(j in (i+1):N){
			Sigma[i,j] = s_f * exp(-rho * pow(Xf[i]-Xf[j], 2));
			Sigma[j,i] = Sigma[i,j];
		}
	}

	for(k in 1:N){
		Sigma[k,k] = s_f + s_n;
	}
	
	Yf ~ multi_normal(mu, Sigma);

	s_f ~ cauchy(0,5);
	inv_rho ~ cauchy(0,5);
	s_n ~ cauchy(0, 5);
}
"""

This is extremely slow, taking about 81 seconds on my computer (up from less than one second). Taking the cholesky decompose of Sigma an using multi_normal_cholesky didn’t speed things up, either.

We can speed this up by taking advantage of the analytical form of the solution. That is, once we know the hyperparameters of the kernel from the observed data, we can directly calculate the multivariate normal distribution of the predicted data:

\( P(Y_p | X_p, X_o, Y_o) \sim Normal(\boldsymbol{K}_{obs}^{*’} \boldsymbol{K}_{obs}^{-1} \boldsymbol{y}_{obs}, \boldsymbol{K}^{*} – \boldsymbol{K}_{obs}^{*’} \boldsymbol{K}_{obs}^{-1} \boldsymbol{K}_{obs}^{*}) \)

We can calculate those quantities directly within STAN. Then, as an added trick, we can take advantage of the Cholesky decomposition to generate random samples of \( Y_p \) within STAN as well. Here’s the annotated model:

data{
	int<lower=1> N1;
	int<lower=1> N2;
	vector[N1] X;
	vector[N1] Y;
	vector[N2] Xp;
}
transformed data{
	vector[N1] mu;

	for(n in 1:N1) mu[n] = 0;
}
parameters{
	real<lower=0> s_f;
	real<lower=0> inv_rho;
	real<lower=0> s_n;
	// This is going to be just a generic (0,1) vector
	// for use in generating random samples
	vector[N2] z; 
}
transformed parameters{
	real<lower=0> rho;

	rho = inv(inv_rho);
}
model{
	// kernel for only the observed data
	matrix[N1,N1] Sigma;

	for(i in 1:(N1-1)){
		for(j in (i+1):N1){
			Sigma[i,j] = s_f * exp(-rho * pow(X[i]-X[j], 2));
			Sigma[j,i] = Sigma[i,j];
		}
	}

	for(k in 1:N1){
		Sigma[k,k] = s_f + s_n;
	}
	
	// sampling statement for only the observed data
	Y ~ multi_normal(mu, Sigma);
	// generic sampling statement for z
	z ~ normal(0,1);

	s_f ~ cauchy(0,5);
	inv_rho ~ cauchy(0,5);
	s_n ~ cauchy(0, 5);
}
generated quantities{
	matrix[N1,N1] Ko;
	matrix[N2,N2] Kp;
	matrix[N1,N2] Kop;
	matrix[N1,N1] L;
	matrix[N1, N1] Ko_inv;
	vector[N2] mu_p;
	matrix[N2,N2] Tau;
	vector[N2] Yp;
	matrix[N2,N2] L2;

	// kernel for observed data
	for(i in 1:(N1-1)){
		for(j in (i+1):N1){
			Ko[i,j] = s_f * exp(-rho * pow(X[i]-X[j], 2));
			Ko[j,i] = Ko[i,j];
		}
	}

	for(k in 1:N1){
		Ko[k,k] = s_f + s_n;
	}

	// kernel for prediction data
	for(i in 1:(N2-1)){
		for(j in (i+1):N2){
			Kp[i,j] = s_f * exp(-rho * pow(Xp[i]-Xp[j], 2));
			Kp[j,i] = Kp[i,j];
		}
	}

	for(k in 1:N2){
		Kp[k,k] = s_f + s_n;
	}

	// kernel for observed-prediction cross
	for(i in 1:N1){
		for(j in 1:N2){
			Kop[i,j] = s_f * exp(-rho * pow(X[i]-Xp[j], 2));
		}
	}

	// Follow the algorithm 2.1 of Rassmussen and Williams
	// cholesky decompose Ko
	L = cholesky_decompose(Ko);
	Ko_inv = inverse(L') * inverse(L);
	// calculate the mean of the Y prediction
	mu_p = Kop' * Ko_inv * Y;
	// calculate the covariance of the Y prediction
	Tau = Kp - Kop' * Ko_inv * Kop;

	// Generate random samples from N(mu,Tau)
	L2 = cholesky_decompose(Tau);
	Yp = mu_p + L2*z;

}
"""

This model runs at 3.5 seconds, huge improvement over 81. If you want the predictions, just extract Yp, and that gives you predictions with full uncertainty. We can also compare the two outputs:

Note: I have no idea why, but upping the iterations from 1000 to 5000 causes the analytical solution model to freeze my computer. I can’t quite figure that one out.

Gaussian Processes for Machine Learning in Python 1 Rasmussen and Williams - Chapter 2

Gaussian Processes for Machine Learning by Rasmussen and Williams has become the quintessential book for learning Gaussian Processes. They kindly provide their own software that runs in MATLAB or Octave in order to run GPs. However, I find it easiest to learn by programming on my own, and my language of choice is Python. This is the first in a series of posts that will go over GPs in Python and how to produce the figures, graphs, and results presented in Rasmussen and Williams.

  • Note, Python as numerous excellent packages for implementing GPs, but here I will work on doing them myself “by hand”. 

This post will cover the basics presented in Chapter 2. Specifically, we will cover Figures 2.2, 2.4, and 2.5.

  • Note, I’m not covering the theory of GPs here (that’s the subject of the entire book, right?)

Before we get going, we have to set up Python:

import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt

Figure 2.2

We want to make smooth lines to start, so make 100 evenly spaced \(x\) values:

N_star = 101
x_star = np.linspace(-5, 5, N_star)

Next we have to calculate the covariances between all the observations and store them in the matrix \(\boldsymbol{K}\). Here, we use the squared exponential covariance: \(\text{exp}[-\frac{1}{2}(x_i – x_j)^2]\)

K_star = np.empty((N_star, N_star))
for i in range(N_star):
	for j in range(N_star):
		K_star[i,j] = np.exp(-0.5 * (x_star[i] - x_star[j])**2)

We now have our prior distribution with a mean of 0 and a covariance matrix of \(\boldsymbol{K}\). We can draw samples from this prior distribution

priors = st.multivariate_normal.rvs(mean=[0]*N_star, cov=K_star, size=1000)

for i in range(5):
	plt.plot(x_star, priors[i])

plt.fill_between(x_star, np.percentile(priors, 2.5, axis=0), np.percentile(priors, 97.5, axis=0), alpha=0.2)
plt.show()

Next, let’s add in some observed data:

x_obs = np.array([-4.5, -3.5, -0.5, 0, 1])
y_obs = np.array([-2, 0, 1, 2, -1])

We now need to calculate the covariance between our unobserved data (x_star) and our observed data (x_obs), as well as the covariance among x_obs points as well. The first for loop calculates observed covariances. The second for loop calculates observed-new covariances.

N_obs = 5
K_obs = np.empty((N_obs, N_obs))
for i in range(N_obs):
	for j in range(N_obs):
		K_obs[i,j] = np.exp(-0.5 * (x_obs[i] - x_obs[j])**2)

K_obs_star = np.empty((N_obs, N_star))
for i in range(N_obs):
	for j in range(N_star):
		K_obs_star[i,j] = np.exp(-0.5*(x_obs[i] - x_star[j])**2)

We can then get our posterior distributions:

\( \boldsymbol{\mu} = \boldsymbol{K}_{obs}^{*’} \boldsymbol{K}_{obs}^{-1} \boldsymbol{y}_{obs} \)
\( \boldsymbol{\Sigma} = \boldsymbol{K}^{*} – \boldsymbol{K}_{obs}^{*’} \boldsymbol{K}_{obs}^{-1} \boldsymbol{K}_{obs}^{*} \)

and simulate from this posterior distribution.

post_mean = K_obs_star.T.dot(np.linalg.pinv(K_obs)).dot(y_obs)
post_var = K_star - K_obs_star.T.dot(np.linalg.pinv(K_obs)).dot(K_obs_star)

posteriors = st.multivariate_normal.rvs(mean=post_mean, cov=post_var, size=1000)

for i in range(1000):
	plt.plot(x_star, posteriors[i], c='k', alpha=0.01)

plt.plot(x_obs, y_obs, 'ro')
plt.show()

This may not look exactly like the Rasmussen and Williams Fig. 2.2b because I guessed at the data points and they may not be quite right. As the authors point out, we can actually plot what the covariance looks like for difference x-values, say \(x=-1,2,3\).

plt.plot(x_star, post_var[30], label='x= = -2')
plt.plot(x_star, post_var[60], label='x = 1')
plt.plot(x_star, post_var[80], label='x = 3')
plt.ylabel('Posterior Covariance')
plt.legend()
plt.show()

In this case, however, we’ve forced the scale to be equal to 1, that is you have to be at least one unit away on the x-axis before you begin to see large changes \(y\). We can incorporate a scale parameter \(\lambda\) to change that. We can use another parameter \(\sigma_f^2\) to control the noise in the signal (that is, how close to the points does the line have to pass) and we can add further noise by assuming measurement error \(\sigma_n^2\).

\(cov(x_i, x_j) = \sigma_f^2 \text{exp}[-\frac{1}{2\lambda^2} (x_i – x_j)^2] + \delta_{ij}\sigma_n^2\)

Let’s make some new data:

x_obs = np.array([-7, -5.5, -5, -4.8, -3, -2.8, -1.5, 
	-1, -0.5, 0.25, 0.5, 1, 2.5, 2.6, 4.5, 4.6, 5, 5.5, 6])
y_obs = np.array([-2, 0, 0.1, -1, -1.2, -1.15, 0.5, 1.5, 1.75, 
	0, -0.9, -1, -2.5, -2, -1.5, -1, -1.5, -1.1, -1])
x_star = np.linspace(-7, 7, N_star)

Next, make a couple of functions to calculate \(\boldsymbol{K}_{obs}\), \(\boldsymbol{K}^{*}\), and \(\boldsymbol{K}_{obs}^{*}\).

def k_star(l, sf, sn, x_s):
	N_star = len(x_s)
	K_star = np.empty((N_star, N_star))
	for i in range(N_star):
		for j in range(N_star):
			K_star[i,j] = sf**2*np.exp(-(1.0 / (2.0*l**2)) * (x_s[i] - x_s[j])**2)
	return K_star

def k_obs(l, sf, sn, x_o):
	N_obs = len(x_o)
	K_obs = np.empty((N_obs, N_obs))
	for i in range(N_obs):
		for j in range(N_obs):
			K_obs[i,j] = sf**2*np.exp(-(1.0 / (2.0*l**2)) * (x_o[i] - x_o[j])**2)
	for i in range(N_obs):
		K_obs[i,i] += sn**2
	return K_obs

def k_obs_star(l, sf, sn, x_o, x_s):
	N_obs = len(x_o)
	N_star = len(x_s)
	K_obs_star = np.empty((N_obs, N_star))
	for i in range(N_obs):
		for j in range(N_star):
			K_obs_star[i,j] = sf**2*np.exp(-(1.0 / (2.0*l**2)) *(x_o[i] - x_s[j])**2)
	return K_obs_star

Then run the code for the various sets of parameters. Let’s start with (1, 1, 0.1):

K_star = k_star(1,1,0.2,x_star)
K_obs = k_obs(1,1,0.2, x_obs)
K_obs_star = k_obs_star(1,1,0.2,x_obs, x_star)

post_mean = K_obs_star.T.dot(np.linalg.pinv(K_obs)).dot(y_obs)
post_var = K_star - K_obs_star.T.dot(np.linalg.pinv(K_obs)).dot(K_obs_star)

posteriors = st.multivariate_normal.rvs(mean=post_mean, cov=post_var, size=1000)

for i in range(1000):
	plt.plot(x_star, posteriors[i], c='k', alpha=0.01)

plt.plot(x_star, posteriors.mean(axis=0))
plt.plot(x_obs, y_obs, 'ro')
plt.show()

Then try (0.3, 1.08, 0.00005):

K_star = k_star(0.3, 1.08, 0.00005, x_star)
K_obs = k_obs(0.3, 1.08, 0.00005,  x_obs)
K_obs_star = k_obs_star(0.3, 1.08, 0.00005, x_obs, x_star)

post_mean = K_obs_star.T.dot(np.linalg.pinv(K_obs)).dot(y_obs)
post_var = K_star - K_obs_star.T.dot(np.linalg.pinv(K_obs)).dot(K_obs_star)

posteriors = st.multivariate_normal.rvs(mean=post_mean, cov=post_var, size=1000)

for i in range(1000):
	plt.plot(x_star, posteriors[i], c='k', alpha=0.01)

plt.plot(x_star, posteriors.mean(axis=0))

plt.plot(x_obs, y_obs, 'ro')
plt.show()

Then finally (3, 1.16, 0.89):

K_star = k_star(3, 1.16, 0.89, x_star)
K_obs = k_obs(3, 1.16, 0.89,  x_obs)
K_obs_star = k_obs_star(3, 1.16, 0.89, x_obs, x_star)

post_mean = K_obs_star.T.dot(np.linalg.pinv(K_obs)).dot(y_obs)
post_var = K_star - K_obs_star.T.dot(np.linalg.pinv(K_obs)).dot(K_obs_star)

posteriors = st.multivariate_normal.rvs(mean=post_mean, cov=post_var, size=1000)

for i in range(1000):
	plt.plot(x_star, posteriors[i], c='k', alpha=0.01)

plt.plot(x_star, posteriors.mean(axis=0))

plt.plot(x_obs, y_obs, 'ro')
plt.show()

And there you have it! Figs 2.2, 2.4, and 2.5 from Rasmussen and Williams.

Undergraduate Research Paper – Phosphorus and Grasshoppers

I’d like to congratulate my REU from last summer, Maddi Rode, whose paper “Prospective evidence for independent nitrogen and phosphorus limitation of grasshopper (Chorthippus curtipennis) growth in a tallgrass prairie” was just published in PLOS One

 Nitrogen, a critical component of amino acids and proteins, has long been considered the primary limiting nutrient of terrestrial insects. Other nutrients have generally received much less attentions. However, phosphorus is a crucial component of larval growth, given the tight coupling between phosphorus-rich RNA and growth rates. Indeed, the strength of phosphorus limitation in terrestrial insects is often just as strong as nitrogen limitation. However, few studies have enriched plants with both nitrogen and phosphorus (separately and together) to determine the relative strengths of nutrient limitation.

Maddi spent the summer at Konza Prairie Biological Station doing just that. She enriched plots of Andropogon gerardii, or big bluestem, with nitrogen, phosphorus, or a combination of the two. We then tracked the growth of the marsh meadow grasshopper, Chorthippus curtipennis, under all conditions.

Chorthippus curtipennis. http://bugguide.net/node/view/699668

She found that nitrogen enrichment led to higher grasshopper growth rates. Surprisingly (or unsurprisingly to us), phosphorus enrichment stimulated grasshopper growth by nearly the exact same amount as nitrogen enrichment. 

This work adds to the building body of literature that grasshoppers, and indeed most terrestrial insects, are limited by a suite of nutrients beyond simply phosphorus. What this means for herbivore feeding behavior and climate change remains to be seen…

New Paper on Mutualisms in Ecology Letters

Good news, everyone! My friend and collaborator Andy Shantz and I, along with our PhD. advisor Deron Burkepile, just published a new paper in Ecology Letters regarding the effects of nutrient enrichment (i.e. fertilizer, phosphorus runoff, sewage outfalls) on nutrient-sharing mutualisms.

Nutrient-sharing mutualisms occur when two species cooperate to exchange necessary nutrients between them. The most common example is plants and mycorrhizal fungi. These fungi live on or in the roots of plants and are very good at scavenging rare nutrients, such as nitrogen or phosphorus, from the soils. However, they are not very good at obtaining sugars, carbohydrates, and other carbon-based nutrients. Plants have the opposite problem. They fix sunlight and carbon dioxide into sugars and carbohydrates, but often cannot take up enough nitrogen from the soils. Mycorrhizae and plants share nutrients, with fungi receiving sugars in exchange for nitrogen, to the benefit of both parties. Corals and algae are another common example, wherein corals provide nitrogen to their photosynthetic algae partners in exchange for sugars.

Beneath the snow and dirt lies a vast network of roots and fungi that support this Ponderosa pine forest
Beneath the snow and dirt lies a vast network of roots and fungi that support this Ponderosa pine forest

The stability of these mutualisms hinges on the relative rarity of important nutrients, in this case nitrogen and phosphorus. There is some evidence that the plant, or more generally the photosynthesizing partner, regulates the amount of nutrients in the trade. Theory predicts that if limiting nutrients, like nitrogen, suddenly become non-limiting, then the mutualism should fall apart. After all, why should the plant continue to trade away valuable sugars when it no longer needs the nitrogen from the fungi?

Andy and I tested this prediction using a meta-analysis, which is a quantitative synthesis of published results. Sure enough, we found that nutrient-sharing mutualisms subject to nutrient enrichment showed signs of collapse. That is, when soils or water were fertilized with nitrogen or phosphorus, the heterotrophic partner (corals, fungi) suffered lower performance because the phototrophic partner (plants, algae) ceased trading sugars for nutrients.

What does this mean? Well, nutrient-sharing symbioses form the foundation of most ecosystems: most plants harbor mutualistic fungi, corals cannot exist with their algal symbionts, and lichens exist as a mutualism between fungi and algae. If these mutualisms weaken, then the foundation of most communities also weakens, although just how much remains to be seen.

Ten easy tips for better science writing

As you may be aware, reading science papers can be difficult. Language is verbose and cluttered, the vocabulary is full of jargon, and trying to make sense of it can make your head hurt. I feel that way and I spend my life reading research papers, so I can’t even imagine how non-scientists feel when they attempt to read primary literature. Actually, I can, because I remember being confused as hell reading papers as an undergraduate. This has led some reputable scientists to explain why most written science is difficult to follow. Of course, other equally reputable scientists had some alternative ideas. I actually agree with Dr. Gelman, author of the blog in the second link. It isn’t just most science writing that stinks (to use Dr. Pinker’s phrasing). Most writing stinks in general. Writing is hard. I’ve spent years honing my skills and they still aren’t great. Here are a few simple tips, in no particular order, I’ve picked up on how to write a clear and understandable research paper.

  1. Write in the active voice. As an undergraduate, I learned that science was to be written in the passive voice at all times (I did that on purpose). I’ve learned as a PhD. student that passive voice is boring and slow. Write actively wherever possible; it’s more engaging and interesting. For example, reword the first sentence of this paragraph: ‘As an undergraduate in the sciences, I learned to write in passive voice at all times’. Much better.
  2. First person is fineI also learned in undergrad never to use first person. This archaic rule has fallen completely out of style. Feel free to say ‘I did this’ or ‘We did that’. If you have co-authors, always use ‘we’ even if you did all the physical work. Often your co-authors (see advisors) had a stronger guiding influence than you suspect.
  3. Delete the word ‘the’. ‘The’ has to be one of the most overused words in writing. Read your sentence multiple times, both with and without ‘the’. See if it reads just as clearly without it, then delete it. This is a BIG one.
  4. Delete the words ‘of’ and ‘that’. Same as above. Don’t write ‘even if you did all of the physical work’. Instead, write ‘even if you did all the physical work’. Sounds better, simpler, more active. Instead of ‘a few tips that I’ve picked up’, try ‘a few tips I’ve picked up’.
  5. Use as few words as possible. I think this is good advice for giving public presentations, writing, and talking in general. Don’t write ‘small changes in temperature’ when ‘small temperature changes’ will do.
  6. Know what ‘As such’ really means. Please look this up, because this has to be pretty high on the list of misused phrases. ‘As such’ commonly appears as a transition, e.g. “Temperatures increase metabolic rates. As such, growth and respiration increase as well”. That is wrong. As such’ directly refers to the subject of the previous sentence, e.g. “I am a Phd. candidate. As such, I’ll be unemployed as soon as I graduate”. If you’re confused, replace ‘As such’ with the subject or description from the previous sentence, “As a PhD. candidate, I’ll be unemployed as soon as I graduate’. If it works, you’ve used ‘As such’ correctly. If not, try a different word. ‘Accordingly’ is good, but make sure your paper isn’t covered with ‘Accordingly’s.
  7. It’s OK to start sentences with ‘However’ and ‘Therefore’. Technically it isn’t (another rule I learned as an undergraduate). However, for impact, I prefer it. First, the technically correct way: “I agree with my advisor on most things. I find, however, that I strongly disagree with him on others.’ Second: “I agree with my advisor on most things. However, I strongly disagree with him on others”. I like the second one better, it gets the point across.
  8. Check your paragraphs. The first sentence of a paragraph is the intro. The last is the outro. You should be able to remove everything in between and get your major point across. The stuff in the middle is just details. Try it with the lead-in paragraph to this post. If you can’t remove the middle without sacrificing several important points, then you have too many main ideas in one paragraph.
  9. Following from #8, keep paragraphs short. Good god, please don’t write a page-long paragraph. Get your point across quickly.
  10. Most importantly, use short and simple sentences. My PhD. advisor is king of this, and it works very well. Write like Hemmingway. You don’t need to show off your incredible vocabulary or complex, Faulknerian trains of thought. Save that for the Archer fan fiction you keep in binders on your shelf. The stuff you want people to understand needs to be written clearly and concisely in simple language.

Honorable Mentions: COMMA SPLICING! Please cut down on commas. I reviewed a number of papers of both friends and anonymous authors and people like to put commas everywhere. Only put them where they belong. Also, write with a dog on your lap. A dog on the lap makes everything better.

Atala Season!

Over the past few weeks, Blue Atala caterpillars have been out in force. A single coontie plant in the park by my house could have 15 – 20 caterpillars. A week ago, the caterpillars all formed their chrysalids, so the coontie plants look like Christmas trees, evergreen shrubs with little red ornaments.

Atala chrysalids hanging from a coontie plant.
Atala chrysalids hanging from a coontie plant.

I can’t wait for the next week or so, when the park will be swarming with Atala butterflies.

What Do Research Grants Look Like? A Successful NSF DDIG Example

As you may or may not know, scientists don’t just get money for research given to them by federal or state governments, agencies, or universities. We have to write proposals and submit them for review whenever there is funding available. Only the best proposals get funded, and the odds of getting your proposal funded for any given opportunity is <10% (usually somewhere between 2 – 5% get funded, depending on the funding source and competition). I though I’d post my NSF Doctoral Dissertation Improvement Grant that was successfully funded (on the first try!) for anyone who wants to to see. This is in keeping with a new philosophy of science transparency, and there are some excellent lists of publicly available grant proposals. Hopefully, this helps you understand what goes into the beginnings of a research project, after all, no research is possible without funding, and getting funding isn’t easy. If you’re applying to one of these grants, hopefully you find this and can look it over to see what successful DDIG grants look like.

You can find it here.

Effects of Herbivory on Ecology of Treefall Gaps

Nate Lemoine, FIU PhD candidate and Smithsonian researcher, sprays treefall gaps within the Smithsonian Environmental Research Center with herbicide. Photos by D. Doublet
Nate Lemoine, FIU PhD candidate and Smithsonian researcher, sprays treefall gaps within the Smithsonian Environmental Research Center with herbicide. Photos by D. Doublet

Naturally-occurring treefall gaps are an important part of forest ecology, playing a prominent role in the regeneration of both pioneer and non-pioneer tree species. Nate Lemoine is setting out to understand how insect herbivory plays a role in the growth and health of plants at treefall gaps. By caging small plots within gaps, he is deterring deer and other animals from eating the plants. He is also using herbicide to deter insects from some plots to compare to controlled plots where only water is sprayed.

By Dejeanne Doublet

Caterpillar Food 101

By Dejeanne Doublet, intern Conducting research with insects means that you must take on the roles and duties of a caretaker.

This summer we worked with the caterpillar species Spodoptera exigua (beet armyworms) and their close relative Spodoptera frugiperda (fall armyworms). Both species were shipped to us in sheets of eggs containing roughly 1,000 caterpillars. Most caterpillars start out their lives as eggs on leaves. They usually don’t get to pick and choose their food at the beginning of their lives, and are usually forced to eat from the plant or tree where they hatched.

Some caterpillars will enter a wandering stage once they’re big enough. They may wander about from plant to plant, picking and choosing what they like best or they may stay put at one type of plant for the rest of their life.

  • The first step involves mixing the agar into boiling water until it is completely dissolved. Then, the temperature should to reduced to 85 degrees before adding in the remaining ingredients.

When you’re a caterpillar shipped to a lab with 999 other caterpillars, though, you don’t necessarily get to have an all-you-can-buffet of cuisines. Our lab doesn’t quite function like a Subway and you can’t always have it your way. However, we do make sure you’re eating all your nutrients and vitamins with a homemade concoction of Lepidoptera food.

The recipe consists of more than a dozen ingredients that are simply mixed into boiling water. Click here for the full recipe. Once the ingredients are mixed together, they form a thick substance that cools down to form a tapioca pudding-like concoction.

We poured the finished food goo into 8 oz. plastic containers, allowed it to cool, then placed small pieces of the sheets containing caterpillars eggs on top of the food. We then placed the containers in a 30 ºC chamber under lights that mimic a 16-hour day and 8-hour night cycle. The caterpillars usually begin to hatch a day from when they arrive as eggs. Within a few days, they’re growing and eating and growing and eating.

Spodoptera exigua at about a week old
Spodoptera exigua at about 10 days old. Photos by D. Doublet