When working with quantitative genetic data, it is often necessary to calculate the genetic variance components that are associated with the trait of interest. Calculating the variance components will allow you to calculate the heritability of the trait, the coefficient of genetic variance for the trait, and genetic correlations between traits. I am going to describe this procedure for the simplest breeding design–one that involves clonal lines. To do the analyses that I will describe below, you need to have a working knowledge of the statistical package R, downloadable here: http://www.r-project.org/
Description of the data: My data includes a single response variable that was measured at two time points. The response was measured on several inbred Drosophila lineages (I will refer to them as ‘lines’). I measured the response for several individuals for each line, and replicated the experiment 8 times. For these analyses, each individual that belongs to each line can be considered a clone. This means that each line represents a family, and each member of the family is genetically identical. Therefore, the variation in the response measured within each line or family represents the influence of environmental variation on the response. Likewise, the variation in the response among the lines or families represents the influence of genetic variation on the response. Because we are interested in things like heritability and genetic variance components, the most important part of this for our purposes is the among line variance.
This is what my data look like:
my data example
This data is classified as an “data.frame” in R. The time variable is stored as a factor, the line is the grouping variable, and the response is the trait that we want to calculate variance components for. Because I want to ultimately compare the responses for each line between the two times, I will separate my data into two new data frames. One for time 1 and one for time 2 (shown below).
There are two methods that I know of in R to determine the variance components. The code for each is provided below.
Method 1: Run a REML model with random effects. Load the lme4 library package. If you haven’t used this library before, you may need to install it using “install.packages(lme4).” The code and output for the model is below. To specify that line is the random variable, it is coded as (1|line). If you have variables that are not random, you can add them after the random variable. For example, your code might look like this: x<- lmer(response~(1|random.variable)+fixed variable, data=your.data)
The genetic variance component is listed in the output above in the section “random effects.” In my model, the genetic variance component is = 0.04188, and the residual or environmental variance component is 0.05987.
Method 2: Use the function varcomp. Load the ape and nlme library packages. You may need to install these packages (see above). First, run a model using the lme function and store the model as an object in R. You will also need to specify a random variable in this model. Because this package was written by different authors, code for random variables is slightly different (see code below). If you are wondering why you need to run the model using the lme function instead of the lmer function, the best answer I can give you is that the function that will determine the variance components requires that the model be in a specific format. This is one of the drawbacks of using a statistical package that uses functions written by lots of different people. Once you have run the model, use the function varcomp to determine the variance components. The code is below.
You can see that is method calculated variance components that are very similar to those calculated in method 1. Again, the genetic variance component (for line) is 0.04187855 and the environmental variance component is 0.05987434. The output for this method is a little cleaner.
OK, now that you have the variance components, you can calculate the heritability, coefficient of genetic variation, and genetic correlation pretty easily.
Heritability: The heritability of a trait is generally calculated as the proportion of phenotypic variance that is explained by genetic variance (Lynch and Walsh, 1998). Because of my experimental design, I can only calculate the broad sense heritability, but if you are working with a more complicated breeding design, you would be able to determine the additive genetic variance component using methods similar to those outlined above (see Lynch and Walsh, 1998 for more details). For this data, the heritability will be calculated as the genetic variance component / (the genetic variance component + the environmental variance component).
The heritability of the response is 0.412.
Coefficient of Genetic Variation: The coefficient of genetic variation is a standardized measure of dispersion of the data and is calculated as 100*(sqrt of the genetic variance component)/response mean (Felix et al., 2012). If you used method 1 to determine the variance components, you will notice that the mean is also calculated and included in the output.
The coefficient of genetic variation is 32.27. The same calculation can be done for the coefficient of environmental variation to determine the degree to which genetic variation fails to explain phenotypic variation in the data.
Genetic Correlation: The genetic correlation is a measure of how closely the genetic variation in the responses are linked. So far, I have only provided example calculations for the time1 dataset. To calculate a genetic correlation for this data, the same calculations need to be done for the time2 dataset to determine the genetic variance component of the response at time 2. I did this calculation and determined that the genetic variance component is 0.04051. Additionally, we need to calculate the genetic covariance component between the responses measured at time 1 and time 2.
Now for the disclaimer: I *think* the following information is correct. In fact, I’m pretty sure because I was able to verify the genetic correlations using another program (not R). Please use your own judgment on the following methods. And please, if you have input, comment below. I would love to be corrected if I am wrong.
OK, here is what I worked out:
You need to calculate the covariance component, which takes into account the genetic covariance between the response at time 1 and time 2. Apparently, this can be done using an ANCOVA format. This can be done using lme and varcomp, similar to method 2 above. The model is specified as follows:
The line/time variable is what specifies the covariate, I believe. When the model is fed into varcomp, the following output results:
The genetic covariance component is 0.02663865. Once you have the genetic covariance component, the calculation of the genetic correlation is simply the genetic covariance component / sqrt( genetic variance component at time 1 * genetic variance component at time 2). See below:
The genetic correlation for the response at time 1 and time 2 is 0.65. In theory, you can do these same machinations for the environmental correlation as well.
Also, if you are interested in the program I used to verify my calculations, it can be found here: http://pages.uoregon.edu/pphil/software.html. The program is called h2boot, and I couldn’t get it to run on a Mac, but that may be user error. It runs great on a PC.
p.s. My apologies for the inconsistent sizes of the R output…I can’t figure out how to resize the images without making them more blurry.
Felix, T.M., K.A. Hughes, E.A. Stone, J.M. Drnevich, and J. Leips. 2012. Age-specific variation in immune response in Drosophila melanogaster has a genetic basis. Genetics 191: 989-1002.
Lynch, M. and B. Walsh. 1998. Genetics and Analysis of Quantitative Traits. Sinauer, Massachusetts.