Downloading a gene list from a sequence interval in FlyBase

This is a quick and easy way to download a list of genes that fall between an interval that you may be interested following QTL mapping, for example.

  1. First, go to
  2. Under Tools, choose Genomic/Map Tools, and then choose CytoSearch.
  3. In the top option menu on the CytoSearch page, choose sequence region.
  4. Enter the sequence coordinates, but don’t forget to convert them. Your coordinates may match an earlier release of the genome. To convert, go to Tools, choose Retrieve/Convert Tools, and then choose coordinates converter. There, you can enter the coordinates you are starting with and pick the appropriate conversion. For example, if your coordinates are from release 5, you should first convert them to release 6.
  5. Back on the CytoSearch page, check all of the options you need once you have the correct coordinates.
  6. Click submit query, which redirects you to the list of genes. You could stop here if this is all you need.
  7. To download the list, click HitList Conversion Tools (at the top right above the list).
  8. A window will popup for a moment called Export Batch to Download. If you wait too long, it might disappear. Choose Genes (or whatever option you want). This redirects you to a the batch download page.
  9. Here, next to Field Data, click the format you want to download the file to. Tab-separated is a good choice.
  10. Change where you want the data exported to under the Send Results To option menu.
  11. Click Select Fields (lower right of search box).
  12. Choose the field options you want in the new page that opens.
  13. At the bottom or top of the page, click Get Field Data when you have selected all desired options.
  14. The file should download quickly and is readable in R as a .txt file.

Some things to keep in mind: FBgn numbers have changed for many genes as new information on the sequence comes available and for other reasons that are much more arbitrary. If you are looking for a way to compare the list you have just generated to a list from another study, you may need to use the FBgn aliases that are an option in step 12 to generate a look-up table of name synonyms. A good place to start for figuring out how to do that is with match() and %in% functions in R.  Also, some of the gene names for Drosophila have ‘ (apostrophes) in them and R will choke on .txt or any other type of file that has them in the data. You have to open the .txt document in an editor and remove them before reading the file into R. After this step, it R shouldn’t have a problem reading it in.



Labeling panels in R

This is a brief tutorial on how to add labels to plots outside the plotting area in both basic plots and for ggplot2. I use this to label panels “A”, “B”, “C”, etc. but figuring out how to do this was surprisingly difficult. Here is my solution:


Here are the plots generated using the basic plot function:


basic plot with panels labeled “A” and “B” in R

…and the second example with basic plot. In this plot the top margin is smaller:


basic plot with panels labeled “A” and “B” in R

basic plot with panels labeled “A” and “B” in R

Here are the plots using ggplot2:


ggplot with panels labeled “A” and “B” in R

…and the vertical version:


ggplot with panels labeled “A” and “B” in R

If you want to tweak the exact location of the plot labels, this is easily done in ggplot2. You simply adjust the x and y values by very small increments, making sure to stay between 0 and 1.

Trick for Counting Ovarioles

If you are looking to learn how to dissect a fruit fly’s ovaries out, check this guy’s video out.  He gives a very clear demonstration of how it’s done, and recommends solutions and tools that are needed to complete the process.

The point of my post is to elaborate on how you might go about discerning the individual ovarioles for counting.  I work on an Olympus SZX7 scope that has a ring light.  I found it very difficult to see the ovarioles when they were lit from above, so I tried several other configurations with lighting (including my iPhone screen).  The bottom of the scope (on the stage) pops out leaving a hole, so I decided to try using the ring light to illuminate specimens from below.  This worked pretty well.  I bought a small sheet of clear acrylic from Home Depot and laid that over the hole so that the slide would be sitting on something stable, and I propped my microscope up on some lids that were laying around lab from another project (pictured below).  I’ll admit that it looks very silly.  All of this trouble is definitely not necessary if you have access to a scope that has transmitted light capabilities–I’m just finding ways to make due with what I’ve got.

In this picture, there is a piece of paper over the opening in the base.  I ended taking this off when using the scope.

In this picture, there is a piece of paper over the opening in the base. I ended taking this off when using the scope.

The guy in the video I linked to above clearly has no trouble seeing individual ovarioles.  However, with my make-shift microscope, I couldn’t see them well enough to count.  To overcome this, I added just a drop of crystal violet dye to the dissection solution (1 x PBS and .14% Triton).  This made the solution a pretty blue-purple color, and allowed me to see individual ovarioles much more clearly (picture below, taken with an iPhone…sorry).  The dye stains the cells, and they really stand out against the background.   Happy dissecting!

This image shows the ovarioles lightly stained with crystal violet.  I teased the individual ovarioles apart so you could see them better.  Normally, they are all stuck together, making up the ovary.

This image shows the ovarioles lightly stained with crystal violet. I teased the individual ovarioles apart so you could see them better. Normally, they are all stuck together, making up the ovary.

This is showing the dissecting solution.  The well in the middle is where I dissect the ovaries.  The well nearest the bottom of the picture is undiluted crystal violet for comparison.  You can see the hole in the base of my scope in this picture as well.  The acrylic sheet is holding the depression slide up, and the ring light (off right now) is directly below the hole.

This is showing the dissecting solution. The well in the middle is where I dissect the ovaries. The well nearest the bottom of the picture is undiluted crystal violet for comparison. You can see the hole in the base of my scope in this picture as well. The acrylic sheet is holding the depression slide up, and the ring light (off right now) is directly below the hole.

How to Calculate Genetic Variance Components, Coefficient of Genetic Variation, and Genetic Correlations in R

When working with quantitative genetic data, it is often necessary to calculate the genetic variance components that are associated with the trait of interest.  Calculating the variance components will allow you to calculate the heritability of the trait, the coefficient of genetic variance for the trait, and genetic correlations between traits.  I am going to describe this procedure for the simplest breeding design–one that involves clonal lines.  To do the analyses that I will describe below, you need to have a working knowledge of the statistical package R, downloadable here:

Description of the data:  My data includes a single response variable that was measured at two time points.  The response was measured on several inbred Drosophila lineages (I will refer to them as ‘lines’).  I measured the response for several individuals for each line, and replicated the experiment 8 times.  For these analyses, each individual that belongs to each line can be considered a clone.  This means that each line represents a family, and each member of the family is genetically identical.  Therefore, the variation in the response measured within each line or family represents the influence of environmental variation on the response.  Likewise, the variation in the response among the lines or families represents the influence of genetic variation on the response.  Because we are interested in things like heritability and genetic variance components, the most important part of this for our purposes is the among line variance.

This is what my data look like:

my data example

my data example

This data is classified as an “data.frame” in R.  The time variable is stored as a factor, the line is the grouping variable, and the response is the trait that we want to calculate variance components for. Because I want to ultimately compare the responses for each line between the two times, I will separate my data into two new data frames.  One for time 1 and one for time 2 (shown below).

Screenshot 2015-03-05 11.09.28

There are two methods that I know of in R to determine the variance components.  The code for each is provided below.

Method 1:  Run a REML model with random effects.  Load the lme4 library package.  If you haven’t used this library before, you may need to install it using “install.packages(lme4).”  The code and output for the model is below. To specify that line is the random variable, it is coded as (1|line).  If you have variables that are not random, you can add them after the random variable.  For example, your code might look like this:  x<- lmer(response~(1|random.variable)+fixed variable,

Screenshot 2015-03-06 14.31.06The genetic variance component is listed in the output above in the section “random effects.”  In my model, the genetic variance component is = 0.04188, and the residual or environmental variance component is 0.05987.

Method 2:  Use the function varcomp.  Load the ape and nlme library packages.  You may need to install these packages (see above).  First, run a model using the lme function and store the model as an object in R.  You will also need to specify a random variable in this model.  Because this package was written by different authors, code for random variables is slightly different (see code below).  If you are wondering why you need to run the model using the lme function instead of the lmer function, the best answer I can give you is that the function that will determine the variance components requires that the model be in a specific format. This is one of the drawbacks of using a statistical package that uses functions written by lots of different people.  Once you have run the model, use the function varcomp to determine the variance components. The code is below.

Screenshot 2015-03-06 14.55.51

You can see that is method calculated variance components that are very similar to those calculated in method 1.  Again, the genetic variance component (for line) is 0.04187855 and the environmental variance component is 0.05987434.  The output for this method is a little cleaner.

OK, now that you have the variance components, you can calculate the heritability, coefficient of genetic variation, and genetic correlation pretty easily.

Heritability:  The heritability of a trait is generally calculated as the proportion of phenotypic variance that is explained by genetic variance (Lynch and Walsh, 1998).  Because of my experimental design, I can only calculate the broad sense heritability, but if you are working with a more complicated breeding design, you would be able to determine the additive genetic variance component using methods similar to those outlined above (see Lynch and Walsh, 1998 for more details).  For this data, the heritability will be calculated as the genetic variance component / (the genetic variance component + the environmental variance component).

Screenshot 2015-03-06 15.08.56The heritability of the response is 0.412.

Coefficient of Genetic Variation:  The coefficient of genetic variation is a standardized measure of dispersion of the data and is calculated as 100*(sqrt of the genetic variance component)/response mean (Felix et al., 2012). If you used method 1 to determine the variance components, you will notice that the mean is also calculated and included in the output.

Screenshot 2015-03-06 15.33.55

The coefficient of genetic variation is 32.27.  The same calculation can be done for the coefficient of environmental variation to determine the degree to which genetic variation fails to explain phenotypic variation in the data.

Genetic Correlation:  The genetic correlation is a measure of how closely the genetic variation in the responses are linked.  So far, I have only provided example calculations for the time1 dataset.  To calculate a genetic correlation for this data, the same calculations need to be done for the time2 dataset to determine the genetic variance component of the response at time 2.  I did this calculation and determined that the genetic variance component is 0.04051.  Additionally, we need to calculate the genetic covariance component between the responses measured at time 1 and time 2.

Now for the disclaimer:  I *think* the following information is correct.  In fact, I’m pretty sure because I was able to verify the genetic correlations using another program (not R).  Please use your own judgment on the following methods.  And please, if you have input, comment below.  I would love to be corrected if I am wrong.

OK, here is what I worked out:

You need to calculate the covariance component, which takes into account the genetic covariance between the response at time 1 and time 2.  Apparently, this can be done using an ANCOVA format.  This can be done using lme and varcomp, similar to method 2 above.  The model is specified as follows:

Screenshot 2015-03-06 16.47.15The line/time variable is what specifies the covariate, I believe.  When the model is fed into varcomp, the following output results:

Screenshot 2015-03-06 16.46.25

The genetic covariance component is 0.02663865.  Once you have the genetic covariance component, the calculation of the genetic correlation is simply the genetic covariance component / sqrt( genetic variance component at time 1 * genetic variance component at time 2).  See below:

Screenshot 2015-03-06 16.50.54The genetic correlation for the response at time 1 and time 2 is 0.65.  In theory, you can do these same machinations for the environmental correlation as well.

Also, if you are interested in the program I used to verify my calculations, it can be found here:  The program is called h2boot, and I couldn’t get it to run on a Mac, but that may be user error.  It runs great on a PC.

p.s. My apologies for the inconsistent sizes of the R output…I can’t figure out how to resize the images without making them more blurry.


Felix, T.M., K.A. Hughes, E.A. Stone, J.M. Drnevich, and J. Leips. 2012. Age-specific variation in immune response in Drosophila melanogaster has a genetic basis. Genetics 191: 989-1002.

Lynch, M. and B. Walsh. 1998. Genetics and Analysis of Quantitative Traits. Sinauer, Massachusetts.…/U10.1-MixedModelExample

My Thoughts on Graduate School

I have neglected this blog for some time, so I will first provide some context on where I am in my education.  I am currently in my sixth semester as a Ph.D. student at Kansas State University in the Division of Biology, program of Ecological Genomics. I study age-related change in cold stress tolerance in the model system Drosophila melanogaster.  At this point in my graduate career, I consider myself to be “nearly finished with data collection,” and “seriously considering my future” in terms of post-doc positions and other job opportunities.  Official estimated time of graduation:  May 2017.

There are a lot of blogs on the Internet offering opinions on how one should approach applying to graduate school.  I am going to offer my thoughts on the subject and contribute to this pool.  One thing is certain–every graduate student has had a unique experience, and some experiences are better than others.  The content below is not meant to be advice per se; it is just a reflection on my own experience.


The decision to apply to graduate programs wasn’t difficult for me.  I really enjoyed my undergraduate experience in research (see other parts of this blog for background), and I didn’t mind the long hours and tedium that is often associated with doing research. I still feel the same way.  At the end of my junior year of undergrad, I knew that I wanted to do research in conservation, population genetics, evolution, and ecology.  This is an important list, and I will come back to it briefly.  My list of graduate programs was similarly focused.  The list of criteria I used for picking the programs I would eventually apply to was:

  1. The program included a PI (Principle Investigator) who led research on conservation, population genetics, evolution, and/or ecology.
  2. The lab I was interested in would build on my current education, expanding to genomics research.
  3. The lab studied amphibians or reptiles.

I ultimately picked out four schools with labs that satisfied one or more of my criteria.  At the time, I had had a few conversations (I think literally 3) with faculty and graduate students about how to go about finding a good program.  The one piece of advice that I still remember was that I should contact the PI to find out if they were interested in taking on a new graduate student.  I took that advice to heart and clung to it like my entire future depended on it.  I emailed the people at my four chosen schools and initiated these conversations. This began my filtering process. One PI was concerned that our interests didn’t match up well. The second PI welcomed my application. The third PI mentioned that I was welcome to apply, but was concerned that I hadn’t taken my GRE yet (this was sometime in the fall of my senior year).  He told me to keep in touch.  The fourth PI never replied.  This essentially narrowed my already small list of grad schools to 2. Concerned that 2 schools wasn’t enough, I threw in one more school. I also realized at that point that I was running out of time to take the GRE.

The GRE is one of those tests that academia clings to as if it is the source of all knowledge and will tell you everything you need to know about the person you are considering hiring.  Some PIs hold this opinion, and others don’t.  It just so happened that, of the 3 schools I ended up applying to, 2 PIs thought my GRE scores were too low.  The only acceptance letter I received was from Kansas State University.

As you know, I decided to join the program at KSU.  After discussing my interests and goals with my PI, we decided that I should skip a Master’s and do a Ph.D. out of undergraduate.  I don’t regret this decision in some ways.  For example, I didn’t miss a beat with the research material and the coursework.  My undergraduate experience and education, coupled with an REU, made the transition to Ph.D. work relatively painless.  However, as an “almost ready to start finishing up” graduate student, I am worried about my ability to get back into conservation research with amphibians and reptiles.  While still within the realm of population genetics, evolution, and ecology, my current research really has very little to do with conservation.  And I study fruit flies.  With regard to my original list of research interests, I still have a ways to go. My hope is that I will be able to redirect through a post-doc position.

As much as I hate to admit it, there is more to graduate school than research. As an undergraduate, I was told, “Oh, you’re going to love graduate school,” on more than one occasion. Reasons cited include: You won’t have to take classes that have nothing to do with your major, you can really dive into your research and immerse yourself in science in a way that you couldn’t before, you will have officemates and lab mates that are interested in similar topics, and you will want to discuss papers and ideas with them, your work on your primary research might inspire cool side-projects that you can publish along the way. The list goes on. Like I mentioned above, everyone’s experience in graduate school is different. I have had a slightly different (and less rosy) experience.

If I were to spew out some sweeping statements about grad school to contemplative students, here is what I would tell them: You will probably have to sit through courses that go over everything you learned (and probably still remember) from undergrad. The only reliable way to come by new information in graduate school is to educate yourself through primary literature and by discovering it through your research. Some programs might require you to take classes that are not actually related to what you are studying (mine did). The first two or so years of your program will be crammed with classes and homework and take home exams, making it surprisingly difficult to dive into your research and immerse yourself in your science. If you have officemates like mine, one will drop out during her second semester, the person who replaces her will be fired, and the third person the department tries to cram into your closet-sized space will decide to work in her own lab space. In other words, you may not actually have officemates. As far as lab mates are concerned, you might find that they become infinitely friendlier after they graduate. If you happen upon the opportunity for a side-project, make sure to fully work out the logistical aspects of the project before you commit to it. These logistical aspects include, but are not limited to experimental design, costs of carrying out the project, funds for covering the costs of the project, and authorship on any resulting papers. Side-projects can turn into hairy monsters if these aspects are ignored. As a last piece of advice, make friends. Don’t do what I did and turn down most of the invitations to hang out with people. They will stop asking you to hang out. Trust me. Even if you are antisocial, it’s good practice for when you need to interview for your next position.

I don’t dislike grad school, but my experience so far isn’t one that I will daydream about down the road. You might read this post as whinny or regretful, but I want to make this clear: I don’t regret coming to KSU and working on my current research project. I do believe that my resulting education in genomics of complex traits will be very useful in the future, and I am up for the challenge of applying what I have learned at KSU to the conservation of amphibians and reptiles.

Things I think I should have done differently:

  1. I should have applied to more than 3 grad schools, even if I hadn’t talked with any specific PIs. Rotations are nearly always an option. I started out with a list of grad schools that was too short, and ended up with no choices when it came to committing to a program (and I’m the type who was afraid to not commit).
  2. I only took the GRE once and I took it too late.  This is a test that can be taken repeatedly.  I didn’t have time to take it again and meet the application deadlines for the schools I was interested in. If schools insist on using this “tool” to filter through students, you might as well work the system to its full advantage.
  3. I was afraid that taking a year off of school and reapplying for the next term would reduce my chances of getting into grad school, and so I took the only opportunity offered.  Looking back, this option (of taking a year off) is probably not as dire as it seemed at the time.
  4. I didn’t discuss the process of applying to graduate schools much with my undergraduate adviser.  I don’t remember if it even ever occurred to me.  One of my attributes is that I am pretty independent.  One of my faults is that I am pretty independent.  I wish I had asked for help.
  5. I went straight into a Ph.D. program instead of starting with a Master’s.  Let me be clear about this:  I wanted to study conservation genetics of amphibians and reptiles.  I currently study the evolution of complex traits in a model organism.  I am worried I won’t be able to transition back to conservation (and herps) in my future research.


Have you ever felt the need to sit and flip a coin 100 times???

If you have, you are either in a Stats class or very bored.  Never mind the fact that it is evident that I have been thinking about flipping coins.  That’s just the result of working too hard yesterday and having nothing to do today.  Anyway, after much deliberation, I’ve come to the conclusion that it’s a lot of work flipping that coin 100 times…what if there was an easier way???  I decided to take my training in R (minimal though it may be) and use it to work out this simple problem for practice.  Much more fun than actually flipping the coin 100 times.  And I actually don’t have a coin anyway.

Here’s how you do it in R:

Coin Flip Simulation


Session 1: Getting Started

Please note that I work on a Mac, so all of the screenshots that are included in this blog will be of the Mac version of the R program.  Fortunately, the program runs very similarly on both Windows and Mac.  I will try to point out the differences where they occur.

To download R, go to

1.  To download R for your personal computer, you must choose a mirror through which to run the program.  As far as I am aware, it doesn’t matter which one you choose. One person recommended the mirror run through Iowa State University, as it seems to run faster.

2.  Install the program.  On a windows machine, a shortcut will appear on the desktop.  On a Mac, drag the R icon from the applications menu to the dock.

3.  Open the program.  The new version on a Mac appears as shown in Figure 1.

Figure 1.  Screenshot of R program on a Mac.

Figure 1. Screenshot of R program on a Mac.

4.  To begin programming, type at the > symbol.  At the end of the command, press enter or return to run the command.  On a Windows machine, commands can either be typed in the Console (pictured in Figure 1) on in the Editor screen.  To open the Editor, go to File, Open, New Document.  To run the commands from the Editor screen on Windows, click the button that looks like it has an arrow between two file cabinets…not the most intuitive button.

The logic behind programming in R rests on the construction of vectors.  A vector has one dimension and is composed of elements.  The elements may be numerical, logical (true vs false), or characters.

Example 1:

I want vector x to be composed of the numbers 5, 6, 7, 8, 9, 10.

The command to construct this vector in R is:


There are several elements to this command.

  1. x is the name of the vector.  If I want to, I could name the vector Victor.  Then Victor would contain the elements 5,6,7,8,9,10.  (But that would be ridiculous.)  For another example, the vector might contain height of 5 humans.  In this case I would name the vector height and set height to contain the elements 1.6, 1.8, 1.3, 1.7, 1.9.
  2. <- is the less than sign followed by the minus sign.  Think of it as an arrow pointing to the name of the vector and indicating what elements are assigned to that vector.
  3. c(…) is the concatenate function.  This is a very commonly used function that groups elements together.

This is what Example 1 looks like in R (Figure 2):

Figure 2.  The first line of code sets the assigns the vector x the numbers 5-10.  To verify the assignment, I typed x in the next line and pressed return.  This gives the third line  [5  6   7   8   9  10] to show the contents of x.

Figure 2. The first line of code sets the assigns the vector x the numbers 5-10. To verify the assignment, I typed x in the next line and pressed return. This gives the third line
[5 6 7 8 9 10]
to show the contents of x.

There are a lot of other ways to accomplish making a vector.  These additional methods follow:

Figure 3.  Additional ways to make vectors.

Figure 3. Additional ways to make vectors.