Downloading a gene list from a sequence interval in FlyBase

This is a quick and easy way to download a list of genes that fall between an interval that you may be interested following QTL mapping, for example.

  1. First, go to FlyBase.org
  2. Under Tools, choose Genomic/Map Tools, and then choose CytoSearch.
  3. In the top option menu on the CytoSearch page, choose sequence region.
  4. Enter the sequence coordinates, but don’t forget to convert them. Your coordinates may match an earlier release of the genome. To convert, go to Tools, choose Retrieve/Convert Tools, and then choose coordinates converter. There, you can enter the coordinates you are starting with and pick the appropriate conversion. For example, if your coordinates are from release 5, you should first convert them to release 6.
  5. Back on the CytoSearch page, check all of the options you need once you have the correct coordinates.
  6. Click submit query, which redirects you to the list of genes. You could stop here if this is all you need.
  7. To download the list, click HitList Conversion Tools (at the top right above the list).
  8. A window will popup for a moment called Export Batch to Download. If you wait too long, it might disappear. Choose Genes (or whatever option you want). This redirects you to a the batch download page.
  9. Here, next to Field Data, click the format you want to download the file to. Tab-separated is a good choice.
  10. Change where you want the data exported to under the Send Results To option menu.
  11. Click Select Fields (lower right of search box).
  12. Choose the field options you want in the new page that opens.
  13. At the bottom or top of the page, click Get Field Data when you have selected all desired options.
  14. The file should download quickly and is readable in R as a .txt file.

Some things to keep in mind: FBgn numbers have changed for many genes as new information on the sequence comes available and for other reasons that are much more arbitrary. If you are looking for a way to compare the list you have just generated to a list from another study, you may need to use the FBgn aliases that are an option in step 12 to generate a look-up table of name synonyms. A good place to start for figuring out how to do that is with match() and %in% functions in R.  Also, some of the gene names for Drosophila have ‘ (apostrophes) in them and R will choke on .txt or any other type of file that has them in the data. You have to open the .txt document in an editor and remove them before reading the file into R. After this step, it R shouldn’t have a problem reading it in.

Cheers.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s