BME 110 Computational Biology Tools

In-class Exercises, April 3

Part A - Using NCBI Resources

1. Retreive the sequence with the LOCUS accession number: AAC60746, and give its (a) length, (b) gene name, (c) source species, and (d) the first four letters of it.   [Nucleotide or Protein]
2. How many journal publication(s) were authored by Sean R Eddy in the year 1997?  (hint use: "eddy sr" AND 1997).  [Pubmed]
3. (a) How many bacterial genomes are currently fully sequenced?
    (b) How many archaeal genomes are fully sequenced?
[Start at http://www.ncbi.nlm.nih.gov/Genomes/], choose "Microbial" under Genome Resources...]

4. Who sequenced "Pyrobaculum aerophilum"?
5. Looking at the "COGs" table, how many genes have "Function Unknown" or are not in a COG?
[Click on "C" charcter at right side of table listing all species]

6. Is this species an archaeon, bacterium, or eukaryote?  What is the optimal growth (Opt. Temp) temperature for this species?
[click on the species name to get a full information page]

7. How many proteins have their best hit to another archaeal gene?  How many to a bacteria?
Give the systematic gene name (i.e. PAE0001) for the first gene who's strongest similarity is to
a bacterial gene.
[Click on "T" character at right side of table listing]

8. Give the first 5 letters of the protein with the systematic name "PAE0034"

Part B - Using the UCSC Genome Browser

Using the July 2003 assembly of the human genome browser

1. Find the location of the "fibrillarin" gene -- what chromosome does it reside on?
[type this string in the "position/search" window, and look at hits under "Known Genes"]
2. Click on link to genome position, and count how many exons are in this gene.
3. Zoom out 3X and tell me what is the name of the gene closest to fibrillarin?
[Use the gene name under the UCSC Known Genes II track]
4. What is the "Genomic Size" of this gene?
[Click on the blue track for this gene, and it will bring up a "details" page -- click on "Outside Link"

5. What does this gene do in the cell?
[See "RefSeq Summary" on details page]

Using the Archaeal genome browser for Pyrococcus furiosus, answer the following questions:
1.  How long is this genome?  How many genes are in this genome?
[After selecting the species name in the drop down menu for "genome" this information will appear on the lead page]
2.  What are the genome coordinates for "PF0250"?
[type this name in the position/search window]
3. What COG does this gene belong to?
[Click on pink gene bar to go to details page]
4. What are the first four letters of the protein encoded by this gene?
[Click on "Predicted Protein" under the "Links to sequence" at the bottom of the details page
5. What is the species name and gene name for the strongest protein hit outside this species?
[Go back to the track view, and click on any bar under "Conservation of proteins with BLASTP"
the hits are ranked from top to bottom, best hit at the top]
6. How many species have a strong DNA sequence alignment with this gene?
[Click on Dark peak at the bottom of the track window under "Pyrococcus 4-way multiz alignments"]

7. What is the gene name and the genome coordinates for the best hit against the species "Sulfolobus tokodaii" in that genome?
[Can use BLAT or Conservation of proteins with BlastP track to get answer]