BME 110 Computational Biology Tools

In-class Exercise, January 8

Using NCBI Resources

  1. Retrieve the sequence with the LOCUS accession number: AAC60746. and give its (a) length, (b) gene name, (c) source species, and (d) the first four letters of it. [Nucleotide or Protein]

    This refers to a protein - also find the gene sequence encoding it and look at all the information you are given in both records!
  2. How many journal publication(s) were authored by Sean R Eddy in the year 1997?  (Hint - use: "eddy sr" AND 1997).  [Pubmed]

    Try a similar search with another query... e.g. check up on one of your professors from another class :-)
  3. How many archaeal genomes are currently fully sequenced? 
    [Start at http://www.ncbi.nlm.nih.gov/Genomes/], choose "Microbial" under Genome Resources...]

    Now see whether you can get to that same information but starting from the NCBI home page again and choosing the [Genome] database!
  4. Who sequenced "Pyrobaculum aerophilum"?

    So now you know where the sequence is from - can you find where the specific "sequence donor" was collected, and a bit more about it?
    [Hint: the information is just one click away from where you are]

    While you are here, read around and learn a bit about why this species is of interest to biologists! What is the optimal growth temperature for this species? If you read the publication abstract you often gain a little additional insight without having to spend much time.
  5. Looking at the "COGs" table, how many genes have "Function Unknown" or are not in a COG?
    [Click on "C" character at the right side of the table listing all species]

    Note: COGs are one way to try to gain an impression of what general life processes are happening inside an organism based on its sequence - in spite of the fact that so many proteins are left unassigned (we'll talk later about why this is) keep in mind that these assignments are made computationally and put on the web automatically. In other words sometimes they can be wrong - so always be on the lookout, unless a functional class in the organism of your interest has many proteins assigned to it there is no guarantee that it is happening. Always use your biological judgment alongside automatic information.
  6. How many P.aerophilum proteins have their best hit to another archaeal protein?  how many to a bacterial one?
    [Click on "T" character at right side back on the table listing from where you linked to the COGs previously]

    Note the systematic gene naming (PAE....) e.g. for the first protein whose strongest similarity is to
    a eubacterial protein.
  7. Give the first 5 letters of the protein sequence with the systematic name "PAE0034"
    [This can be found in many different ways, one of them is to link to the "P" in the table (for ProTable) and take it from there. This will take a few clicks!] Have fun and feel free to ask us anything that you were wondering about during this exercise.