BME 110 Computational Biology Tools

In-class Exercises, January 22

  1. Get JAVA tools for dot-plotting to run on your computer

    Complement Receptor 1 (CR1) is an important protein in your immune system. It has homologs in primate species that are quite similar in sequence. In addition parts of its sequence (at protein and/or at DNA-level) are similar to itself, i.e. it contains internally repeated domains. Investigate this using dot plots using the tools Dotlet (a JAVA applet) and JDotter (a JAVA Web Start application).

    Note: Dotlet has a nice "learn by example" page - study that first. Personally for use I'd prefer JDotter but, to be honest, we would not use dotplotting (anymore) to investigate a protein (see lecture) and most visual features of these programs seem fairly... useless. Still, please try to get both programs to run; it is important for us to ensure that you can run both JAVA applets and JAVA Webstart applications on your laptops in future exercises (and exams).

  2. Run a PSI-BLAST search at NCBI

    PSI-BLAST uses the same query form as BLASTP does. Your next problem set (home work 2) also guides you through an example but let's try this one to get started. It will hopefully demonstrate nicely to you that the increased sensitivity of PSI-BLAST sometimes makes it worth the "hassle" of having to interact with the program and go through several rounds of searching.

    Use human actin as your query sequence (you could get it through NCBI but, just for fun, why don't you retrieve the FASTA-formatted sequence from SwissProt this time: http://www.expasy.org/swissprot - the SwissProt accession "number" is ACTC_HUMAN). To make sure you can cope with the large lists of homologs you will obtain, change the parameters so that that is not an issue (e.g. so that up to 5000 alignments are displayed). Also, since there are so many homologs we can use a less populated but better annotated database: SwissProt. Just like in the HW2 instructions, I will use the check boxes to select only those sequences in each iteration with E-values better (=lower) than 1E-04. However, please note that you could do this automatically also, by setting the PSI-BLAST inclusion threshold to this value, as suggested at the end of the lecture.

    If you run PSI-BLAST now over a few iterations until the search converges (i.e. no new sequences appear with as good or better E-values than your inclusion threshold), you will observe that distant homologs have names that indicate that they play a different biological role than actin... If you don't feel completely familiar with what actin's role in eukaryotic cells is, and what the role of these other proteins are, then have a look at the wikepedia entries for "actin" and for "chaperone(protein)".