In-class Exercises,
January 22
Complement Receptor 1 (CR1) is an important protein in your immune system. It has homologs in primate species that are quite similar in sequence. In addition parts of its sequence (at protein and/or at DNA-level) are similar to itself, i.e. it contains internally repeated domains. Investigate this using dot plots using the tools Dotlet (a JAVA applet) and JDotter (a JAVA Web Start application).
Note: Dotlet has a nice "learn by example" page - study that first. Personally for use I'd prefer JDotter but, to be honest, we would not use dotplotting (anymore) to investigate a protein (see lecture) and most visual features of these programs seem fairly... useless. Still, please try to get both programs to run; it is important for us to ensure that you can run both JAVA applets and JAVA Webstart applications on your laptops in future exercises (and exams).
Compare the mRNA sequence of human CR1 to itself to see internal repeats (to do this retrieve its nucleotide sequence with accession code NM_000573 from NCBI in FASTA format and use it as input on both axes of your dotplot)
Now plot the human CR1 protein sequence against itself (its accession code is NP_000564).
Finally get plots (always with both tools) that compare the human CR1 protein sequence with that of a protein in chimpanzee that is similar in sequence (the accession code for the chimp protein is XP_001166919).
PSI-BLAST uses the same query form as BLASTP does. Your next problem set (home work 2) also guides you through an example but let's try this one to get started. It will hopefully demonstrate nicely to you that the increased sensitivity of PSI-BLAST sometimes makes it worth the "hassle" of having to interact with the program and go through several rounds of searching.
Use human actin as your query sequence (you could get it through NCBI but, just for fun, why don't you retrieve the FASTA-formatted sequence from SwissProt this time: http://www.expasy.org/swissprot - the SwissProt accession "number" is ACTC_HUMAN). To make sure you can cope with the large lists of homologs you will obtain, change the parameters so that that is not an issue (e.g. so that up to 5000 alignments are displayed). Also, since there are so many homologs we can use a less populated but better annotated database: SwissProt. Just like in the HW2 instructions, I will use the check boxes to select only those sequences in each iteration with E-values better (=lower) than 1E-04. However, please note that you could do this automatically also, by setting the PSI-BLAST inclusion threshold to this value, as suggested at the end of the lecture.
If you run PSI-BLAST now over a few iterations until the search converges (i.e. no new sequences appear with as good or better E-values than your inclusion threshold), you will observe that distant homologs have names that indicate that they play a different biological role than actin... If you don't feel completely familiar with what actin's role in eukaryotic cells is, and what the role of these other proteins are, then have a look at the wikepedia entries for "actin" and for "chaperone(protein)".