BME 110 Computational Biology Tools

Study Section Practice Questions April 22-24

Instructions: Use the sequences at the bottom of the page for the Study Section Practice Questions (seq-A, B).
Similar problems with be assigned for problem set #2 with different sequences.

1.  You notice that there are not many strong hits to a particular protein of interest: (SeqA).  Use PSI-BLAST to find more hits than possible with BlastP alone (all default parameters).
(a) How many have an initial e-value of 1e-5 or smaller? 
(b) Use the default inclusion cutoff, and run iteration 2.  How many new proteins could be found with an evalue better than 1e-5 on this iteration?  How many iterations do you have to repeat until no new sequences can be found?
(c) Click on the "Distance tree of results" at the bottom of the page to see a phylogenetic representation of all the hits found.  Is this protein unique to this species or just Crenarchaea, just Archaea, just Archaea & Bacteria, or is it found in all three domains of life?

Using ClustallW to Align Proteins & Jalview to Edit them

2.  Use BlastP with SeqB to collect sequences for an alignment.  You want likely homologs, so search with an E-value of 1-e10, Word size = 3, BLOSUM80 scoring matrix, and restrict your search to "archaea (taxid:2157)" (use the "Organism" box under Choose Search Set, incombination with Database = nr), with the rest defaults.
To get a good collection of sequences, choose the top 11 hits  (including the identity self-hit) by checking off boxes, and then click on "Get selected Sequences" just below the summary list of hits.
On the next screen, make sure you have 11 sequences, then save them to a FASTA file.  Copy them into a text editor, and change the names to something helpful - name them by the first letter of the genus, and the first four letters of gensu (i.e the sequence from Thermococccus kodakorensis should be  named "Tkoda", or if there is more than one sequence for a species, "Pfuri_1", "Pfuri_2").

Go to the ClustalW Web site, and enter your 11 sequences.  Use the defaults (Interactive, Alignment=full, score-type=percent, matrix=blosum). Leave the phylogentic tree options off for now. Submit. 

(a) From the score table, which two sequences were most similar among the 11 (you can sort by score/identity)?
Which were most distant?
(b) Knowning how the ClustalW algorithm works, which pair will be aligned first in the process of actually building the mulitple sequence alignment?
(c) Save your alignment file (*.aln) so you can load it in the stand-alone version of Jalview.  Give the "Alignment score" (at the top of the ClustalW page) and paste in your Guide Tree file (*.dnd) so we know how your tree construction looks).  Remember, you should have 10 sequences.
(d) Load your aligned sequences into Jalview, and color them with Clustalx colors.  Now, in preparation for making a gene tree,
delete all columns in which more than half of the sequences have gaps inserted, and delete leading and trailing parts of the alignment
which have fewer than 5 sequences. 
(e) You decide the sequences from P.islandicum and T.neutrophilus do not belong in the alignment.  Delete just these sequences.
(f)  Now, you want to highlight sequence columns with 30% or better conservation.  Adjust the cutoff under "Colour->By Conservation".
(g) When done, save your alignment in (1) jalview *.jar format (Save As) (2) In Clustal format (*.aln),
and (3) an (Export Image) in PNG format.  Keep these files for futher phylogenetic analysis.


Use these sequences for the Study Section:
>SeqA (study section)
MGVEICRSLLECLGALGRSQRLYAAAGLVDEEGLEAASRAAGELRVLVGD
SGPVPRPVYERWREVVRVYPSLHAKFYIFAEDAGPSAALVGSADLTAGGL
RGNLEAVVLIRGEAARPLADMFNRLWARALPLTEDYVADWEGPEEALRKP
WGEAVKRANERLAEILGVSAHCLSRHDPLNCARLVARAVRSRFEGCGDLP
ENCAARATGVSAKALLSAPPSAVLAGHYVCWARALAARLLEGKVGRLDSG
MEAYEAAVQAGAESCWGEAKRAAEEELERLEDSNYRDNYVRWPIPYRLLF
LAMTLPATGCRILGREVRTKKRGVARVERELYC

>SeqB (study section)
MKAVILAGGFGTRLRPISSTRPKPMVPVLGKPNLQYLLENIEKIPEIDEV
ILSVHYMRGEIREFIDEKMADYPKDIRFVNDPMPLETGGALKNVEEYVSD
DFLVIYGDVFTNFNFRELIEAHRNNDGLITVAVTKVYDPERFGVVETDEN
GKVTHFEEKPHRPKTNLVDAGIYVVNKKVLEEIPKGKEVYFEREVLPKFV
ARGEVYAYRMPRDAYWVDLGTPDDLFYAHQIAMDEIAKDNGYITIKEGAE
VPDDVEIQGPVYIDEGAKIGHGVKIKAYTYIGPNTIVEDKAYLKRSILIG
SDIIKERAELKDTILGEGVVVGKNVIIKENAVVGDYARIADDLVIYGAKV
LPWKKVEEYEAYIKVKLDPTKVRPGQYPDHCPLGLPECIYKKFKAIAGEK
PPCDECIENQWLF