Assignment #1: Exploring Pyrobaculum Expression Data


A) Identify co-expressed genes that responded to temporal changes

1) Open the Temporal Response (Temporal_response_genesis.txt) data set from the file menu in Genesis. 
2) Open the heatmap in the file "Expression Images" on the left. This shows genes and experimental conditions ordered according to their statistical ranking in response to temporal conditions.
3) In the "Sort" menu select "sort Genes by unique ID" to order genes as they occur in the genome. Note that intergenic regions (genes numbered as PAE.i_###) and some others such as tRNAs and rRNAs will not be listed in their genomic order, but most other loci will. Scroll through the ordered list of loci to identify groups of adjacent loci that show strong co-expression patterns. Compile a list of the three most strikingly co-expressed sets of adjacent loci.
4) In the "Clustering Results" folder in Genesis select the "Tree (Complete Linkage)" icon. Here, loci are clustered hierarchically using the Pearson correlation, with blue dendrograms at left indication clustering relationships. Are the clusters you identified in (3) also clustered together by hierarchical clustering? If not, select three new clusters of adjacent loci that are most strongly co-expressed.

B) Identify potential operons

5) Open the archaeal genome browser (http://archaea.ucsc.edu/) and select the genome browser for Pyrobaculum aerophilum from the species tree. Find the loci in each of your three co-expressed clusters using the gene IDs as search terms. Zoom in or out as necessary using the zoom buttons. Are the loci in each set of co-expressed loci predicted to be transcribed in the same direction?
6) Under "Genes and Gene Prediction Tracks" controls in the lower part of the browser, set the "ArkinOperons" track to "pack", and hit the "refresh" button. Based on the orientation of loci, and the Arkin operon predictions do you think the co-expressed loci in each cluster are transcribed as operons, or not?
7) Under "Expression and Regulation" in the browser turn the "Promoter+" and "Promoter-" controls to "full" and hit "refresh". Where are the strongest promoter signals among your co-expressed loci? Are these consistent with your predictions for operonal or independent transcription?

C) Assign putative gene functions

8) Under "Genes and Gene Prediction Tracks" in the browser turn "Pfam domains" to "pack" to visualize conserved domains in genomic context. Click on individual genes to open pages showing RefSeq gene annotations, and record each of these along with any associated Pfam or interpro domains.
9) From each RefSeq page click the "NCBI Blast Hits button". Are the annotations of the strongest Blast hits consistent with the P. aerophilum RefSeq annotation? If not, why do you think there are differences?
10) Click on the "Conserved Domain Database hits" entry at the top of the Blast list. If there are any conserved domains are these consistent with the P. aerophilum RefSeq annotation?
11) Can you assign putative functions for each of your three co-expressed clusters based on annotations or conserved domains? Can you relate these to time-dependent changes in cell cycling or growth conditions?

C) Use gene expression profiles to cluster EXPERIMENTAL CONDITIONS

12) in the "Distance" command menu of Genesis select "Pearson correlation"
13) in the "Analysis" command menu select "Calculate Hierarchical Clustering". Or just hit the "HCL" button.
14) select "Complete linkage clustering", and check both "cluster genes" and "cluster experiments". Hit "OK". Click on the new "Tree (Complete Linkage" in the "Clustering Results" file. Dendrograms at the top show clustering of experimental conditions. Dendrograms at the left show clustering of loci. Note that the view can be manipulated by selecting (click on the dendrogram to select) a subset of conditions or genes that are clustered together, then right-clicking your mouse and selecting "Flip sub-tree".
15) Which experiments cluster together? Can you draw any conclusions from this?
16) Go back to step 2 and repeat the above clustering steps using each of the different distance metrics in the "Distance" command menu. If you lose track of which tree is which the distance metric used for each is listed in the folder "General Information".
17) Is there a consistent pattern of experiments that cluster together under most distance metrics? Does this tell you anything about the data set? Are there any distance metrics that produce markedly different results from the majority?
18) In the file menu select "Save project" to save your clustering results. The saved project consists of a .txt file and a matching .xml file.
19) Open the Respiratory Response (Respiratory_response_genesis.txt) data set from the file menu in Genesis.
20) Cluster experimental conditions with similar expression profiles using a) the Pearson correlation (same as #11 above), and b) using Euclidean distance. Which experiments cluster together now? How does this compare with the clustering pattern obtained with the temporal response data? What does this say about the statistical analyses used to separate loci that responded to respiratory conditions from those that responded to temporal conditions?
21) Can you identify any loci that are in both the temporal response dataset and the respiratory response dataset? If so, how can you explain the occurrence of some loci in both sets?
22) Open the complete set of expression profiles for all loci in the P. aerophilum genome  (All_loci_genesis.txt) from the file menu in Genesis. Repeat the clustering of experimental conditions using a) the Pearson correlation and b) Euclidean distance. Which experiments cluster together now? What does this say about prevalent patterns genome-wide, i.e. does the clustering of experiments using the whole genome data set more strongly reflect temporal responses, respiratory responses, or neither?