Assignment #1: Exploring Pyrobaculum Expression Data
A) Identify co-expressed genes that responded to temporal changes
1) Open the Temporal Response (Temporal_response_genesis.txt) data
set from the file menu in Genesis.
Open the heatmap in the file "Expression Images" on the left. This
shows genes and experimental conditions ordered according to their
statistical ranking in response to temporal conditions.
the "Sort" menu select "sort Genes by unique ID" to order genes as they
occur in the genome. Note that intergenic regions (genes numbered as
PAE.i_###) and some others such as tRNAs and rRNAs will not be listed
in their genomic order, but most other loci will. Scroll through the
ordered list of loci to identify groups of adjacent loci that show
strong co-expression patterns. Compile a list of the three most
strikingly co-expressed sets of adjacent loci.
4) In the
"Clustering Results" folder in Genesis select the "Tree (Complete
Linkage)" icon. Here, loci are clustered hierarchically using the
Pearson correlation, with blue dendrograms at left indication
clustering relationships. Are the clusters you identified in (3) also
clustered together by hierarchical clustering? If not, select three new
clusters of adjacent loci that are most strongly co-expressed.
B) Identify potential operons
5) Open the archaeal genome browser (http://archaea.ucsc.edu
and select the genome browser for Pyrobaculum aerophilum from the
species tree. Find the loci in each of your three co-expressed clusters
using the gene IDs as search terms. Zoom in or out as necessary using
the zoom buttons. Are the loci in each set of co-expressed loci
predicted to be transcribed in the same direction?
"Genes and Gene Prediction Tracks" controls in the lower part of the
browser, set the "ArkinOperons" track to "pack", and hit the "refresh"
button. Based on the orientation of loci, and the Arkin operon
predictions do you think the co-expressed loci in each cluster are
transcribed as operons, or not?
7) Under "Expression and
Regulation" in the browser turn the "Promoter+" and "Promoter-"
controls to "full" and hit "refresh". Where are the strongest promoter
signals among your co-expressed loci? Are these consistent with your
predictions for operonal or independent transcription?
C) Assign putative gene functions
"Genes and Gene Prediction Tracks" in the browser turn "Pfam domains"
to "pack" to visualize conserved domains in genomic context. Click
individual genes to open pages showing RefSeq gene
record each of these along with any associated Pfam or interpro domains.
From each RefSeq page click the "NCBI Blast Hits button". Are
annotations of the strongest Blast hits consistent with the P.
aerophilum RefSeq annotation? If not, why do you think there are
10) Click on the "Conserved Domain Database
hits" entry at the top of the Blast list. If there are any conserved
domains are these consistent with the P. aerophilum RefSeq
you assign putative functions for each of your three co-expressed
clusters based on annotations or conserved domains? Can you relate
these to time-dependent changes in cell cycling or growth conditions?
C) Use gene expression profiles to cluster EXPERIMENTAL CONDITIONS
12) in the "Distance" command menu of Genesis select "Pearson
13) in the "Analysis" command menu select "Calculate Hierarchical
Clustering". Or just hit the "HCL" button.
select "Complete linkage clustering", and check both "cluster genes"
and "cluster experiments". Hit "OK". Click on the new "Tree (Complete
Linkage" in the "Clustering Results" file. Dendrograms at the top
clustering of experimental conditions. Dendrograms at the left show
clustering of loci. Note that the view can be manipulated by selecting
(click on the dendrogram to select) a subset of conditions or genes
that are clustered together, then right-clicking your mouse and
selecting "Flip sub-tree".
15) Which experiments cluster together? Can you draw any
conclusions from this?
Go back to step 2 and repeat the above clustering steps using each of
the different distance metrics in the "Distance" command menu. If you
lose track of which tree is which the distance metric used for each is
listed in the folder "General Information".
17) Is there a
consistent pattern of experiments that cluster together under most
distance metrics? Does this tell you anything about the data set? Are
there any distance metrics that produce markedly different results from
18) In the file menu select "Save project" to
save your clustering results. The saved project consists of a .txt file
and a matching .xml file.
19) Open the Respiratory Response
(Respiratory_response_genesis.txt) data set from the file menu in
Cluster experimental conditions with similar expression profiles using
a) the Pearson correlation (same as #11 above), and b) using Euclidean
distance. Which experiments cluster together now? How does this compare
with the clustering pattern obtained with the temporal response data?
What does this say about the statistical analyses used to separate loci
that responded to respiratory conditions from those that responded to
21) Can you identify any loci that are
in both the temporal response dataset and the respiratory response
dataset? If so, how can you explain the occurrence of some loci in both
22) Open the complete set of expression profiles for
all loci in the P. aerophilum genome (All_loci_genesis.txt) from
file menu in Genesis. Repeat the clustering of experimental conditions
using a) the Pearson correlation and b) Euclidean distance. Which
experiments cluster together now? What does this say about prevalent
patterns genome-wide, i.e. does the clustering of experiments using the
whole genome data set more strongly reflect temporal responses,
respiratory responses, or neither?