Homework 1 (30 pts
total)
Using NCBI Resources, UCSC Genome Browser, and the UCSC Archaeal Browser
You are assigned to re-analyze the genome of Sulfolobus
acidocaldarius DSM639, but you don't know
anything about it.
Using the resources we practiced in class, answer the following
questions:
1. What domain of life, and sub-domain is this species
from? (2 pts)
2. What are its favorite
growth conditions? Give its (4 pts)
(a) optimal growth
temperature range,
(b) oxygen requirements: aerobic/anaerobic,
(c) what it uses for respiration (i.e. what it "oxidizes" for energy),
(d) what is the optimum pH range for growth?
3. In reading the abstract associated with this genome
sequence, why do these researchers believe this genome's stability and
organization is so different from the two other Sulfolobus species
previously sequenced? (2 pts)
4. What is the systematic name (i.e. Saci_0001) for "reverse
gyrase"
in S. acidocaldarius?
Give
its genome coordinates. (2pts)
5. What Biological Processes (GO: terms) are associated with
this gene? (2 pt)
6. Give the protein sequence for this gene in FASTA
format. (2 pts)
7. What is the name of the species and the systematic gene
name that is
most similar to this one? (give your evidence) (2 pts)
8. How many species (and what are their names) have genomic DNA alignments for this gene? (2 pts)
---------------------------------------------------------------------------------------
9. Using the UCSC genome browser (genome.ucsc.edu), tell me (3
pts)
(a) the chromosome,
(b) genome
coordiates, and
(c) name (i.e. DK....) of the dyskerin gene in the human
genome, (March 2006 assembly).
10. According to the "RefSeq Genes" track in the human genome
browser,
(4 pts)
(a) how many exons does this gene have?
(b) how long is the gene in the genome (Genomic Size)?
(c) how long is the mature spliced mRNA (see mRNA/Genome Alignments
size)
(d) what are the first five amino acids of this protein?
11. Turn on the "RNA Genes" track. Are there any RNA
Genes hidden in the introns of this gene? (2 pts)
If so, what are their names?
12. Given the following partial protein sequence, tell me the
gene name of the top-scoring hit,
chromosome where it is found, and disease associated with this gene
(use
BLAT in the UCSC human genome browser, March 2006 assembly) (3
pts)
>Protein-q11
RVNHCLTICENIVAQSVRNSPEFQKLLGIAMELFLLCSDDAESDVRMVAD
ECLNKVIKALMDSNLPRLQLELYKEIKKNGAPRSLRAALWRFAELAHLVR
PQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFGNFANDNEI
KVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGL
LVPVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVS
PSAEQLVQVYELTLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAV
GGIGQLTAAKEESGGRSRSGSIVELIAGGGSSCSPVLSRKQKGKVLLGEE
EALEDDSESRSDVSSSALTASVKDEISGELAASSGVSTPGSAGHDIITEQ
PRSQHTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAVPSDPAMDL
NDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLGLQIGQP
QDEDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFV
LRDEATEPGDQENKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGK
NVLVPDRDVRVSVKALALSCVGAAVALHPESFFSKLYKVPLDTTEYPEEQ
YVSDILNYIDHGDPQVRGATAILCGTLICSILSRSRFHVGDWMGTIRTLT
GNTFSLADCIPLLRKTLKDESSVTCKLACTAVRNCVMSLCSSSYSELGLQ
LIIDVLTLRNSSYWLVRTELLETLAEIDFRLVSFLEAKAENLHRGAHHYT
GLLKLQERVLNNVVIHLLGDEDPRVRHVAAASLIRLVPKLFYKCDQGQAD
PVVAVARDQSSVYLKLLMHETQPPSHFSVSTITRIYRGYNLLPSITDVTM
ENNLSRVIAAVSHELITSTTRALTFGCCEALCLLSTAFPVCIWSLGWHCG
VPPLSASDESRKSCTVGMATMILTLLSSAWFPLDLSAHQDALILAGNLLA
ASAPKSLRSSWASEEEANPAATKQEEVWPALGDRALVPMVEQLFSHLLKV
INICAHVLDDVAPGPAIKAALPSLTNPPSLSPIRRKGKEKEPGEQASVPL