BME 110 Computational Biology Tools

Homework 2 / Study Section Practice Questions

Using NCBI Resources (including BLAST, ORF Finder), UCSC Genome Browser, and the UCSC Archaeal Browser, ExPASy Tools, Primer3 and the Biololgy Workbench at SDSC to analyze these sequences.

Instructions: Use the two groups of sequences at the bottom of the page for the Study Section Practice Questions (seq-A, B, C, etc) and for the homework problems (seq-1, seq-2, seq-3, etc).  You will not get homework credit if you turn in answers for the study section sequences.

Using Various Flavors of BLAST

1.
  Use the mystery-seq-A/seq-1 when searching the non-redundant (NR) database, using no filters, using the programs indicated below:
(a) What species and domain of life is this sequence derived from?  What program is best-suited to this purpose, assuming you will get a 100% identity match? Why?

How many hits, and what is the most distantly related species (eukaryote, bacterium, or archaeon) and max identity of that sequence and found using the following programs, and an Expect Threshold = 1e-5, database NR, and all other parameter defaults?:
(b) Megablast (Word size=28)?
(c)  BlastN  search wordsize = 15)? 
(c) BlastN search (wordsize = 7)?
Please show the hit information for the most distantly related sequence for each of these searches.
Hint:  Rember, you can click on "Edit and resubmit" at the top of your BLAST search results page
to go back to repeat a search on the same sequence with modified parameters.

2. Using a more sensitive version of Blast than BlastN, tell me if a homolog of the same gene as in question 1 probably occurs in the human genome (remember: you can limit the search to only human sequences!). 

(a) Tell me which program you used, what is the best scoring matrix to use, any non-default search parameters you used.  
(b) If you find a hit, what are the e-value & % identitiy of the top hit you may find in the human genome?  What is your evidence / criteria for believing this is a true homolog?  Please show the top hit score info.

3. Grab the full-length protein sequence (not just the part with a BLAST hit) of your best hit to the human genome in question 2, and use it to find the coordinates of the gene in the March 2006 human genome assembly in the UCSC Genome Browser (if there is more than one that is >95% identical, just give me the top hit).
(a) Give the coordinates, chromosome, gene name, and number of exons according to the RefSeq track. 
(b) Look at the prediction for the UTRs (untranslated regions) for this gene.  Focussing on the "RefSeq Genes" track, is the UTR larger on the 5' or 3' end of this gene?
(c) You suspect that known classes of repetitive elements have been involved in the evolution or regulation of this gene.  Turn on the "Repeat Masker" track (one of the last tracks at the bottom of the page).  What is the most numerous class and family of repeititve element? (i.e.Class:  SINE, LINE, LTR, DNA, Simple, etc.?  Family: Click on a few of the repeat elements in the genome browser to find out the family classification).   Do any of these repeitive elements overlap any of the predicted protein-coding exons?

4.  You notice that there are not many strong hits to a particular protein of interest: (mystery-seq-B/seq-2).  Use PSI-BLAST to find more hits than possible with BlastP alone (all default parameters).  Use Expect threshold = 0.1, Word size =2, rest default parameters
for your search, database = NR.
(a) How many have an initial e-value of 1e-5 or smaller in the first (blastp) iteration? 
(b) Use the default inclusion cutoff, and run iteration 2.  How many new proteins could be found with an evalue better than 1e-5 on this iteration?  How many iterations do you have to repeat until no new sequences can be found?

Update: If it does not converge after 6 iterations, stop (answer “more than 6”).�
(c) Click on the "Distance tree of results" at the bottom of the page to see a phylogenetic representation of all the hits found.  Is this protein unique to this species or just Crenarchaea, just Archaea, just Archaea & Bacteria, or is it found in all three domains of life?

Update: (d) Based on the evalue and percent identity against the top scoring hit of “newly-found” sequences,

do you think these hit sequences are orthologs (same function), paralogs (related function), or have unrelated

function?


Calculating Protein Parameters

5.
  Using the tool "ProtParam" at the ExPASy tools page mentioned in class for mystery-seq-C / seq-3 give (a) the length in amino acids, (b) Molecular Weight, (c) Isoelectric point (pI),  (d) the most frequent single amino acid, (e) the least frequent amino acid, (f) by the Instability Index, is this a stable or unstable protein? (g) By the grand average of hydrophobicity (GRAVY), is this not hydrophobic (values <0 ) slightly hydrophobic  (values 0-0.5) or highly hydrophobic (values > 0.5)?

6.  Psort is a useful program for predicting the topology of a proteins by running analyses with a number of different classification programs,  These programs look for various signals that might tell where a protein occurs in the cell.  Using the PSORTb v.2.0, (Gram stain = positive option), analyze mystery-seq-D / seq-4, and tell me where in the cell it is predicted to occur, and which program(s) reported predictions.  If the protein  is predicted as cytoplasmic, give the number of transmembrane helices found.

Using NCBI Orf finder:

7. Using the NCBI ORF finder, look for all the ORFs in all reading frames in mystery-seq-E / seq-5. 
(a) How many ORFs are found with a minimum size of 300 nucleotides?
(b) Do these confict (overlap) with each other?  Use BlastP and CDD search (part of BlastP run) to determine which of the overlapping ORF(s) most likely to be real based on similarity other proteins in the NR database.
(c) Ok, now that you've narrowed this down, these ORFs do not cover the entire region.  You know that this is a prokaryotic genome, so genes should be well-packed together.  Drop your minimum ORF size to 100 nucleotides, and tell me how many ORFs you find now?
(d) Give the coordinates for the best set of non-overlapping ORFs (it is OK if they overlap by less than 10 nucleotides) (You decide what the best minimum ORF size is)


Basic Sequence Analyses using the SDSC Biology Workbench

8.  Use the SDSC Workbench to determine basic statistics about the mystery-seq-E / seq-5. 
Using the NASTATS program, find the length, "G + C" content in %, and what is the most numerous nucleotide (A,C,G,T).

Making PCR Primers with Primer3
 
9.  You decide you want to do a northern blot to check for expression of longest ORF from question 7:

  Using the Primer3 website, design PCR primers (select Pick left primer and Pick right primer) to make a probe for your northern blot.  Use a GC content range of 40-70%, a Primer Tm of 45-55 (optimum 50), Primer sizes 18-22 bp (optimium 20bp), and Product Size Range of 250-800 (only), and use defaults for the rest of the parameters.
(a) Give the top primer pair found (sequences of the left and right primer), the product size, Tm for each primer, and GC% for each primer.
(b) What single parameter (too many Ns, bad GC%, Tm too low, poly X, etc.) caused the greatest number of primers considered to be made unacceptable?

Use these sequences for the Study Section:

>mystery-seq-A
ATGACCTTCGTCTACCCCGACGCCATTGTCTCCACCGCCGGCCACGTGGA
CCACGGCAAGACGCAGACCACCTACGCCCTCTCCGGAGTCTGGGTCATGA
GGCACAGCGAGGAGATCAAGAAGGCCATGACGATAAAGCTCGGATACGCC
CAGGTGGGCATATACGACTGCGGCGATGAGTACTACTACAGCGATGGCTT
GCTCCAAGACGGCAAGTGTCCCAACGGCAACGAGCCGAAGCTTGTCAGAA
TGATATCCCTCCTCGACGTGCCGGGGCACGAGGTCCTAGTGGCCACCATG
GTCTCCGACGCAGCGGTGGTGGACGGCGCCGTGATGGTGGTGGATGCCTC
CCAGCCCGTGCCCCAGCCCCAGACGGCCGAGCACTTCGCGGTGCTAGACG
TAATCGGCGTGCGGCACATGGTAGTGGCGCAGAACAAGATTGACCTCGTC
ACGAGGGAGAAGGCCATTGAAAACTATCAGCAGATAAGGGAGTTCCTCAG
AGGGACGTGGGCCGAGAAGGCCAAGGTGGTGCCCATCTCGGCCCTCCACA
AGGTGAACATCGACGTCCTCGCCACATACCTCGCCAAGGAGATACCGAGG
AGGGAGGCTGAGACGGGGAAGCCCGCCAGGTTCTCGGTCCTCAGAAGCTT
CAACGTCAACCCCCCGGGCACCCCGCCTGAGAAGCTCAGAGGAGGCGTGC
TAGGCGGCACTCTGCTACAGGGGGTCCTCCGCGTCGGCGACGAGATTGAG
ATTAGGCCAGGCCTCAAGCTGGAGAAGGGCTACCAGCCCATTCGCACAAA
GGTCCTCAGCATTGAGTACAGCGGGCACAAGGTAGAGGAGGCGAGGCCGG
GAGGCCTCGTGGGGATAATGACGGGCCTAGACCCGGCCTTGACCAAGGCC
GACGCGTTGGCAGGCGCCGTGGCTGGGAAGCCGGGCACTCTGCCCCCCGT
GTGGACCGCCGTGGAGATTGAGACAAGGCCCATGCCCCGCATGGGCGGCG
AGAAGCCCGAGCCGATTAAGCAAGGCGAGACTGTGCTCGTGGCCGTCGGC
CCCGCCACGGTCTTCGGCGTGGTGCAGTCGGTGAAAAAAGACGTGGCCAC
GGTGGCCCTTAAGAAGGCTGTATGCGCAGAGCAGGGAGCCAAGGCTGTGC
TCATTCGACAGGTAAAGACTAGGTGGATTGTTACTCACTACGGCGTGCTC
AGAGGCGGCACGGTGGCCCTAGAGTGA

>mystery-seq-B

MWPFPGPYNILYSDSWPLLGVVFISLGVASWFNHIQKPVFYLYAGLSLPI
FIYGVAIAYFHLTQEPEIAAALFMFVGLAGLLSPLLTMGKAGRGAAYLII
AILVVAAIIALFLGINSTFAHIPRWAKWSPWYGKVVVSG
>mystery-seq-C
MKIYLPAYGGNRFYLVGGSITRQDNVNVLILTVLRQAEIRELTPAQQGGS
AVAVDQSPVAEFTYTVSFDIDYLADRLCGLCRSSTGRRNFDICGYYVYRN
GEAGWLSYAVLFLTSCSERQSESQTYAVVKNFGRRGEPWIVAITRNIPLA
KGRSERFQLTISGRHASWSFTATVDAVEKLNVFGKAFVDEVSSFSRSQCL
GGTLELGEAVVEAIYMGSLISLRTKDRAAYLPVVVHYDIVGSSSPRTLHP
SAKYYRIKLNISDDLLYSNVRSVLLDSDKKVAVKLLLQNVVAGSVFRAFG
CGRWVTEREISASIFGREIVITTPIISRILGIYSEIYGSGANYSGSGGLR
SLLKRWQNAVVKYKCRDLSISQERQRWIDYFTDCIENSQTPNDFRSCLMT
RFQYTGRLECRLIESRSKELFAALTLYDTLAFSAILLGSHGVAHLLGKSA
GLGRDEFSERIAVKLNPRKLPMLNCAMFGWESYEPLNIDGLFNIEVNTSL
NSVAEVEITVYDLKGTSMMNCQDLSSVLGKLLQDIGGCGSPRDACKRAWE
EESRRVERMYRVIAAQDQLASSLDQELKRRPQEVTPPRDIFRFMLRDIIR
GIGASAGQSSSKILQYVWPRHIPSCSDGCQRRVLMQRNLCVYPPLHEEMK
VSKRMAIMFLRMLCPQAPQQ

>mystery-seq-D

MRIPLPLLDVVLTLVTAFAIGAVVMWISGYDPIMSYRSMLFTPLLDYTYF
FSALAFSAPIVLTGLTFAVGLRAGLFNIGAEGQVYMGALGAVVAAYITKS
SLALPLALFIGLTLACVWSLVPALLKIWRGVNEVITTIMMNWIAYWVVIM
AVSTVFTNPLQPEESIKTPEPARLTPLVAGTDLTAAVPLSYIIALFVYIF
VKYSVWGYRISISGQNPIVAKSYGIEPMRSIMLSFLIGALTAGVAGVMQV
VAKPPSYSLMRNLANVYGLGFDGITAAMLGRGHPIGVAIAALFLGVLQEG
ARHMQIEAGTPFEFVRIIQGVIILLLAIQMLKKT

>mystery-seq-E
CCTTTTCCACAGATGTTCCACGAATCCTATTTAAATGTTTAAGTTTGTTA
ACTTTTAAATAAATCAGTGTGTCTAAGCGCCATGGCGGTGGAAATAAGGG
CAATAGAGAATGGGCCGTATGAAGTTAAGATAGGCGGGAGGGCCATCTAC
CTCTGCCGCTGCGGGCACTCCGGCTCAAAGCCTCACTGCGACGGCACGCA
TGCAAAAGTGGGGTTTAAGGCGCCTGGCGCCAAAATAGTCTAGCGCCGGG
CGGCGAACCTGGCGTATATTTTTTCGATCTCCAGGGCGAGGGCCACGTCT
TTCTCTGTGATACAGCCCTCGTCGTGGGTTATGAGGTAGACGTCCACCTT
GTTGTACCCCTCTATGCATACGTCTGGGTGGTGGTTAGCCGCGTCGGCCA
CCTTCCTCACCTCTGCCAAGAACTCCACCGCCTCGGCGAAGCTCGCCAGA
GAGAAGCGCTTGTGGAGCCTGGGGGGTTGCCCAGACGTGCCCCAGCCCAC
TGCGCCCCTGCCCGCGAGCTCTGACAAGGCGTCGCTGGGCTTTTTACAGG
CCATGAAAAAGCCGATTTGTGTTTTAAAAAGTGGCCTACTGGGCGGCCAG
CTTGTTCACCGTGTTTATCACCTCGTCGTATGGAGGCTCGTCGAGGGGGT
TGTCGCTGTACCACACGTACTTAATTGTGCCGTCAGGCGCTATGATGTAC
ACGGCGCGTTTGGCGAGGTAGTACAGTGGGAAGTGGAGGAGCTTTGGGAG
CACCACGTCGTAGAGGGCGATGACCTGCCTATTGAAGTCGCTGACCAGGG
TGAAGTGGAGCCTGTTCTTCTCCTTGAACTCCTTCAGCGAGAACGGCGAG
TCCACGGATATTGCTATAACCTCGGCGTTAGCCTTGGCGAGCTTGGCCAT
GTTGTCGCGGAAGGTGCACAGCTCCTTTGTACACACGCTCGTGAACGCCC
CTGGGAAGAACAGAAGCACCACGTACCTCCCCCTCGCCAAAACCTCGCTC
AGCCTCACGGGCTTTAGATCTGTGTCCAACGCCTCGAAGTCCGGTGCCTT
GTCTCCAACTTTCAACGGCATGACACAAGGCCGACTTCCACATAAAAACA
TTGAAAGACAGTAGAAGAGTTGGAGAAAAAAGAAAGTTGTTTTCGATCTA
CCCGCTGACTACCACTTTACCGTACCAGGGGCTCCACTTGGCCCACCTTG
GTATGTGGGCAAAGGTGGAGTTTATGCCGAGGAATAGGGCGATGATGGCG
GCTACCACGAGTATGGCGATGATGAGGTAGGCGGCGCCTCTGCCAGCCTT
GCCCATGGTTAAGAGC

Use these for the Homework:

>mystery-seq-1
GTGTTTAGGACACATCTAGTCTCAGAATTAAATCCTAAATTAGATGGATC
AGAAGTAAAGGTAGCAGGATGGGTTCATAATGTAAGGAATTTAGGTGGAA
AGATATTTATTTTATTAAGAGACAAGAGTGGAATAGGACAAATAGTAGTT
GAAAAAGGTAATAATGCATATGATAAAGTCATAAATATAGGATTGGAATC
GACTATCGTTGTAAATGGTGTAGTTAAAGCTGATGCGAGAGCCCCTAATG
GGGTTGAAGTACACGCAAAAGATATAGAAATACTGTCGTATGCAAGGTCT
CCATTACCGTTAGATGTGACGGGCAAGGTTAAGGCTGATATAGATACTAG
ACTTAGGGAAAGATTACTAGATTTAAGAAGATTGGAGATGCAAGCAGTGT
TAAAAATACAATCGGTAGCTGTGAAATCATTTAGGGAAACATTATATAAA
CATGGATTTGTAGAAGTCTTTACTCCAAAGATAATTGCTAGTGCAACGGA
AGGAGGAGCCCAATTATTTCCAGTATTATACTTTGGAAAAGAGGCATTTT
TAGCTCAGAGTCCGCAATTATACAAGGAATTATTAGCAGGTGCTATAGAA
AGAGTATTTGAAATAGCTCCTGCATGGAGAGCAGAAGAGTCAGACACACC
ATATCATCTCTCAGAGTTCATTAGCATGGACGTAGAAATGGCCTTTGCCG
ATTACAACGATATAATGGCTTTAATAGAACAAATAATTTATAACATGATA
AATGATGTAAAGAGAGAATGTGAAAATGAATTAAAGATATTGAATTATAC
TCCACCTAATGTTAGAATACCTATAAAGAAAGTCTCTTACTCAGATGCAA
TAGAGCTTCTGAAAAGTAAAGGTGTTAATATTAAATTTGGCGATGATATA
GGAACGCCTGAACTGAGGGTATTATATAATGAATTAAAGGAAGATCTTTA
CTTCGTAACTGATTGGCCTTGGCTAAGTAGACCATTTTATACAAAGCAGA
AAAAAGATAATCCGCAGCTAAGCGAGAGCTTTGATTTAATTTTCAGATGG
TTAGAGATTGTTTCTGGAAGTTCAAGAAATCACGTTAAAGAAGTCCTAGA
GAACTCACTTAAAGTAAGAGGACTAAATCCAGAAAGTTTTGAATTCTTCC
TAAAATGGTTTGACTATGGGATGCCACCACACGCCGGTTTTGGAATGGGA
TTAGCAAGAGTAATGTTAATGTTAACTGGTCTTCAGAGCGTGAAGGAAGT
AGTACCATTCCCTAGAGATAAGAAGAGACTAACACCATAG

>mystery-seq-2
MGVEICRSLLECLGALGRSQRLYAAAGLVDEEGLEAASRAAGELRVLVGD
SGPVPRPVYERWREVVRVYPSLHAKFYIFAEDAGPSAALVGSADLTAGGL
RGNLEAVVLIRGEAARPLADMFNRLWARALPLTEDYVADWEGPEEALRKP
WGEAVKRANERLAEILGVSAHCLSRHDPLNCARLVARAVRSRFEGCGDLP
ENCAARATGVSAKALLSAPPSAVLAGHYVCWARALAARLLEGKVGRLDSG
MEAYEAAVQAGAESCWGEAKRAAEEELERLEDSNYRDNYVRWPIPYRLLF
LAMTLPATGCRILGREVRTKKRGVARVERELYC

>mystery-seq-3
MMAAREEARRIRSLLHCYHFKEDITEDAKKVLEAKSREHDELVARVVRIF
RDIIGSECIEENPLVKRGVADAVLRCIDKIYVLEIKSYPFAEINEESYCQ
RGFTRSYIADFEQAVLYGEEIRRRFSSPEVIPVLVYRGIPVEGLENTPFF
IFIDVETARKEGASTTTSAELAYKPGPECARCRNTDCPIREKMRSYIAQG
TAQVRAASAEVAYQELFKTEWCLARHGTRCVKCGPLYIALSQTTLHRDNL
PLTEEHIHYVINSQNASQIAEYLKAQKLLIEKSCYQYRIPLQKIQIDSSG
RYVVAFQVTRNDYLIMLEYVSLLNSLRRRGVRIDVRKLLNGPERCYRTFH
GEVARLIRNISALPGNRRTFEAYYLPIIIDHIGSRDIKVDQISNYFSRNL
WNENWINKLVSNLFDEKGQPITNISGFQNEAIRKIAENITAWLEKSGRSP
LIILTAPPGTGKTLVFLIVAIAVALHGFKAVIMYPSKKLALQQVQQIYYI
VEGLNRHGANISLAVLDGDSKRCGRRPQGGVRALRCDGGRGSLEYQGGQY
ICRTNQAASQVSWFTDCEDDDALSSSIIVTNPYKLSSMLMRSADSAKKLA
NSLALLVIDEVHTMLEPKHLDFFTALLHRLYLLGDVKKYPAIILSSATVT
SSGLPFADRIASDASGAPITFRSVGAVERTEPPDPRMVREFSESLANALL
GERLVQEYAVEPIDYYSMLGSIQGSGSTVTVAKLTAPMVVFTNPAESPSG
TVQEAVVSLMIASSARRKLSAEVLRNFSSVIFFDSKESLSEVEAYVRDRL
VLKEGSPSDKTVTKPFVMNLINGNIASYGLNLVRNILNSGSLAELEDFSH
LTLFCTSLAELNNAYGSAKRAYNSDSSVARGPQCYNMAIDATIDIISDIR
NNGRNIQNRHTLLIHHADLSDNVRYSIEEKLERPGAWSVVLSTSTLELGV
NLTGVGAVGQLGLPKLAENVIQRFGRGGRDKSVLYTALGILFAKHTGEDV
ALIDEDYAVARLFAFKRTPVLPRDESRIISIEQLIAYTTIRALGWFRNSN
VINKVEKVVESSLQFLVSDANQANQLYHVISSRLQTMCSALQALGGSMQQ
VAHRLLRDIVNQINQYIDNIRNVLQNINNTCSLYENLSKIINIYATDPLK
YSWELYFTLGNVLSQLDSLPDLSKCFINNDIRIYNIVRDYVRDALHVARE
LILQYFPRAPSYSGWRTREILLQLTFVPPMPDPRIIEYSLAHYTVESGGI
RRRSRELREAYLKSAPLKTDRYETL

>mystery-seq-4
MKAIRDALVIAWINGWIAAVRGWVWVVTNAITPLSFLVILAVYGGAEGLR
WGLAGGLVWTVASNGISLIGDAAYYRLAIKYQSMLTAAPVSPIGYALGLA
LSSFVFSLPTLAAYVAISLWLGLGLLTPPAAYALITLWLASAGIGFTLSS
LVKHMRYAWSLPQILSTVFTVAAPVYYPASLLPSPYLGVVMPTGAAGILI
QQAAGLAAYDIWLTAAAAVALAIQSISGLYFLIKLAKWREP

>mystery-seq-5
TGCTGACCCTATGATGTATCCTATGGTCATTTATTAAGATGTTATCCTAA
AAAGTATATAACGATTTATTATAGTGTGATAGTAATACCAGAACGAGAAA
TTAGAAAATTGTAAAAAAAGAATTTTAAAATATTATGCGGCTACTTTTCC
TACAGTTTCTGCAATTTTTGCTTCTTCTTCAGCAAATGCGCATAGCATTG
CTTCTATGTCGGCATATCCTTCTGCTTTTGCTGTCATTGCTATTTTGTTG
TATGTGGATATGTGCTCCTCTCCCTCTTTAGTAGCGAAATCAGATAGTAA
CTTCTTTACCTTTAATTCGGTGGCTACTTTTCCTACAGTTTCTGCAATTT
TTGCTTCTTCTTCAGCAAATGCGCATAGCATTGCTTCTATGTCGGCATAT
CCTTCTGCTTTTGCTGTCATTGCTATTTTGTTGTATGTGGATATGTGCTC
CTCTCCCTCTTTTATTGAGAATTCTTCTAATATTTTTTCTACTTTACTGG
ATATCATAGGTATTCGTAATGAATAATCGGAAGGCAAATATATAAGCATT
TGTTAAGCTTTTTTAATACTAAATATAATTAGCATTTTTGTATTTCAACA
AAGTTTGAGATTTTTGTATTACGGAACTAAAAATCCTCTAAAAAACTTAA
CTTGTATATAAAATTCTTTCGTATAATTTCTTTGCCTCTTCATACTTCTC
CTTTGATTTGGTTTCTAATTCGTTCTTCCTTTCAGGAAGTTTTTCAGCTA
ATAGTTTTGAGAGCATATAAACTGTAGCTGTAAGCATAAATTTCTCTTTT
AATTGCTCACTTTCCTTTTTATTTAACTCTTCGAACCAATTGGTTATATA
TTCTAACCCTAAATACTTTATCATTTTCTCTAATATTCCCCTAGCATGAC
CCAATTCAACTAAGGCTTTTTCCCTAATTTTTTCAGATTCTTCCTTTTTA
TTAACCTCCTCCAGCTTTTGAGAGGAGAACAATAGTAATAAATGGTCTTC
GGAGTTAGCCATAAAAAGCTCTTTTAATCCTATTTCAGTCTGTGTCCCCT
TCATCACCTTTAAATTGTATTCATAGCTAATATACTCTTGTTAAAATAAT
GATGACTAACTCCAATACTGACCAATGATGTCGTAACCCGAAACTGAATA
AAAGTAAAATCCTTCCCTACTGAGAATATTTGTATGATAACCTCAAAAAG
AATGAAAGCCCTTGAAATTAATAGCGAAGCATTAGGCGTGCCAACATTAC
TCTTGATGGAAAACGCAGGGAGAAGTGTAAAGGATGAAATAATGAAAAGA
CTGAATTTGGACTATTCTAAAAAGGTTGTAGTATTTGCAGGAACTGGTGG
AAAAGGAGGAGACGGATTAGTAGTAGCAAGGCACCTTGCCTCGGAAGGGT
CAGAGGTTCATGTTTTACTTTTAGGCGAGAACAAACATCCGGACGCAATC
ATTAACTTGAATGCAATATATGAAATGGATTATTCTATTAGAGAAGTTAA
ACTGATAAAAGATACTGACGAATTGCAACCAGTTAAAGCTGACGTGCTTA
TAGATGCCATGTTAGGCACGGGATTTTCTGGTAAAGTTAGAGAACCATTT
AGAACAGCTATTAGAGTATTTAATCAGAGCTCTGGTTTTAAGGTTTCTAT
AGATATACCCTCTGGGATAAATGCAGACGATGAAGAACAGCAGGGAGAAC
ACGTTATTCCCGACCTAATAGTCACCTTTCATGATCTTAAGCCAGGCTTA
AAAAAATTTGAGAGTAAAGTGGTCGTCAAGAAAATAGGTATTCCTAAAGA
GGCTGAAATATATGTTGGTCCCGGTGATGTCATTGTCAATGTGAAGAAAA
GAGAGTATAACACAAAGAAAGGAGATAATGGAAGAGTTTTGATCATTGGA
GGGAATTTTACATTTAGTGGAGCCCCAACTCTATCTGCTTTGGGAGCCTT
AAGGACGGGAGCAGATCTGGTATATGTCGCATCTCCAGAGGAGACAGCTA
AGGTCATCTCTAGCTTTTCCCCTGACCTTATATCTATTAAGCTTAAGGGA
AAGAATATATCTACAGACAATTTGGATGAGCTAAAACCATGGATTGATAA
AGCTGACGTCGTAGTTGTAGGACCTGGTATGGGACAAGAAAGGGAAACTG
TAGATGCTTCCATAGAGATAGTTAGATATCTGAAAGCAAAGAATAAACCT
TCAGTCATAGATGCTGATGCGTTAAAATCAGTGGCAGGTATGGAATTATT
CCCGAATGCAGTAATAACTCCTCATGCAGGAGAATTTAAGATATATTCAG
GGGTTCAGCCTGATTCGAACATGAGAAAAAGAATTGAGCAAGTGAAGGAG
TGCTCACTGAAATGTAATTGTGTAGTACTCCTTAAGGGTTATGTTGATAT
CATAGCAGAAAAGGAAGAATTTAAACTTAATAAGACAGGAAATCCTGGAA
TGGCAGTTGGCGGTACTGGGGATACATTGACAGGAATAATTGCCTCATTT
ATGGCTCAAAAACTATCTCCATTCACTTCTGCTTACTTGGGAGCATTCGT
TAATGGTTTAGCAGGGTCTATAGCATATGAAAAACTTGGCGCACATCTAG
TTGCAACAGATATAATAGAAAACATTCCTAAGGTAATTAATGAACCTTTA
GAAGTGTTCAAGAAAAAAGTGTACAAAAGGATTTTAGATACTTAGGTTTT
ACCCCTAATTCTTTTAATAATCTCAAGTGATTTGTTTGCATGTTCTTCTG
CATTTCCTAGACCGCTCAATACCTCTATAATTTTTCCGTTTTCGTCTATG
ATAAAGGTTACTCTCTGAGCACTTGAGCCTTTCTCGTTTAGAACACCGTA
TAATTTAGCTATTTGTTTATTTGAGTCAGAAACTATAGGAAATCTGGCAC
CGCATTTGTCTGCAAAACTCTTTTGAGTTGAAACTGTATCAACACTAACA
CCTATAACTTCAGCATTTAACTGTTTAAATTGGTCATAAAGTTGTCCAAA
TTTTATGGTCTCTCTAGTACAACCAGGTGTAAACGCCTTAGGATAGAAAT
ATAGTACAACTACAGATTTGCCTCTATATGAAGATAGTTTCAATTTTCCT
ATAGTTGAATCTCCTTCAAAATCAGGAGCTTCATTTCCTTTTTCTAAAGC
CATAGATTATCTGATATAAATATATTCAGTTATGGTTTTTAACCTCTTTT
TCGCTTATGCCTTACA