log_2 (P(X|M+)/P(X|M-)).
This gives the log odds in total bits. What is the difference between this score and the scores plotted in the histogram on page 53 of the text? Where would this sequence go in that histogram?
Anders Krogh has provided us with a set of HMM teaching/experimenting programs for use with the text. The programs are under the directory
/cse/guests/krogh/book/ahmmon the barnyard machines. The documentation is in
/cse/guests/krogh/book/ahmm/AHMM.documentationand the executable code for the DEC alphas is in
/cse/guests/krogh/book/ahmm/bin/OSF1You should add this directory to your UNIX path. The three programs in this directory are decodeahmm, testahmm and trainahmm.
Using these programs you can define the state structure for an HMM, train it on some training sequences, and use it to "decode" other sequences. Here by decoding, we mean assigning a label to each letter in the observation sequence, as described in Chapter 4 of the text. This label usually represents one of the states of the HMM, although in a more general case the same label may be used for more than one state. There are two ways to decode an observation sequence. The first is the Viterbi method, discussed on page 56 of the text. In this method, the most likely sequence of (hidden) states is computed for the entire observation sequence. The second way is posterior decoding, discussed on page 59. In this method, for each individual letter of the observation sequence, the most likely label (or state, if each state has a unique label) is computed separately. The formulas used in the Viterbi and posterior decoding cases are given in the text.
In this part of the assignment we are going to compare the Viterbi and posterior decoding methods. We'll do this using the "occasionally dishonest casino" example on page 54 of the text. I have designed an HMM to represent this hidden Markov model, under the additional assumption that when the HMM starts, the first die chosen is fair with probability 0.99. I have also changed the transition probabilities a bit. For use by Anders' program, this HMM is specified by the following file (see documentation for details):
# HMM based on example of page 54 of text
########## ALPHABET ################################################
alphabet { alphabet AB123456; }
########## STATE begin #############################################
begin { trans
fair: 0.99
loaded: 0.01;
letter NULL;
end 0;
}
########## STATE fair ##############################################
fair { trans
fair: 0.99
loaded: 0.01;
only 1:0.166666666 2:0.166666666 3:0.16666666 4:0.16666666 5:0.16666666
6:0.16666666;
label F;
}
########## STATE loaded ############################################
loaded { trans
fair: 0.1
loaded: 0.9;
only 1:0.1 2:0.1 3:0.1 4:0.1 5:0.1 6:0.5;
label L;
}
Here is what you need to do.
1552433142516636666166646666425341323435241324435166366126666166643523425234142One way to do this is to make a file called data1 for this sequence. (Don't add any blank lines at the end of the file, the software is just quickly made classroom software, and is very brittle.) Then type
decodeahmm -v dice.mod data1
The "-v" option specifies Viterbi decoding.
What is the output? Display the output above the observation sequence.
Is it a reasonable decoding?
data2:
11116161661666666611116111116111111616666166616166666616111166666661616161666666
data3:
11111111116666666666666666111111111166666666661111111In each case, compare the results of the decoding, and say which decoding you feel is more reasonable. Try to explain why they differ the way they do.