This is part 2, for part 1 follow this link.
Application on Biological sequences
As seen thus far, MC and HMM are powerful methods that can be used for a large variety of purposes. However, we use a special case of HMM named Profile HMM for the study of biological sequences. In the following section, my description of this system should explain the reasoning behind the use of Profile HMM.
Analysis of a MSA
Let us consider a set of functionally related DNA sequences. Our objective is to characterise them as a “family”, and consequently identify other sequences that might belong to the same family [1].
We start by creating a multiple sequence alignment to highlight conserved positions:
A | C | A | – | – | – | A | T | G |
T | C | A | A | C | T | A | T | C |
A | C | A | C | – | – | A | G | C |
A | G | A | – | – | – | A | T | C |
A | C | C | G | – | – | A | T | C |
It is possible to express this set of sequences as a regular expression. The family pattern for this set of sequences is:
[AT][CG][AC][ACGT]^{*}A[TG][GC] Continue reading “Hidden Markov Model applied to biological sequence. Part 2”