R. Wagner's Home What is Bioinformatics? Intro to Markov Chains Hidden Markov Models Evolutionary Genetics
Background on Dynamic Programming
  • The Longest Common Subseqence Problem

    A subsequence of an n-character sequence, S, is obtained by deleting 0,1,2, or more characters from S. Here is an example of some subsequences of the word, residency: res, sdncy, reden, and many more. Given two sequences, A = [a1, a2, ..., am ] and B = [b1, b2, ..., bn], len(i, j) denotes the longest subsequence common to [a1, a2, ..., ai ] and [b1, b2, ..., bj], (note the subscripts, j <= m and i <= n).

    The dynamic programming formulation of the problem can be shown as a recurrence relation:

  • Computation of the Longest Common Subsequence

    Here is the pseudocode (from [1]) for an O(mn) longest common subsequence script:

    for(i = 0 to i = m) do length(i, 0) = 0
    
    for (i = l to i = n) do length(0, j) = 0
    
    for (i = l to i = m) do
      {
        for (j = l to j = n) do
          {
            if (a[i] = b[j])
              {
                length(i, j) = length(i - 1, j - 1) + 1
                 traceback(i, j) = diag;
               }
             else if( length(i - 1, j) >= length(i, j - 1)
                     {
                       length(i, j) = length(i - 1, j)
                       traceback(i, j) = vertical
                     }
                   else
                     {
                       length(i, j) = length(i, j - 1)
                       traceback(i, j) = horizontal
                     }
            }
        }
    
    return length and traceback
    
    
  • Common Subsequences between residency and redundancy


Sources:
[1] Chao, K.-M., "Dynamic Programming Strategies for Analyzing Biomolecular Sequences." Selected Topics in Post-Genome Knowledge Discovery. Eds. Wong, L. and Zhang, L., Singapore University Press. 2004.