Longest common subsequence : Dynamic programming ~ KodeKnight

Monday, April 26, 2010

Longest common subsequence : Dynamic programming

April 26, 2010 Algorithms, Dynamic Programming / DP, kodeknight, String, subsequence, substring No comments

m: A B C B D A B
yn: B D C A B A

The LCS is B C B A

For arbitrary inputs, it is NP-hard. For constant inputs, DP gives the solution in polynomial time ex: O(n^k). Also, there can be more than one LCSs and the solution for that is exponential time ex: O(k^n).

Here we maintain an LCS matrix of m*n size. Initially, for i= 0 or j = 0, LCSij = 0. Then we use the following optimal substructure.

LCSij = 0                                        for i = 0 or j = 0
          = LCS[i-1,j-1] + 1                for x[i] = y[j]
          = max(LCS[i-1,j],LCS[i , j-1])      for x[i] != y[j]

I thought of the following on my try which also works:

i

Below is an example from wiki:



    | 0 1 2 3 4 5 6 7

    |   M Z J A W X U

----|-----------------

0   | 0 0 0 0 0 0 0 0

1  X | 0 0 0 0 0 0 1 1

2  M | 0 1 1 1 1 1 1 1

3  J | 0 1 1 2 2 2 2 2

4  Y | 0 1 1 2 2 2 2 2

5  A | 0 1 1 2 3 3 3 3

6  U | 0 1 1 2 3 3 3 4

7  Z | 0 1 2 2 3 3 3 4

So the length of the longest subsequence is 4.

getLongestCommonSubsequence(x, y):

    int lcs[m,n] ;   //2 d array 

    for i = 0 to m:

        lcs[i,0] = 0

    for j = 0 to n:

        lcs[0,j] = 0

    for i = 1 to m:

        for j = 1 to n:

            if x[i] = y[j]:

                lcs[i,j] = lcs[i-1,j-1] + 1

            else: 

                lcs[i,j] = max(lcs[i-1,j], lcs[i,j-1])

After LCS matrix is calculated in O(n*m) time, we just need to trace back from LCS[n,m] location.



 printLCS(lcs, x, y, i, j):

    if i = 0 or j = 0: 

        return ""

    else:

        if x[i] = y[j]:

            return printLCS(lcs, x, y, i-1, j-1) + x[i]

        else:

            if lcs[i,j-1] > lcs[i-1,j]: 

                return printLCS(lcs, x, y, i, j-1)

            else:

                    return printLCS(lcs, x, y, i-1, j)

There can be more than one LCS of the same length. The above would print one of them in O(n+m) time. So by DP, we get one LCS in O(n*m) time.

Step by step example

	A	G	C	A	T
0	0	0	0	0	0
G
A
C

KodeKnight

Like Us

Followers

Topics

Books for You

Contributors

Search This Blog

Labels

Report Abuse

Blog Archive

Blogger news

Blogroll

ADs

Blogger news

Monday, April 26, 2010

Longest common subsequence : Dynamic programming

0 comments:

Post a Comment

Labels

Likeus