Monday, July 5, 2010

Longest common substring problem

Given two sequences of letters, such as A = HELLO and B = ALOHA,
find the longest contiguous sequence appearing in both.

One solution:   (assume strings have lengths m and n)
For each of the m starting points of A, check for the longest common string starting at each of the n starting points of B.

The checks could average Θ(m) time → a total of Θ(m^2*n) time.

Dynamic programming solution:
Let Li, j = maximum length of common strings that end at A[i] & B[j].   Then,

Li, j  =  0 if i = 0 or j = 0
        =    Li-1, j-1 + 1 if A[i] = B[j]
        = 0                   if A[i] != B[j]
     

int LONGEST_COMMON_SUBSTRING(A,m,B,n)
{
  if (m==0 || n ==0) return 0;  
  for i = 0 to m
                   L[i,0] = 0
    for j = 0 to n
                  L[0,j] = 0
    len = 0 ;
    answer = L[0,0];
    for i = 1 to m do
       for j = 1 to n do
          if A[i] != B[j]
             L[i,j ]  = 0;
          else
             Li,j = 1 + L[i-1,j-1];
             if L[i,j] > len then
                len := Li,j
               
}
Example:

       A L O H A

    H  0 0 0 1 0
    E  0 0 0 0 0
    L  0 1 0 0 0
    L  0 1 0 0 0
    O  0 0 2 0 0

0 comments:

Post a Comment