Monday, April 26, 2010

String matching

The naive brute force approach:

It is very simple to code but there are little things to look for (as least I have to) with respect to optimization. The outer loop runs for text, the inner loop runs for pattern with one more condition -  i+ len(pattern) <= len(text).


int bruteForce (text, pattern)
    for (i =0; i < len(text); i++):

        for (j = 0; j < len(pattern) && i+ len(pattern) <= len(text); j++): //instead of i+j < len(text)
            if text[i+j] != pattern[j]:
                break;
        if (j==
len(pattern)) return i; //match found
    return -1 //match not found



With i+j < len(text) (2+0 < 3), the inside loop runs this the last, even though not required i.e. i=2 is run.
 

012    i
aab    len(text)= 3
  ax   len(pattern)= 2
  01
   j

With
i+ len(pattern) <= len(text) (1+2 <= 3), the inside loop runs this the last i.e. i=2 is not run.
 

012   i
aab   len(text)= 3
 ax   len(pattern)= 2
 01   j
 


Update: A simpler way is to run the outer loop for condition: i <= len(text) - len(pattern)

0 comments:

Post a Comment