Wednesday, August 4, 2010

Reducing the space for LCS lengths

When you’ve run out of main memory, any estimate of runtime based on big-O analysis becomes useless. The system either crashes or thrashes, paging in virtual memory. By contrast, if we can reduce the memory required by a program, the time analysis should still hold — we never have to page in more time.

Once you’ve watched the dynamic programming solution a few times, you’ll realise that the LCS lengths (the numbers in the grid) are computed row by row, with each row only depending on the row above.

The LC matrix in the naive algorithm grows quadratically with the lengths of the sequences. For two 100-item sequences, a 10,000-item matrix would be needed, and 10,000 comparisons would need to be done. In most real-world cases, especially source code diffs and patches, the beginnings and ends of files rarely change, and almost certainly not both at the same time. If only a few items have changed in the middle of the sequence, the beginning and end can be eliminated. This reduces not only the memory requirements for the matrix, but also the number of comparisons that must be done.


Post a Comment