## Sunday, May 3, 2015

### Problem

Give logic for implementing "diff" command in Linux.
Consider various test cases and explain what will happen in each. The two files are source code and are huge..
For e.g.
File 1: 1-2-3-4
File 2: 1-3-4-2

### Solution

The operation of diff is based on solving the longest common sub-sequence problem.
In this problem, you have two sequences of items:
a b c d f g h j q z
a b c d e f g i j k r x y z
and you want to find a longest sequence of items that is present in both original sequences in the same order. That is, you want to find a new sequence which can be obtained from the first sequence by deleting some items, and from the second sequence by deleting other items. You also want this sequence to be as long as possible. In this case it is
a b c d f g j z
From a longest common subsequence it's only a small step to get diff-like output: if an item is absent in the subsequence but present in the original, it must have been deleted. (The '–' marks, below.) If it is absent in the subsequence but present in the second sequence, it must have been added in. (The '+' marks.)
e h i q k r x y
+ - + - + + + +

Resource