## Tuesday, November 5, 2013

### Maximum weight independent set (WIS) graph problem using DP

Input - A path graph G = (V,E)
with non - negative weights on vertices

eg.Consider the graph with vertex weights - 1,4,5,4

Desired output - subset of non-adjacent vertices - an independent set of maximum total weights.

Solutions
Brute force
polynomial time, so lets do better.

greedy approach
Greedy algos are easy to implement but are sometimes not correct. To apply greedy approach here we have to iteratively choose the max-weight vertex not adjacent to any previously chosen vertex.

So, in the graph above, greedy will fail. Why?
To begin with greedy algo will pick the most weighted vertex, with weight 5, and as it picks 5 it is blocked on both side to pick any other vertex i.e vertices with weight 4 and 4, but it picks 1, which leads to total cost of 6, which is incorrect.

A divide and conquer approach

We divide the path graph in 2 , so max in 1st part is 4 and max in 2nd part is 5, and now as the recursion is complete, we merge back the solution, but OOPS, the nodes are adjacent, which is again incorrect. As the graph is small, we can think of some way of avoiding this, but in very very big graphs, it will make it very complicated. So, greedy is myopic in this sense.

So, all the paradigms have failed till now, but lets see if Dynamic programming can be of any help.

Dynamic programming
Lets now work on dynamic programming.

Dynamic Programming
Critical step - Reason about structure of an optimal solution.

Motivation - This thought experiment narrows down the set of candidates for the optimal solution, can search through small set using brute force search.So, we have to think that we do have optimal solution, i.e. guessing and finally get that solution.

Notation - Let S ⊆ V be a max-weight independent set (WIS)
Let vn = last vertex of path (i.e. the rightmost vertex).

Case1 - vn  is excluded from optimal solution S
Suppose vn  is excluded from optimal solution S.  ie. vn ∉ S.
Let G' = G with vn deleted.

Note : S also an IS of G'.
Note : S must be a max weight IS of G' - if S* was better it would also be better than S in G [contradiction]

Case 2 - Suppose vn ∈ S ..i.e. S does include vn
Note - previous vertex vn-1 ∉ S due to adjacency rule.

Let G'' = G with vn-1 , vn deleted

Note : S - {vn} is an IS of G''.
Must in fact be a max weight IS of G'' - if S* is better than S, in G'' , then S* U {vn} is better than S in G [contradiction]

We did all this because - the optimal solution is to select the case where either vn is there or its not there. So, lets take either case, and see whichever is better. So, if vn is not included, than we have to search for WIS in G' and if it is included we have to search for WIS in G''. This is kind of brute force, but it is far better.

So, now we have a thought process, we have to come up with algorithm.
If we knew whether or not we exclude the last vertex vn from the optimal solution, because optimal soluton depends on it. So, the optimal solution can have 2 possibilities - either it excludes the final vertex or the last vertex and get max WIS from G' or it includes the last vertex then it gets WIS from G''.

So, if we somehow knew which case we are in,we can just solve this by recursing on rest of the graph based on the case.But there is no way to find this way, so lets resort to brute force. We apply crazy solution that we can compute the solution recursively going into both cases and choosing the path which gives the better result.

Proposed algorithm
Now suppose we apply crazy solution that we can compute the solution recursively going into both cases and choosing the path which gives the better result.

```  Recursively compute S1 = max-wt IS of G'
Recursively compute S2 = max-wt IS of G''
return S1 or S2 U {vn} - whichever is better
```

Good news - It will work.
Bad news    - So, this looks like a brute force, as we are going through almost all vertices, hence takes exponential time.

The \$64000 Question(Moving towards memoization)
So, this brings us to important question - Over these exponentially many recursive calls, how many distinct sub-problems are ever get solved by this algorithm? Answer is Î¸ (n).

So, even if there are exponential calls, we have to solve only distinct  Î¸ (n) sub problems.

This is because when we start from right vertex, we will be taking it or leaving it, then we will skip 1 vertex and like wise go through till we pluck all the vertices. So, in first recursive call you may pluck out one vertex from the right, and 2nd recursive call you may pluck out 2, but both from the right. So, we always pluck the vertex from the right.

So, we have to remove the redundancy, as we dont have to solve the same problem more than once. So, we will be using a technique called - memoization, where we will cache the sub problem as soon as we see it.

So, we have to see if we have seen that problem earlier, just skip but if we not then just go recursively and solve it.

But more better way is to shun this top down recursion, and accept bottom up iterative algorithm.

```Let Gi is the subset of G, containing 1st i vertices of G

Plan - Populate array A bottom up, i.e. left to right with A[i] = IS of Gi

Initialization : A[0] = 0 and A[1] = w1

Main loop : For i = 1, 2, ...,n   {

A[i] = max (A[i-1] , A[i-2]+wi)
```
(max of case 1 or case 2)

This is same as recursive , only thing is it skips the redundant cases. Also, note that this algorithm gives us the max weight in the array element A[n] and not the solution containing the vertex path.
Time complexity: O(n)
Applying this algorithm on above problem
So, our path graph is 1,4,5,4 :

So, as initialization we will have A[0] = 0 and A[1] = w1 = 1. Our array looks like:
Now,A[2] = max ( A[i-1], A[i-2]+w2) , where i = 2
A[2] = max( 1, 0 + 4) = 5.
Similarly we go on and array becomes:
So, here we have got 8 as the maximum value, as it is the max value(if we take vertices b and d, with weights 4 and 4). So, we notice here that the algorithm gives us optimal value and  not the optimal solution.

Getting the Optimal solution from the Optimal value
We can do 2 things to get optimal solution:
1. Put in the optimal solution value at the time of construction [Not recommended, hence we didnt do the same in algorithm above]
2. Reconstruct the path as and when required using array A which we created in algorithm above.
So, lets follow point 2. How to trace back? Key point - We know that vertex vi belongs to max weight IS of Gi. So, our last element in the array is either included in optimal solution or not. So, if last element is higher than element just before it, then it is included and similarly for other elements.
```Let A = filled in array A
i = n
Let S = {}

while i>=1{
if(A[i-1]>=A[i-2])
decrease i by 1
else
decrease i by 2

return S
```

Running time : O(n)

Now we can define dynamic programming.Key ingredients of dynamic programming:
1. Identify a small number of sub problems
eg. compute the max-weights IS of Gi, for i=0,1,2,...n
2. Quickly and correctly solve "larger" sub problems given solutions to "smaller sub problems"
usually via recurrence such as A[i] = max(A[i-1],A[i-2]+wi)