## Sunday, August 4, 2013

### Strongly Connected Graph

A strongly connected graph is a graph such that there is a path connecting each vertex in the graph to another vertex in the graph.

The strongly connected components (SCCs) of a directed graph G are the equivalence classes of the relation: u~v <-> path u->v AND path v->u.
The ~ relation is an equivalence class, that is, ~ is reflexive, symmetric, and transitive. (Reflexive => everybody is related to itself, symmetric => if u is related to v, then v is related to u and Transitive => if u is related to v, v to w then w is related to u )
Following is the example of strongly connected components:

Above graph has 3 SCCs in it.

Note: In the post, I assume that the reader knows about DFS or BFS.
Computing the strongly computed components allows you to use the for-free primitives (BFS, DFS, etc) on the components in linear time, which is blazingly fast.

Calling DFS on the right nodes will compute all the SCCs, but on the wrong nodes will just give back the union of multiple SCC's or even the whole graph.

Why DFS or BFS for SCCs?
If we use DFS or BFS on any node in SCC, then it gives us back SCC, but there are some caveats, that if you traverse in some other node, in some other SCC, it will give us multiple SCCs, so if you select wrong node, then it gives us more than 1 SCC.

Kosaraju's Two pass algorithm

With Kosaraju's Two-Pass Algorithm, we can compute the SCC's in linear time (O(V+E)), with just two passes of DFS.

Here are 3 steps in Kosaraju's algorithm

1. Make a new graph with all the edges of original graph reversed.
The naive way of implementing this would to run through the graph linearly and construct a new one. An optimization for this would be to just run DFS on the original graph, but going across arcs/edges backwards. Lets call this graph Grev.
2. Run DFS-loop on Grev
3. Run DFS-loop on G.
The first DFS-loop computes the "magic ordering" of the nodes while the second DFS-loop discovers the SCC's one by one.
Step 2 computes magical ordering of nodes,  such that DFS in step 3 computes SCCs one by one. Step 1 will have a notion of finishing time of first vertex. In 2nd pass  we will be processing the nodes in decreasing order of finishing times, and in second pass we will mark some nodes called leader.

```//Note - it takes input as Graph and not any vertex
DFS-Loop(graph G)

global variable t = 0 //(for finishing times in first pass, number of nodes finished so far)
//(current source vertex, for leaders in second pass)
global variable s = null
//So s keeps track of the most recent vertix for which DFS was initiated

Assume nodes labeled 1 to n
//iterate in decreasing order
for i:=n down to 1
if i not yet explored
s := i
DFS(G, i)

DFS(graph G, node i)
mark i as explored
for each arc(i,j) in G
if j not yet explored
DFS(G, j)
t++
set f(i) := t [i's finishing time]

```

Time compexity -=2*DFS =  O(|V|+|E|) or O(n+m)

Dry running the algorithm
Consider the following graph (used by Tim Roughgarden in Algorithms: Design and Analysis, Part 1 coursera):

DFS - First past
Lets have first pass of DFS. We will get the finishing time of the nodes.
Though finishing time vary, we don't have to worry about it, but rather dry running the algorithm: So lets start with end node i.e. 9. So we start from 9, then go to 6, then we can go to 8 or 3. Lets go to 3 (say). The only outgoing arc for 3 is 9, which is already visited. That means 3 is finished. We increase t by 1, f(3) =1 as 3 is first node to finish. Hence:

Now we backtrack to 6, then move to 8, then 2 and then 5. But has only 1 outgoing arc, which is 8 and is already marked. So we mark 5 as finished, hence f(5) = 2. We backtrack to 2, f(2) = 3, then we backtrack to 8 then to 6 and then 9, and all these nodes are completed. So following is the current status:

So 9 is already explored, so we come otut of loop, then we decrement the value of i by 1, and check for 8, and it is already explored so we decrease i again and check for 7, and it is not explored. From 7, we go to 4, and then to 1. From 1 to 7 we see that 7 is already finished we backtrack, hence 1 is finished, then 4 and in the end 7. See below:

So our ordering is done and based on finishing time lets replace the graph vertices by finishing time.
Now lets rename the nodes from their original names to the finishing times and also their arcs/edges reversed:

So our ordering is 1,2,3,4,5,6,7,8,9 goes to 3,5,2,8,6,9,1,4,7. But once the edges are reversed the ordering becomes - 7,4,1,9,6,8,2,5,3

DFS - 2nd pass
So lets start DFS on this graph from the end node. i.e. 9. From 9 we go to 7, from 7 to 8 and from 8 we are again back to 9 and it is already explored, and hence leader vertex is 9, and you will notice that it one of the SCCs.

Now we come out to outer loop. We go to 8, already seen, then we go to 7, we have already explored then we go to 6, and it is not already explored. From 6 we can go to 9 or 1, say we go to 9, which is already explored, and hence we backtrack, we then go 1, from 1 to 5, and back to 6, so only new node we added were 1, 5 and 6, with leader node at 6. We notice that it is also SCC of graph:

We again come back to outer loop. Last value was 6, we decrement to 5, it is already explored, then 4, which is not explored. From 4 we can go to 5 or 2. Say we go to 5 first, we see it is already explored, we backtrack and go to 2, then 3 then 4 again. So new node added are 2 3 and 4. So it is new SCC:

Then we come out to outer loop, then go to 3, which is explored, then 2 which is explored then 1 which explored and hence the end.

Claim - The SCCs of a directed graph induce a acyclic "meta-graph"
meta-nodes - C1, ..., CK
So our above example becomes :

Source