## Sunday, August 4, 2013

### Topological Sorting

Definition

Topological sort: an ordering of the vertices in a directed acyclic graph, such that: If there is a path from u to v, then v appears after u in the ordering.

A topological ordering of a directed graph G is a labeling f of G's nodes (1...n) such that:
(1) the f(v)'s are the set {1, 2, ... , n}
(2) if (u, v) in G, then f(u) < f(v) ... the graph should move in forward direction.

Lets understand this with example, consider following graph :
Here we can order s,v,w,t as 1,2,3,4:
OR we can order s,w,v,t as 1,2,3,4 i.e. w coming before v.
So, if we have only 2 topological orders. If we put w before s, we will have to go backwards, and hence this is not topological ordering.

Necessary condition for graph to have topological ordering
So, only restriction on directed graph to have topological ordering is that it shouldn't have cycle.

Consider the graph G with directed cycle, suppose we go forward, and when we pass through cycle, we come back to the same point, hence there is no ordering.

So, if there is no cycle we are guarantee to have topological ordering.

Uniqueness of topological ordering
A topological sort will be unique if and only if there is a directed edge between each pair of consecutive vertices in the topological order (i.e., the digraph has a Hamiltonian path). Source

A Hamiltonian path just means that a path between two vertices will only visit each vertex once, it does not mean though that one vertex must have no incoming edges. You can have a Hamiltonian path that is in fact a cycle. This would still generate a unique topological sort (of course it would be a cycle as well if that is important to you).

Uses of topological sort
Scheduling a sequence of tasks based on their dependency.

Algorithm
Algorithm 1 - Using sink vertex
Before we go to algorithm lets define some terms.

Sink vertex - a vertex that has no outgoing edges.

Note - Every directed acyclic graph (DAG) has a sink vertex.
Proof - Suppose that every vertex has atleast outgoing edge, then we keep following outgoing edge  and as the graph size if finite say n, and because of this we see n+1 vertex, which results in directed cycle, and we have already said that it shouldn't exits. So, the last vertex shouldn't return back, otherwise that will result in cycle.

So if graph has to have a topological order, then its the sink vertex where the order ends.

To compute the topological ordering:
```Let v be the sink vertex of Graph G
f(v) = n
recurse on (G-v)
```

so, lets take the example:
Consider the graph below, where t is the sink vertex(marked with orange circle):

We set, f(t) = 4, and remove it from graph. Now we recurse on G-v :
This time we have 2 sink vertices - w and v. Suppose we set f(w)=3, then we remove it from graph. Then we come to f(v) = 2 and remove it from graph and finally f(s) = 1.

So, we take away the sink vertex again and again, until we have no vertex left. It is pretty fast but lets now see the application of DFS.

Topological ordering algorithm 2 - Using DFS
Here is a slick way to do a topological sorting with DFS:

DFS-Loop doesn't take current vertex and has a global variable current_loop is global variable which is initialized to n, and is reduced by1 each time we finish exploring the new vertex.

```DFS-Loop(graph G)

mark all nodes unexplored

current_label = n (number of nodes, to keep track of ordering,which we will count down)

for each v in G
if v not yet explored
DFS(G, v)

DFS(graph G, vertex start)
mark start explored
for every edge(start, v)
if v not yet explored
DFS(G, v)
set f(start) = current_label
current_label--
```

Time Complexity - This algorithm is as fast as DFS, because we have added only 1or 2 steps in it which are very trivial. This will run in linear time because you do a constant amount of time at each node and traverse each edge only once since we keep track of which has or has not been explored. So, we have O(n+m) OR O(|V|+|E|).

Dry running the algorithm:
Consider the following graph :
Lets start dfs at node v, as we can start randomly from any point. If we start from v, only point we can go to is t.
So, we set f(t) = current label, which is currently equal to n, which is number of nodes and it is 4.
Once we are done with t, we backtrack to v, and count down current label by 1, to 3. So, f(v)=3:
Now, we go to s as we have not seen it yet. Now from s, we can go to v or w. But v we have already seen, so we move w, from w to t, but we have been to t. So, we back track to w:
Now from w we backtrack to s:

Correctness
To prove correctness, we need to show that if (u,v) is an edge, then f(u)<f(v)
Case 1 : u visited by DFS before v => recursive call corresponding to v finishes before that of u(since DFS using lifo structure). So, we v will be assigned larger label. Thus f(v) > f(u).

Case 2: If v is visited before u, there is no path (v, u) because otherwise it would be a directed cycle. Then, the recursive call of v gets popped before u even gets put on the stack. Thus f(v) > f(u)

Source