Friday, August 30, 2013

find four elements in array whose sum equal to a given number X

This can be solved efficiently via using HashTables.

We can have a hashtable sums sums will store all possible sums of two different elements. For each sum S it returns pair of indexes i and j such that a[i] + a[j] == S and i != j. But initially it's empty, we'll populate it on the way. So, this can be done in O(n^2) time.


for (int i = 0; i < n; ++i) {
    // 'sums' hastable holds all possible sums a[k] + a[l]
    // where k and l are both less than i

    for (int j = i + 1; j < n; ++j) {
        int current = a[i] + a[j];
        int rest = X - current;
        // Now we need to find if there're different numbers k and l
        // such that a[k] + a[l] == rest and k < i and l < i
        // but we have 'sums' hashtable prepared for that
        if (sums[rest] != null) {
            // found it

    // now let's put in 'sums' hashtable all possible sums
    // a[i] + a[k] where k < i
    for (int k = 0; k < i; ++k) {
        sums[a[i] + a[k]] = pair(i, k);


Sunday, August 25, 2013

DFS (Depth first search) for graph

What is dfs?
 This algorithm explores aggressively and backtracks only when necessary. This is similar to how we play a maze game.

To code DFS iteratively, we can mimic BFS , but use a Stack instead of a Queue. Another way to code this would be to do it recursively. DFS is very naturally phased as recursive example. We will see both of implementation later.

 DFS(graph G, vertex s)
     mark s as explored
     for every edge (s, v):
          if v unexplored
               DFS(G, v)

Example : 
Consider the graph:
So, our strategy will be to explore aggressively and only backtrack when necessary.
Example taken from coursera course by Tim Roughgarden in Algorithms: Design and Analysis, Part 1.

Now, we have starting vertex as s.  From s we have to go to some node, lets go to a.

From as we can't go to b, as it is s's immediate neighbor, but we can go to a's immediate neighbor. So we go to c:

From c, we go to e.
From e we can go to d only as it is the only neighbor which is unexplored. So d is 5th node we see.
From d we can either go to c or we can go b. Lets suppose we go to c. But c is already explored at number 3, so we retreat, and we then go to b.
Now at b we go to s, but we have explored that, then we go to a, we have explored. So we have explored all the options from b, so we retreat to d, then we see that at d also we have explored all the options, so we retreat to e, from e to c, from c to a, and then we come to s back. From s we have not went through edge s-b, so we traverse and again retreat as we have already seen b. Now we have covered all the edges.

Basic DFS Properties
At the end of DFS, all vertices that could have been reached will have been reached.
Time complexity - 
Running time for adjacency list- O(|V|+|E|) OR O(n+m) where |V| and |E| is number of vertices and edges for adjacency list representation of graph.
Running time for adjacency matrix - O(|V|2) OR O(n2) as we have to traverse through the whole row until we find an edge.

Application of DFS
  1. Topological sorting
  2. Finding SCC(strongly connected component)

Dijkstra's algorithm for equal weight edges using BFS

This is the application of BFS (breadth first search)

dist(v) = fewest number of edges on path s to v.

Here is the pseudocode:
Initialize dist(v) = {0 if v=s, infinity if v != s}
foreach vertex v in all vertices
    dist(v) = 0;
    dist(v) = infinity;

Now apply BFS, and hen considering edge (v, w), if w is unexplored, then set dist(w) = dist(v)+1, keeping rest of the BFS same.

Note - BFS computes the shortest path, only when the length of all the edge is same (say 1)

Consider the graph:
Initially, dist(s) = 0 and rest of the nodes have dist = infinity.

So, applying bfs now, where we find a , we add to queue, mark dist(a) = dist(s)+1 = 0+1 = 1.
So, dist(v)=1. Now we add b, and do the same. So dist(b) = 1.

Now we deque a, and add c. While adding c, we will find dist(c) = dist(a) +1, so dist (c) and so on.

In order to retrieve the shortest path from the origin to a node, you need to maintain two items for each node in the graph: it's current shortest distance, and the preceding node in the shortest path. Initially all distances are set to infinity, and all predecessors are set to empty. Consider following example:
A - B - C
 |           |
D          E

In this example, we set A's distance to zero, and then proceed with the BFS. On each step you check if you can improve the distance of a descendant, i.e. the distance from the origin to the predecessor plus the length of the edge that you are exploring is less than the current best distance for the node in question. If you can improve the distance, set the new shortest path, and remember the predecessor through which that path has been acquired. When the BFS queue is empty, pick a node (in your example, it's E) and traverse its predecessors back to the origin. This would give you the shortest path.

Here is the pseudo code of what we talked about:
function Dijkstra(Graph, source):
       for each vertex v in Graph:                                // Initializations
           dist[v] := infinity ;                                  // Unknown distance function from 
                                                                  // source to v
           previous[v] := undefined ;                             // Previous node in optimal path
       end for                                                    // from source
       dist[source] := 0 ;                                        // Distance from source to source
       Q := the set of all nodes in Graph ;                       // All nodes in the graph are
                                                                 // unoptimized – thus are in Q
      while Q is not empty:                                      // The main loop
          u := vertex in Q with smallest distance in dist[] ;    // Source node in first case
          remove u from Q ;
          if dist[u] = infinity:
              break ;                                            // all remaining vertices are
          end if                                                 // inaccessible from source
          for each neighbor v of u:                              // where v has not yet been 
                                                                 // removed from Q.
              alt := dist[u] + dist_between(u, v) ;
              if alt < dist[v]:                                  // Relax (u,v,a)
                  dist[v] := alt ;
                  previous[v] := u ;
                  decrease-key v in Q;                           // Reorder v in the Queue
              end if
          end for
      end while
      return dist;

For detailed algorithm of dijkstra's algorithm, please refer here.

BFS (Breadth first search) on Graph

  • Starting at some arbitrarily chosen vertex s (s stands for start vertex) , we mark v so that we know we've visited it, process v, and 
  • then visit i.e. mark the vertex as visited and process all of v's neighbors. 
  • Now that we've visited and processed all of v's neighbors, 
  • we need to visit and process all of v's neighbors neighbors
BFS(graph G, vertex s)
[all nodes initially unexplored]
-mark s as explored
let Q = queue data structure, initialized with s
while (Q not empty)
     remove the first node of Q, call it v
     for each edge (v, w):
          if w is unexplored
               mark w as explored
               add w to Q (at the end)

We visit each node only once and each edge at most twice (for edge(v, w), we might encounter it when we're at node v and at node w).

Consider the graph
Let s be the starting vertex.
Layer 0 is node s, then a and b are layer 1. So, when layer 0 is processed, we add layer 1 to queue.
Layer n+1 gets added to the queue when we process layer n.

Time Complexity: 

Adjacency list implementation of graph - O(|V|+|E|) OR O(n+m) where |V| and |E| is number of vertices and edges for adjacency list representation of graph.
Running time for adjacency matrix - O(|V|2) OR O(n2)

where V - number of vertices and E number of edges which can be reached from the starting vertex. Also this is time complexity for adjacency list implementation.

Applications of BFS

Saturday, August 24, 2013

Deleting the element from the heap

Here we will discuss deletion of element from min heap.

Pseudocode for deletion from heap
  1. Find the index of the element to be deleted from the heap. This takes O(n) time.(If you already have the index, please skip this step)
  2. Remove the element from the heap
  3. Move the root element to the element to be deleted location
  4. Move the last element to the first position
  5. Decrease array size by 1
  6. Now bubble down on the array(like we did in extract min). Bubbling down (or sifting down) is done as follows : 
    • if current node has no children, it is over;
    • if current node has one child: check, if heap property is broken, then swap current node's value and child value; so we bubble down the child;
    • if current node has two children: find the smaller of them. If heap property is broken, then swap current node's value and selected child value; bubble down the child.
  7. Repeat step 6, until the heap property is satisfied.
Consider the following heap:
Suppose we want to delete element 9. So, we first nullify it and move to 4 to 9th location:

Now, as the root position is empty, we move last element i.e. 13 to root.
Now the process of bubbling down starts. We see that at the root node, 12 is greater than 4 and 8. So we swap it with smaller of children i.e. 4.

But again the heap property is not restored, as 13 is still larger than 4 and 4. So bubbling down again:
But still 13 is greater than 11, and hence heap property is still not satisfied:
Finally, min heap property is satisfied, and we are done.


Application of heap (in computer science)

Following are applications of heap:
  • Heap Sort - fast way to get minimum computation
    SelectionSort() with time complexity - Θ(n2) when uses heap data-structure is used as Heapsort, with time compexity nlogn. Heapsort has 2 steps
    1) Insert all n array elements in the heap O(n)
    2) Extract min to place elements in sorted order n * O(lg n)
    So, it is now close to quicksort.
  • Priority Queue
    A heap can also be called a priority queue. It can be used to schedule events and to check which event is going to occur next, using the timestamp as key. This makes it much faster than keeping an unordered list and finding the max each time.
  • Median maintanence
    Thirdly we can use a heap for "median maintenance." You can given a sequence x1, ... xn and at each time step i, you need to give the median of {x1, ..., xi}. Use O(log i) time at each step i. We can solve this by maintaining two heaps, Hlow and Hhigh to keep track of the smallest half of the elements and the largest half of the elements respectively. Hlow supports extract-max while Hhigh supports extract-min. The key idea is to maintain the invariant that about i/2 smallest (largest) elements in Hlow (Hhigh).

    When you add a new element, if it is smallest than max element in Hlow then add it in there. Otherwise, add to Hhigh. What if there is an imbalance? That is, if Hlow has two more elements than Hhigh? Then you simply extra the max of Hlow and add to the other heap.

    So how do we compute the median? It is the max of the Hlow or the min of Hhigh. If i is even, then they are both medians. Otherwise, the median is just from the heap that has one more element than the other.
  • Speeding up algorithm like djikstra - see here.

Building of heap

To be done soon.

Array Implementation of Heap

We have discussed a heap as binary tree. Normal implementation of tree is via pointer, but in heap it is much more efficient to use arrays.

So, suppose we have following heap (having 9 elements) :

Now, we need array of 9 elements.

Now, we see the heap having 3 levels, level 0 as root, and level 3 which is partially filled.

Now, we start putting the elements one by one.
So, level 0 goes at first position, level 2 goes next and so on.

You, might be wondering why we are not wasting any space, and we dont even require pointers. This is coming from properties of almost complete binary tree. For finding parent of the node, we just need

   if(i is even)
     return i;
     return floor(i/2);

Similarly children of i are:

   return 2*i;

   return 2*i+1;

Find Nth largest element in the rotated sorted array

Question : Find Nth largest element in the rotated sorted array
So for example we have sorted array as  -
2,3,6,12, 15, 18.
Now suppose the array is rotated k times ( we don't know k), such that array becomes
15, 18,2,3,6,12

So, to find the Nth largest element in normal sorted array is a[N]. But in this case it is rotated k, which we don't know. But seeing the above array we cans see that k = 2, which is also the index of minimum element i.e. 2 here. So, 3rd largest element is counting 3rd from 2 i.e. 6. Similarly for Nth largest element, it is k+N-1. But as we can see as the array is rotated, k+N-1 may overflow array, so we need to modulo operator on k+N-1, i.e. Nth largest element lies at = (k+N-1)%arraySize.

So, the problem breaks down in 2 step

  1. Find k, i.e. number of rotations. To do so please refer the post - Find the rotation count in rotated sorted array
  2. Return the arr[(k+N-1)%arr.size]

Here is the pseudocode:
int findNthHigehest(int[] arr, int size, int n)
    int k = findRotationCount(arr, size);
    return arr[(n+k-1)%size];


Find the rotation count in rotated sorted array

Question : Find the minimum element in the rotated sorted array.
So for example we have sorted array as  -
2,3,6,12, 15, 18.
Now suppose the array is rotated k times ( we don't know k), such that array becomes
15, 18,2,3,6,12

This can be solved in linear time and logarithmic time. If we notice the above array, we see the array has been rotated 2 times, which is also the index of smallest element in the array.
So, we need to find the point of inflection where we find a point such that a[i]>a[i+1].

So, finding the minimum element in the rotated sorted array has been implemented here - Find the minimum element in rotated sorted array. Thanks.

Random Contraction Algorithm

This algorithm was designed by Karger in 1990, as phd student at stanford. This algorithm is used to solve the min cut problem, which is :

Input - An undirected graph G = (V,E) ,  (note that parallel edges are allowed)
Goal - Tom compute a cut with fewest number of crossing edges (a min cut)

Karger's Algorithm
We will have just one mail loop, and algorithm terminates when only 2 vertices are remaining. This algorithm showed that random sampling can be extremely effective for graph problems.

We have only 1 while loop. Here we first randomly select edge, and when we see if it is edge, we will fuse the 2 vertices, and this may create parallel edge or self loop. Now, if it is self loop, we delete them.
while there are more than 2 vertices
  pick a remaining edge (u,v) uniformly at random
  merge or contract u and v into a single vertex
  remove self loops
return cut represented by final 2 vertices

Example 1 - Where Karger's algorithm works
Now let's see the example. Consider the following graph:

Now, lets take the edge (s,v) randomly to start with, and fuse it:

Now, we have 2 edges between s and t, and 4 edges overall. Suppose, we take upper parallel edge between and s and t nodes, which will s to t, and remove upper parallel edge, but create a self-loop of lower parallel edge between s and t. Also, we will have parallel edge between s and w:
Finally, we can remove 2 parallel edges between s and w, as we remove the self loop on s :
But, this being random algorithm, we will have different run, each time we run it, as it can randomly select any edge. Lets see example 2.

Example 2 - Where Karger's algorithm don't work
Lets again run the Karger's algorithm on the same graph:
Suppose we fuse s and v, like we did in example 1 :

Now, we have 2 parallel edges between s and t. But this time rather than choosing any parallel edge lets choose non parallel edge, say t-w:
Now, we are left with 2 nodes, which was condition for getting out of loop, but 3 edges as min cut, which is incorrect.

So, here we found out that Karger's algorithm may not work sometimes.

Probability of Success for Karger's algorithm
It is just 1/|V| for this algorithm, but some recent works have improved it by lot.
Proof of 1/|V| to come soon.

Running time
Running time - polynomial in |V| and |E|, though the brute force algorithm takes exponential algorithm, and there are better implementation of this.


Graph Representations

We have already discussed what are graphs here. Now graph can be represented in 2ways:
  1. Adjacency matrix 
  2. Adjacency list
Adjacency Matrix
Undirected graph
Represent G by a nXn, n being number of vertices 0-1 matrix A,
where Aij = 1 for every edge between 2 nodes.

Undirected graph with parallel edges
Aij = b*1 where b is number of parallel edges between i and j.

Weighted Undirected graph
Aij = w for every edge between 2 nodes with weight w

Directed graph
Aij = +1 if edge goes from node i to j, similarly for edge from j to i, it is -1.

Space Complexity
Space required by adjacency matrix is θ (n2), no matter how many edges the graph has. In case of sparse graph it is super-wasteful implementation.

Adjacency lists
Adjacency lists contains:

  • array or list of vertices
  • array or list of edges
  • each edge points to its end points - i.e. each edge will have 2 end points
  • each vertex points to edges incident on it.
Space Complexity
Space required by adjacency list is θ (m+n) OR θ(|V|=|E|).

Which method to use?
It depends on 2 items:

  1. Density of the graph
  2. Operations needed
Usage of adjacency list : 
For sparse matrix, there is lots of space wastage if we use Adjacency matrix, hence we use adjacency list method. In case of Operations we deal with, for graph search operations, adjacency lists suits more as it fast to iterate over adjacency list, but slow when finding the presence or absence of any edge.  

Usage of adjacency matrix:
An adjacency matrix uses O(n*n) memory. It has fast lookups to check for presence or absence of a specific edge, but slow to iterate over all edges.

Thursday, August 22, 2013

Cuts of Graphs

A cut of a graph (V, E) is a partition of a graph into two non-empty sets A and B. Each set contains only the vertices that connect one vertex in the set to another in the set. Meanwhile, there are also edges connecting the vertices from set A to set B and vice versa. For eg.
Similarly we have directed graph. Edges, that cross from A, to B are called are crossing edges.

In a graph with n nodes, you can either put them in set A, or set B. This gives us 2n cuts of the graph. However, since each set must be nonempty, it is actually 2n-2. Meanwhile, a crossing edge of a cut (A, B) is an edge whose endpoint is in A and head is in B.

Also, see the minimum cut problem or min cut problem.

Wednesday, August 14, 2013

Integer Multiplication using Karatsuba algorithm

Normal Integer method
If we consider the integer multiplication, we need 2 integers and output is also integer.

Suppose we have to multiply 5678*1234. In the normal way, we will realize following:
X 1234
  22712     //first number multiply as is i.e. 4*5678
 17034-     //2nd time, shift (left) and then multiply the number as 3*5678
11356--     //shift twice
5678---     //shift thrice

So, each time we multiply, depending on the digit of the 2nd number, we shift and then multiply. Now, lets compute its time complexity.

For the first partial product, we got 22712 by multiplying 4*5678. So, we needed 4 basic operations(i.e. sum) and also we needed to take care of carries, which requires 2*4 operations at max. So, it requires at max 2n operations, depending on the number. So, for all the digits 2n2 operations. Finally we have to add up these numbers, which will again require at max - 2n2.
So, overall operations count <= constant.n2 (constant like 4)

The algorithm designer's mantra - Perhaps the most important principle for the good algorithm designer is to refuse to be content. (By Aho, Hopcroft and Ullman - The design analysis of computer algorithms 1974)

So, is there a better method?

Karatsuba multiplication

So, first half of X is a, and so on.
X = 5678 = ab i.e. a = 56, b=78
Y = 1234 = cd i.e. c = 12, d = 34
n=4, as number of digits is 4 in both X and Y.

Compute 1 = a.c
Compute 2 = b.d
Compute 3 = (a+b).(c+d)
Compute 4 = 3-2-1
Compute 5 = (1 * 10^n)+2 + (4*(10^(n/2))
5 is our answer.

1 = 56*12 =  672
2 = 78*34 = 2652
3 =(56+78)*(12+34) =  6164
4 = 2840
5 =
which equal to our number.

Lets see if it does better than 3rd grade algorithms. Before we go into the details of Karatsuba algorithm, we have to see integer multiplication using recursive algorithms.

Recursive algorithm for 3rd grade algorithm
Write X = 10n/2 a+b, Y = 10n/2c+d
a,b,c,d are n/2 digit numbers and X and Y are n digit number.
example:a=56, b=78,c=12,d=34 for X being 5678 and Y being 1234.

X.Y = 10n a.c. + 10n/2 (ad+bc)+ bd

Idea : Recursively compute ac,ad,bc, and bd until and unless we find the base conditions. So, here we have to compute 4 items recursively. The 2nd term ad+bc is important but the individual term ad and bc are not, only the sum is important. So, lets see if we can address this i.e reducing 4 recursive calls to 3.

Recursive algorithm for Karatsuba algorithm

Now, lets refine this algorithm using Karatsuba algorithm
X.Y = 10n a.c. + 10n/2 (ad+bc)+ bd
So, in the last algorithm we needed 4 recursive calls, but we care more about ad+bc rather than ad and bc.

1. Recursively compute ac
2. Recursively compute bd
3. Recursively compute (a+b)(c+d) = ac+ad+bc+bd
Gauss's trick = 3 - 1 - 2 implies 3 gives us ad+bc

Upshot, only 3 recursive multiplication call. To, compute further we need understanding of master method. Time complexity of Karatsuba algorithm is O(n log23) i.e. O(n1.59). Also, the normal recursion method takes O(n2). So, Karatsuba algorithm is much better.

Tuesday, August 13, 2013

Sunday, August 11, 2013

Data Type in Data Structure

               Before discussing data type in data structure, i am going to describe data, basis of categorization of these data, and advanced categories of data. 

DATA :- Data can be defined as a value or set of values. These values may represent some observations from an experiment, some figures collected during some surveys, marks obtained by a student in an examination, etc.

DATA ITEM :- A data item refers to a single unit of values. For example, roll no., name, date of birth, age, address and marks in each subject are data items. 
                 Data items that can be divided into sub items are called group items whereas those item which cannot be divided into sub - items are called elementary items. For example, an address is a group item as it is usually divided into sub - items such as house no., street no., locality, city, pin code, etc. On the other hand roll no., marks, city, pin code are normally treated as elementary items.

ENTITY :- An entity is something that has certain attributes or properties which may be assigned values. The values assigned may be either numeric or non - numeric. 
                    For example : A student is an entity. The possible attributes for a student can be roll no., name, date of birth and class. Each attribute of an entity has a defined set of values called Domain.

ENTITY SET :- An entity set is a collection of similar entities.
                     For example : A student of a class, employees of an organization, products manufactured by a manufacturing unit etc. forms an entity set.

RECORD :- A record is a collection of related data items.
                     For example : 

Pathological Data Sets and Universal Hashing Motivation

We will go deep inside hash function, where we will find every hash table has its Kryptonite, i.e some pathological data set, which will make it vulnerable.
Note:To continue on this article, please first read the article - Hash Tables.

The Load factor or Load of a hash table
It tells us how populated the hash table is.It is denoted by alpha (α)

α = (number of objects in the hash table) / (number of buckets in the hash table)

The load of a hash table is the total number of elements in the hash table divided by total amount of space in the table. In case of chaining implementation of hash table, the number of elements can be more than bucket size, and hence α > 1, but in case of open addressing, this is not possible.

  1. Alpha=O(1) is necessary for operations to run in constant time for chained implementation.
    With open addressing we need alpha << 1. For good performance, we need to control load.
  2. Secondly, for good performance, we need a good hash function, i.e. spreads the data evenly across buckets. However, a hash table that spreads out every data set does not exist. That is, for every hash function, there is a pathological data set.
We don't know how clients will put in the data, but we surely have control over denominator i..e. number of buckets in the hash table. So, if the data increase some particular limit, you have to increase the size of the buckets, and hence controlling the load

Great hash function possibility is no,no
Can we have super clever hash function which guarantees the spread every data set out evenly?

Answer is no. No matter what function you write, you will always find some data, to which this hash function is vulnerable.

  1. Comparing the hash functions with algorithms, we can't say for sure that the hash function will give good performance no matter what the data is. This is unlike known algorithms, like merge sort, no matter what data you give, time complexity is O(n log n) etc. 
  2.  Some one will create the pathological data set which will exploit the hash function for some purpose, for eg. denial of service attack.

    Refer the paper - Pathological Data in the Real World ,Reference: Crosby and Wallach, USENIX 2003, whcih discusses how people can paralyze several real world systems (i.e. network intrusion detection) by exploiting badly designed hash function)
Two characteristics of these breakable systems were:
  1. Open source
  2. Overly simplistic hash function designed for speed (easy to reverse engineer a pathological data set)
There are many solutions, but here are 2:
  1. Cryptographic hash function e.g. SHA-2
    These cryptographic hash functions have their own pathological data set, but it is in-feasible to figure out what this hash function is. So, they work well in practice.
  2. Use randomization.
    -- We will not design a single hash function, but a family of hash functions and at runtime we will pick up these hash function. Eg., in quick sort there was one way by which quick sort will work as O(n^2) algo, so we have to pick up the pivot randomly.
    So, in this way you can still make your code open source, but people will find it difficult to find what hash function was used in real time.


Saturday, August 10, 2013

Hash Tables

A hash table is primarily used to keep track of an evolving set of stuff by using a key. It is good for inserting, deleting, and lookups. It is also sometimes referred to as a "dictionary."

They are similar to array in a way, that array gives information at the index very fast, if you know the index. Suppose, your friend names are integer, and you save the number in the array. If you need phone number of your friend 173, you can get it as array[173]. But the friend names are strings like Alice, bob, charlie, etc So, we have to have an array with random access to the array with index as strings and not integer.

  • Insert
  • Delete
  • Lookup (very important)
The operations that it supports are all blazingly fast, O(1) running time. However, this is only if it is implemented properly.

Note: A hash table should not be used if you need to keep things in some particular order.
So, when sometimes, hashtable is called dictionary, don't assume any order, its like misnomer.
2 caveats:
  • Hashtable can be easily implemented badly - So, its better to use some standard library.
    Hence, if you have not implemented well, this O(1) guarantee will not work. But, if you are forced to come up with your own hashtable, then you will get O(1) guarantee, only if you implement well.
  • Unlike, most of the problems in data structure, hashtable doesn't enjoy worst case guarantee. So, you can't guarantee that searching for any data will guarantee any worst case guarantee.

De-duplication OR Removing the duplicates: You are given a stream of object, such as a linear scan through a large file or objects arriving in real time. The goal is to remove duplicates (only keep track of unique objects). This could be  used to report only unique visitors to a website and avoid duplicates in search results.

When a new object x arrives, look it up in hash table H. If it is not found, then insert x into H.

The 2-SUM Problem: You are given an array of n integers and a target sum T. The goal is to determine whether or not there are two numbers x,y in A with x+y=T.
(I have already mentioned this problem here, but showing here in summary for application of stack)

Approach 1:
The naive way to do this would be to check all combinations (n choose 2). This exhaustive search is O(n2).

Approach 2: 
 A better way would be to sort the array. This takes O(n log n)
Then for each x in array A, use binary search to look for T-x. This will take O(nlogn).
So, overall search is  O(n log n)

Approach 3 :
The best way would be to insert every element into a hash table (without sorting). This takes O(n) as constant time insertion.
Then for every x, we can just look up its complement, T-x, which is O(1).
Overall it takes will be O(n).

Other application

  • historical application - Hash tables was first designed for and used to symbol table lookups for a compiler. 
  • It can also be used to block network traffic. If the IP is in blacklist, just drop the packet and don't allow to connect.
  • Speeding up search algorithms like in Game theory or Chess. (There are more chess configurations than size of web). You can add a configuration to a hash table to avoid exploring it more than once. 
How to Implement HashTables?
 So how is a hash table implemented?
Setup - Universe U (e.g. all possible IP address, or all possible names or all possible mobile numbers or all possible chess board configuration). You might have noticed that universe U is very large.

Goal - Set S is of reasonable size S⊆ U. So, this set S is of reasonable size, like you keeping phone number of your friends.

Selecting data-structure for implementation
Without data-structure, we will not get good solution. We could use an array for quick insert, deletion, and lookup, but it will take up a lot of space. Also, your friend name should be integer and space requirement is proportional to universe.

On the other hand, we could use a linked list. This would only take up as much as space as there are objects i.e. Set S, but the 3 operations would not be O(1). To resolve this, we can use both.

So, the solution is to use the best of both worlds, i.e. fast lookup of arrays and small storage of size like link list.

Solution 1 :
The first way to implement a hash table could be to use an array of linked lists. Now, the array size will not be the size of universe, but the size proportional to S, the set we are storing. So, let array size be n which is proportional to the size of the S. For the time being lets assume that the size of set S, doesn't fluctuates.
(Note: If size of S increases, we have to re-allocate the array to keep our data structure and vice versa for shrinking of set S)

We will be calling each entry of array as bucket. So, we have n buckets, where
n ∝ S i.e. n = kS where k is some constant.

Now we need to translate between the things we care about like names of the friends to the position in the array.  This is achieved by hash function.
Formal definition of hash function - So, hash function takes as input a key like IP address OR name of your friend and returns the position in this array. This is not specified how you want to implement this function.

If, suppose we have friend Alice,
hash(Alice) = 17
This implies, we have to save Alice has to be stored at the 17th bucket(or position) of the array.

We use a hash function to determine which index the object goes into. If there is already something in that index, this is called a "collision" .
 Suppose, hash(Alice) = 17, and hash(Bob) = 17, then how do we solve this.

How often can collision occur?
 The "Birthday Paradox" states that given 23 people, there is a 50% chance that two people will share the same birthday. By this principle, we can see that it doesn't take relatively many objects to cause a collision. See here for more on birthday paradox.

Resolving Collisions - Using Chaining

For x,y  ∈ U, if h(x)=h(y)
So, this U is very large, and hash function is to somehow map all this universe with buckets n in array A:
Separate Chaining
To resolve this, we just make a chain at that index of the array. So we keep linked list in each bucket
given a key/objectx, perfrom insert/delete/lookup in the list in A[h(x)].

For eg., Suppose we have 4 buckets, in the first bucket we have A (Say alice), and in bucket 3 we have B(say Bob) and D(say Daniel) as there was collision between B.
Open Addressing

The second way to implement a hash table would be to use open addressing. Though it is practically hard to explain mathematically, but is very important solution. Here we don't use pointer or list at all.
You will have 1 object per bucket. So, if you have inserted Bob at bucket, and Daniel collides, you will again probe the hash function to get the next position. So, hash function becomes hash sequence.

When there is a collision, simply put it elsewhere. This could be the next available index, which is called linear probing. So, you go on produce hash until you find next open slot.

Another we solution we have double probing - we have 2 use hash function h1(x) and h2(x). Suppose h1(Daniel) = 17, and h2(Daniel) = 23. If position 17  is empty, you insert Daniel there. But if it is full, then you h2(Daniel) will act as an additive offset So if position 17 is empty, you will place Daniel at 17+23 = 40 th position. Likewise, if 40 is also full, the look for position 40+23 i.e. 63 and so on.

Which solution for collision resolving is better?
If space is problem, then use probing, because chaining uses more space
Deletion is more trickier for probing, and then you can use chaining. 

Implementing Good Hash function
We will discuss about hashtable with chain. Insertion in the hashtable is O(1).

Invoke hash function and see where is our bucket.
Now go to the bucket and traverse the list until we find our element.

Suppose we have 100 elements which are inserted in the hash function, then we have
Lucky hash function - 1 element per bucket
Stupid hash function - all elements in in a[0].

In case of chaining, the lookup time will depend on the link list, and hence depend on the quality of hash function.

In case of open addressing implementation, the time complexity depend on the probe sequence and hence the implementation of hash function.

So, performance of hash function depends on the quality of hash function, no matter what implementation we use.

Properties of Good hash function
  1. Should lead to good performance. So, in case of chain implementation, the length of the chain should be equal. So, we should spread data out. So, the gold standard for spreading data out is completely random function.
    Note - Its also important to remember what is our hash function is. So, when we want to lookup for the element, it will again generate hash and hence we will not be easily. We have to store somewhere that we stored Alice there. So, completely random hash function is not possible.
  2. Should be easy to store and very fast to evaluate. 
Bad Hash Functions
Bad hash function - Example 1
keys = phone numbers (10 digits)
Size of universe = |U| = 1010. Suppose we won't have more than 500 friends or phone numbers which we have to track, so our |S| = 500. So, we choose 1000 buckets, just doubling it. So, n=1000. So our hash function should take mobile number and produce 3 digit hash function.

terrible hash function
h(x) = floor(x/1000 )

So, suppose we use 3 most significant digits of mobile number given the number, and suppose first 3 digits in the mobile number reflects the operator or area code, which is normally the case. Then this is a terrible idea.

mediocre hash function

h(x) = x mod (1000)

So, suppose now we use last 3 digits as the hash to the key provided, it is still better and their is no evidence that last 3 digits are operator or area related, and hence still better. But still collision is possible, as there may be some pattern which will make the hash function vulnerable.

Bad hash function - Example 2 
key = Memory location of the object
In case of memory location it is in power of 2. Our goal is to take big integer which is power of 2 and producing a good hash.
keys = Memory location, and have power of 2 and hence even.

terrible hash function -
h(x) = x mod (1000)

This hash function is incapable of producing odd hash, and hence half of the buckets are wasted.
So, the mediocre hash function in  example 1, is terrible here.

Good hash function
Answering what is good hash function is quite tricky. So, what we discuss now is okay-ish hash function and not good. For good hash function we have to study much more that :)

Suppose the object type is string, like name. So, our first requirement is to get the integer or hashcode from the object
int GetHashCode(string s)
     int sum = 0;
     foreach(Char c in s)
       int x = ascii(c) // get ascii value of char c   
       sum += x;
     return sum;

Once you convert the object to integer, then we have to use some compression function like mod (n) where n is the size of the hash table.

int Compress(int num, int size)
     return num mod (size); //mod is the the function which 
                            //returns remainder after dividing number num by number size.

So overall we do following:
GetBucketIndex(object o)
     int hashCode = GetHashCode(o);
     int compressedValue = Compress(hashCode, sizeOFHashTable);
     return compressedValue;

So, this gives us the index of the object o, where it has to be inserted.

how to choose the n - number of buckets ?

  1. Choose n to be a prime (within constant factor of number of objects in table)
  2. Prime should not be close to the data. like power of 2 or power of 10.
Note that there is no single solution to good function. 
Here is how hashCode() method is implemented in java for strings. Thanks.

Given a dictionary of word - Group all words into set of anagrams of each other

Given a set of words in a dictionary, write a program to group all words which are anagrams of each other in to sets.

input: “bat”, “tar”, “xyz” , “tab”, “tar”     
output:[["bat", tab"], ["tar", rat"],”xyz” ]

Two words are anagrams if one of them has exactly same characters as that of the another word.

Example : Anagram & Nagaram are anagrams (case-insensitive).

The simpler logic would be to :
  1. Use some sort of a hash structure - key being string, and value may be list of words
  2. Sort all words alphabetically and use the sorted word as the key, the value being the non sorted word. So bat will become abt, and it is the key in the dictionary and values will be bat and tab.
  3. If the key is found, keep on adding the original word to the ‘value’ list of strings
Something like this :
Solution 1 : Using hash table
def compare_sort(x):
        sorted_key_list = list(x)
        sorted_str = ','.join(str(n) for n in sorted_key_list)
        sorted_str = sorted_str.replace(',','')
        return sorted_str

source = ['tab','bat','atb','tac','cat','xyz']

target_hash = {}

for x in source:
        sorted_key = compare_sort(x)
        print sorted_key

        if target_hash.has_key(sorted_key):
                existing_list = target_hash[sorted_key]
                target_hash[sorted_key] = existing_list
                target_list =[x]
                target_hash[sorted_key] = target_list

print target_hash     

Complexity: number of words in dictionary
space complexity : hash table entries

Solution 2 - Assigning characters with prime numbers

The obvious solution is to map each character to a prime number and multiply the prime numbers. So if 'a'' -> 2 and 'b' -> 3, then
  • 'ab' -> 6
  • 'ba' -> 6
  • 'bab' -> 18
  • 'abba' -> 36
  • 'baba' -> 36
To minimise the chance of overflow, the smallest primes could be assigned to the more frequent letters (e,t,i,a,n). Note: The 26th prime is 101.So, for all the words where the integer comes out to be same, you have an anagram. Here is more you can read on this.

Friday, August 9, 2013

Splay trees

Boyer Moore Algorithm


What is a sorting algorithm ?

A sorting algorithm is an algorithm that sorts the data into some order, say from ascending to descending.

The sorting problem
Input: Array of numbers , unsorted. Eg.

Output : Same numbers sorted in some order, say increasing order. Eg.

Classification of sorting algorithms
Sorting algorithm can be classified in various ways.

Comparison sorts vs non comparison sorts

Various sorts

Tower of Hanoi Problem

There are 3 pegs source ,auxillary and target. n disks of different sizes are given which can slide onto any peg . In the beginning all of the disks are in the source peg in the order of size with largest disk at the bottom and smallest disk at the top. We have to move all the disks from source peg to target peg such that in the end the target peg will have all the disks in the same order of size.

  1. Only one disk can be moved from one peg to another peg at a time
  2.  A larger disk cannot be placed on top of the smaller disk 
Tower of Hanoi

C++ Program
#include <iostream>
#include <cstdlib>
static int moveCounter = 0;
void towerOfHanoi(int ndisk, char source, char auxillary, char target)
  if(ndisk == 1)
    std::cout << "Move [" << ndisk << "]   [" << source << "] to [" <<
      target << "]" << std::endl;
  //place ndisk - 1 disks from source to auxillary peg
  towerOfHanoi(ndisk - 1, source, target, auxillary);
  //place ndisk to target peg
  std::cout << "Move [" << ndisk << "]   [" << source << "] to [" <<
      target << "]" << std::endl;
  //place ndisk - 1 disks from auxillary to target peg
  towerOfHanoi(ndisk - 1, auxillary, source, target);
int main(int args, char *argv[])
  if(argv[1] == NULL)
    std::cout << "ERROR: Insufficient Arguments\n";
    std::cout << "Usage: ./a.out number_of_disks\n";
  int disks = atoi(argv[1]);
  char peg1 = 'A', peg2 = 'B', peg3 = 'C';
  towerOfHanoi(disks, peg1, peg2, peg3);
  std::cout << "Total Moves = " << moveCounter << std::endl;

Java Program:
public class TowerOfHanoi
  private static final char source = 'A';
  private static final char auxillary = 'B';
  private static final char target = 'C';
  private static int moveCount = 0;
  public static void main(String args[])
    if(args.length < 1)
      System.out.println("ERROR: Insufficient Arguments");
      System.out.println("Usage: java TowerOfHanoi number_of_disks");
    int ndisk = Integer.parseInt(args[0]);
    towerOfHanoi(ndisk, source, auxillary, target);
    System.out.println("Total Moves = " + moveCount);
  public static void towerOfHanoi(int ndisk, char source, char
      auxillary, char target)
    if(ndisk == 1)
      System.out.println("Move [" + ndisk + "] from [" + source +
          "] to [" + target + "]");
    //move ndisk - 1 disks from source peg to auxillary peg
    towerOfHanoi(ndisk - 1, source, target, auxillary);
    System.out.println("Move [" + ndisk + "] from [" + source +
          "] to [" + target + "]");
    //move ndisk - 1 disks from auxillary peg to target peg
    towerOfHanoi(ndisk - 1, auxillary, source, target);

Time Complexity: Let the time required for n disks is T(n). There are 2 recursive call for n-1 disks and one constant time operation to move a disk from source peg to target peg . Let it be c. Therefore, 
T(n) = T(n-1) + c + T(n-1)
T(n) = 2T(n-1) + c
T(0) = k1
T(1) = 2T(0) + c1 = 2k1 + c1
T(2) = 2T(1) + c2 = 2(2T(0)) + c2 = 2(2k1 + c1) + c2 = 4k1 + 2c1 + c2
T(3) = 2T(2) + c3 = 2(4k1 + 2c1 + c2) + c3 = 8k1 + 4c1 + 2c2 + c3
Coefficient of k1 =2n Time complexity is O(2n)  

//Output with 3 pegs
Move [1]   [A] to [C]
Move [2]   [A] to [B]
Move [1]   [C] to [B]
Move [3]   [A] to [C]
Move [1]   [B] to [A]
Move [2]   [B] to [C]
Move [1]   [A] to [C]
Total Moves = 7

//output with 5 pegs
Move [1]   [A] to [C]
Move [2]   [A] to [B]
Move [1]   [C] to [B]
Move [3]   [A] to [C]
Move [1]   [B] to [A]
Move [2]   [B] to [C]
Move [1]   [A] to [C]
Move [4]   [A] to [B]
Move [1]   [C] to [B]
Move [2]   [C] to [A]
Move [1]   [B] to [A]
Move [3]   [C] to [B]
Move [1]   [A] to [C]
Move [2]   [A] to [B]
Move [1]   [C] to [B]
Move [5]   [A] to [C]
Move [1]   [B] to [A]
Move [2]   [B] to [C]
Move [1]   [A] to [C]
Move [3]   [B] to [A]
Move [1]   [C] to [B]
Move [2]   [C] to [A]
Move [1]   [B] to [A]
Move [4]   [B] to [C]
Move [1]   [A] to [C]
Move [2]   [A] to [B]
Move [1]   [C] to [B]
Move [3]   [A] to [C]
Move [1]   [B] to [A]
Move [2]   [B] to [C]
Move [1]   [A] to [C]
Total Moves = 31

Thursday, August 8, 2013

Implementing doubly linked list using single pointer

Rearranging linked list first odd next even numbers

Here is the implemenation in cpp

Wednesday, August 7, 2013

GetMin and GetMax in BST

Binary tree - Types and Properties

Tree is sometimes binary tree (BT) and sometimes binary search tree(BST).
BT has maximum of 2 child nodes.
Binary Search Tree or Ordered tree(sometimes) is BT such that left child has value less than the parent and right child has value greater than parent.
Strictly binary tree - If every non-leaf node in BT has non empty left and right subtrees.
   / \
  B   C
     /   \
    D    E
   / \
  F    G
Complete BT or Perfect BT of depth d is strictly BT all of whose leaves are at level d.
   /    \
  B      C 
/   \   /    \
D   E  F   G
Almost complete BT (or sometimes complete BT) of depth d is BT such that :
1. Any node at level d-1 has 2 sons.
2. For any node in the tree with a right descendant at level d, nd must have a left son and every left descendant of node is either a leaf at level d or 2 sons.
i.e. it is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible.
These are note almost complete BT :
a) This has only 1 leave at leavel 1 2 3, violating condition 1.
       /   \
      B     C
    /   \  

   D   E 

/   \  

F   G

b) This satisfies condition 1, since every leaf is either at level 2 or 3. But condition 2 is violated, because A has right descendant at level 3( J ) but also has left descendant (E ) which is a leaf at level 2.
     /    \
    B      C 
   /   \   /    \
  D   E  F   G
/   \   /   \
H   I  J   K   

These are almost complete BT.

     /    \
    B      C 
   /   \   /    \
  D   E  F   G
/   \  
H   I
     /    \
    B      C 
   /    \   /    \
  D    E  F   G
/   \  / 
H   I J

A degenerate tree is a tree where for each parent node, there is only one associated child node. This means that in a performance measurement, the tree will behave like a linked list data structure.

Properties of binary trees
  • The number of nodes n in a perfect binary tree can be found using this formula: n = 2h + 1 − 1 where h is the height of the tree.
  • The number of nodes n in a (almost) complete binary tree is minimum: n = 2h and maximum: n = 2h + 1 − 1h is the height of the tree. where
  • The number of nodes n in a perfect binary tree can also be found using this formula: n = 2L − 1 where L is the number of leaf nodes in the tree.
  • The number of leaf nodes L in a perfect binary tree can be found using this formula: L = 2h where h is the height of the tree.
  • The number of NULL links in a Complete Binary Tree of n-node is (n+1).
  • The number of leaf node in a Complete Binary Tree of n-node is UpperBound(n / 2).
  • For any non-empty binary tree with n0 leaf nodes and n2 nodes of degree 2, n0 = n2 + 1.

Linear Search on Array

Tuesday, August 6, 2013

Randomized Algorithm

Bellman ford algorithm - One source shortest path

Sunday, August 4, 2013

Strongly Connected Graph

A strongly connected graph is a graph such that there is a path connecting each vertex in the graph to another vertex in the graph.

The strongly connected components (SCCs) of a directed graph G are the equivalence classes of the relation: u~v <-> path u->v AND path v->u. 
The ~ relation is an equivalence class, that is, ~ is reflexive, symmetric, and transitive. (Reflexive => everybody is related to itself, symmetric => if u is related to v, then v is related to u and Transitive => if u is related to v, v to w then w is related to u )
Following is the example of strongly connected components:

Above graph has 3 SCCs in it.

Note: In the post, I assume that the reader knows about DFS or BFS.
Computing the strongly computed components allows you to use the for-free primitives (BFS, DFS, etc) on the components in linear time, which is blazingly fast.

Calling DFS on the right nodes will compute all the SCCs, but on the wrong nodes will just give back the union of multiple SCC's or even the whole graph.

Why DFS or BFS for SCCs?
If we use DFS or BFS on any node in SCC, then it gives us back SCC, but there are some caveats, that if you traverse in some other node, in some other SCC, it will give us multiple SCCs, so if you select wrong node, then it gives us more than 1 SCC.

Kosaraju's Two pass algorithm

With Kosaraju's Two-Pass Algorithm, we can compute the SCC's in linear time (O(V+E)), with just two passes of DFS.

Here are 3 steps in Kosaraju's algorithm

  1. Make a new graph with all the edges of original graph reversed.
    The naive way of implementing this would to run through the graph linearly and construct a new one. An optimization for this would be to just run DFS on the original graph, but going across arcs/edges backwards. Lets call this graph Grev.
  2. Run DFS-loop on Grev
  3. Run DFS-loop on G.
    The first DFS-loop computes the "magic ordering" of the nodes while the second DFS-loop discovers the SCC's one by one.
Step 2 computes magical ordering of nodes,  such that DFS in step 3 computes SCCs one by one. Step 1 will have a notion of finishing time of first vertex. In 2nd pass  we will be processing the nodes in decreasing order of finishing times, and in second pass we will mark some nodes called leader.

//Note - it takes input as Graph and not any vertex
DFS-Loop(graph G)

     global variable t = 0 //(for finishing times in first pass, number of nodes finished so far)
     //(current source vertex, for leaders in second pass)
     global variable s = null 
     //So s keeps track of the most recent vertix for which DFS was initiated

     Assume nodes labeled 1 to n
     //iterate in decreasing order
     for i:=n down to 1
          if i not yet explored
               s := i
               DFS(G, i)

DFS(graph G, node i)
     mark i as explored
     set leader(i) := node s//by definition a leader
     for each arc(i,j) in G
          if j not yet explored
               DFS(G, j)
     set f(i) := t [i's finishing time]

Time compexity -=2*DFS =  O(|V|+|E|) or O(n+m)

Dry running the algorithm
Consider the following graph (used by Tim Roughgarden in Algorithms: Design and Analysis, Part 1 coursera):

DFS - First past
Lets have first pass of DFS. We will get the finishing time of the nodes.
Though finishing time vary, we don't have to worry about it, but rather dry running the algorithm: So lets start with end node i.e. 9. So we start from 9, then go to 6, then we can go to 8 or 3. Lets go to 3 (say). The only outgoing arc for 3 is 9, which is already visited. That means 3 is finished. We increase t by 1, f(3) =1 as 3 is first node to finish. Hence:

Now we backtrack to 6, then move to 8, then 2 and then 5. But has only 1 outgoing arc, which is 8 and is already marked. So we mark 5 as finished, hence f(5) = 2. We backtrack to 2, f(2) = 3, then we backtrack to 8 then to 6 and then 9, and all these nodes are completed. So following is the current status:

So 9 is already explored, so we come otut of loop, then we decrement the value of i by 1, and check for 8, and it is already explored so we decrease i again and check for 7, and it is not explored. From 7, we go to 4, and then to 1. From 1 to 7 we see that 7 is already finished we backtrack, hence 1 is finished, then 4 and in the end 7. See below:

So our ordering is done and based on finishing time lets replace the graph vertices by finishing time.
Now lets rename the nodes from their original names to the finishing times and also their arcs/edges reversed:

So our ordering is 1,2,3,4,5,6,7,8,9 goes to 3,5,2,8,6,9,1,4,7. But once the edges are reversed the ordering becomes - 7,4,1,9,6,8,2,5,3

DFS - 2nd pass
So lets start DFS on this graph from the end node. i.e. 9. From 9 we go to 7, from 7 to 8 and from 8 we are again back to 9 and it is already explored, and hence leader vertex is 9, and you will notice that it one of the SCCs.

Now we come out to outer loop. We go to 8, already seen, then we go to 7, we have already explored then we go to 6, and it is not already explored. From 6 we can go to 9 or 1, say we go to 9, which is already explored, and hence we backtrack, we then go 1, from 1 to 5, and back to 6, so only new node we added were 1, 5 and 6, with leader node at 6. We notice that it is also SCC of graph:

We again come back to outer loop. Last value was 6, we decrement to 5, it is already explored, then 4, which is not explored. From 4 we can go to 5 or 2. Say we go to 5 first, we see it is already explored, we backtrack and go to 2, then 3 then 4 again. So new node added are 2 3 and 4. So it is new SCC:

Then we come out to outer loop, then go to 3, which is explored, then 2 which is explored then 1 which explored and hence the end.

Observations about SCCs 
Claim - The SCCs of a directed graph induce a acyclic "meta-graph"
meta-nodes - C1, ..., CK
So our above example becomes :


Topological Sorting


Topological sort: an ordering of the vertices in a directed acyclic graph, such that: If there is a path from u to v, then v appears after u in the ordering.

A topological ordering of a directed graph G is a labeling f of G's nodes (1...n) such that:
(1) the f(v)'s are the set {1, 2, ... , n}
(2) if (u, v) in G, then f(u) < f(v) ... the graph should move in forward direction.

Lets understand this with example, consider following graph :
Here we can order s,v,w,t as 1,2,3,4:
OR we can order s,w,v,t as 1,2,3,4 i.e. w coming before v.
So, if we have only 2 topological orders. If we put w before s, we will have to go backwards, and hence this is not topological ordering.

Necessary condition for graph to have topological ordering
So, only restriction on directed graph to have topological ordering is that it shouldn't have cycle.

Consider the graph G with directed cycle, suppose we go forward, and when we pass through cycle, we come back to the same point, hence there is no ordering.

So, if there is no cycle we are guarantee to have topological ordering.

Uniqueness of topological ordering
A topological sort will be unique if and only if there is a directed edge between each pair of consecutive vertices in the topological order (i.e., the digraph has a Hamiltonian path). Source

A Hamiltonian path just means that a path between two vertices will only visit each vertex once, it does not mean though that one vertex must have no incoming edges. You can have a Hamiltonian path that is in fact a cycle. This would still generate a unique topological sort (of course it would be a cycle as well if that is important to you).

Uses of topological sort
Scheduling a sequence of tasks based on their dependency.

Algorithm 1 - Using sink vertex
Before we go to algorithm lets define some terms.

Sink vertex - a vertex that has no outgoing edges.

Note - Every directed acyclic graph (DAG) has a sink vertex.
Proof - Suppose that every vertex has atleast outgoing edge, then we keep following outgoing edge  and as the graph size if finite say n, and because of this we see n+1 vertex, which results in directed cycle, and we have already said that it shouldn't exits. So, the last vertex shouldn't return back, otherwise that will result in cycle.

So if graph has to have a topological order, then its the sink vertex where the order ends.

To compute the topological ordering:
Let v be the sink vertex of Graph G
f(v) = n
recurse on (G-v)

so, lets take the example:
Consider the graph below, where t is the sink vertex(marked with orange circle):

We set, f(t) = 4, and remove it from graph. Now we recurse on G-v :
This time we have 2 sink vertices - w and v. Suppose we set f(w)=3, then we remove it from graph. Then we come to f(v) = 2 and remove it from graph and finally f(s) = 1.

So, we take away the sink vertex again and again, until we have no vertex left. It is pretty fast but lets now see the application of DFS.

Topological ordering algorithm 2 - Using DFS
Here is a slick way to do a topological sorting with DFS:

DFS-Loop doesn't take current vertex and has a global variable current_loop is global variable which is initialized to n, and is reduced by1 each time we finish exploring the new vertex.

DFS-Loop(graph G)

    mark all nodes unexplored

    current_label = n (number of nodes, to keep track of ordering,which we will count down)

    for each v in G
        if v not yet explored
            DFS(G, v)

DFS(graph G, vertex start)
    mark start explored
    for every edge(start, v)
        if v not yet explored
            DFS(G, v)
    set f(start) = current_label

Time Complexity - This algorithm is as fast as DFS, because we have added only 1or 2 steps in it which are very trivial. This will run in linear time because you do a constant amount of time at each node and traverse each edge only once since we keep track of which has or has not been explored. So, we have O(n+m) OR O(|V|+|E|).

Dry running the algorithm:
Consider the following graph :
Lets start dfs at node v, as we can start randomly from any point. If we start from v, only point we can go to is t.
So, we set f(t) = current label, which is currently equal to n, which is number of nodes and it is 4.
Once we are done with t, we backtrack to v, and count down current label by 1, to 3. So, f(v)=3:
Now, we go to s as we have not seen it yet. Now from s, we can go to v or w. But v we have already seen, so we move w, from w to t, but we have been to t. So, we back track to w:
Now from w we backtrack to s:

To prove correctness, we need to show that if (u,v) is an edge, then f(u)<f(v)
Case 1 : u visited by DFS before v => recursive call corresponding to v finishes before that of u(since DFS using lifo structure). So, we v will be assigned larger label. Thus f(v) > f(u).

Case 2: If v is visited before u, there is no path (v, u) because otherwise it would be a directed cycle. Then, the recursive call of v gets popped before u even gets put on the stack. Thus f(v) > f(u)