Tuesday, January 28, 2014

Algorithm to find top 10 search terms in search engine

Problem : 
"You have been asked to design some software to continuously display the top 10 search terms on Google (or any other search engine). You are given access to a feed that provides an endless real-time stream of search terms currently being searched on Google. Describe what algorithm and data structures you would use to implement this. You are to design two variations: (i) Display the top 10 search terms of all time (i.e. since you started reading the feed). (ii) Display only the top 10 search terms for the past month, updated hourly. You can use an approximation to obtain the top 10 list, but you must justify your choices."

http://stackoverflow.com/questions/3260653/algorithm-to-find-top-10-search-terms

Find the k most frequent words from a file

This updated information has been further expanded upon on my new website. You can find the updated details here: https://k5kc.com/cs/algorithms/top-k-frequent-words-problem/.
Case 1 - Consider the file is not big
The question can be described as: Input: A positive integer K and a text file which can stay in-memory. The text can actually be viewed as word sequence. So we don't have to worry about how to break down it into word sequence. Output: The most frequent K words in the text.
My thinking is like this.
1) use a Hash table to record all words' frequency while traverse the whole word sequence. In this phase, the key is "word" and the value is "word-frequency". This takes O(n) time.
2) sort the (word, word-frequency) pair; and the key is "word-frequency". This takes O(n*lg(n)) time with normal sorting algorithm.
3) After sorting, we just take the first K words. This takes O(K) time.
To summarize, the total time is O(n+n*lg(n)+K), Since K is surely smaller than N, so it is actually O(n*lg(n)).
We can improve this. Actually, we just want top K words. Other words' frequency is not concern for us. So, we can use "partial Heap sorting". For step 2) and 3), we don't just do sorting. Instead, we change it to be
2') build a heap of (word, word-frequency) pair with "word-frequency" as key. It takes O(n) time to build a heap;
3') extract top K words from the heap. Each extraction is O(lg(n)). So, total time is O(k*lg(n)).
To summarize, this solution cost time O(n+k*lg(n)).

Case 2 - When the file size is too big to fit in-memory
Now this can be solved by 2 ways
Solution 1 - Caching
Break the file in to n components, so that they can fit in memory. Now read the first component, and keep it in cache. Start reading the next component, and add the component to the cache where words occur more frequently, for rest of the words there will be cache-miss.

Solution 2 - Map reduce kind of way
Breaking the file into k , create a word to frequency mapping. Now merge the 2 files, and create a max heap, re-heapify as we merge next file with it.




Friday, January 24, 2014

Given a binary tree with extra field "next pointer" in node, populate the next pointer to the nodes inorder successor

This updated information has been further expanded upon on my new website. You can find the updated details here: https://k5kc.com/cs/algorithms/populate-next-pointer-to-inorder-successor-in-binary-tree/.
Problem : Given a Binary tree where each node has an extra field (node * next), write a function that puts the inorder successor of that node in the next pointer.
This is how our binary tree node looks like :
struct node
{
  int data;
  struct node* left;
  struct node* right;
  struct node* next;
}

Initially, all next pointers have NULL values. Your function should fill these next pointers so that they point to inorder successor.

Solution
This can be petty easily done using the general Inorder traversal algorithm of binary trees. Please refer what is inorder successor or the node here.



Solution (Use Reverse Inorder Traversal)
Traverse the given tree in reverse inorder traversal and keep track of previously visited node. When a node is being visited, assign previously visited node as next.
/* Set next of p and all descendents of p by traversing them in reverse Inorder */
void populateNext(struct node* p)
{
    // The first visited node will be the rightmost node
    // next of the rightmost node will be NULL
    static struct node *next = NULL;
 
    if (p)
    {
        // First set the next pointer in right subtree
        populateNext(p->right);
 
        // Set the next as previously visited node in reverse Inorder
        p->next = next;
 
        // Change the prev for subsequent node
        next = p;
 
        // Finally, set the next pointer in left subtree
        populateNext(p->left);
    }
}

We can avoid the use of static variable by passing reference to next as paramater.
// An implementation that doesn't use static variable
 
// A wrapper over populateNextRecur
void populateNext(struct node *root)
{
    // The first visited node will be the rightmost node
    // next of the rightmost node will be NULL
    struct node *next = NULL;
 
    populateNextRecur(root, &next);
}
 
/* Set next of all descendents of p by traversing them in reverse Inorder */
void populateNextRecur(struct node* p, struct node **next_ref)
{
    if (p)
    {
        // First set the next pointer in right subtree
        populateNextRecur(p->right, next_ref);
 
        // Set the next as previously visited node in reverse Inorder
        p->next = *next_ref;
 
        // Change the prev for subsequent node
        *next_ref = p;
 
        // Finally, set the next pointer in right subtree
        populateNextRecur(p->left, next_ref);
    }
}

Time Complexity: O(n)

Thursday, January 23, 2014

Data Structure to Emulate LRU Cache

Problem

Implement the LRU cache

 Solution

Least Recently Used (LRU) Cache is to discard the least recently used items first How do you design and implement such a cache class? The design requirements are as follows:
1) find the item as fast as we can
2) Once a cache misses and a cache is full, we need to replace the least recently used item as fast as possible.

We use two data structures to implement an LRU Cache :
1. A Queue which is implemented using a doubly linked list. The maximum size of the queue will be equal to the total number of frames available (cache size).
The most recently used pages will be near front end and least recently pages will be near rear end.
2. A Hash with page number as key and address of the corresponding queue node as value.
When a page is referenced, the required page may be in the memory. If it is in the memory, we need to detach the node of the list and bring it to the front of the queue.
If the required page is not in the memory, we bring that in memory. In simple words, we add a new node to the front of the queue and update the corresponding node address in the hash. If the queue is full, i.e. all the frames are full, we remove a node from the rear of queue, and add the new node to the front of queue.
Note: Initially no page is in the memory.


A linked list + hashtable of pointers to the linked list nodes is the usual way to implement LRU caches. This gives O(1) operations (assuming a decent hash). Advantage of this (being O(1)): you can do a multithreaded version by just locking the whole structure. You don't have to worry about granular locking etc.
Briefly, the way it works:
On an access of a value, you move the corresponding node in the linked list to the head.
When you need to remove a value from the cache, you remove from the tail end.
When you add a value to cache, you just place it at the head of the linked list.

Assuming hashmap has retrieval complexity of O(1), we can do this using a doubly link list and a Hashmap

    Doubly link list will be used to store indexes sorted by date/timestamp information.
    Keep track of first and last node.
    No sorting effort is needed as each new node will have latest timestamp.
    The Hashmap will retrieve the node searched by key. It will also contain pointer to its respective doubly link list node.


Operations

Here is how we will handle cache operations

READ -

    Search the node by key. Retrieve the node value and index for doubly linklist.
    Delete the node from doubly linked list and add it with the updated date value at the starting of the list.
    Complexity of deleting a node in doubly link list is O(1) as we can reach to it without traversing the list.

INSERT -

    Add the new node in the hashmap and the doubly linked list.
    If the cache is full, use the pointer to the last node in the doubly linked list and delete that node. Set the delete pointer appropriately.
    Inserting at the start of linked list is operation with time O(1).

UPDATE -

    Update operation will need similar operation as READ in terms of accessing the data structure. Hence, similar time complexities apply.

DELETE -

    Deleting the node from the doubly-linked list and hashmap can be done in O(1).





Java's LinkedHashMap, explained well here - http://java-planet.blogspot.com/2005/08/how-to-set-up-simple-lru-cache-using.html

But, what if the environment is multi-threaded and we want a concurrent solution?

http://www.codewalk.com/2012/04/least-recently-used-lru-cache-implementation-java.html
http://javadecodedquestions.blogspot.in/2013/02/java-cache-static-data-loading.html
http://spsneo.com/blog/2011/03/15/a-simple-lru-cache-implementation/
http://stackoverflow.com/questions/15951935/the-best-way-to-implement-lru-cache
http://stackoverflow.com/questions/221525/how-would-you-implement-an-lru-cache-in-java-6
http://www.linkedin.com/groups/One-famous-java-interview-question-70526.S.5810112475999264771
http://www.careercup.com/question?id=17230678
http://www.careercup.com/question?id=13998662
http://www.cs.uml.edu/~jlu1/doc/codes/lruCache.html
http://timday.bitbucket.org/lru.html
http://java-planet.blogspot.in/2005/08/how-to-set-up-simple-lru-cache-using.html
http://www.geeksforgeeks.org/implement-lru-cache/
http://mcicpc.cs.atu.edu/archives/2012/mcpc2012/lru/lru.html
http://stackoverflow.com/questions/2504178/lru-cache-design
http://stackoverflow.com/questions/221525/how-would-you-implement-an-lru-cache-in-java-6
http://stackoverflow.com/questions/11623994/example-using-androids-lrucache
http://www.geeksforgeeks.org/implement-lru-cache/
http://www.careercup.com/question?id=1726905
http://codereview.stackexchange.com/questions/32849/design-lru-cache-interview-questions
http://stackoverflow.com/questions/224868/easy-simple-to-use-lru-cache-in-java
http://www.java-blog.com/creating-simple-cache-java-linkedhashmap-anonymous-class
http://stackoverflow.com/questions/13158657/simple-java-caching-library-or-design-pattern

Wednesday, January 22, 2014

Print nodes at k distance from root

Given a root of a tree, and an integer k. Print all the nodes which are at k distance from root.
For example, in the below tree, 4, 5 & 8 are at distance 2 from root.
            1
          /   \
        2      3
      /  \    /
    4     5  8 

Code:
void printKDistant(node *root , int k)   
{
   if(root == NULL)
      return;
   if( k == 0 )
   {
      printf( "%d ", root->data );
      return ;
   }
   else
   {     
      printKDistant( root->left, k-1 ) ;
      printKDistant( root->right, k-1 ) ;
   }
}

Time Complexity: O(n) where n is number of nodes in the given binary tree.

Find the time period with the maximum number of overlapping intervals

Problem

There is one very famous problem. I am asking the same here. There is number of elephants time span given, here time span means, year of birth to year of death. You have to calculate the period where maximum number of elephants are alive.

Example :
 1990 - 2013
 1995 - 2000
 2010 - 2020
 1992 - 1999

Answer is   1995 - 1999

Solution


Pseudocode
  • Split each date range into start date and end date.
  • Sort the dates. If a start date and an end date are the same, put the end date first (otherwise you could get an empty date range as the best).
  • Start with a count of 0.
  • Iterate through the dates using a sweep-line algorithm:
    • If you get a start date:
      • Increment the count.
      • If the current count is higher than the last best count, set the count, store this start date and set a flag.
    • If you get an end date:
      • If the flag is set, store the stored start date and this end date with the count as the best interval so far.
      • Reset the flag.
      • Decrement the count.
Example:
For input:
1990 - 2013
1995 - 2000
2010 - 2020
1992 - 1999
Split and sorted: (S = start, E = end)
1990 S, 1992 S, 1995 S, 1999 E, 2000 E, 2010 S, 2013 E, 2020 E

Iterating through them:
count = 0
lastStart = N/A
1990: count = 1
      count = 1 > 0, so set flag
        and lastStart = 1990

1992: count = 2
      count = 2 > 0, so set flag
        and lastStart = 1992

1995: count = 3
      count = 3 > 0, so set flag
        and lastStart = 1995

1999: flag is set, so
        record [lastStart (= 1995), 1999] with a count of 3
      reset flag
      count = 2

2000: flag is not set
      reset flag
      count = 1

2010: count = 2
      since count = 2 < 3, don't set flag

2013: flag is not set
      reset flag
      count = 1

2020: flag is not set
      reset flag
      count = 0





What if the number of elephants are very large?
For a (very) large number of elephants, it might be worth skipping the sort. You could create an array indexed by year with +1 for birth, -1 for death. O(e) to fill, and O(n) to sweep, where e is number of elephants and n is date range.


Reference

Given an array of 100,000 pixel color values and each of which is an integer in the range (0,255). Give sorting algorithm.

We should use counting sort.

There are only 256 key value, so aux array will have only 256 values, and in O(n) time and 2 passes we will be able to sort in efficient way.

Sunday, January 19, 2014

Sort a linked list using bubble sort

Merge sort looks like a best choice for sorting the linked list.At a first appearance, merge sort may be a good  selection since the middle element is required to subdivide the given list into 2 halves, but we can easily solve this problem by moving the nodes alternatively to 2 lists.
We have discussed the sorting options for linked list here. Here we will discuss about bubble sort

Pseudocode for bubble sort:
public void sortList()
{
    if (isEmpty())
    {
        System.out.println("An empty list is already sorted");
    }
    else if (getHead().getNext() == null)
    {
        System.out.println("A one-element list is already sorted");
    }
    else
    {
        Node current = getHead();
        boolean swapDone = true;
        while (swapDone)
        {
            swapDone = false;
            while (current != null)
            {
                if (current.getNext() != null && current.getData().getSurname().compareTo(current.getNext().getData().getSurname()) >0)
                {
                    CustomerFile tempDat = current.getData();
                    current.setData(current.getNext().getData());
                    current.getNext().setData(tempDat);
                    swapDone = true;
                }
                current = current.getNext();
            }
            current = getHead();
        }
    }
}

Source-http://stackoverflow.com/questions/7168164/sorting-linked-lists-in-c/7168300#7168300

Selection sort on linked list

Another O(n2) sorting algorithm that can be adapted for linked lists is selection sort. This can be implemented very easily (assuming you have a doubly-linked list) by using this algorithm:
  1. Initialize an empty list holding the result.
  2. While the input list is not empty:
    1. Scan across the linked list looking for the smallest remaining element.
    2. Remove that element from the linked list.
    3. Append that element to the result list.
This also runs in O(n2) time and uses only O(1) space, but in practice it's slower than insertion sort; in particular, it always runs in Θ(n2) time.

Insertion sort on linked list

If you want a simpler algorithm that needs only O(1) space, you can also consider using insertion sort to sort the linked lists. While insertion sort is easier to implement, it runs in O(n2) time in the worst case (though it also has O(n) best-case behavior), so it's probably not a good choice unless you specifically want to avoid merge sort.
The idea behind the insertion sort algorithm is as follows:
  1. Initialize an empty linked list holding the result.
  2. For each of the three linked lists:
    1. While that linked list isn't empty:
      1. Scan across the result list to find the location where the first element of this linked list belongs.
      2. Insert the element at that location.

Sort a linked list using quick sort

Merge sort looks like a best choice for sorting the linked list.At a first appearance, merge sort may be a good  selection since the middle element is required to subdivide the given list into 2 halves, but we can easily solve this problem by moving the nodes alternatively to 2 lists.

We have discussed the sorting options for linked list here. Here we will discuss about quick sort.

Another O(n log n) sort that you can implement on linked lists is a modification of quicksort. Although the linked list version of quicksort is fast (still O(n log n) expected), it isn't nearly as fast as the in-place version that works on arrays due to the lack of locality effects from array elements being stored contiguously. However, it's a very beautiful algorithm as applied to lists.
The intuition behind quicksort is as follows:
  1. If you have a zero- or one-element list, the list is sorted.
  2. Otherwise:
    1. Choose some element of the list to use as a pivot, say first element.
    2. Split the list into three groups - elements less than the pivot, elements equal to the pivot, and elements greater than the pivot.
    3. Recursively sort the smaller and greater elements.
    4. Concatenate the three lists as smaller, then equal, then greater to get back the overall sorted list.
One of the nice aspects of the linked-list version of quicksort is that the partitioning step is substantially easier than in the array case. After you've chosen a pivot (details a bit later), you can do the partitioning step by creating three empty lists for the less-than, equal-to, and greater-than lists, then doing a linear scan over the original linked list. You can then append/prepend each linked list node to the linked list corresponding to the original bucket.
The one challenge in getting this working is picking a good pivot element. It's well known that quicksort can degenerate to O(n2) time if the choice of pivot is bad, but it is also known that if you pick a pivot element at random the runtime is O(n log n) with high probability. In an array this is easy (just pick a random array index), but in the linked list case is trickier. The easiest way to do this is to pick a random number between 0 and the length of the list, then choose that element of the list in O(n) time. Alternatively, there are some pretty cool methods for picking an element at random out of a linked list; one such algorithm is described here.

Bubble sort on double linked list

 Following can be the pseudocode:
public void bubbleSort() {
    boolean done = false;
    while (!done) {
        Node cur = head;
        done = true;
        while(cur != tail) {
            if (cur.getNext().getCount()>cur.getCount()) {
                swap(cur.getNext(),cur);
                done=false;
            }
            cur = cur.getNext();
        }
    }
}

Thanks.
Source : stackoverflow

Sort a linked list using merge sort

 Merge sort looks like a best choice for sorting the linked list.At a first appearance, merge sort may be a good  selection since the middle element is required to subdivide the given list into 2 halves, but we can easily solve this problem by moving the nodes alternatively to 2 lists.

We have discussed the sorting options for linked list here.

Merge sort is very special (in fact many standard sort libraries like in java internally uses a combination of merge sort and insertion sort) because it has the wonderful property of being a stable sort. In fact it is the only stable sort with asymptotic complexity of O(nlogn) but its draw back typically is the use of O(n) extra space. There are wonderful inline merge sort algorithms which I would cover in a different post altogether. Now what is more wonderful is that for sorting linked list, we can use merge sort’s O(nlogn) without the drawback of O(n) space. Again a quick search in net seem to suggest it is a very complicated algorithm to implement , again I am not sure why , this is my version of it (in Java).
The basic idea of merge sort is this basic recursive idea , how most of us learned recursion to start with -

merge_sort(list) {
  split list into two halfs, say first and second ;
  merge_sort(firstHalf);
  merge_sort(secondHalf);
  merge(firstHalf,secondHalf);
}

The merge algorithm on linked lists is really beautiful. The pseudocode works roughly like this:
  1. Initialize an empty linked list holding the result.
  2. As long as both lists aren't empty:
    1. If the first element of the first list is less than the first element of the second list, move it to the back of the result list.
    2. Otherwise, move the first element of the second list to the back of the result list.
  3. Now that exactly one list is empty, move all the elements from the second list to the back of the result list.
This can be made to run in O(n) time, so the overall complexity of the merge sort is O(n log n).
A working implementation in Java :
public Node merge_sort(Node head) {
    if(head == null || head.next == null) { return head; }
    Node middle = getMiddle(head);      //get the middle of the list
    Node sHalf = middle.next; middle.next = null;   //split the list into two halfs

    return merge(merge_sort(head),merge_sort(sHalf));  /recurse on that
}

public Node merge(Node a, Node b) {
    Node dummyHead, curr; dummyHead = new Node(); curr = dummyHead;
    while(a !=null && b!= null) {
        if(a.info <= b.info) { curr.next = a; a = a.next; }
        else { curr.next = b; b = b.next; }
        curr = curr.next;
    }
    curr.next = (a == null) ? b : a;
    return dummyHead.next;
}

public Node getMiddle(Node head) {
    if(head == null) { return head; }
    Node slow, fast; slow = fast = head;
    while(fast.next != null && fast.next.next != null) {
        slow = slow.next; fast = fast.next.next;
    }
    return slow;
}

Thanks

source - http://www.dontforgettothink.com/2011/11/23/merge-sort-of-linked-list/
http://stackoverflow.com/questions/7168164/sorting-linked-lists-in-c/7168300#7168300

What's the fastest algorithm for sorting a linked list?

Merge sort is the best choice. At a first appearance, merge sort may be a good  selection since the middle element is required to subdivide the given list into 2 halves, but we can easily solve this problem by moving the nodes alternatively to 2 lists.

It is reasonable to expect that you cannot do any better than O(N log N) in running time, whenever we use comparison based sorts.

However, the interesting part is to investigate if you can sort it in-place, stably, and so on.

Auxiliary storage requirement is small and constant (i.e. a few variables within the sorting routine). Thanks to the inherently different behaviour of linked lists from arrays, this Mergesort implementation avoids the O(N) auxiliary storage cost normally associated with the algorithm.

It may be possible, depending on a number of factors, it may actually be faster to copy the list to an array and then use a Quicksort.
The reason this might be faster is that an array has much better cache performance than a linked list. If the nodes in the list are dispersed in memory, you may be generating cache misses all over the place. Then again, if the array is large you will get cache misses anyway.
Mergesort parallelises better, so it may be a better choice if that is what you want. It is also much faster if you perform it directly on the linked list.
Since both algorithms run in O(n * log n), making an informed decision would involve profiling them both on the machine you would like to run them on.

 Here are some methods for sorting on linked list:
You can apply the sorting algorithms on double link list as well:

Thanks.

Find increasing 3-tuple (sub-sequence)

Problem:
You given an array:
3, 2, 1, 6, 5, 4, 9, 8, 7
you have to find a 3 tuple which has property a < b < c, also a is before b, b is before c in array.
Answer can have multiple tuples, you have to find any one.
In this array, answer will be 3, 6, 9

Solution:


  1. Simple. Time complexity = O(n ^ 2)
    • Create an array of indexes, and sort the original array keeping track of the indexes in the second array. Now go through the sorted array and for each element try to find a pair of grater elements whose indexes are increasing. Complexity: time = O(n log n) + O(n ^ 2) = O(n ^ 2), space = O(n)
    • Alternatively, traverse the original array starting from the second element and consider it to be the middle of the 3-tuple. Try to find a smaller element at indexes [0..i) and a greater element at (i..n]. Complexity: time = O(n ^ 2), space = O(1)
    • Or, build a BST from the original array. Then find an element having two consecutive right children. They are grater than one another due to the BST property, and they are located one after another in the original array because we added them in this order in the BST. Complexity: time = O(n ^ 2) [worst] + O(n) = O(n ^ 2), space = O(n)
  2. Advanced.
    Search for Longest Increasing Subsequence and stop after finding three elements of the tuple. Time complexity = O(n log n), space = O(n)

From Wikipedia:
Denote the sequence values as X[1], X[2], etc. Then, after processing X[i], the algorithm will have stored values in two arrays:
M[j] — stores the position k of the smallest value X[k] such that there is an increasing subsequence of length j ending at X[k] on the range k ≤ i (note we have j ≤ k ≤ i here, because j represents the length of the increasing subsequence, and k represents the position of its termination. Obviously, we can never have an increasing subsequence of length 13 ending at position 11. k ≤ i by definition).
P[k] — stores the position of the predecessor of X[k] in the longest increasing subsequence ending at X[k].
In addition the algorithm stores a variable L representing the length of the longest increasing subsequence found so far. Note that, at any point in the algorithm, the sequence
X[M[1]], X[M[2]], ..., X[M[L]]
is nondecreasing. For, if there is an increasing subsequence of length i ending at X[M[i]], then there is also a subsequence of length i-1 ending at a smaller value: namely the one ending at X[P[M[i]]]. Thus, we may do binary searches in this sequence in logarithmic time. The algorithm, then, proceeds as follows.
L = 0
for i = 1, 2, ... n:
   binary search for the largest positive j ≤ L
     such that X[M[j]] < X[i] (or set j = 0 if no such value exists)
   P[i] = M[j]
   if j == L or X[i] < X[M[j+1]]:
      M[j+1] = i
      L = max(L, j+1)
The result of this is the length of the longest sequence in L. The actual longest sequence can be found by backtracking through the P array: the last item of the longest sequence is in X[M[L]], the second-to-last item is in X[P[M[L]]], etc. Thus, the sequence has the form
..., X[P[P[M[L]]]], X[P[M[L]]], X[M[L]].

Complexity:
time - O(n log n)
space - O(n)
Links and credits:
http://www.careercup.com/question?id=21602662
http://en.wikipedia.org/wiki/Longest_increasing_subsequence

Merge two arrays efficiently - one sorted, another unsorted

Problem

Given a sorted array of n elements followed by an unsorted array of length n. Sort the 2 list into one efficiently.

Solution

Method 1 - Insert the elements in sorted array using binary search
Since inserting a single element into array and keeping it sorted is O(n), you cannot get better then that.
Thus, for both cases - sorting the smaller array and then using merge(part1,part2) will be O(n), and thus optimal in terms of asymptotic complexity.
  • sorting the smaller array: O(logn*loglog(n)), or O(sqrt(n)*log(sqrt(n)) respectively of the cases.
  • merge(part1,part2): O(n+logn) or O(n+sqrt(n)), which is O(n)1 anyway.
So, the total complexity of both cases is O(n), which is optimal for this problem.




Source


Red Black Tree vs AVL Tree vs B-Tree

B-tree when you're managing more than thousands of items and you're paging them from a disk or some slow storage medium. RB tree when you're doing fairly frequent inserts, deletes and retrievals on the tree. AVL tree when your inserts and deletes are infrequent relative to your retrievals.
think B+ trees are a good general-purpose ordered container data structure, even in main memory. Even when virtual memory isn't an issue, cache-friendliness often is, and B+ trees are particularly good for sequential access - the same asymptotic performance as a linked list, but with cache-friendliness close to a simple array. All this and O(log n) search, insert and delete. B+ trees do have problems, though - such as the items moving around within nodes when you do inserts/deletes, invalidating pointers to those items. I have a container library that does "cursor maintenance" - cursors attach themselves to the leaf node they currently reference in a linked list, so they can be fixed or invalidated automatically. Since there's rarely more than one or two cursors, it works well - but it's an extra bit of work all the same. Another thing is that the B+ tree is essentially just that. I guess you can strip off or recreate the non-leaf nodes depending on whether you need them or not, but with binary tree nodes you get a lot more flexibility. A binary tree can be converted to a linked list and back without copying nodes - you just change the pointers then remember that you're treating it as a different data structure now. Among other things, this means you get fairly easy O(n) merging of trees - convert both trees to lists, merge them, then convert back to a tree. Yet another thing is memory allocation and freeing. In a binary tree, this can be separated out from the algorithms - the user can create a node then call the insert algorithm, and deletes can extract nodes (detach them from the tree, but dont free the memory). In a B-tree or B+-tree, that obviously doesn't work - the data will live in a multi-item node. Writing insert methods that "plan" the operation without modifying nodes until they know how many new nodes are needed and that they can be allocated is a challenge. Red black vs. AVL? I'm not sure it makes any big difference. My own library has a policy-based "tool" class to manipulate nodes, with methods for double-linked lists, simple binary trees, splay trees, red-black trees and treaps, including various conversions. Some of those methods were only implemented because I was bored at one time or another. I'm not sure I've even tested the treap methods. The reason I chose red-black trees rather than AVL is because I personally understand the algorithms better - which doesn't mean they're simpler, it's just a fluke of history that I'm more familiar with them. One last thing - I only originally developed my B+ tree containers as an experiment. It's one of those experiments that never ended really, but it's not something I'd encourage others to repeat. If all you need is an ordered container, the best answer is to use the one that your existing library provides - e.g. std::map etc in C++. My library evolved over years, it took quite a while to get it stable, and I just relatively recently discovered it's technically non-portable (dependent on a bit of undefined behaviour WRT offsetof). http://nathanbelue.blogspot.in/2013/01/avl-vs-red-black-conclusion_6.html
http://attractivechaos.wordpress.com/2008/10/02/comparison-of-binary-search-trees/

Union Find

The Union-Find data structure is used to keep track of a partition of a set and supports two operations. The Union(x, y) operation merges the subsets containing x and y, and the Find(x) operation returns the name of the leader element of the subset to which x belongs.

The implementation is based on a response to this post on StackOverflow, but extended with a makeSet operation, a getNumGroups operation, and tests.

Union-Find can be used to create very efficient implementations of Kruskal’s minimum spanning tree algorithm and the single-link hierarchical agglomerative clustering algorithm.

Bottom-up Merge Sort

Recursive algorithm
In the earlier article I’ve described a recursive version of the Merge Sort Algorithm OR Top Down Merge sort. Of course every recursive algorithm can be written in an iterative manner .

Non recursive algorithm
So today I am going to present the bottom-up version of the same algorithm, written in a non-recursive fashion .
The main idea of the bottom-up merge sort is to sort the array in a sequence of passes . During each pass the array is divided into smaller sub-arrays of a pseudo-fixed size (step or sz) . Initially step is 1, but the value increases with every pass, as every adjacent sub-arrays are merged, thus step doubles .
Top Down vs Bottom up sort

Top downBottom up
RecursiveNon recursive
It is top down, as we break the array into 2 halves, and these halves into further sub halves and so on, sort them and merge againHere we dont break into halves, rather we have a step, which determines the size of the array to be broken. Initially it is one, we sort the array of size equal to step and go further
Has more number of iterationsHas lesser number of iterations
Professor Kevin wayne (of Princeton) replied that in many cases recursion is faster than iteration because of caching improved performances.Algorithm wise it is as fast as top down, but may be slower as current implementation provide faster support for recursion

Example
0. We consider an array with size <= 1 sorted .
1. The array that needs to be sorted is A = { 5, 2, 1, 12, 2, 10, 4, 13, 5} . At this point step = 1 .
2. At the first iteration array A is divided into blocks of size step = 1 . The resulting blocks (sub-arrays) are {5}, {2}, {1}, {12}, {2}, {10}, {4}, {13}, {5} .
3. step *= 2, thus step is now 2 . At this point we have a collection of sorted sub-arrays (because their size is = 1) . We will group the sub-arrays one-by-one and we will start merging them . After the merge, the resulting sub-arrays are: {2, 5}, {1,12}, {2,10}, {4, 13} and {5} . {5} remains unpaired as the array size is an odd number . We will take care of this block later .
4. step *= 2, thus step is now 4 . Now we have a collection of 4 blocks of size two and one block of size one . We will start to merge again the adjacent blocks, so the sub-arrays collection becomes: {1, 2, 5, 12}, {2, 4, 10, 13} and {5} .
5. step *= 2, thus step is now 8 . Now we have a collection of 2 blocks with size 4 and one block with size 1 . We will merge the adjacent blocks so the sub-arrays collection becomes {1, 2, 2, 4, 5, 10, 12, 13} and {5} .
6. We now have two blocks one of size 8 and one of size 1 . We will merge those blocks and obtain the resulting sorted array: {1, 2, 2, 4, 5, 5, 10, 12, 13} .


The pseudo code for the algorithm can be written as follows . We will start by writing the merge function . This is responsible with the merging of two already sorted blocks (sub-arrays) from a given array . The input parameters for this function are : the array itself, and the interval headers (stop, start) for the two already sorted blocks . The sentinel is a concept that simplifies the code . In our case we consider SENTINEL infinity (no value from the array is bigger or equal to the Sentinel) .

Given an integer array of which both first half and second half are sorted. Write a function to merge the two parts to create one single sorted array in place [do not use any extra space].

Given an integer array of which both first half and second half are sorted. Write a function to merge the two parts to create one single sorted array in place [do not use any extra space].
e.g. If input array is [1,3,6,8,-5,-2,3,8] It should be converted to: [-5,-2,1,3,3,6,8,8]

http://stackoverflow.com/questions/6153915/merging-of-two-sorted-halfs-without-extra-memory

Amazon Questions - Set 1

  1. Write code to find Nth last node in a linked list. Solution
  2. Given a binary tree build a vertical sum array.    Solution
  3. Given two strings str1 and str2, write code to find out if all non-unique elements in str2 are in str1? Solution
  4. Given two lists of strings return a list of strings that is an intersection of both of the lists.
    Analyze running time and space complexity.
    Give Test Case scenarios.
  5. There is n node graph, each node having only one root. All nodes are labeled in a random order from int 0 to n-1. The graph is represented in a array format such that the value in the array at index equal to child node label is root node label.
    For root assume the value is -1.
    Ex.
        3
    4   1     2
       0 5
    the array is
    value : 1, 3, 3, -1, 3, 1
    Index : 0, 1, 2, 3, 4, 5
    Find the height of graph. careercup
  6. Given a node in a binary tree, how will you find out if left and right subtrees are mirror images of each other?  Solution
  7. Given a binary tree and a number N, print all paths that start from root and add up to N.
  8. Find the biggest BST in a given binary tree.   http://www.careercup.com/question?id=13000680
  9. Given an array A, build another array B in which each element is the product of all elements in A other than the element at the same position. Write code that does this with and without division.
  10. Find the position of first ’1′ in a sorted array that contains only 0-s and 1-s.
  11. In a dial pad(similar to phone dial pad) where if 2 corresponds to ‘abc’ and 3 corresponds to ‘def’ etc. Write a program to form all possible and meaningful words for the given number.
    Example:
    if 2 is pressed possible words are a,b,c. But ‘a’ is only meaningful word. Hence print ‘a’ and
    if 23 is pressed. The possible words are going to be ad,ae,af,bd,be,bf,cd,ce,cf there is no meaningful word.So,dont print anything.
  12. Given a binary tree which is huge and can not be placed in memory, how do you check if the tree is a mirror image.
  13. Given a book containing 100′s of words., If a word is repeated, delete it. Process such that there are no repetitive words.
  14.  Given coins of denomination 4, 5 and 7. How can you make up 13 ?   http://puddleofriddles.blogspot.com/2012/03/you-are-given-n-types-of-coin.html
  15. Given a program on fork() system call.
    #include <stdio .h>
    #include <unistd .h>
    int main()
    {
       fork();
       fork() && fork() || fork();
       fork(); printf(“forked\n”);
       return 0;
    }
    

    How many processes will be spawned after executing the above program?
  16. Do in-order traversal of tree without using recursion .  http://stackoverflow.com/questions/5502916/please-explain-morris-inorder-tree-traversal-without-using-stacks-or-recursion
  17. 1. There is a web URL which contains %20 in place of space, how will your replace every %20 with space3. Let’s say, on the file there are only 5% URL which contains this %20, how will your replace with space in most efficient way?       http://www.careercup.com/question?id=13032664
  18. Longest Palindrome in a input string
  19. How do you find million smallest numbers from a file that contains billions of numbers ?
  20. Given a binary tree, every node has a int value, return the root node of subtree with the largest sum up value. Java is more preferable. Caution: the return should be a node, not a integer!http://www.careercup.com/question?id=12868663
  21. Let’s say you have a simple function (fibonacci/factorial) that you need to run constantly. The largest number that you will receive as input will be 1,000. How can you improve the performance of this function call?I said not use recursion and cache the results using a data structure (i.e. a Map)
    What else could you do to improve the performance?    http://www.careercup.com/question?id=12903663
  22. Given an array of integers, find second largest element in an array.  http://www.careercup.com/question?id=12897677
  23. Write a function convert a string to an integer. example convert String “1750″ to int 1750. Using Java, not allow use the default function ..http://www.careercup.com/question?id=12903687
  24. Write a function convert a integer from 0 to 1000 to Roman numerals.
    1: I 5: V 10: X 50: L 100: C 500: D 1000: M
    Test Cases:
    XI 11, IX 9, XXX 30, DC 600, CM 900, XCVIII 98         http://www.careercup.com/question?id=12940668
  25. Given an array. Find pairs of numbers that add up to a given value ‘x’. with time complexity less than O(n2) and use no additional space. http://www.careercup.com/question?id=12887674
  26. Write Code: Find an integer array’s maximum value. Trace code. Use Try/catch instead of return.
  27. Given a spreadhseet, what data structure you will use to store it. What will you use for dynamic updation of of Spreadsheet formulas.
  28. Count number of set bits in an Integer
  29. Find first unique character in an array.  http://www.careercup.com/question?id=12960668
  30. Given 2 Arrays A and B, find the intersection of two arrays. Initially without using any other data structure. Later with using additional data structure.  http://www.careercup.com/question?id=12975663
  31. Given a 2D array, all rows and all columns sorted. Find an integer x from the 2D array.  http://www.careercup.com/question?id=12977662
  32. How do you find million smallest numbers from a file that contains billions of numbers ? http://www.careercup.com/question?id=13018675
  33. Convert a sorted doubly linked list to a BST while trying to keep BST as balanced as possible   http://www.careercup.com/question?id=12778667
  34. i got this as an interview question. i was given 2 linked lists of unequal lengths,containing a single digited number in each of their nodes. i was asked to build a 3rd linked list which contains the sum of the two linked lists, again in the form of 1 digit in a node. ex: linked list 1 is 4-7-9-6 linked list 2 is 5-7 then the 3rd linked list would be 4-8-5-3 can someone suggest me Java answer.   http://www.careercup.com/question?id=12743681
  35. Given a string “This is an example” –> Output: “This is an example” There should be equal number of spaces between words.  http://www.careercup.com/question?id=12781696
  36. each of 100 students in a school has a unique ranking 1-100. I will select 50 students randomly .
    Of the 50 randomly selected student i will select a boy, say ranking 40, and wish to know his relative ranking among the 50 students selected. How do you do this efficiently? http://www.careercup.com/question?id=12765663
  37. What data structure you will use if you have to implement google map like utility, you will be able to search the city, you will be able to find shortest path between two nodes. And you have to implement one more feature of auto suggestion which means if user type “Na” then you should show all the cities starting with letter Na in a list. And algo should be very efficient because it should update things as soon as user changes the typed character.  http://www.careercup.com/question?id=12809665
  38. Given two nodes of a tree, find the first common ancestor of these node.
  39. How to store a binary tree in a file so that we can re construct the same tree with that file ..http://www.careercup.com/question?id=12805664
  40. Algorithm to check a linked list is palindrome or not, each node contains a single character. http://www.careercup.com/question?id=12804667
  41. Find the successor of a node in inorder traversal. http://www.careercup.com/question?id=1280566
  42. print all possible combination of given n parentheses. Eg: if n=2 then all possible combinations are: (n stands for number of parentheses pair open-close)
    {}{}
    {{}}   http://www.careercup.com/question?id=12806664
  43. Given an array of N integers with +ve & -ve numbers. Find the maxproduct of 3 numbers ? & Test Cases   http://www.careercup.com/question?id=12806693
  44. Given an n-by-n matrix of 0′s and 1′s where all 1′s in each row come before all 0′s, find the most efficient way to return the row with the maximum number of 0′s.  http://www.careercup.com/question?id=12849666
  45. Write Program to find longest common contiguous intersection from 2 lists provided to the function.
    Example: list1: abcrfghwetf
    list2: abrfghwwetxyab   http://www.careercup.com/question?id=12862670
  46. Construct a Cartesian tree from in order traversal  ..http://www.careercup.com/question?id=12715683
  47. Find if two rectangles overlap in cartesian space..http://www.careercup.com/question?id=12719693
  48. Given an expression tree how do you evaluate it and also how do you evaluate for n-ary operators …http://www.careercup.com/question?id=12714690
  49. The root node in the tree is equal to sum of its all descendants and the leafs are assigned value 0, so if your tree is something like 1020 30 40 50
    output will be
    140
    0 90
    0 0    http://www.careercup.com/question?id=12691679
  50. Given n points in 2D space find k points nearest to origin..http://www.careercup.com/question?id=12691685
  51. Print a binary tree in level by level in Zig-Zag manner …http://www.careercup.com/question?id=12688694
  52. Given a binary tree return the new binary of given depth.
    Example:
    0 – 3 — Should return full binary tree from root to height 33 – 4 — Should return full binary tree from height 3 to 4   http://www.careercup.com/question?id=12642713
  53. IMP—-Write a method to create new tree with same structure but the values of each node will be sum of their descendents (sub tree). The leaf nodes will become 0. So if the tree is 50 30 10 40 60 55 75 (PreOrder) then new tree should be 270 50 0 0 130 0 0(PreOrder)…http://www.careercup.com/question?id=12746661
  54. write a function which takes a string S and a pattern string P and an integer i , and returns the i’th occurrence of P in S.
    example S = “abcabcefgabc ” , P = “abc” , i =3 , ..returns 10 …http://www.careercup.com/question?id=12782664
  55. given an array A which has only 0′s and 1′s in sorted order , find the occurrence of first ’1′ in A.
    example : A [] = 00001111 , returns 5    http://www.careercup.com/question?id=12779666
  56. Convert a BST to sorted doubly linked list.     http://www.careercup.com/question?id=12778666
  57. implement “fill with paint” function which fill the screen(2d array) with respect to a given point and color,fill until border of screen or same color is met.   http://www.careercup.com/question?id=12572675
  58. You are given a linked list. Apart from the normal “Next” pointer, there is one more pointer(random ptr) in each node which points to some random node of the list. How will you create a clone of such a list? (In less than O(n^2))   http://www.careercup.com/question?id=12590664
  59. Write a function to convert an IPv4 Address in string format to an unsigned integer http://www.careercup.com/question?id=12651670
  60. Given a set of unique numbers from 1 to 1000, propose a data structure that allows you to perform the following operations in constant time.
    1- Insertion,
    2- Deletion,
    3- Searching,
    4- Get any random numbe              http://www.careercup.com/question?id=12650665
  61. Given a String – aaaabbbbcc, convert the given string in to a4b4c2 without using extra memory. ( Note that every character appears more than once in input string and the repeated characters are contiguous)    http://www.careercup.com/question?id=12663669
  62. Search an element in a 3-D sorted array without modifying the given array.   http://www.careercup.com/question?id=12666670
  63. Write a program to calculate the number of occurances of letters(chars) in a string. Eg:”Test”.  http://www.careercup.com/question?id=12666683
  64. Print nodes at k distance from root..http://www.geeksforgeeks.org/archives/8615
  65. Given references to roots of two binary trees, how do you short circuit determine whether the sequences of the leaf elements of both the trees are same ? The structure of two BTs may be different. Short circuit : for ex. If the very first leaf element of each tree is different, the algorithm should stop immediately returning false instead of checking all the leaf elements of both trees.  http://www.careercup.com/question?id=12698674
  66. Given two nodes of a BST and all elements in BST are distinct, write code to determine the shortest distance between the two nodes. (unit distance between two adjacent nodes). Nodes dont have parent pointer.http://www.careercup.com/question?id=12696671
  67. Given an array, subarray size k and a number ssum, find the number of subarrays of size k that sum up to ssum. http://www.careercup.com/question?id=12697675
  68. one unsorted array is given.Find out the index i and j ,j> i for which a[j] – a[i] is maximum.perform in linear time complexity http://www.careercup.com/question?id=12705676
  69. Why data members are private when we can access them through getter/setters?. Why can’t we just make them public.[ How will you convince the interviewer] http://www.careercup.com/question?id=12705691
  70. Given two linked lists. Find their intersection point.
  71. Given a binary search tree of n nodes, find two nodes whose sum is equal to a given number k in O(n) time and constant space ..http://www.careercup.com/question?id=12711674
  72. Find the number of nodes in a singly link list, it can be both cyclic and acyclic..http://www.careercup.com/question?id=12631663
  73. you have given 2n+1 numbers in which 2n numbers are repeated means every number is having duplicate value.find that non repeating number in constant space and o(n) time.i told him using XOR.
    then he gave me 2n+2 numbers in which 2n numbers are repeating like above now you have 2 different number.find both number in constant space and o(n) time.(f2f 4th round)   http://www.careercup.com/question?id=12570669
  74. Write function compress(char* strSource)
    it should do the following .
    repeating chars in sequence should be replaced with char & count, in case count is 1 no need to add any integer.
    Example – AAAABBBCXYZEEEEPPPPPKKABC
    should be A4B3CXYZE4P5K2ABC.
    you are supposed to iterate the array only once, and modify the same input parameter, do not create any new string.   http://www.careercup.com/question?id=12514668
  75. You have given two polynomials .. You have to choose a data structure which will be used for addition of both polynomials :-
    Condition :-
    polynomials are very huge ..(n is very big)
    a + bx +cx2 +…… +nxn ..
    c + dx +ex2+…… +nxn ..
    you have to store the polynomial after sum .  http://www.careercup.com/question?id=12514670
  76. Get the top 3 frequently used words in a book. The book contents are given as a single text file.
    I used hashmap solution. The interviewer said its not optimal. Use a combination of two or three data struct.   http://www.careercup.com/question?id=12391700
  77. You are given a function printKDistanceNodes which takes in a root node of a binary tree, a start node and an integer K. Complete the function to print the value of all the nodes (one-per-line) which are a K distance from the given start node in sorted order. Distance can be upwards or downwards.  http://www.careercup.com/question?id=12527666
  78. Given a robot and a Maze in a game. Following our few of the primitives:
    bool IsMazeExit();
    bool Forward();
    void RightTurn();
    Constraints:
    Every room in the maze is a square. Each room has 2 doors. Robot can call IsExitMaze() when in a room and identify the exit. Forward moves the Robot forward few steps and if it finds a wall it returns to its position. Right turn turns the robot right 90 degrees.
    Write a method to find the exit  http://www.careercup.com/question?id=12549691
  79. Given numbers 1 to 1000, suggest a data structure to store them such that following operations can be executed in constant time:
    1- insertion,
    2- deletion,
    3- searching,
    4- get_any_number (means return any number if present in the data-structure otherwise return -1).   http://www.careercup.com/question?id=12559669
  80. Given a root and a node of a BST tree, write a function which print all the nodes which are a ‘k’ distance from the given nodes in sorted order. (distance can be upwards and downwards) [Hints -Recursion]  http://www.careercup.com/question?id=12533682
  81. In given array of elements like [a1,a2,a3,..an,b1,b2,b3,..bn,c1,c2,c3,...cn] Write a program to merge them like [a1,b1,c1,a2,b2,c2,...an,bn,cn]. We have to do it in O(1) extra space. [Hints- D&C or swapping]  http://www.careercup.com/question?id=12549702
  82. You are given a linked list which has the following structureCode:
    linkedList {
    data
    *nextLink
    *randomLink
    }
    *nextLink will always point to next node in the linked list
    *randomLink will point to any random node node it the list (or NULL)Now given a list L write an algorithm to create replica of the list( say L’) in O(n) time and O(n) space. [Hints- Need Double Pass/Hashing]  http://www.careercup.com/question?id=12549703
  83. How to build a heap from inoder traversal ?  http://www.careercup.com/question?id=9735293
  84. Given a function getInorderSuccessor which takes a BST (Binary Search Tree) as it’s parameter. Every node has an extra pointer “next” , which is intialized to null, fill next with node pointers which represent Inorder Successor. [Hints -Recursion/Stack/DP]  http://www.careercup.com/question?id=12557684
  85. An array contain +ve and -ve element, find the longest contiguous sub array whose sum is 0  http://www.careercup.com/question?id=12533685
  86. Given a matrix pxq, You start from top left and have to reach the bottom right. Can only traverse right or bottom. How many ways are there to reach at the bottom right?. So If I allow all 4 way move is possible what will be the ans ? . What happened if i make some Restricted Move ?    http://www.careercup.com/question?id=12549705
  87. Given A Binary Tree of size n , Find Out a Matrix M[n][n], where M[i][j]=1 if i is predecessor of j, else M[i][j]=0. [Hints DP]  http://www.careercup.com/question?id=12560677
  88. How you implemnt a Priority Queue using Heap.  http://www.careercup.com/question?id=12549706
  89. Given a Cache of n String Having length of k each, on Alphabet ALPHA where |ALPHA|=t, Find out number of 2k-length string constructable from the cache, where all sub-string of Resultant sub-string are also in cache ? What Data-structure should you use to Lookup cache? Design an Algorithm to find the count Efficiently? Calculate the Time/Space complexity of the technique. [Hints -Tries ]  http://www.careercup.com/question?id=12549707
  90. Given a Cache of k-length Strings of size n,. Given a String X, We can to convert to another String Y using the following two Rules:
    R1:you can change only one character at a Time in stepwise Transformation
    R2: All intermediate String in the transformation must be also in cache.Cache has also an API : Called Is_Transformable(X,Y) return Ture if it is possible to transform X to Y, without violating the Above rule. Assume that Cache is fixed size and there is a sequence of Query of Checking the possibility of Transformation is coming to The API of Cache.
    the Question is : What Data-Structure U can use to implement Cache ? It might need some Initial Complexity to Organize the data , for efficient lookup, and provide service to APIs.
    How can u implement the API in O(1).
    Suppose we add one more API which calculate minimum number of Steps required to transform X to Y, How do u solve this problem in O(1).   http://www.careercup.com/question?id=12533686
  91. Suppose, Amazon have a Logging system Like This:They Track all logs daily basis, stored in separate log file.Log contains a collection of tuples of Customer ID and Page ID. The length of Customer ID and Page ID is L1 and L2. Suppose We have a log of D-days , and size of each log fine can be Order of n , where n be the number of customer.
    In a most generalized situation, you can assume that a customer can visit the same page multiple times in a day or any number of days. We are interested to find out the number of Distinct customer visited at-least p distinct pages within exactly T days.
    Propose a Data Structure to be use to solve this problem efficiently . Design an Algorithm to solve this problem and Find out the complexity of the algorithm.
    {Hints:- Double Hashing/ {Hashing +Tries/BST }}  http://www.careercup.com/question?id=12560678
  92. Find the character that has maximum frequency in an array of characters. If n is small and dealing with unicode char-space, extra space for hashtable is overhead, can you avoid?  http://www.careercup.com/question?id=12336749
  93. placed two robots and a sensor on the line, write code that will be executed on both machines and make them meet. http://www.careercup.com/question?id=12337726
  94. String represented as as singly Linked list with one letter on each Node. Need to check whether it is a paliandrome or not. Can use only one String variable other than the Linkedlist  http://www.careercup.com/question?id=12364061
  95. Given the number, find the immediate next (larger) palindrome   http://www.careercup.com/question?id=12382912
  96. Find the nth non-repeating number given a streams of characters in the range of A-Z  ..http://www.careercup.com/question?id=12381892
  97. What is the complexity of an algorithm to check whether a binary tree is symetric or not. No need to check the data only the structure needs to be verified.  http://www.careercup.com/question?id=12374673
  98. Given an array (length n), we need to find the subarray (length k) such that the sum of the first j elements of the subarray equals the sum of the remaining (k-j) elements of the subarray.
    For e.g.
    Array: 2,2,13,4,7,3,8,12,9,1,5
    Output: 4,7,3,8,12,9,1 [4+7+3+8=12+9+1]
    Could this be done with a complexity better than O(n^3)   http://www.careercup.com/question?id=12351270
  99. given a linked list of objects.Print the objects in reverse order. I gave a recursive solution as well the normal one.
    But he wanted a solution where no recursion is used and no manipulation of any pointers in linked list.
    You can use any algorithm…any space and any time complexity…but remember linked list is dynamic.
    Could not answer this q     http://www.careercup.com/question?id=12381347
  100. compare two strings arrays and return intersection elements in a string array without using 2 for loops?  http://www.careercup.com/question?id=12402663
  101. GIven a binary tree, print the tree according to the level.
    eg
    01
    0203
    04050607
    0809101112131415
    proceed further to find the mirror image of alternate level
    01
    0203
    07060504
    0809101112131415    http://www.careercup.com/question?id=12419665
  102. given a million integers find the largest k elements  http://www.careercup.com/question?id=12347483
  103. find common ancestor of 2 nodes in
    1. binary tree
    2. binary search tree No parent pointer is given.   http://www.careercup.com/question?id=12374681
  104. Given number n, return true if n is prime otherwise false.    http://www.careercup.com/question?id=12342686
  105. write a function strRemove(char *source, char *remove )
    This function will delete all the chars that exist in string remove from array source, number of iteration should be only 1. Make the searching efficient.
    Example
    (“amazon development center”, “aenr”)
    “mzo dvlpmt ct”.
    Criteria – First parameter should be modified , no need to create an extra string.
    (Answer – Put the second array in a hash table, like array of 256 chars, to search which are the chars needed to be removed with complexity o(1) )   http://www.careercup.com/question?id=12518669
  106. you have two BST , they contain the same amount of elements “N” but there structure is different. how will you cross check that the trees are identical with complexity O(N). Also how will you do the same in case its only Binary tree and not a Binary search tree.  http://www.careercup.com/question?id=12520661
  107. Write code to traverse a binary tree and save all the elements found in a sorted order in double link list.  http://www.careercup.com/question?id=12514667
  108. You have a table which contains huge data may be crores of records and a cache which can contain only 1000 records. Query is done on the basis of some unique ID. When any query happens data should be copied to the cache, but the records which are used least amount of time should be removed and should be occupied with newly added records from table.
    Cache should provide two mechanism , search for record(s) on the basis of a unique id & remove the records which used least amount of time. Cache can contain max 1000 records, and record should be available in cache even after your query is done. So records in cache is refreshed only when new query is done and new record arrives which do not exist in the cache.   http://www.careercup.com/question?id=12449679
  109. You are given a paragraph , which contain n number of words, you are given m threads. What you need to do is , each thread should print one word and give the control to next thread … this way each thread will keep on printing one word , in case last thread come, it iwill invoke the first thread … procedure will repeat until all the words are printed in paragraph. Finally all the thread should exit gracefully. What kind of synchronization will use ? Answer only the logic to implement (not complete code)  http://www.careercup.com/question?id=12419684
  110. you have 100 elements from 1 to 100 but my mistake one of those number overlapped another by repeating itself.
    Ex. in 1—-98,99,100 …. 99 overwrite 2 , so the array becomes … 1,99,3,4,—-99,100 .. Array is not in sorted format , how to find the repeating number ?  http://www.careercup.com/question?id=12515668
  111. Right a program which will perform the following task ->
    In a binary tree , if a root is greater then the sum of two child nodes , then the new value of same root will be sum of child nodes , else the right // left child should reinit to such a value, so that the sum of the child nodes will be equal to the value of root node.
    (I asked him the answer , he told u need to do manipulation twice, 1) while parsing the nodes 2)while unwinding i.e. again recheck the values.   http://www.careercup.com/question?id=12426712
  112. how would you design a rate limiter in a client to control the number of transactions it requests the server.  http://www.careercup.com/question?id=12322728
  113. How do you check if an entered text in a textbox is a student email address (ends with edu). special case xyz@alumni.univ.edu should be rejected  http://www.careercup.com/question?id=12503676
  114. Write code for the following problem:
    find the element in the array that appears consecutively in the array max number of times. Should be O(N)
    eg. 1113333244111 ans. 3  http://www.careercup.com/question?id=12375726
  115. Given pre order traversal of a tree. It has only 2 type of nodes, N & L (non-leaf, leaf).. Also, every node either has zero or two children.
    Produce the tree. Eg: Pre-order NNLLL
    Tree;
    N
    / \
    N L
    / \
    L L  http://www.careercup.com/question?id=12477676
  116. You have the post order traversal of a tree. produce the tree.
    Possible?
    If not, what other info you need?
    Ok you have that info, now produce the tree.
    Write code  http://www.careercup.com/question?id=12493668
  117. Given a Singly Linked lists, write the function that takes as arguments the head of list and a number and then deletes all the nodes with that number.   http://www.careercup.com/question?id=12480668
  118. Given a binary search tree, first find the least common ancestor of two numbers and then generalize it for m numbers.   http://www.careercup.com/question?id=12480667
  119. Given a dynamic stream of integral numbers, write a function that returns its median. The numbers may arrive in bursts at any time   http://www.careercup.com/question?id=12343736
  120. There are two integer arrays ,each in very large files (size of each is larger than RAM). How would you find the common elements in the arrays in linear time.  http://www.careercup.com/question?id=12469661
  121. For an array of integers, find if there are 3 numbers that add up to zero. An algorithm of complexity O(n^2) was required.  http://www.careercup.com/question?id=12468661
  122. How do you partition an array into 2 parts such that the two parts have equal average? Each partition may contain elements that are non-contiguous in the array..  http://www.careercup.com/question?id=1244666
  123. Till Page 7 So far in careercup..Target—till page 20

k-way merge - Merging k sorted arrays of n elements

 Given k sorted arrays of size n each, merge them and print the sorted output.
Example:
Input:
k = 3, n =  4
arr[][] = { {1, 3, 5, 7},
            {2, 4, 6, 8},
            {0, 9, 10, 11}} ;

Output: 0 1 2 3 4 5 6 7 8 9 10 11 

Method 1 - Merging from 1 array to other
It does so by using the "merge" routine central to the merge sort algorithm to merge array 1 to array 2, and then array 3 to this merged array, and so on until all k arrays have merged.

Time complexity - O(n . k^2)
It doesn't traverse each of the k arrays once. The first array is traversed k-1 times, the first as merge(array-1,array-2), the second as merge(merge(array-1, array-2), array-3) ... and so on.
The result is k-1 merges with an average size of n*(k+1)/2 giving a complexity of O(n*(k^2-1)/2) which is O(nk^2).
The mistake you made was forgetting that the merges are done serially rather than in parallel, so the arrays are not all size n.

Method 2 - Merge 2 at a time, again and again.
Step 1: Merge arrays (1 and 2), arrays (3 and 4), and so on. (k/2 array merges of 2n, total work kn).
Step 2: Merge array (1,2 and 3,4), arrays (5,6 and 7,8), and so on (k/4 merges of 4n, total work kn).
Step 3: Repeat...
There will be log(k) such "Steps", each with kn work. Hence total work done = O(k.n.log(k)).
Even otherwise, if we were to just sort all the elements of the array we could still merge everything in O(k.n.log(k.n)) time.

 Method 3 - Maintain k min heaps, and get the min element from all the arrays and saving into one
A simple solution is to create an output array of size n*k and one by one copy all arrays to it. Finally, sort the output array using any O(nLogn) sorting algorithm. This approach takes O(nkLognk) time.
We can merge arrays in O(nk*Logk) time using Mean Heap. Following is detailed algorithm.
1. Create an output array of size n*k.
2. Create a min heap of size k and insert 1st element in all the arrays into a the heap
3. Repeat following steps n*k times.
     a) Get minimum element from heap (minimum is always at root) and store it in output array.
     b) Replace heap root with next element from the array from which the element is extracted. If the array doesn’t have any more elements, then replace root with infinite. After replacing the root, heapify the tree.

Source
http://stackoverflow.com/questions/11026219/why-is-k-way-merge-onk2
http://www.geeksforgeeks.org/merge-k-sorted-arrays/

Saturday, January 18, 2014

Find the maximum repeating number in array where all the elements are in range 0 to k-1, k ∈ [0,N]

:
This updated information has been further expanded upon on my new website. You can find the updated details here: https://k5kc.com/cs/algorithms/find-element-which-appears-maximum-number-of-times-in-the-ranged-array/.
Given an array of size n, the array contains numbers in range from 0 to k-1 where k is a positive integer and k <= n. Find the maximum repeating number in this array. For example, let k be 10 the given array be arr[] = {1, 2, 2, 2, 0, 2, 0, 2, 3, 8, 0, 9, 2, 3}, the maximum repeating number would be 2. Expected time complexity is O(n) and extra space allowed is O(1). Modifications to array are allowed.

Solution 1 - Brute force
The naive approach is to run two loops, the outer loop picks an element one by one, the inner loop counts number of occurrences of the picked element. Finally return the element with maximum count. Time complexity of this approach is O(n^2).

Solution2 - Using element as index in auxiliary array for counting the frequency
A better approach is to create a count array of size k and initialize all elements of count[] as 0. Iterate through all elements of input array, and for every element arr[i], increment count[arr[i]]. Finally, iterate through count[] and return the index with maximum value. This approach takes O(n) time, but requires O(k) space.

Solution 3 - Using modulo operator
Following is the O(n) time and O(1) extra space approach. Let us understand the approach with a simple example where arr[] = {2, 3, 3, 5, 3, 4, 1, 7}, k = 8, n = 8 (number of elements in arr[]).
1) Iterate though input array arr[], for every element arr[i], increment arr[arr[i]%k] by k (arr[] becomes {2, 11, 11, 29, 11, 12, 1, 15 })
2) Find the maximum value in the modified array (maximum value is 29). Index of the maximum value is the maximum repeating element (index of 29 is 3).
3) If we want to get the original array back, we can iterate through the array one more time and do arr[i] = arr[i] % k where i varies from 0 to n-1.
How does the above algorithm work? Since we use arr[i]%k as index and add value k at the index arr[i]%k, the index which is equal to maximum repeating element will have the maximum value in the end. Note that k is added maximum number of times at the index equal to maximum repeating element and all array elements are smaller than k.
Following is C++ implementation of the above algorithm.
#include<iostream>
using namespace std;
 
// Returns maximum repeating element in arr[0..n-1].
// The array elements are in range from 0 to k-1
int maxRepeating(int* arr, int n, int k)
{
    // Iterate though input array, for every element
    // arr[i], increment arr[arr[i]%k] by k
    for (int i = 0; i< n; i++)
        arr[arr[i]%k] += k;
 
    // Find index of the maximum repeating element
    int max = arr[0], result = 0;
    for (int i = 1; i < n; i++)
    {
        if (arr[i] > max)
        {
            max = arr[i];
            result = i;
        }
    }
 
    /* Uncomment this code to get the original array back
       for (int i = 0; i< n; i++)
          arr[i] = arr[i]%k; */
 
    // Return index of the maximum element
    return result;
}
 
// Driver program to test above function
int main()
{
    int arr[] = {2, 3, 3, 5, 3, 4, 1, 7};
    int n = sizeof(arr)/sizeof(arr[0]);
    int k = 8;
 
    cout << "The maximum repeating number is " <<
         maxRepeating(arr, n, k) << endl;
 
    return 0;
}

Output : The max repeating number is 3

What else?
The above solution prints only one repeating element and doesn’t work if we want to print all maximum repeating elements. For example, if the input array is {2, 3, 2, 3}, the above solution will print only 3. What if we need to print both of 2 and 3 as both of them occur maximum number of times. Write a O(n) time and O(1) extra space function that prints all maximum repeating elements. (Hint: We can use maximum quotient arr[i]/n instead of maximum value in step 2).
Note that the above solutions may cause overflow if adding k repeatedly makes the value more than INT_MAX.
This article is compiled by Ashish Anand and reviewed by GeeksforGeeks team. Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

Finding the max repeated element in an array

This updated information has been further expanded upon on my new website. You can find the updated details here: https://k5kc.com/cs/algorithms/find-most-frequent-element-in-the-array/.
Problem : Find the element which occurs maximum number of times.

METHOD 1 : Sorting the array and scanning the array
The simple solution is to
1) Sort the array
2) Scan the array such that keep the track of the elements which occurred max number of times

METHOD 2 : Using Binary Search Tree

We can have a binary search tree with an extra field count, which indicates the number of times an element appeared in the input. 
Node of the Binary Search Tree (used in this approach) will be as follows.

struct tree
{
  int element;
  int count;
}BST;

1) Insert elements in BST one by one and if an element is already present then increment the count of the node.
2) Now do the inorder traversal on the tree, keeping track of the count and value of max element in the tree.

Time complexity - O(n) + O(n) ≈ O(n). First for contructing the tree and second for inorder traversal.

Space complexity - O(2n)  ≈ O(n), since every node needs 2 extra pointers.

METHOD 3 : Using Hashtable
Use a counter for each elements, as value and key as the element in the hashtable. At the end just retun the element having max counter.

Time complexity - O(n) and space complexity O(n) needed for storing in hashtable.



Print letters followed by their frequency (Run Length Encoding)

Given a string ,Write a program to print letter followed by it’s frequency.This is also knows as Run Length Encoding.

Example:
Input aaabcc
Output a3b1c2

Algorithm:
1.Pick the first character from the string and print it.
2.Count the number of subsequent occurrences of the picked character and then print the count.
3.Pick the next character and repeat step 2 till we reach end of the String.

#include <iostream>
#include<string>
using namespace std;
 
int main()
{
    string str = "aaabcc";
    int count=0;
    int i=0,j;
    int l = str.length();
    while(i<l)
    {
 
        cout << str[i];
                count=1;
        j=i;
        while(j+1<l && str[j+1]==str[i])
        {
            count++;
            j++;
        }
        cout << count;
        i=j+1;
    }
    return 0;
}

Time Complexity : Time complexity may look O(n^2) as there are two while loops but it’s O(n) as we visit each character once.

Remove duplicate characters in a string

This updated information has been further expanded upon on my new website. You can find the updated details here: https://k5kc.com/cs/algorithms/remove-duplicate-characters-in-a-string/.

Problem

Given a string, Write a program to remove duplcate characters from the string.

Input String : kodeknight
Output String : kodenight

Solution

Method 1 : Brute force
void removeDuplicates(char *str)
{   
    if (!str)
        return;    
    int len = strlen(str);
    if (len < 2) 
        return;
     
    int tail = 1;   
    for (int i = 1; i < len; ++i)
    {       
        int j;      
        for (j = 0; j < tail; ++j)           
            if (str[i] == str[j])               
                break;
         
        if (j == tail) 
        {           
            str[tail] = str[i];
            ++tail;
        }   
    }   
    str[tail] = '\0';
}

Time Complexity : O(N^2)


Method 2 : Using sorting
  1. Sort the elements.
  2. Now in a loop, remove duplicates by comparing the current character with previous character.
  3. Remove extra characters at the end of the resultant string.
Note that, this method doesn’t keep the original order of the input string.
Example
  1. Sort the elements - deghikknot
  2. Remove dupes, deghiknott
  3. Remove extra chars , here last t is extra, we have to simply replace it
Time complexity - O(n logn)

Method 3 - Using hash table
1. For each character, check if it is duplicate of already found characters in the hash array of 256 characters, assuming ASCII string
2. Skip duplicate characters and update the non duplicate characters.


void removeDuplicates(char *str)
{
     
    if (!str)           
        return;     
    int len = strlen(str);      
    if (len < 2)         
        return;     
    bool hit[256];      
    for(int i = 0; i < 256; ++i)         
        hit[i] = false;
         
    hit[str[0]] = true;
    int tail = 1;       
    for (int i = 1; i < len; ++i)
    {           
        if (!hit[str[i]])
        {
            str[tail] = str[i];             
            ++tail; 
            hit[str[i]] = true;
        }
    }
    str[tail] = '\0';
}

Time Complexity : O(N)



Putting the constraint of space and ordering
Suppose the problem becomes:
Write code to remove the duplicate characters in a string without using any additional buffer. NOTE: One or two additional variables are fine. An extra copy of the array is not.

Then what solution do we have?Method 2 cannot be used as we want to keep the ordering same.
Method 3 cannot be used, as it uses hashtable, which uses extra spaces.
Well, nothing is stopping us from using method 1, but it is brute force, and can we do better?

Method 4 - Special case of chars being a...z
The answer is yes, if the interviewer confirms, that your string is a constrained by say it can have only small case integers.
Since one or two additional variables are fine but no buffer is allowed, you can simulate the behaviour of a hashmap by using an integer to store bits instead.
This does not work for normal cases. It will only work from a..z case sensitive. No spaces supported. This would work miracles in a 256bit system to process the entire ASCII range. But it does run in O(N). I still +1 this one.

This simple solution runs at O(n), which is faster than yours. Also, it isn't conceptually complicated and in-place :

public static void removeDuplicates(char[] str) {
    int map = 0;
    for (int i = 0; i < str.length; i++) {
        if ((map & (1 << (str[i] - 'a'))) > 0) // duplicate detected
            str[i] = 0;
        else // add unique char as a bit '1' to the map
            map |= 1 << (str[i] - 'a');
    }
}

The drawback is that the duplicates (which are replaced with 0's) will not be placed at the end of the str[] array. However, this can easily be fixed by looping through the array one last time. Also, an integer has the capacity for only regular letters.

References - Stackoverflow