Tuesday, July 30, 2013

Operation time complexity for an ARRAY

The time complexity of handling the operations in a data structure like an ARRAY are as follows:

i. Almost all the operations are O(1) [i.e. omega times one]
ii. Remove/Insert operations that handles element between elements require O(n) [i.e. omega times n] because the elements need to be shifted.
Note! Hence, an array as a data structure is used where remove/insert operations are not required.

Operation time complexity for a LINKED LIST

The time complexity of handling the operations in a data structure like a LINKED LIST are as follows:

i. Most of the operations are O(n) [i.e. omega times n]
ii. Remove/Insert operations that handles element between elements require O(1) [i.e. omega times one] this is due to references/pointers on both the ends of the list, hence the elements don't require to be shifted. However, due to the references/pointers this data structure requires additional memory!

Time complexity - Omega of algorithms

a) if statement - O(n) [Omega times n]
b) for loop - O(n) [Omega times n]
c) for loop with a for loop - O(n*n) [Omega times n squared]
d) for loop within a if loop, this if loop is within a for loop - O(n + n*n) [n plus n squared omega]
e) while loop - O(n) [Omega times n]
f) MergeSort - O(nlogn) [Omega n time logarithmic n]
g) HeapSort - O(nlogn) [Omega n time logarithmic n]
h) KMP algorithm - O(m+n) [m plus n omega]

Time complexity of algorithms - In particular Big Omega of algorithms

Big Omega or so called asymptotic upper bound of algorithms such as Sequential statements, If statements and Looping construct

(i) Sequential statements:
Big Omega of this algorithm equals to maximum time complexities of all individual statements (i.e. sum of maximum time complexities of the individual statements)
Therefore, O[g(n)] = O{f1(n)+f2(n)+ ... +fk(n)}
Conclusion: Overall complexity of this algorithm equates to O[g(n)]

(ii) If statements:
Big Omega of this algorithm equals to maximum time complexities of either one of the alternative statements i.e. then branch -> f(n)/else branch -> g(n)
Therefore, O[h(n)] = O{f(n), g(n)}
Conclusion: Overall complexity of this algorithm equates to O[h(n)]

(iii) Looping construct:
Big Omega of this algorithm equals to aggeration of the maximum time complexities of the following: (a) execution time (i.e. elementary operations execution in a given iteration) -> f(n) and (b) the number of iterations -> g(n)
Therefore, O{f(n) * g(n)}
Conclusion: Overall complexity of this algorithm equates to O{f(n) * g(n)}

String Searching Algorithm - Knuth Morris Pratt or KMP Algorithm

Knuth-Morris-Pratt string searching algorithm:
- This algorithm searches a word->w within the main string text->s
- Such a search must employ the following observation:
i) A given match determines where the next match could begin
ii) The above intelligence is obtained for the search word itself that contains sufficient information for making such a discussion (i.e. determine where the next match begins)

Advantage of this string searching algorithm: Helps to avoid the re-examination of already matched characters

Step1:

returns: No match found!
Search algorithms observation: We notice no string->'s'(i.e. word position->i[0]) occurs between position w[0] & w[8] in the main string text except at w[5]. Hence, the next character matching operation begins at w[5].

Step2:

returns: Match found!
Search algorithms observation: We notice all strings of the word occurs between position w[5] & w[3] in the main string text. The next character matching operation begins at w[4].

Step 3:

returns: No match found!
Search algorithms observation: We notice no string->'s'(i.e. word position->i[0]) occurs between position w[4] & w[9] in the main string text. The character matching operation ends here.

http://dev-faqs.blogspot.in/2010/05/knuth-morris-pratt-algorithm.html
http://www.geeksforgeeks.org/searching-for-patterns-set-2-kmp-algorithm/

Priority Queue

What is priority queue?

A data structure for maintaining items or elements in a queue of priorities, which is usually implemented using an array. A queue of this nature helps to insert every item with an assigned priority and these inserted items could be deleted if required.


Minimum number of queues needed to implement the priority queue?
Two. One queue is used for actual storing of data and another for storing priorities.

  • With priority queue it's possible to insert items then remove max or min.
  • Simple implementation is to have array / linked list. And either keeping elements sorted or iterate all items when removing.
  • This makes some operations cost N.
  • A better implementation is with a binary heap / complete binary tree / a tree that is perfectly balanced except the bottom row. This makes priority queue take log n time.

Implemented as Binary heap

  • Binary heap can be implemented using an array where the root is a[0], 1st level is a[1] and a[2] and so on.
  • Swim Operation - if a child is larger than parent, switch with parents until the root of the tree.
  • Insert is simple. Add to the end of the array and swim if needed.
  • Sink is the opposite - if parent becomes smaller - exchange with larger of two children.
  • To remove item, return head, replace head with last item in the array and sink
  • It's better to make data immutable by copying data to local instance variable.

Immutable

  • Ensures data/keys don't change while they are in the collection.
  • Disadvantage is doubling the data.
  • Classes should be as immutable as possible

Heapsort

  • Construction - start from the leaves - ensure each sub tree is a correct heap
  • Sortdown - Remove highest element and exchange with last element. Sink. Repeat until all elements are out of the heap
  • array should now be sorted.
  • Result: In place sort with NlogN cost in the worse case. Better than Quicksort quadratic time worse case.
  • Cons: Not stable, poor use of cache memory.

Event driven simulation

  • Using priority queue sorted by next collision.
  • Main loop:
    • Pull next collision, advance time to that collision
    • Resolve collision
    • Calculate new collisions for the two participating particles

Symbol Table

  • Storing key/value pairs
  • Properly writing equals method
    • check for null
    • check if they are the same class
    • compare fields
  • Simple implementation - linked list - every operation requires searching the list. Cost: N.
  • Sorted array for keys and using binary search
  • Ordered operations that we can allow: min(), max(), keys(min, max), select(index), floor(time).

Binary Search Tree implementation.

  • Every node's key is:
    • Larger than all keys on the left subtree
    • Smaller than all keys on the right subtree
  • To find a key, keep going down the tree with greater/right, less/left until getting to the place.
  • To insert: similar to get, just insert where we stopped.
  • Issue: key order affects the tree shape.
  • Ordered Operations:
    • min: go to the left most tree
    • max: go to the right most tree
  • Iterate:
    • In-order: add left, add self, add right
  • Delete a node with no children - simple
  • Delete min - promote right subtree if available to replace the min
  • Hibbard Deletion - to delete a node:
    • if there are two children - find min from the right subtree. Delete the min, assign min to replace deleted node, with same left/right sub-trees.
  • Over time the tree will get unbalanced and deletion will be sqrt(N)
 Source
https://github.com/mb-dev/mb-dev.github.io/blob/master/study-notes/algorithms/week4.md

What is a non-linear datastructure?

A non-linear datastrucutre is a datastructure in which the data items in the memory are not allocated contiguously i.e. the data items are dispersed in the memory. The first data item will have a link to the second data item and second data item will have a link to the third data item and so on.

Pros
  • Uses memory efficiently that the free contiguous memory in not an requirement for allocating data items
  • The length of the data items is not necessary to be known prior to allocation
Cons
  • Overhead of the link to the next data item
Examples of non-linear datastructure are Linked List

What is a linear datastructure?

Linear datastructure is one in which the data items are placed in the memory contiguously i.e. one after the other.
Pros
  • If we search the first data item, then it is very easy to find the next data items
Cons
  • Size of the array must be know prior to allocation
  • Requires contiguous memory, however if a free memory space is disjoint then this free memory space is not utilized for memory allocation
Example of linear datastructure is an array.

What is a sorting algorithm?

A sorting algorithm is an algorithm that sorts the data into ascending or descending order.

Examples of such sorting algorithms are:

Insertion sort

The sorting problem
Input: Array of numbers , unsorted. Eg.

Output : Same numbers sorted in some order, say increasing order. Eg.


Insertion sort:
An efficient elementary sort method which places each element in it's proper place among the elements which are already placed

Pseudocode for selection sort


  • Starts by considering the first two elements of the array data, if out of order, swap them 
  • Consider the third element, insert it into the proper position among the first three elements. 
  • Consider the forth element, insert it into the proper

Example
Consider the following array:

We have to sort the array in increasing order. So, first we take first 2 elements i.e. 28 and 18 and swap them as they are out of order.


2nd pass - Now, we consider first 3 elements. As, 7 is smallest element, we bring it in 0th index, and shift the rest of the elements from there on, as they are already sorted. So, the idea is to exchange the element selected in this pass (here 7) to swap with all elements in the left which are higher than that element.
Likewise, the algorithm goes on.


















Pseudocode
public static void sort(int[] array)
{
    int N = array.length;
    for(int i=0;i < N;i++)
        for(int j=0;j>0;j--)
            if(a[j] < a[j-1])
                swap(a[j],a[j-1]);
            else
                break;
}


Implementation in cpp

Implementation in cpp using templates

#include <iostream>
#include <string.h>

template <typename T>
class InsertionSort
{
    public:
        InsertionSort();
        ~InsertionSort();

        void sort(T arr[], int size);
    private:
        void compareExchange(T arr[], int l, int r);
        bool greater(T left, T right);
};

//Constructor
template <typename T>
InsertionSort<T>::InsertionSort(){}

//Destructor
template <typename T>
InsertionSort<T>::~InsertionSort(){}

template <typename T>
void InsertionSort<T>::sort(T arr[], int size)
{
    for(int i = 1; i < size; ++i)
    {
        for(int j = i; j > 0; --j)
        {
            compareExchange(arr, j-1, j);
        }
    }
}

template <typename T>
void InsertionSort<T>::compareExchange(T arr[], int l, int r)
{
    if(greater(arr[l], arr[r]))
    {
        T temp = arr[l];
        arr[l] = arr[r];
        arr[r] = temp;
    }
}

template <typename T>
bool InsertionSort<T>::greater(T left, T right)
{
    return left > right;
}

template <>
bool InsertionSort<const char*>::greater(const char *left, const char *right)
{
    return strcmp(left, right) > 0;
}

template <typename T>
void print(T arr[], int size)
{
    for(int i = 0; i < size; ++i)
        std::cout << arr[i] << " ";
    std::cout << std::endl;
}

template <>
void print(std::string arr[], int size)
{
    for(int i = 0; i < size; ++i)
        std::cout << arr[i].c_str() << " ";
    std::cout << std::endl;
}

template <>
void print(const char *ptrArray, int size)
{
    for(int i = 0; i < size; ++i)
        std::cout << ptrArray[i] << " ";
    std::cout << std::endl;
}

int main()
{
    int arr[] = { 10, 65, 35, 25, 15, 75, 85, 45, 65 };
    InsertionSort<int> isInt;
    isInt.sort(arr, 9);
    print(arr, 9);

    std::string strArr[] = { "pankaj", "paresh", "hello", "world", "ankit", "aditya", "sankalp", "aladdin" };
    InsertionSort<std::string> isString;
    isString.sort(strArr, 8);
    print(strArr, 8);

    const char* ptrArray[] = { "pankaj", "paresh", "hello", "world", "ankit", "aditya", "sankalp", "aladdin", "george"};
    InsertionSort<const char*> isPtr;
    isPtr.sort(ptrArray, 9);
    print(ptrArray, 9);

    return 0;
}

Implementation in java using generics
 import org.junit.Assert;
 import org.junit.Test;
 
 class GenericInsertionSorter
 {
     public <T extends Comparable<T>> void sort(T[] elems) {
         int size = elems.length;
 
         for (int outerLoopIdx = 1; outerLoopIdx < size; ++outerLoopIdx) {
             for (int innerLoopIdx = outerLoopIdx; innerLoopIdx > 0; --innerLoopIdx) {
                 if (elems[innerLoopIdx - 1].compareTo(elems[innerLoopIdx]) > 0) {
                     T temp = elems[innerLoopIdx - 1];
                     elems[innerLoopIdx - 1] = elems[innerLoopIdx];
                     elems[innerLoopIdx] = temp;
                 }
             }
         }
     }
 }
 
 public class InsertionSortTester
 {
     private String[] unsortedNames = new String[] {
             "Pankaj",
             "Paresh",
             "Ankit",
             "Sankalp",
             "Aditya",
             "Prem",
             "Rocket",
             "Singh",
             "Alabama",
             "Alaska",
             "Animal" };
 
     private String[] sortedNames = new String[] {
             "Aditya",
             "Alabama",
             "Alaska",
             "Animal",
             "Ankit",
             "Pankaj",
             "Paresh",
             "Prem",
             "Rocket",
             "Sankalp",
             "Singh" };
 
     @Test
     public void testStringSort() {
         GenericInsertionSorter ss = new GenericInsertionSorter();
         ss.sort(unsortedNames);
         Assert.assertArrayEquals(unsortedNames, sortedNames);
     }
 }

Introduction to Heap

Definition of a heap

A heap is a data structure that contains objects which have comparable keys.

Uses of Heap
Some uses for a heap include: keeping track of employee records where the social security number is the key, network edges, events where there is a priority, etc.


Such a heap is called descending heap. In an ascending heap, on the other hand, the key value stored in a node is smaller than the keys of its children nodes.

Heap ADT (Abstract Data Type)
There are two basic operations for a heap.
  1. Insert - add a new object to the heap
    Running time - O (log n)
  2. Extract-min - remove an object in the heap with a minimum key value (with ties broken arbitrarily). You could always makes this an extract-max operation or negate every key and you will get the maximum when you extract-min.
    Running time - O (log n)
Note that heap can either support extractMin or extractMax, if you want both operations supported simultaneously, then use Binary Search tree.

Other operations supported
  • Heapify or BuildHeap - Initializes the heap in linear time i.e. O(n). Of-course you can use insert function, but then it will take O(log n) time per element, and so n log n time for n elements.
  • Delete -  You can delete any element from the heap in O (log n) time.

The running time for both operations is O(logn) where n is the number of objects in the heap. Some additional operations heap supports Another operation is heapify, which inserts a batch of objects into the heap. This seems like it should run in O(nlogn) but it can actually be done in O(n) by inserting the values in batch. Deletion can also be done in O(logn) time.

Implementing a heap
Conceptually,heap is an almost complete binary tree where the data (key) stored in a node is larger than those of its children nodes (i.e. concept of comparability) .

So, to understand heap, we have to see 2 views of heap - tree and array.

Tree view of heap

Heap must satisfy following properties :
  1. Heap is a binary tree - i.e. each node will have 0 or 1 or 2 children. 
  2. Heap is almost complete tree - All nodes are filled in, except the last level may have some nodes missing toward the right. For eg. :
    Almost Complete Binary Tree
  3. Heap Property - The key at any node is less than or equal to the keys of each of its children (if it has any). Because of this property, root of the minHeap is the smallest element.
    So, at every node x
    key[x] <= all keys of x's children
    (Also, Except root, all the nodes do have the parent.)
    Eg.

    Though, heap property puts some ordering of structure, but in no way guarantees the order for given keys
Array Implementation of Heap

Implementing the Operations of heap
Consider the operations one by one:
Cases when Implementing Binary heaps
Immutability of Keys
  • Assumption - Clients don't change the keys. So, better is to make the key class immutable. Like in Java - String, Integer, Double, Color, Vector, Transaction, Point2D are immutable, but these are mutable - StringBuilder, Stack, Counter, Java array.

Thanks.
Source

Selection Sort

The sorting problem
Input: Array of numbers , unsorted. Eg.

Output : Same numbers sorted in some order, say increasing order. Eg.



Selection Sort is a sorting algorithm that sorts data items into ascending or descending order, which comes under the category of in-place comparison sort.

Pseudocode of selection sort

  1. Get the smallest element and put it in the first position
  2. Get the next smallest element and put it in the 2nd position
  3. ....


Sorting takes place by stepping through all the data items one-by-one while looking for either largest or smallest data item and making only one swap after finding either largest or smallest data item. Hence, this sorting algorithm is referred to as the selection sort because on each pass this algorithm selects either largest or smallest of the remaining unsorted data items and places them in the right order.

For example, consider the following array:

Now we select the first smallest element, i.e. 4 and put it in first location.


And so on with 7,10 and 18.













Finally we have a sorted array:


Implementation in c
Will come soon - comment if you need early
 
Implementation in cpp
 #include <iostream>
 
 class SelectionSort
 {
     public:
         SelectionSort();
         ~SelectionSort();
 
         void sort(int arr[], int size);
 
     private:
         void exchange(int &x, int &y);
 };
 
 //Constructor
 SelectionSort::SelectionSort() {}
 
 //Destructor
 SelectionSort::~SelectionSort() {}
 
 void SelectionSort::sort(int arr[], int size)
 {
     for(int outerLoopIdx = 0; outerLoopIdx < size - 1; ++outerLoopIdx)
     {
         int min = outerLoopIdx;
         for(int innerLoopIdx = outerLoopIdx + 1; innerLoopIdx < size; ++innerLoopIdx)
         {
             if(arr[min] > arr[innerLoopIdx])
             {
                 min = innerLoopIdx;
             }
         }
         exchange(arr[outerLoopIdx], arr[min]);
     }
 }
 
 void SelectionSort::exchange(int &x, int &y)
 {
     int t = x;
     x = y;
     y = t;
 }
 
 void print(int arr[], int size)
 {
     for(int i = 0; i < size; ++i)
         std::cout << arr[i] << " ";
 }
 
 int main()
 {
     int arr[] = { 10, 65, 35, 25, 15, 75, 85, 45, 65 };
     SelectionSort ss;
     ss.sort(arr, 9);
     print(arr, 9);
 }

Selection sort in java
 To come soon - comment if you need early

Selection sort in java using generics
 import org.junit.Assert;
 import org.junit.Test;
 
 class GenericSelectionSorter
 {
     public <T extends Comparable<T>> void sort(T[] elems) {
         int size = elems.length;
 
         for (int outerLoopIdx = 0; outerLoopIdx < size - 1; ++outerLoopIdx) {
             int min = outerLoopIdx;
             for (int innerLoopIdx = outerLoopIdx; innerLoopIdx < size; ++innerLoopIdx) {
                 if (elems[min].compareTo(elems[innerLoopIdx]) > 0) {
                     min = innerLoopIdx;
                 }
             }
 
             // exchange elements at outerIndexLoop and min positions
             T temp = elems[min];
             elems[min] = elems[outerLoopIdx];
             elems[outerLoopIdx] = temp;
         }
     }
 }
 
 public class SelectionSortTester
 {
     private String[] unsortedNames = new String[] {
             "Pankaj",
             "Paresh",
             "Ankit",
             "Sankalp",
             "Aditya",
             "Prem",
             "Rocket",
             "Singh",
             "Alabama",
             "Alaska",
             "Animal" };
 
     private String[] sortedNames = new String[] {
             "Aditya",
             "Alabama",
             "Alaska",
             "Animal",
             "Ankit",
             "Pankaj",
             "Paresh",
             "Prem",
             "Rocket",
             "Sankalp",
             "Singh" };
 
     @Test
     public void testStringSort() {
         GenericSelectionSorter ss = new GenericSelectionSorter();
         ss.sort(unsortedNames);
         Assert.assertArrayEquals(unsortedNames, sortedNames);
     }
 }

Thanks

What is Bucket Sort?

Bucket Sort (alternatively know as bin sort) is an algorithm that sorts numbers and comes under the category of distribution sort.
This sorting algorithm doesn't compare the numbers but distributes them, it works as follows:
  1. Partition a given array into a number of buckets
  2. One-by-one the numbers in the array are placed in the designated bucket
  3. Now, only sort the bucket that bare numbers by recursively applying this Bucket Sorting algorithm
  4. Now, visiting the buckets in order, the numbers should be sorted
Java code:
public class BucketSort{
 
   public static void sort(int[] a, int maxVal) {
      int [] bucket=new int[maxVal+1];
 
      for (int i=0; i<bucket.length; i++) {
         bucket[i]=0;
      }
 
      for (int i=0; i<a.length; i++) {
         bucket[a[i]]++;
      }
 
      int outPos=0;
      for (int i=0; i<bucket.length; i++) {
         for (int j=0; j<bucket[i]; j++) {
            a[outPos++]=i;
         }
      }
   }
 
 
   public static void main(String[] args) {
      int maxVal=5;
      int [] data= {5,3,0,2,4,1,0,5,2,3,1,4}; 
 
      System.out.println("Before: " + Arrays.toString(data));
      sort(data,maxVal);
      System.out.println("After:  " + Arrays.toString(data));
   }
}

Thanks.

What is a Graph data structure?

Graphs consist of 2 ingredients :
  • Vertices (V) , also called nodes and can be pictured as dots
  • Edges (E) , the line connecting the nodes.
Types of Graphs
Depending on the edges, there two types of graphs:
  1. Undirected Graphs - A graph that entail edges with ordered pair of vertices, however it does not have direction define. Example of such a graph is the 'Family tree of the Greek gods'
  2. Directed Graphs (aka Arcs) - Directed Graph: A graph that entail edges with ordered pair of vertices and has direction indicated with an arrow. Example of such a graph is the 'A finite-state machine of a light switch'
So, consider the following figure below:
So, above graph look similar, but 2nd graph is directed one, as it has arrows attached with some direction.



Graph ADT
Graph is an Abstract Data Type (ADT) which is of a non-linear structure entailing sets of nodes and connectors. Here, the connectors helps to establish relationship between nodes. However, from the graph ADT literature point-of-view the nodes are referred to as Vertices and connectors are referred to as Edges.
>Mathematically, a graph G entails vertices V, which are connected by edges E. Hence, a graph is defined as an ordered pair G=(V,E), where V is a finite set and E is a set consisting of two element subsets of V. We will come to ADT after types of graphs.


Graph ADT in java
So this can be represented as graph:

public class Edge {
    private Node a, b;
    private directionEnum direction;     // AB, BA or both 
    private int weight;
    ...
}

public class Vertex
{
    private List edges;
}

It is looking simple right now, and I agree it may vary depending on your requirement. So, this is just one way of representing the graph. Also, we have to come accross adjacency list and adjacency matrix to make it more clear.

For java, you can refer to following graph library:

Examples of Graph
Graphs are used ubiquitously.
  • An obvious example is a map. The places are the vertices while the roads connecting them are the edges. 
  • Web - Individual pages are vertex and linkages among pages are edges
  • Social networks - Vertices are individuals and relations are edges
  • Graphs can also be used to set precedence. That is, you can show which things are prerequisites for others, such as course prerequisite. 
  • Networks - What is network in terms of graph:
    Network refers to a graph in which the edges are bound with weights. Here weights can be numbers assigned to a the edges. Hence, a graph of this nature is referred to as weighted graph or a network.
Number of edges in graph
A graph in one piece with n nodes must have at least n-1 edges. You start out with n distinct pieces and by adding an edge between two edges, you now have n-1 distinct pieces. To get to one piece, you need to do this n-1 times.
This graph will have a maximum number of edges when every vertex is connected to every other vertex. Then there will n(n-1)/2 total edges or nC2, because this is the number of ways of choosing 2 vertices from n of them.

Sparse vs Dense graphs
This distinction is useful as some algorithm are good for sparse graph, and some for dense graph.

Let n( OR |V|) be the # of vertices and m (OR |E|) be the # of edges,
then in most (but not all) applications, we have that m is Ω (n) and O(n2).

A graph is "sparse" if it is closer to the lower bound and "dense" if it is closer to the upper bound.
This is a loose definition.



Resources:
  Source 1, Source 2

Graph Traversal Methods

There are two possible traversal methods for a graph:
i. Breadth-First Traversal
ii. Depth-First Traversal

i. Breadth-First Traversal: Traverse all the vertices starting from a given 'start vertex'. Hence, such a traversal follows the 'neighbour-first' principle by considering the following:
- No vertex is visited more than once
- Only those vertices that can be reached are visited

Importantly here, we make use of the Queue data structure. In effect, we house the vertices in the queue that are to be visited soon in the order which they are added to this queue i.e. first-in first-out.

Visiting a vertex involves returning the data item in the vertex and adding its neighbour vertices to the queue. However, we don't carryout any action under following conditions:
- if the neighbour is already present in the queue or if the neighbour has been already visited


ii. Depth-First Traversal: Traverse all the vertices starting from a given 'start vertex'. Hence, such a traversal follows the 'neighbour-first' principle by considering the following:
- No vertex is visited more than once
- Only those vertices that can be reached are visited

Importantly here, we make use of the Stack data structure. In effect, we stack the vertices so that vertices to be visited soon are in order in which they are popped from the stack i.e. last-in, first-out.

Visiting a vertex involves returning the data item in the vertex and adding its neighbour vertices to the stack. However,
we don't carryout any action if the following occurs:
- neighbour is already present in the stack or if the neighbour has already been visited



 

Find shortest path using Dijkstra's Algorithm

The Dijkstra's algorithm entails the following procedure:
- The breadth-first approach is used to traversal the network, however the difference here is instead of a queue data structure to store the traversed vertex this algorithm uses a priority queue. The priority queue helps to remove the items in order of values rather than in order of being added to the queue.
- Have an account of lowest total path weight (i.e. weightsum), from the start vertex to the target vertex
- Have an account of immediate predecessor in a given path for each vertex
- Have an account of the items in the priority queue

Hashing, Hash Data Structure and Hash Table

Hashing is the process of mapping large amount of data item to a smaller table with the help of a hashing function. The essence of hashing is to facilitate the next level searching method when compared with the linear or binary search. The advantage of this searching method is its efficiency to hand vast amount of data items in a given collection (i.e. collection size).

Due to this hashing process, the result i
s a Hash data structure that can store or retrieve data items in an average time disregard to the collection size.

Hash Table is the result of storing the hash data structure in a smaller table which incorporates the hash function within itself. The Hash Function primarily is responsible to map between the original data item and the smaller table itself. Here the mapping takes place with the help of an output integer in a consistent range produced when a given data item (any data type) is provided for storage and this output integer range determines the location in the smaller table for the data item. In terms of implementation, the hash table is constructed with the help of an array and the indices of this array are associated to the output integer range.

Hash Table Example :
Here, we construct a hash table for storing and retrieving data related to the citizens of a county and the social-security number of citizens are used as the indices of the array implementation (i.e. key). Let's assume that the table size is 12, therefore the hash function would be Value modulus of 12.

Hence, the Hash Function would equate to:
(sum of numeric values of the characters in the data item) %12
Note! % is the modulus operator

Let us consider the following social-security numbers and produce a hashcode:
120388113D => 1+2+0+3+8+8+1+1+3+13=40
Hence, (40)%12 => Hashcode=4

310181312E => 3+1+0+1+8+1+3+1+2+14=34
Hence, (34)%12 => Hashcode=10

041176438A => 0+4+1+1+7+6+4+3+8+10=44
Hence, (44)%12 => Hashcode=8

Therefore, the Hashtable content would be as follows:
-----------------------------------------------------
0:empty
1:empty
2:empty
3:empty
4:occupied Name:Drew Smith SSN:120388113D
5:empty
6:empty
7:empty
8:occupied Name:Andy Conn SSN:041176438A
9:empty
10:occupied Name:Igor Barton SSN:310181312E
11:empty
-----------------------------------------------------

What is a Complete Binary Tree?

A strictly binary tree of depth 'd' in which all the leaves are at level 'd' i.e. there is non empty left or right subtrees and leaves are populated at the same level.
In a complete binary tree all the internal nodes have equal degree, which means that one node at the root level, two nodes at level 2, four nodes at level 3, eight nodes at level 4, etc, which the below figure represents:

Binary tree traversal: Preorder, Inorder, Post-order, BFS, DFS, Level Order, zig zag order

In order to illustrate the 3 traversals  - pre order, in-order and post-order lets take following tree:

Preorder traversal:
To traverse a binary tree in Preorder, following operations are carried-out (i) Visit the root, (ii) Traverse the left subtree, and (iii) Traverse the right subtree.
Therefore, the Preorder traversal of the above tree will outputs:
7, 1, 0, 3, 2, 5, 4, 6, 9, 8, 10

Implementing pre-order traversal

preorder(N *root)
{
  if(root)
  {
    printf("Value : [%d]", root->value);
    preorder(root->left);
    preorder(root->right);
  }
}

More details here.


Inorder traversal:
Inorder traversal prints the binary tree in increasing order. To traverse a binary tree in Inorder, following operations are carried-out (i) Traverse the left most subtree starting at the left external node, (ii) Visit the root, and (iii) Traverse the right subtree starting at the left external node.
Consider the tree below

Therefore, the Inorder traversal of the above tree will outputs:
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Implementing inorder traversal(recursive)
inorder(Node *root)
{
  if(root)
  {
    inorder(root->left);
    printf("Value : [%d]", root->value);
    inorder(root->right);
  }
}

More details here.  

Postorder traversal:
To traverse a binary tree in Postorder, following operations are carried-out (i) Traverse all the left external nodes starting with the left most subtree which is then followed by bubble-up all the internal nodes, (ii) Traverse the right subtree starting at the left external node which is then followed by bubble-up all the internal nodes, and (iii) Visit the root.

Consider the tree below

Therefore, the Postorder traversal of the above tree will outputs:
0, 2, 4, 6, 5, 3, 1, 8, 10, 9, 7

Implementing post order traversal

postorder(Node *root)
{
  if(root)
  {
    postorder(root->left);
    postorder(root->right);
    printf("Value : [%d]", root->value);
  }
}

More details here.
Time complexity of above traversals is O(n).

BFS-Breadth first search
This is same as level order traversal, where we go level by level of the tree.

more details here.

DFS - Depth first search
more details here.

Level order traversal

Zig-zag order traversal

more details here.


Monday, July 29, 2013

Data Structure Introduction

The basic building block of DATA STRUCTURE are "data" and "information". So, before discussing data structure, we need to understand what is data and some ground facts about data structure.


many people use the terms "data" and "information" as synonyms but these two terms actually convey very distinct concepts. 

  • "data" is defined as a body of facts or figures, which have been gathered systematically for one or more specific purposes
    • data can exist in the forms of
      • linguistic expressions (e.g. name, age, address, date, ownership)
      • symbolic expressions (e.g. traffic signs)
      • mathematical expressions (e.g. E = mc2)
      • signals (e.g. electromagnetic waves)

  • "information" is defined as data which have been processed into a form that is meaningful to a recipient and is of perceived value in current or prospective decision making
    • although data are ingredients of information, not all data make useful information
      • data not properly collected and organized are a burden rather than an asset to an information user
      • data that make useful information for one person may not be useful to another person
    • information is only useful to its recipients when it is
      • relevant (to its intended purposes and with appropriate level of required detail)
      • reliable, accurate and verifiable (by independent means)
      • up-to-date and timely (depending on purposes)
      • complete (in terms of attribute, spatial and temporal coverage)
      • intelligible (i.e. comprehensible by its recipients)
      • consistent (with other sources of information)
      • convenient/easy to handle and adequately protected

DATA STRUCTURE DEFINITION :-
            Data structure is a collection of data elements whose organization is characterized by accessing functions that are used to store and retrieve individual data elements.
OR
               The logical or mathematical model of a particular organization of data is called data structure.


The two important goals of data structures are first to identify the representation of abstract entities and then to identify the operations, which can be performed with them. The operations help us to determine the class of problems, which can be solved with these entities. The proper construction of a program is influenced by the choice of data structure, which is used. A data structure is a systematic way of organizing and accessing data, and an algorithm is a step- by-step procedure for performing some task in a little amount of time. 

Depending on the organization of the elements, data structures are classified into two types:
  • Linear Data Structure: Elements are accessed in a sequential order but it is not compulsory to store all elements sequentially (say, Linked Lists). Examples are Linked Lists, Stacks and Queues. 
  • Non-Linear Data Structure: Elements of this data structure are stored/accessed in a non-linear order. Examples are Trees and Graphs.
To simplify the process of solving the problems, we generally combine the data structures along with their operations and are called Abstract Data Types (ADTs). An ADT consists of two parts:
  • Declaration of data
  • Declaration of operations