Saturday, December 31, 2011

First non repeating element in the array

Question: Write an algorithm to find the first non-repeated character in a string. For example, the first non-repeated character in the string ‘abcdab’ is ‘c’.

Answer: Seems trivial enough right? If a character is repeated, we should be able to search the string to determine if that character appears again. So if we go about doing this for every character in the string, our worst case run time would O(n^2). Here is some code to show this logic.

char returnFirstNonRepeatedChar( char * str )
{
    int i, repeated = 0;
    int len = strlen(str);

    for( i = 0; i < len; i++ )
    {
        repeated = 0;
        for( j = 0; j < len; j++ )
        {
            if( i != j && str[i] == str[j] ) {
                repeated = 1;
                break;
            }
        }         

        if( repeated == 0 ) { 
 // Found the first non-repeated character
            return str[i];
        }
    }

    return '';
}

This approach seems to work perfectly well. But can we do better? Of course we can! With the use of other data structures we can reduce the run time of this algorithm. Any guesses?

Using binary search will not help. Though it looks like easiest way to improve search efficiency on a set of data is to put it in a data structure that allows more efficient searching. What data structures can be searched more efficiently than O(n)? Binary trees can be searched in O(log(n)). But it will not return the "first" non repeating element.

Using hashtable

A hash table could work really handy here. Although, with this approach, we do sacrifice space for runtime improvement. So we can create a hash table by sequentially reading all the characters in the string and keeping count of the number of times each character appears. Once we’ve created the hash table, we can sequentially read the entries to see which one has a count of one. What is the runtime of this algorithm? We have O(n) to create the hash table and another O(n) to read the entries. This results in a runtime of O(n) + O(n) = O(2n) = O(n).

Arrays and hashtables both have constant time element lookup. Begin by trying to take advantage of an array or hashtable because these data structures offer the greatest potential for improvement. You'll want to be able to quickly determine which data structure is better to use.

Hashtables have a higher lookup overhead than arrays. On the other hand, an array would initially contain random values that you would have to take time to set to zero, whereas a hashtable initially has no values. Perhaps the greatest difference is in memory requirements. An array would need an element for every possible value of a character. This would amount to a relatively reasonable 256 elements if you were processing ASCII strings, but if you had to process Unicode strings you would need more than 65,000 elements (Unicode uses 16-bit characters). In contrast, a hashtable would require storage for only the characters that actually exist in the input string. So, arrays are a better choice for long strings with a limited set of possible character values.
hashtables are more efficient for shorter strings or when there are
many possible character values.


0 comments:

Post a Comment