Given a dictionary of word - Group all words into set of anagrams of each other ~ KodeKnight

Saturday, August 10, 2013

Given a dictionary of word - Group all words into set of anagrams of each other

August 10, 2013 Amazon, anagrams, kodeknight, Microsoft, String No comments

Given a set of words in a dictionary, write a program to group all words which are anagrams of each other in to sets.

input: “bat”, “tar”, “xyz” , “tab”, “tar”
output:[["bat", tab"], ["tar", rat"],”xyz” ]

Anagrams
Two words are anagrams if one of them has exactly same characters as that of the another word.

Example : Anagram & Nagaram are anagrams (case-insensitive).

Approach
The simpler logic would be to :

Use some sort of a hash structure - key being string, and value may be list of words
Sort all words alphabetically and use the sorted word as the key, the value being the non sorted word. So bat will become abt, and it is the key in the dictionary and values will be bat and tab.
If the key is found, keep on adding the original word to the ‘value’ list of strings

Something like this :
Solution 1 : Using hash table

def compare_sort(x):
        sorted_key_list = list(x)
        sorted_key_list.sort()
        sorted_str = ','.join(str(n) for n in sorted_key_list)
        sorted_str = sorted_str.replace(',','')
        return sorted_str

source = ['tab','bat','atb','tac','cat','xyz']

target_hash = {}

for x in source:
        sorted_key = compare_sort(x)
        print sorted_key

        if target_hash.has_key(sorted_key):
                existing_list = target_hash[sorted_key]
                existing_list.append(x)
                target_hash[sorted_key] = existing_list
        else:
                target_list =[x]
                target_hash[sorted_key] = target_list

print target_hash

Complexity: number of words in dictionary
space complexity : hash table entries

Solution 2 - Assigning characters with prime numbers

The obvious solution is to map each character to a prime number and multiply the prime numbers. So if 'a'' -> 2 and 'b' -> 3, then

'ab' -> 6
'ba' -> 6
'bab' -> 18
'abba' -> 36
'baba' -> 36

To minimise the chance of overflow, the smallest primes could be assigned to the more frequent letters (e,t,i,a,n). Note: The 26th prime is 101.So, for all the words where the integer comes out to be same, you have an anagram. Here is more you can read on this.

KodeKnight

Like Us

Followers

Topics

Books for You

Contributors

Search This Blog

Labels

Report Abuse

Blog Archive

Blogger news

Blogroll

ADs

Blogger news

Saturday, August 10, 2013

Given a dictionary of word - Group all words into set of anagrams of each other

0 comments:

Post a Comment

Labels

Likeus