Find the distinct words from a text file ~ KodeKnight

Monday, April 26, 2010

Find the distinct words from a text file

April 26, 2010 Algorithms, kodeknight No comments

The data structure of the trie and its addWord() method remains the same.

class trieNode: trieNode[26] children char c int countPrefix // words with prefix from root to current node int countWord // complete words for this current node void addWord(trieNode node, string word): //end of the word if (word = ""): node.countWord = node.countWord +1 return //do stuff with current node node.countPrefix = node.countPrefix + 1 //go to next node char first = firstChar(word) if (node.children[first] == null): node.children[first] = new trieNode() addWord(node.children[first], firstCharRemoved(word))

Another example is to find the distinct words from a text file. At first, I thought of a Hashmap which acts like a boolean array for the words as index. In which case, the time required is O(n) and space required is that of the distinct words. With Trie, the solution is better and elegant in terms of space required, time required is O(length). In case of words 'anon', 'an', 'a', the Trie formed (''-a-n-o-n) would just need space for a-n-o-n. Thus, the advantages of Trie would show over a Hashmap when many words have overlapping prefixes. Once the Trie is formed, a DFS would print the distinct words as follows.

//do DFS on the trie StringBuilder sb = new StringBuilder() dfs(root, sb) void dfs(trieNode node, StringBuilder sb): sb.append(node.c) if (node.countWord > 0): print(sb) for (trieNode child : node.children) if (child != null): dfs(child, sb) sb.length = sb.length -1

KodeKnight

Like Us

Followers

Topics

Books for You

Contributors

Search This Blog

Labels

Report Abuse

Blog Archive

Blogger news

Blogroll

ADs

Blogger news

Monday, April 26, 2010

Find the distinct words from a text file

0 comments:

Post a Comment

Labels

Likeus