The data structure of the trie and its addWord() method remains the same.
class trieNode:
trieNode[26] children
char c
int countPrefix // words with prefix from root to current node
int countWord // complete words for this current node
void addWord(trieNode node, string word):
//end of the word
if (word = ""):
node.countWord = node.countWord +1
return
//do stuff with current node
node.countPrefix = node.countPrefix + 1
//go to next node
char first = firstChar(word)
if (node.children[first] == null):
node.children[first] = new trieNode()
addWord(node.children[first], firstCharRemoved(word))
Another example is to find the distinct words from a text file. At first, I thought of a Hashmap which acts like a boolean array for the words as index. In which case, the time required is O(n) and space required is that of the distinct words. With Trie, the solution is better and elegant in terms of space required, time required is O(length). In case of words 'anon', 'an', 'a', the Trie formed (''-a-n-o-n) would just need space for a-n-o-n. Thus, the advantages of Trie would show over a Hashmap when many words have overlapping prefixes. Once the Trie is formed, a DFS would print the distinct words as follows.
//do DFS on the trie
StringBuilder sb = new StringBuilder()
dfs(root, sb)
void dfs(trieNode node, StringBuilder sb):
sb.append(node.c)
if (node.countWord > 0):
print(sb)
for (trieNode child : node.children)
if (child != null):
dfs(child, sb)
sb.length = sb.length -1
Monday, April 26, 2010
Find the distinct words from a text file
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment