// word_len_histo.cpp : reads words and lists distribution // of word lengths. // Fred Swartz, 2002-09-01 // This would be nice to turn into an OO program, where // a class represented a distribution of values. // Some elements which are globals here would turn into // private member elements in the class (eg, valueCount). //--- includes #include <iostream> #include <iomanip> #include <cctype> using namespace std; //--- prototypes void countValue(int cnt); float getAverage(); //--- constants const int BINS = 21; // how many numbers can be counted //--- globals int valueCount[BINS]; // bins used for counting each number int totalChars = 0; // total number of characters //=========================================================== main int main() { char c; // input character int wordLen = 0; // 0 if not in word, else word length //--- Initialize counts to zero for (int i=0; ivalueCount[i] = 0; } //--- Read chars in loop and decide if in a word or not. while (cin.get(c)) { if (isalpha(c)) { // letters are in words, so wordLen++; // add one to the word length } else { countValue(wordLen); // end of word wordLen = 0; // not in a word, set to zero } } countValue(wordLen); // necessary if word ended in EOF //--- print the number of words of each length cout << "Why does this line disappear?" << endl; cout << "Word length Frequency" << endl; for (int j=1; j cout << setw(6) << right << j << " " << setw(8) << right << valueCount[j] << endl; } //--- print average length cout << "\nAverage word length: " << getAverage() << endl; return 0; }//end main //==================================================== countValue void countValue(int cnt) { if (cnt > 0) { // this must be the end of a word if (cnt > 20) { cnt = 20; // longer than 20 counts as 20 } valueCount[cnt]++; // count in correct bin } totalChars += cnt; }//end countWord //==================================================== getAverage float getAverage() { int totalCount = 0; for (int i=0; i totalCount += valueCount[i]; } if (totalCount > 0) { return (float)totalChars/totalCount; } else { return 0.0; } }//end getAverage
Alternative approach
Suppose that we have a very short paragraph like this "Roses are red. Violets are blue. This verse doesn't rhyme. And neither does this one," how can we find and save all different word lengths and their frequencies of occurrence? For example, "red" is 3 letter long and there are a total of five letters of this length (are, red, are, and, one) in the paragraph. Then one of the items in our cache should be 3 and 5
Just like other problems where we have to keep track of the number of occurrences, we should use a hash table. The algorithm is like this:
public HashMapExplanation:we just loop through the words in the paragraph. For each word, we check to see if its length is already in the hash table. Therefore, there are two cases. 1) If the word's length is already in the table, we increase the frequency count by one. 2) Otherwise, we hash the length into the table and set the frequency to 1 because this is the first time this length is hashed into the table.countWordLengthFrequency(String[] paragraph) { HashMap frequencyTable = new HashMap (); for (int i = 0; i < paragraph.length; i++) { if (!frequencyTable.containsKey(paragraph[i].length())) frequencyTable.put(paragraph[i].length(), 1); else { Integer count = frequencyTable.get(paragraph[i].length()) + 1; frequencyTable.put(paragraph[i].length(), count); } } return frequencyTable; }
Obviously, the time complexity is O(n) because we have to check every word in the paragraph. Moreover, in the worst case, the space complexity is O(n). That's when every word in the paragraph has a different length than the other words. If you know any better algorithm for this problem
0 comments:
Post a Comment