Tuesday, April 1, 2014

Find duplicate elements in the array with 4KB memory only

This updated information has been further expanded upon on my new website. You can find the updated details here: https://k5kc.com/cs/algorithms/find-duplicate-elements-in-the-array-with-4kb-memory-only/.
Problem
You have an array with all the numbers from 1 to N, where N is at most 32,000. The array may have duplicate entries and you do not know what N is. With only 4KB of memory available, how would you print all duplicate elements in the array?
Solution
4KB = 32768 bits.

We can use a byte array to represent if we have seen number i. If so, we make it 1. If we encounter a duplicate, we would have marked the particular position with 1, so we would know we have a duplicate.
We have 4KB of memory which means we can address up to 8 * 4 * (2^10) bits. Note that 32*(2^10) bits is greater than 32000. We can create a bit vector with 32000 bits, where each bit represents one integer.

Then we print it out.
public static void findDuplicates(int[] array) {
    byte[] bitVector = new byte[32000 / 8];
    for (int number : array) {
        int posInArray = number >> 3;
        int posInBit = number % 8;
        if ((bitVector[posInArray] & (1 << posInBit)) > 0)
            // found duplicates
            System.out.println(number);
        else
            bitVector[posInArray] |= (1 << posInBit);
    }
}

Lets use a cleaner approach, by using BitSet in java. Note that bit set starts with 0, but number starts with 1.

public static void checkDuplicates(int[] array) {
    BitSet bs = new BitSet(32000);
    for (int i = 0; i < array.length; i++) {
        int num = array[i];
        int num0 = num - 1; // bitset starts at 0, numbers start at 1
        if (bs.get(num0)) {
            System.out.println(num);
        } else {
            bs.set(num0);
        }
    }
}



0 comments:

Post a Comment