Tuesday, April 1, 2014

Find duplicate elements in the array with 4KB memory only

Problem
You have an array with all the numbers from 1 to N, where N is at most 32,000. The array may have duplicate entries and you do not know what N is. With only 4KB of memory available, how would you print all duplicate elements in the array?
Solution
4KB = 32768 bits.

We can use a byte array to represent if we have seen number i. If so, we make it 1. If we encounter a duplicate, we would have marked the particular position with 1, so we would know we have a duplicate.
We have 4KB of memory which means we can address up to 8 * 4 * (2^10) bits. Note that 32*(2^10) bits is greater than 32000. We can create a bit vector with 32000 bits, where each bit represents one integer.

Then we print it out.
public static void findDuplicates(int[] array) {
    byte[] bitVector = new byte[32000 / 8];
    for (int number : array) {
        int posInArray = number >> 3;
        int posInBit = number % 8;
        if ((bitVector[posInArray] & (1 << posInBit)) > 0)
            // found duplicates
            System.out.println(number);
        else
            bitVector[posInArray] |= (1 << posInBit);
    }
}

Lets use a cleaner approach, by using BitSet in java. Note that bit set starts with 0, but number starts with 1.

public static void checkDuplicates(int[] array) {
    BitSet bs = new BitSet(32000);
    for (int i = 0; i < array.length; i++) {
        int num = array[i];
        int num0 = num - 1; // bitset starts at 0, numbers start at 1
        if (bs.get(num0)) {
            System.out.println(num);
        } else {
            bs.set(num0);
        }
    }
}



0 comments:

Post a Comment