Tuesday, April 1, 2014

Sorting 10 GB file of integer with limited ram.

I have only x GB RAM in computer. I have a text file of 10 GB data.This file contains numbers. x is less than 10. How will I sort them?
Generic algorithm
Here is the generic algorithm:
  • split the file into parts (buffers) that you can sort in-memory
  • then when all buffers are sorted take 2 (or more) at the time and merge them (like merge sort) until there's only 1 buffer remaining which will be the sorted file

Example x = 4
Given a 10GB file of integers and a RAM that can hold only 4GB values, how would you sort the integers in the file. Here is how we will continue:

  • Divide the file in 3 parts(3GB+3GB+4GB,(each part must be less than 4GB )) and sort each chunk using any O(nlogn) sorting algo(as it is less than 4GB we can apply any in-memory sorting algo). 
  • Once each chunk of file sorted, bring 1GB of each chunk in memory, so it'll occupy 3GB(1+1+1), now sort this 3GB data(by 3-way merge sort),
  • Note that we still have 1 GB ram left. Now, we select a minimum number of the sorted elements,add that in remaining 1GB, and while selecting this number, bring a number from corresponding chunk to replace it, finally write sorted 1GB to secondary storage. So, we have functioning of minHeap.
Dry running the algo
let say we divide file in three chunk A,B,C and after sorting individully, content of these parts is like: A{12,18,20,25,33,35,37},B{8,13,14,40,41,45,47},C{2,15,50,70,72,75,78}.
Now suppose in memory, we have {12,18,20,25} {8,13,14,40} {2,15,50,70} respectively from A,B,C. now we'll select 2 from C's part as its minimum, so put this in remaining 1GB and replace 2 in memory by an element of chunk C. i.e. insert 72 in C's part in memory.. next replace 8 by 41 and so on.. we are maintaining a min heap.




Post a Comment