Thursday, January 23, 2014

Data Structure to Emulate LRU Cache


Implement the LRU cache


Least Recently Used (LRU) Cache is to discard the least recently used items first How do you design and implement such a cache class? The design requirements are as follows:
1) find the item as fast as we can
2) Once a cache misses and a cache is full, we need to replace the least recently used item as fast as possible.

We use two data structures to implement an LRU Cache :
1. A Queue which is implemented using a doubly linked list. The maximum size of the queue will be equal to the total number of frames available (cache size).
The most recently used pages will be near front end and least recently pages will be near rear end.
2. A Hash with page number as key and address of the corresponding queue node as value.
When a page is referenced, the required page may be in the memory. If it is in the memory, we need to detach the node of the list and bring it to the front of the queue.
If the required page is not in the memory, we bring that in memory. In simple words, we add a new node to the front of the queue and update the corresponding node address in the hash. If the queue is full, i.e. all the frames are full, we remove a node from the rear of queue, and add the new node to the front of queue.
Note: Initially no page is in the memory.

A linked list + hashtable of pointers to the linked list nodes is the usual way to implement LRU caches. This gives O(1) operations (assuming a decent hash). Advantage of this (being O(1)): you can do a multithreaded version by just locking the whole structure. You don't have to worry about granular locking etc.
Briefly, the way it works:
On an access of a value, you move the corresponding node in the linked list to the head.
When you need to remove a value from the cache, you remove from the tail end.
When you add a value to cache, you just place it at the head of the linked list.

Assuming hashmap has retrieval complexity of O(1), we can do this using a doubly link list and a Hashmap

    Doubly link list will be used to store indexes sorted by date/timestamp information.
    Keep track of first and last node.
    No sorting effort is needed as each new node will have latest timestamp.
    The Hashmap will retrieve the node searched by key. It will also contain pointer to its respective doubly link list node.


Here is how we will handle cache operations


    Search the node by key. Retrieve the node value and index for doubly linklist.
    Delete the node from doubly linked list and add it with the updated date value at the starting of the list.
    Complexity of deleting a node in doubly link list is O(1) as we can reach to it without traversing the list.


    Add the new node in the hashmap and the doubly linked list.
    If the cache is full, use the pointer to the last node in the doubly linked list and delete that node. Set the delete pointer appropriately.
    Inserting at the start of linked list is operation with time O(1).


    Update operation will need similar operation as READ in terms of accessing the data structure. Hence, similar time complexities apply.


    Deleting the node from the doubly-linked list and hashmap can be done in O(1).

Java's LinkedHashMap, explained well here -

But, what if the environment is multi-threaded and we want a concurrent solution?



Post a Comment