Friday, July 25, 2014

Skiplists

Goal
To understand skiplist.
Definition
Skiplist is a dynamic search structure, as it is dynamic when it comes to insertion and deletion, which supports search).
Other dynamic search structure:
• Treaps
• Red black tree
• B-Trees (2-3 trees etc)
Operations on skiplist
• Search - O(log n)
• Insert - O(log n)
• Delete - O(log n)
These operations are with high probability, 1/polynomial depending on polynomial number of operation.

Understanding SkipList - Can we search in a sorted linked list in better than O(n) time?
The worst case search time for a sorted linked list is O(n) as we can only linearly traverse the list and cannot skip nodes while searching. For a Balanced Binary Search Tree, we skip almost half of the nodes after one comparison with root. For a sorted array, we have random access and we can apply Binary Search on arrays.

Can we augment sorted linked lists to make the search faster? The answer is Skip List. The idea is simple, we create multiple layers so that we can skip some nodes.
Suppose we are on the metro station, and have to go from station 1 to 16. Now, to reach station 4. How will the train go, it will go from 1 to 2, 2 to 3, 3 to 4. Ohh..finally you have reached. But you don't have to sit in a train which stops at each station.

Now, suppose an express lane is build, such that train steps at every alternate station:
Now, for reaching from 1 to 4, you have to sit in express metro, and you will reach in 2 stops. But what of a person who wants to go to station 12, there will be still many stops.

Likewise we can still skip to every 8th node, and finally 16th node:

Understanding the Operations

Search(x)

So, how can we make search better. We can discuss about insertion and deletion later.

How can we make a search in linked list better?
• I will have to sort the linked list
• More links - I can add pointers to skip some of the nodes in the list.
• Have 2 lists rather than 1, and have pointers between them.
Algorithm
```search(x)
{
step 1 :
Move in the top most list L1(fastest skiplist)
to the right until it is too far
or you found the element
step 2 :
If you find data in L1 its good, otherwise go to list L2 and repeat step 1

}
```

Example (as borrowed from here)
Distribution of elements
Now lets discuss which elements should go in L1 (or S2 above). By L1, it means the first skiplist will more bigger skips. In subway system we know, we put popular stations, and we are done. But, here we don't know, so we have to distribute uniformly, i.e. we have to minimize the search time for all the nodes, whether it is in L1, L2 or any where.

Suppose, we know that size of L1 is fixed, i.e. we can afford only these many subway stops. So, to start with we have to distrbute the data unifromly.

What is the cost of search then?
```cost of search = (length of L1 ) + length of L2  / length of L1 = |L1| + |L2| / |L1|
= |L1| + n/|L1|
```

We want to have cost of search low, so from 1st operand, we would like to have |L1| to be low and from 2nd operand we would like to have |L1| large. To minimize it we should have |L1| = |L2|/|L1|
|L2| is equivalent to elements in skiplist as it is not express list.

|L1| = n / |L1|
|L1| = n ^ 1/2 =  √n

Search time will now be 2
But, is this search good enough? Nope, square root of n is not that good.
So, we will add more of express lists. Now, the search time will be equivalent to |L1| = n ^ 1/3 and search will be like 3 * n ^ 1/3 = 3 * & #8731;n

So if we have kth skiplist we will have k * n ^1/k

So, what should be k's value? We should have k = log n, as we want to have equivalent of binary search.

So, search operation = log n * n ^ 1/log n
Also, a ^ (log base 2 of a) = 2
So, search operation takes 2 * log n

Insertion

How do we balance the elements ratio of 2? Here is basic idea:
```We add all the elements in the lowest list. Then we:
flip a fair coin
if it is head, move the element again and flip again.
Continue this process until you reach the highest list or you get the tail.
```

But we want to start with proper element and list may look very bad if we start from any random location. So, our list will always have -infinity in all the lists. Here are basic steps

1 Find the node as in search. This will return the node that will be to the left of the new node (L).    a) While dropping down to the lower levels maintain the nodes in an update List.
2 Create the new node with a random level by flipping the coin.
```//assuming random.nextDouble() less than 0.5 means heads
private int getRandomLevel()
{
int ret = 1;
while(random.nextDouble() < 0.5 && ret < nMaxLevel++)
ret ++ ;
return ret;
}
```
3 Adjust the pointers from the existing L node to the new node.If the new level is greater than the existing level of the list, then the head of the list should be adjusted to point to the new node at the new level.
4 Set the next pointers of the new node to next pointers of L 5 Set next pointers of L to the new node at that level. This is done for each node where level was changed in step 1.

Deletion

Its not fancy. Search the element and delete it, my changing the pointers.

Java code

This is how the skipnode element should like:

```class SkipNode
{
int element;
SkipNode right;
SkipNode down;

/* Constructor */

public SkipNode(int x)
{
this(x, null, null);
}

/* Constructor */
public SkipNode(int x, SkipNode rt, SkipNode dt)
{
element = x;
right = rt;
down = dt;
}
}
```

This was good for explanation, but why right our own code if Java already have it.

Java 1.6 (Java SE 6) introduced ConcurrentSkipListSet and ConcurrentSkipListMap to the collections framework. So, I'd speculate that someone out there is really using them.
Skiplists tend to offer far less contention for locks in a multithreaded situation, and (probabilistically) have performance characteristics similar to trees.
See the original paper [pdf] by William Pugh
References