Tuesday, December 27, 2011

Finding the Start of a Loop in a Linked List

This updated information has been further expanded upon on my new website. You can find the updated details here: https://k5kc.com/cs/algorithms/linked-list-cycle-2-find-start-of-a-cycle/.

Problem 

Given a circular linked list, implement an algorithm which returns node at the beginning of the loop.
DEFINITION
Circular linked list:
A (corrupt) linked list in which a node’s next pointer points to an earlier node, so as to make a loop in the linked list.
EXAMPLE
input: A -> B -> C -> D -> E -> C [the same C as earlier]
output: C

Example
head -> 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8
                  ^                        |
                  |                        |
                  +------------------------+

Answer here is 3.

Solution
The problem here is 2 fold - First check if there is any loop or not, if the loop is present , we can simply find the loop in O(n) time.

We can find the loop in the linked list via Floyd’s Cycle-Finding Algorithm, explained here.

The algorithm is pretty straightforward:
  1. We start at the beginning of the linked list with two pointers.
  2. The first pointer is incremented through each node of the list. The second pointer moves twice as fast, and skips every other node.
  3. If the linked list contains a loop, these two pointers will eventually meet at the same node, thus indicating that the linked list contains a loop. 
Consider the linked list :
>head -> 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8
                  ^                        |
                  |                        |
                  +------------------------+

Incrementing pointers A at rate of 1 and B at 2, they take on the following values:
A   B
1   6
2   7
3   8
4   8
5   4
6   6


Method 1 - Increment the Hare pointer by loopsize

Now we can see that loop is there, as A and B meet at 6 :
                                 AB (this is where A and B
                                 |               first met).
                                 v
head -> 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8
                  ^                        |
                  |                        |
                  +------------------------+




  • First, advance B and set the loopsize to 1.
  • Second, while A and B are not equal, continue to advance B, increasing the loopsize each time. That gives the size of the loop, six in this case. If the loopsize ends up as 1, you know that you must already be at the start of the loop, so simply return A as the start, and skip the rest of the steps below.
  • Third, simply set both A and B to the first element then advance B exactly loopsize times (to 7 in this case). This gives two pointers that are different by the size of the loop.
  • Lastly, while A and B are not equal, you advance them together. Since they remain exactly loopsize elements apart from each other at all times, A will enter the loop at exactly the same time as B returns to the start of the loop. You can see that with the following walkthrough:
    • loopsize is evaluated as 6
    • set both A and B to 1
    • advance B by loopsize elements to 7
    • 1 and 7 aren't equal so advance both
    • 2 and 8 aren't equal so advance both
    • 3 and 3 are equal so that is your loop start
Now, since each those operations are O(n) and performed sequentially, the whole thing is O(n^2).

Method 2 - Iterate from first pointer and see if nodes are reachable
We know that Floyd’s Cycle detection algorithm terminates when fast and slow pointers meet at a common point. We also know that this common point is one of the loop nodes (2 or 3 or 4 or 5 in the above diagram). We store the address of this in a pointer variable say ptr2. Then we start from the head of the Linked List and check for nodes one by one if they are reachable from ptr2. When we find a node that is reachable, we know that this node is the starting node of the loop in Linked List and we can get pointer to the previous of this node.

Method 3  - Efficient O(n) solution

Once a loop has been found, the following additional steps will give us the starting node of the loop:
    • Once a cycle has been detected, let B remain pointing to the element where the loop for the step above terminated but reset A so that it's pointing back to the head of the list. 
    • Now, move each pointer one element at a time. Since B began inside the loop, it will continue looping. After k steps (equal to the distance of the start of the loop from the head of the list), A and B will meet again. This will give you a reference to the start of the loop.
    To sum it up:
    This method is also dependent on Floyd’s Cycle detection algorithm.
    1. Detect Loop using Floyd’s Cycle detection algo and get the pointer to a loop node.
    2. Count the number of nodes in loop. Let the count be k.
    3. Fix one pointer to the head and another to kth node from head.
    4. Move both pointers at the same pace, they will meet at loop starting node.
    5. Get pointer to the last node of loop and make next of it as NULL.

      So here once loop is detected :
                                       AB (this is where A and B
                                       |               first met).
                                       v
      head -> 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8
                        ^                        |
                        |                        |
                        +------------------------+
      We move A to start. Now we increment each by 1 till they meet:


      A   B
      1   6
      2   7
      3   8
      4   3
      5   4
      6   6

      This algorithm isn’t too difficult compared to the algorithm for detecting a loop. However, the mental model seems a bit trickier. Why and how does it always find the start of the loop?

      How does the Algorithm work? An intuitive explanation:
      Here’s some explanation which would hopefully help you intuitively understand why the algorithm works, without going into a lot of mathematical detail.
      First, meeting point of two pointers in a loop
      Consider two pointers: a slow pointer S that increments by one node at each step, and a fast pointer F that increments by two nodes at each step (i.e. it is twice as fast as S). Both pointers start at the same time from the beginning of an n-node loop. In the time S covers n nodes. F will have covered 2n nodes and they will both meet at the start of the loop.
      Now, let us say that the slow pointer S starts at the beginning of the loop, and the fast pointer F starts at node k (where k < n) of the loop. As these two pointers move along the loop, they will meet at node (n-x).
      What we really need to do is figure out x, as it will give us the node at which the two pointers meet inside the loop.
      1. When S takes n/2 steps, it will be at node n/2. During the same time, F will have taken 2(n/2) = n steps, and it will be at node (k+n). Since the we are inside a loop, F will be effectively back at node k.
      2. In order for the two pointers to meet at node (n-x), S needs to take a further (n-x-n/2)=(n/2-x) steps and it will end up at node n-x. During the same time, F will have taken 2*(n/2-x)=(n-2x) steps and will be at node k+(n-2x). Given our assumption that both S and F meet at the same node:
      n-x = k+n-2x
      =>   x = k
      This means that if S starts from the start of the loop, and F starts k nodes into the loop, both of them will meet at node (n-k), i.e k nodes from the end of the loop. This is a key insight.

      Circular Linked List
      Now, coming back to the linked list that contains a loop. Suppose the start of the loop is m (e.g. m=3) nodes from the start of the linked list. Both S and F start at the beginning of the linked list [Figure-1].


      By the time S gets to node m (i.e. start of loop), F will be at node 2m [Figure-2]. This means that S will be at the start of the loop and F will be m nodes *into the loop*.

      Based on the discussion above, we already know that if S begins from the start of the loop and F starts from node m, they will meet m nodes from the end of the loop (i.e. the orange-node in [Figure-3]).

      At this point, keep the pointer F at the orange-node where the two pointers met (i.e. m-nodes from the start of the loop), and move the pointer S to the beginning of the linked list [Figure-4]. Now, if we increment both S and F *one node at a time*, it is obvious that they will meet at ‘Node-m’ (red-node) of the list, which is the start of the loop.

      Source code:
      /**
       * Returns the node at the start of a loop in the given circular linked
       * list. A circular list is one in which a node's next pointer points
       * to an earlier node, so as to make a loop in the linked list. For
       * instance:
       *
       * input: A -> B -> C -> D -> E -> C [the same C as earlier]
       * output: C
       *
       *
       * @param linkedList
       *            list to be tested
       * @return the node at the start of the loop if there is a loop, null
       * otherwise
       */
      public static IntegerNode findLoopStart(LinkedList linkedList) {
       if (linkedList == null || linkedList.getHead() == null) {
        return null;
       }
      
       IntegerNode loopStartNode = null;
       IntegerNode slow = linkedList.getHead();
       IntegerNode fast = linkedList.getHead();
      
       while (slow != null && fast != null) {
        slow = slow.getNext();
        if (fast.getNext() == null) {
         loopStartNode = null;
         break;
        }
        fast = fast.getNext().getNext();
      
        // If slow and fast point to the same node, it means that the
        // linkedList contains a loop.
        if (slow == fast) {
      
         slow = linkedList.getHead();
      
         while (slow != fast) {
          // Keep incrementing the two pointers until they both
          // meet again. When this happens, both the pointers will
          // point to the beginning of the loop
          slow = slow.getNext(); // Can't be null, as we have a loop
          fast = fast.getNext(); // Can't be null, as we have a loop
         }
      
         loopStartNode = slow;
         break;
        }
       }
      
       return loopStartNode;
      }
      

      Thanks. Some related problem to this will be delete this loop, which you can see here.

      References

      2 comments: