Thursday, February 13, 2014

Determine if a small tree T2 is the subtree of a huge tree T1

You have two very large binary trees: T1, with millions of nodes, and T2, with hundreds of nodes. Create an algorithm to decide if T2 is a subtree of T1.

Solution
Method 1 - Traversing through T1 and finding subtree at each node, if T1's node = T2.root
Here's what we will do:

• Traverse T1.
• If the current node is equal to the root node of T2, traverse both of the trees (T2 and the current subtree of T1) at the same time.
• Compare the current node. If they are always equal, T2 is a subtree of T1.
```public static boolean isSubtree(BinaryTreeNode<Integer> T1,
BinaryTreeNode<Integer> T2) {
if (T1 == null)
return false;
else if (T1.data.equals(T2.data) && isTwoTreeEqual(T1, T2)) {
return true;
} else {
return isSubtree(T1.left, T2) || isSubtree(T1.right, T2);
}
}

public static boolean isTwoTreeEqual(BinaryTreeNode<Integer> root1,
BinaryTreeNode<Integer> root2) {
if (root1 == null && root2 == null)
return true;
else if (root1.isLeaf() && root2.isLeaf())
return root1.data.equals(root2.data);
else if (!root1.isLeaf() && !root2.isLeaf()) {
if (!root1.data.equals(root2.data))
return false;
return isTwoTreeEqual(root1.left, root2.left)
&& isTwoTreeEqual(root1.right, root2.right);
} else
return false;
}
```

Though this is good, but we may run out of memory stack, as we are recursing in 2 functions, and isTwoTreeEqual can be improved.
```public static boolean containsTree(BinaryTreeNode<Integer> T1,
BinaryTreeNode<Integer> T2) {
if (T2 == null)
return true; // The empty tree is always a subtree
else
return subTree(T1, T2);
}

private static boolean subTree(BinaryTreeNode<Integer> r1,
BinaryTreeNode<Integer> r2) {
if (r1 == null)
return false; // big tree empty & subtree still not found.
if (r1.data == r2.data) {
if (matchTree(r1, r2))
return true;
}
return (subTree(r1.left, r2) || subTree(r1.right, r2));
}

private static boolean matchTree(BinaryTreeNode<Integer> r1,
BinaryTreeNode<Integer> r2) {
if (r2 == null && r1 == null)
return true; // nothing left in the subtree
if (r1 == null || r2 == null)
return false; // big tree empty & subtree still not found
if (r1.data != r2.data)
return false; // data doesn’t match
return (matchTree(r1.left, r2.left) && matchTree(
r1.right, r2.right));
}
```

Time complexity -O(n+m), where n is size of T1, and m of T2
If k is the number of occurrences of T2’s root in T1, the worst case runtime can be characterized as O(n + k * m).

Method 2 -Pre process the tree T1
1. Pre-process the T1 tree, keeping the list of all possible subtree roots (the cache list will have a millions of entries)
2. Sort the cached list of roots by the descending order of data, kept in root. You may choose more elegant sorting function, for example, parse a character tree into string.
3. Use binary search to locate the necessary sub-tree. You may quickly reject subtrees, with the number of nodes, not equal to the T2 number of nodes (or with the different depth).
Note that steps 1) and 2) must be done only once, then you can test many sub-tree candidates, using the same preprocessed result.

Reference - stackoverflow