Wednesday, July 23, 2014

Search a long string for small strings in an array

Problem

Given a string s and an array of smaller strings T, design a method to search s for each small string in T.

Solution

Method 1 - Using suffix tree
We can first get all the suffices of s and check for each element  t in T to see if t is the beginning part of any suffices. To make this more efficient, we can build a suffix tree and do the operations.

For example, s = “mississippi”; T = { “is”, “sip”, “hi”, “sis” }.
The suffices of s
[0] “mississippi”
[1] “ississippi”
[2] “ssissippi”
[3] “sissippi”
[4] “issippi”
[5] “ssippi”
[6] “sippi”
[7] “ippi”
[8] “ppi”
[9] “pi”
[10] “i”

Then for “is” in T, we see it sits in the beginning of suffices [1] and [4]. for “sip”, we see it in [6]. We do not see any “hi” in the suffices but we do see “sis” in [3].

Java code
public static class SuffixTree {
    SuffixTreeNode root = new SuffixTreeNode();
 
    public SuffixTree(String s) {
        for (int i = 0; i < s.length(); ++i) {
            root.insert(s.substring(i), i);
        }
    }
 
    public List<Integer> getIndexes(String s) {
        return root.getIndices(s);
    }
}
 
public static class SuffixTreeNode {
    private char c;
    private List<Integer> indices = new ArrayList<Integer>();;
    private Map<Character, SuffixTreeNode> children = 
        new HashMap<Character, SuffixTreeNode>();
 
    public void insert(String s, int index) {
        indices.add(index);
        if (s != null && s.length() > 0) {
            char character = s.charAt(0);
            if (children.keySet().contains(character)) {
                children.get(character).insert(
                        s.substring(1), index);
            } else {
                SuffixTreeNode child = new SuffixTreeNode();
                children.put(character, child);
                child.insert(s.substring(1), index);
            }
        }
    }
 
    public List<Integer> getIndices(String s) {
        if (s == null || s.length() == 0)
            return indices;
        else {
            char character = s.charAt(0);
            if (children.containsKey(character))
                return children.get(character).getIndices(
                        s.substring(1));
            else
                return null;
        }
    }
}

Thanks.

References

0 comments:

Post a Comment