Problem
Since XML is very verbose, you are given a way of encoding it where each tag gets mapped to a pre-defined integer value The language/grammar is as follows:
Element –> Element Attr* END Element END [aka, encode the element tag, then its attributes, then tack on an END character, then encode its children, then another end tag]
Attr –> Tag Value [assume all values are strings]
END –> 01
Tag –> some predefined mapping to int
Value –> string value END
Write code to print the encoded version of an xml element (passed in as string).
FOLLOW UP
Is there anything else you could do to (in many cases) compress this even further?
Solution
We all know xml has lots of redendency, of-course I don't want to enter into json vs xml here. Its a compression based algorithm.private Map<String, Byte> tagMap; private static final Byte[] END = { 0, 1 }; private List<String> tokens; private int currentTokenIndex; byte[] encode(char[] input) throws IOException { tokenize(input); currentTokenIndex = 0; ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); encodeTokens(outputStream); return outputStream.toByteArray(); } void encodeTokens(ByteArrayOutputStream output) throws IOException { nextToken("<"); // read tag name String tagName = nextToken(); output.write(getTagCode(tagName)); // read attributes while (!hasNextToken(">") && !hasNextTokens("/", ">")) { // read next attribute String key = nextToken(); nextToken("="); String value = nextToken(); output.write(getTagCode(key)); for (char c : value.toCharArray()) { output.write(c); } output.write(END[0]); output.write(END[1]); } // end of attributes output.write(END[0]); output.write(END[1]); // finish this element if (hasNextTokens("/", ">")) { nextToken("/"); nextToken(">"); } else { nextToken(">"); // while not the end tag while (!hasNextTokens("<", "/")) { encodeTokens(output); // encode child } // ending tag nextToken("<"); nextToken("/"); nextToken(tagName); nextToken(">"); } output.write(END[0]); output.write(END[1]); }
Thanks
References
0 comments:
Post a Comment