Encoding of hierarchically organized data for efficient storage and processing
First Claim
1. A method for encoding XML data, the XML data comprising a plurality of tags in a hierarchy, the method comprising:
- encoding said XML data to produce an encoded representation of said XML data that does not include an encoded representation of a particular tag of the plurality of tags;
wherein encoding the XML data comprises;
determining whether a schema that corresponds to the XML data constrains the particular tag within the hierarchy to a particular hierarchical level and a particular order within the level, andin response to determining that the particular tag is constrained within the hierarchy to the particular hierarchical level and the particular order within the level, omitting an encoded representation of the particular tag from the encoded representation of said XML data;
wherein a constraint specified in the schema associated with the XML data specifies that a particular parent tag in the hierarchy must have only one reference in the hierarchy to a particular child tag followed by only one reference in the hierarchy to the particular tag;
wherein the encoded representation of said XML data;
includes an encoded representation the particular parent tag,does not include an encoded representation of the particular child tag, anddoes not include an encoded representation of the particular tag;
wherein the method is performed by one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
A compact binary encoding technique for information that is logically hierarchically structured, such as XML data, maintains all of the features of XML data in a useable form, such as the hierarchical structure underlying the data. Hence, data encoded in this format can undergo XML-based processing on-the-fly as it is being received or fetched, as if the data was being processed linearly in its textual character-based format. Processing of data encoded in this format can begin without having to wait for and decode the entire data set. The overhead due to XML tags is significantly minimized. The encoded data can be processed more efficiently because the data is pre-parsed. Values may be stored in their native type formats and, therefore, processing of the encoded data avoids costly type conversions. Further, any available structural constraint information can be effectively exploited.
84 Citations
9 Claims
-
1. A method for encoding XML data, the XML data comprising a plurality of tags in a hierarchy, the method comprising:
-
encoding said XML data to produce an encoded representation of said XML data that does not include an encoded representation of a particular tag of the plurality of tags; wherein encoding the XML data comprises; determining whether a schema that corresponds to the XML data constrains the particular tag within the hierarchy to a particular hierarchical level and a particular order within the level, and in response to determining that the particular tag is constrained within the hierarchy to the particular hierarchical level and the particular order within the level, omitting an encoded representation of the particular tag from the encoded representation of said XML data; wherein a constraint specified in the schema associated with the XML data specifies that a particular parent tag in the hierarchy must have only one reference in the hierarchy to a particular child tag followed by only one reference in the hierarchy to the particular tag; wherein the encoded representation of said XML data; includes an encoded representation the particular parent tag, does not include an encoded representation of the particular child tag, and does not include an encoded representation of the particular tag; wherein the method is performed by one or more computing devices.
-
-
2. A method for encoding XML data, the XML data comprising a plurality of tags in a hierarchy, the method comprising:
-
encoding the XML data to produce an encoded representation of said XML data; wherein encoding the XML data includes; determining whether the plurality of tags comprises a set of consecutive tags that are repeats of each other; in response to determining that the plurality of tags comprises the set of consecutive tags that are repeats of each other, using, in the encoded representation, a first opcode and associated operand to represent the first tag in the set of consecutive tags, and using, in the encoded representation, a second opcode without an operand to represent all of the second tag and any subsequent tags in the set of consecutive tags; wherein the second opcode indicates that all of the second tag and any subsequent tags are repeats of the first tag; wherein the encoded representation, other than the second opcode, does not include any representation of any of the second and any subsequent tags of the set of consecutive tags; wherein the method is performed by one or more computing devices. - View Dependent Claims (8, 9)
-
-
3. A volatile or non-volatile machine-readable non-transitory storage medium storing one or more sequences of instructions for encoding XML data, the XML data comprising one or more tags in a hierarchy, said instructions, when executed by one or more processors, causes the one or more processors to perform:
-
encoding said XML data to produce an encoded representation of said XML data that does not include an encoded representation of a particular tag of the plurality of tags; wherein encoding the XML data comprises; determining whether a schema that corresponds to the XML data constrains the particular tag within the hierarchy to a particular hierarchical level and a particular order within the level, and in response to determining that the particular tag is constrained within the hierarchy to the particular hierarchical level and the particular order within the level, omitting an encoded representation of the particular tag from the encoded representation of said XML data; wherein a constraint specified in the schema associated with the XML data specifies that a particular parent to in the hierarchy must have only one reference in the hierarchy to a particular child tag followed by only one reference in the hierarchy to the particular tag; wherein the encoded representation of said XML data; includes an encoded representation the particular parent tag, does not include an encoded representation of the particular child tag, and does not include an encoded representation of the particular tag. - View Dependent Claims (4)
-
-
5. A volatile or non-volatile machine-readable non-transitory storage medium storing one or more sequences of instructions for encoding XML data, the XML data comprising a plurality of tags in a hierarchy, said instructions, when executed by one or more processors, causes the one or more processors to perform:
-
encoding the XML data to produce an encoded representation of said XML data;
wherein encoding the XML data includes;determining whether the plurality of tags comprises a set of consecutive tags that are repeats of each other; in response to determining that the plurality of tags comprises the set of consecutive tags that are repeats of each other, using, in the encoded representation, a first opcode and associated operand to represent the first tag in the set of consecutive tags, and using, in the encoded representation, a second opcode without an operand to represent all of the second tag and any subsequent tags in the set of consecutive tags; wherein the second opcode indicates that all of the second tag and any subsequent tags are repeats of the first tag; wherein the encoded representation, other than the second opcode, does not include any representation of any of the second and any subsequent tags of the set of consecutive tags. - View Dependent Claims (6, 7)
-
Specification