Document fidelity with binary XML storage
First Claim
1. A method comprising:
- receiving Extensible Markup Language (XML) data that is not encoded in a binary format;
in response to receiving the XML data, using a binary encoding technique to encode the XML data in a binary format to create a binary version of the XML data that is lexically equivalent to the XML data, wherein each character in the XML data is represented in the binary version;
wherein using a binary encoding technique comprises performing at least one of;
encoding one or more unnecessary whitespace characters that are in a particular element tag in the XML data,encoding a carriage return character and a line feed character that appear consecutively in the XML data, orencoding an empty XML element that comprises an opening tag and a closing tag by encoding the opening tag and the closing tag in the binary format;
storing the binary version of the XML data in persistent storage;
wherein the method is performed by one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are provided for ensuring lexical fidelity when an XML document is stored in a binary format. Operations, on the XML data, that would cause the loss of lexical fidelity between the original XML document and the binary-encoded version of the XML document are not performed. Such operations include the removal of unnecessary whitespace characters, certain data type conversions, CRLF normalization, the “collapsing” of two-tag empty elements into a single tag empty element, and the replacing of entity references or numeric character references with another value. An XML schema, to which the XML document conforms, may indicate that the XML document is to be stored in a lexical fidelity mode. Additionally, or alternatively, the database statement that (when executed) causes the XML document to be stored in a binary format may so indicate.
54 Citations
26 Claims
-
1. A method comprising:
-
receiving Extensible Markup Language (XML) data that is not encoded in a binary format; in response to receiving the XML data, using a binary encoding technique to encode the XML data in a binary format to create a binary version of the XML data that is lexically equivalent to the XML data, wherein each character in the XML data is represented in the binary version; wherein using a binary encoding technique comprises performing at least one of; encoding one or more unnecessary whitespace characters that are in a particular element tag in the XML data, encoding a carriage return character and a line feed character that appear consecutively in the XML data, or encoding an empty XML element that comprises an opening tag and a closing tag by encoding the opening tag and the closing tag in the binary format; storing the binary version of the XML data in persistent storage; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method comprising:
-
determining whether a lexical fidelity mode for binary storage has been selected for an XML document that is to be stored in a binary format, wherein if the lexical fidelity mode has been selected for a particular XML document, then each character in the particular XML document must be represented in a binary-encoded representation of the particular XML document; if the lexical fidelity mode has been selected for the XML document, then using a first binary encoding technique to generate an encoded representation of the XML document, that represents all characters in the XML document; and if the lexical fidelity mode has not been selected for the XML document, then using a second binary encoding technique to generate an encoded representation of the XML document, that does not represent all characters in the XML document, wherein the second binary encoding technique is different than the first binary encoding technique; wherein the method is performed by one or more computing devices.
-
-
14. One or more storage media storing instructions which, when executed by one or more processors, causes:
-
receiving Extensible Markup Language (XML) data that is not encoded in a binary format; in response to receiving the XML data, using a binary encoding technique to encode the XML data in a binary format to create a binary version of the XML data that is lexically equivalent to the XML data, wherein each character in the XML data is represented in the binary version; wherein using a binary encoding technique comprises performing at least one of; encoding one or more unnecessary whitespace characters that are in a particular element tag in the XML data, encoding a carriage return character and a line feed character that appear consecutively in the XML data, or encoding an empty XML element that comprises an opening tag and a closing tag by encoding the opening tag and the closing tag in the binary format; storing the binary version of the XML data in persistent storage. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. One or more storage media storing instructions which, when executed by one or more processors, cause:
-
determining whether a lexical fidelity mode for binary storage has been selected for an XML document that is to be stored in a binary format, wherein if the lexical fidelity mode has been selected for a particular XML document, then each character in the particular XML document must be represented in a binary-encoded representation of the particular XML document; if the lexical fidelity mode has been selected for the XML document, then using a first binary encoding technique to generate an encoded representation of the XML document, that represents all characters in the XML document; and if the lexical fidelity mode has not been selected for the XML document, then using a second binary encoding technique to generate an encoded representation of the XML document, that does not represent all characters in the XML document, wherein the second binary encoding technique is different than the first binary encoding technique; wherein the method is performed by one or more computing devices.
-
Specification