Byte stream organization with improved random and keyed access to information structures
First Claim
1. A method comprising organizing a byte stream of an information structure, said information structure having a schema and an in-memory representation, said schema having a schema tree representation with a plurality of schema nodes, said schema nodes including at least one leaf and at least one interior node, the step of organizing comprising the steps of:
- computing a layout from the schema tree representation depth-first enumeration of leaf nodes of the schema;
serializing the byte stream from the in-memory representation while grouping together all scalar items from the in-memory representation corresponding to each schema node, wherein the step of serializing the byte stream further comprises the steps of;
retrieving a location in the byte stream for an element of the in-memory representation corresponding to a first schema leaf node in depth first order from the layout;
converting the element to bytes in the byte stream according to a number of elements corresponding to the schema leaf node, storing a result during said converting the element; and
accessing information from the byte stream by using the layout and offset calculations, wherein the step of accessing information further comprises the steps of;
scanning a list of key values representing a table column serialized within the byte stream to determine an index position; and
using the index position in conjunction with offset calculations and offset tables serialized at the start of lists within the byte stream to find information in lists representing non-key table columns.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention improves processing time when accessing information in a byte stream and avoids the step of deserializing unneeded portions of the byte stream when the byte stream encodes an information structure corresponding to a schema with arbitrarily nested lists and tuples. It facilitates efficient keyed access when lists of tuples represent tables with key columns by storing tables in nested column order, which extends the well-known concept of column-order so as to apply to arbitrarily nested tables. Using well-known offset calculation techniques within the nested lists that result from nested column order, the invention achieves greater efficiency by grouping together all scalar information items that correspond to the same node in a tree representation of the schema.
67 Citations
20 Claims
-
1. A method comprising organizing a byte stream of an information structure, said information structure having a schema and an in-memory representation, said schema having a schema tree representation with a plurality of schema nodes, said schema nodes including at least one leaf and at least one interior node, the step of organizing comprising the steps of:
-
computing a layout from the schema tree representation depth-first enumeration of leaf nodes of the schema; serializing the byte stream from the in-memory representation while grouping together all scalar items from the in-memory representation corresponding to each schema node, wherein the step of serializing the byte stream further comprises the steps of; retrieving a location in the byte stream for an element of the in-memory representation corresponding to a first schema leaf node in depth first order from the layout; converting the element to bytes in the byte stream according to a number of elements corresponding to the schema leaf node, storing a result during said converting the element; and accessing information from the byte stream by using the layout and offset calculations, wherein the step of accessing information further comprises the steps of; scanning a list of key values representing a table column serialized within the byte stream to determine an index position; and using the index position in conjunction with offset calculations and offset tables serialized at the start of lists within the byte stream to find information in lists representing non-key table columns. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program product stored in a computer readable storage medium having stored thereon a sequence of instructions which, when executed by a processor, causes the processor to organize a byte stream of an information structure, wherein the computer program product executes the steps of:
-
computing a layout from a schema tree representation by depth-first enumeration of leaf nodes of the schema serializing the byte stream from a in-memory representation while grouping togther all scalar items froms the in-memory representation corresponding to each schema node, wherein the step of serializing the byte stream further comprises the steps of; retrieving a location in the byte stream for an element of the in-memory represention corresponding to a first schema leaf node in depth first order from the layout; comverting the element to bytes in the byte stream according to a number of elements corresponding to the schema leaf node, storing a result during said converting the element; and accessing information from the byte stream by using thelayout and offset calculations, wherein the step of accessing information further comprises the steps of; scanning a list of key values representing a table column serialized within the byte stream to determine an index position; and using the index position in conjunction with offset calculation and offset tables serialized at the start of lists within the byte stream to find information in lists representing non-key table columns.
-
-
12. An apparatus comprising a serializer/deseralizer for a byte stream form of an information structure, said information structure having a schema and an in-memory representation, said schema having a schema tree representation with a plurality of schema nodes, said schema nodes including at least one leaf and at least one interior node, the serializer/deserializer comprising:
-
a processor for computing a layout from the schema tree representation by depth-first enumeration of leaf nodes of the schema; a serializer for serializing the byte stream from the in-memory representation while grouping together all scalar items from the in-memory representation corresponding to each schema node, wherein the serializer further comprises a lookup module to retrieve a location in the byte stream for an element of the in-memory representation corresponding to a first schema leaf node in depth first order from the layout; a converter to convert the element to bytes in the byte stream according to a number of elements corresponding to the schema leaf node, wherein all schema leaf nodes are retrieved and converted in depth-first order, storing a result during said converting the element; and a selective de-serializer for accessing information from the byte stream by using the layout and offset calculations, wherein the selective de-serializer scans a list of key values representing a table column serialized within the byte stream to determine an index position, and uses the index position in conjunction with offset calculations and offset tables serialized at the start of lists within the byte stream to find information in lists representing non-key table columns. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer program product stored in a computer readable storage medium having stored thereon a sequence of instructions which, when executed by a processor, causes the processor to organize a byte stream form of an information structure, wherein the computer program product executes the steps of:
-
computing a layout from a schema tree representation by depth-first enumeration of leaf nodes of the schema; serializing the byte stream from a in-memory representation while grouping together all scalar items from the in-memory representation corresponding to each schema node, wherein the step of serializing the byte stream further comprises the steps of; retrieving a location in the byte stream for an element of the in-memory representation corresponding to a first schema leaf node in depth first order from the layout; converting the element to bytes in the byte stream according to a number of elements corresponding to the schema leaf node, storing a result during said converting the element; and accessing information from the byte stream by using the layout and offset calculations, wherein a selective de-serializer scans a list of key values representing a table column serialized within the byte stream to determine an index position, and using the index position in conjunction with offset calculations and offset tables serialized at the start of lists within the byte stream to find information in lists representing non-key table columns.
-
Specification