Byte stream organization with improved random and keyed access to information structures

US 7,216,127 B2
Filed: 12/13/2003
Issued: 05/08/2007
Est. Priority Date: 12/13/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A method comprising organizing a byte stream of an information structure, said information structure having a schema and an in-memory representation, said schema having a schema tree representation with a plurality of schema nodes, said schema nodes including at least one leaf and at least one interior node, the step of organizing comprising the steps of:

computing a layout from the schema tree representation depth-first enumeration of leaf nodes of the schema;

serializing the byte stream from the in-memory representation while grouping together all scalar items from the in-memory representation corresponding to each schema node, wherein the step of serializing the byte stream further comprises the steps of;

retrieving a location in the byte stream for an element of the in-memory representation corresponding to a first schema leaf node in depth first order from the layout;

converting the element to bytes in the byte stream according to a number of elements corresponding to the schema leaf node, storing a result during said converting the element; and

accessing information from the byte stream by using the layout and offset calculations, wherein the step of accessing information further comprises the steps of;

scanning a list of key values representing a table column serialized within the byte stream to determine an index position; and

using the index position in conjunction with offset calculations and offset tables serialized at the start of lists within the byte stream to find information in lists representing non-key table columns.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention improves processing time when accessing information in a byte stream and avoids the step of deserializing unneeded portions of the byte stream when the byte stream encodes an information structure corresponding to a schema with arbitrarily nested lists and tuples. It facilitates efficient keyed access when lists of tuples represent tables with key columns by storing tables in nested column order, which extends the well-known concept of column-order so as to apply to arbitrarily nested tables. Using well-known offset calculation techniques within the nested lists that result from nested column order, the invention achieves greater efficiency by grouping together all scalar information items that correspond to the same node in a tree representation of the schema.

67 Citations

View as Search Results

20 Claims

1. A method comprising organizing a byte stream of an information structure, said information structure having a schema and an in-memory representation, said schema having a schema tree representation with a plurality of schema nodes, said schema nodes including at least one leaf and at least one interior node, the step of organizing comprising the steps of:
- computing a layout from the schema tree representation depth-first enumeration of leaf nodes of the schema;
  
  serializing the byte stream from the in-memory representation while grouping together all scalar items from the in-memory representation corresponding to each schema node, wherein the step of serializing the byte stream further comprises the steps of;
  
  retrieving a location in the byte stream for an element of the in-memory representation corresponding to a first schema leaf node in depth first order from the layout;
  
  converting the element to bytes in the byte stream according to a number of elements corresponding to the schema leaf node, storing a result during said converting the element; and
  
  accessing information from the byte stream by using the layout and offset calculations, wherein the step of accessing information further comprises the steps of;
  
  scanning a list of key values representing a table column serialized within the byte stream to determine an index position; and
  
  using the index position in conjunction with offset calculations and offset tables serialized at the start of lists within the byte stream to find information in lists representing non-key table columns.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method as recited in claim 1, wherein information structure is a message.
  - 3. The method as recited in claim 1, wherein the step of computing a layout comprises:
    - establishing a fixed length portion of the byte stream, the fixed length portion having a slot for each enumerated schema leaf node; and
      
      establishing a varying length portion of the byte stream following the fixed length portion, the varying length portion having successive areas for any information items requiring varying length encoding.
  - 4. The method as recited in claim 1, wherein the step of computing the layout comprises:
    - establishing a fixed length portion of the byte stream, the fixed length portion having a slot for each enumerated schema leaf node having a predecessor in depth-first numbering requiring varying length encoding; and
      
      establishing a varying length portion of the byte stream following the fixed length portion, the varying length portion having successive areas for said each enumerated schema leaf node.
  - 5. The method as recited in claim 1 wherein the interior nodes of said schema tree representation are restricted to list and tuple nodes, and the leaf nodes comprise scalar types and dynamic types.
  - 6. The method as recited in claim 1, wherein the step of serializing the byte stream comprises:
    - determining a correspondence between the in-memory representation and the schema tree representation;
      
      initializing the byte stream by reserving a fixed length portion and pointing to a beginning of a variable length portion; and
      
      repeating the steps of retrieving and converting for all schema leaf nodes in depth-first order.
  - 7. The method as recited in claim 1, wherein the step of converting elements to bytes comprises recording a nested list of tuples in column order rather than row order, resulting in a set of nested lists.
  - 8. The method as recited in claim 1, wherein the step of converting elements to bytes comprises preceding each list of varying length items with an offset table allowing any element of said each list to be reached in constant time from a head of said each list.
  - 9. The method as recited in claim 1, wherein the schema tree representation is derived from a schema graph representation by truncating recursive definitions and variants and replacing truncated sub-trees with leaf nodes of a dynamic type.
  - 10. The method as recited in claim 1, further comprising performing a preliminary reorganization of the schema to distribute tuples over variants prior to carrying out the steps of computing, serializing and accessing.

11. A computer program product stored in a computer readable storage medium having stored thereon a sequence of instructions which, when executed by a processor, causes the processor to organize a byte stream of an information structure, wherein the computer program product executes the steps of:
- computing a layout from a schema tree representation by depth-first enumeration of leaf nodes of the schemaserializing the byte stream from a in-memory representation while grouping togther all scalar items froms the in-memory representation corresponding to each schema node, wherein the step of serializing the byte stream further comprises the steps of;
  
  retrieving a location in the byte stream for an element of the in-memory represention corresponding to a first schema leaf node in depth first order from the layout;
  
  comverting the element to bytes in the byte stream according to a number of elements corresponding to the schema leaf node, storing a result during said converting the element; and
  
  accessing information from the byte stream by using thelayout and offset calculations, wherein the step of accessing information further comprises the steps of;
  
  scanning a list of key values representing a table column serialized within the byte stream to determine an index position; and
  
  using the index position in conjunction with offset calculation and offset tables serialized at the start of lists within the byte stream to find information in lists representing non-key table columns.

12. An apparatus comprising a serializer/deseralizer for a byte stream form of an information structure, said information structure having a schema and an in-memory representation, said schema having a schema tree representation with a plurality of schema nodes, said schema nodes including at least one leaf and at least one interior node, the serializer/deserializer comprising:
- a processor for computing a layout from the schema tree representation by depth-first enumeration of leaf nodes of the schema;
  
  a serializer for serializing the byte stream from the in-memory representation while grouping together all scalar items from the in-memory representation corresponding to each schema node, wherein the serializer further comprises a lookup module to retrieve a location in the byte stream for an element of the in-memory representation corresponding to a first schema leaf node in depth first order from the layout;
  
  a converter to convert the element to bytes in the byte stream according to a number of elements corresponding to the schema leaf node, wherein all schema leaf nodes are retrieved and converted in depth-first order, storing a result during said converting the element; and
  
  a selective de-serializer for accessing information from the byte stream by using the layout and offset calculations, wherein the selective de-serializer scans a list of key values representing a table column serialized within the byte stream to determine an index position, and uses the index position in conjunction with offset calculations and offset tables serialized at the start of lists within the byte stream to find information in lists representing non-key table columns.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
- - 13. The apparatus as recited in claim 12, wherein the processor comprises:
    - a module for establiabing a fixed length portion of the byte stream, the fixed length portion having a slot for each enumerated schema leaf node; and
      
      for establishing a varying length portion of the byte stream following the fixed length portion, the varying length portion having successive areas for any information items requiring varying length encoding.
  - 14. The apparatus as recited in claim 12, wherein the processor comprises:
    - a module for establishing a fixed length portion of the byte stream, the fixed length portion having a slot for each enumerated schema leaf node having a predecessor in depth-first numbering requiring varying length encoding; and
      
      for establishing a varying length portion of the byte stream following the fixed length portion, the varying length portion having successive areas for said each enumerated schema leaf node.
  - 15. The apparatus as recited in claim 12, wherein the serializer comprises:
    - a reconciling module to determine a correspondence between the in-memory representation and the schema tree representation;
      
      an initialization module to initialize the byte stream by reserving a fixed length portion and pointing to a beginning of a variable length portion.
  - 16. The apparatus as recited in claim 12, wherein the converter comprises a recorder to record a nested list of tuples in column order rather than row order, resulting in a set of nested lists.
  - 17. The apparatus as recited in claim 12, wherein the converter precedes each list of varying length items with an offset table allowing any element of said each list to be reached in constant time from a head of said each list.
  - 18. apparatus as recited in claim 12, wherein the schema tree representation is derived from a schema graph representation by truncating recursive definitions and variants and replacing them with leaf nodes of dynamic type.
  - 19. The apparatus as recited in claim 12, wherein a preliminary reorganization of the schema is performed to distribute tuples over variants prior to carrying out the remaining steps.

20. A computer program product stored in a computer readable storage medium having stored thereon a sequence of instructions which, when executed by a processor, causes the processor to organize a byte stream form of an information structure, wherein the computer program product executes the steps of:
- computing a layout from a schema tree representation by depth-first enumeration of leaf nodes of the schema;
  
  serializing the byte stream from a in-memory representation while grouping together all scalar items from the in-memory representation corresponding to each schema node, wherein the step of serializing the byte stream further comprises the steps of;
  
  retrieving a location in the byte stream for an element of the in-memory representation corresponding to a first schema leaf node in depth first order from the layout;
  
  converting the element to bytes in the byte stream according to a number of elements corresponding to the schema leaf node, storing a result during said converting the element; and
  
  accessing information from the byte stream by using the layout and offset calculations,wherein a selective de-serializer scans a list of key values representing a table column serialized within the byte stream to determine an index position, andusing the index position in conjunction with offset calculations and offset tables serialized at the start of lists within the byte stream to find information in lists representing non-key table columns.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Auerbach, Joshua S.
Primary Examiner(s)
Cottingham; John
Assistant Examiner(s)
PHAM, MICHAEL

Application Number

US10/738,377
Publication Number

US 20050131917A1
Time in Patent Office

1,242 Days
Field of Search

707/100, 707/2, 707/3, 707/102, 707/206, 341/100, 715/502, 711/154, 711/170
US Class Current

707/741
CPC Class Codes

G06F 16/2246   Trees, e.g. B+trees

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99943   Generating database or data...

Y10S 707/99953   Recoverability

Byte stream organization with improved random and keyed access to information structures

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

67 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Byte stream organization with improved random and keyed access to information structures

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

67 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links