Scalable storage and processing of hierarchical documents
First Claim
1. A method, implemented at least in part on a computing device, for processing a data stream embodying a hierarchically structured document, said method comprising:
- querying the hierarchical structure of the document embodied in the data stream;
determining an offset of the data stream, the offset determined during the querying, and the offset including one or more bits;
partitioning said data stream into respective fixed length segments utilizing said queried hierarchical structure and the offset to determine a respective length of each fixed length segment;
processing said fixed length segments in a pipeline fashion, the processing the fixed length segments including decoding the fixed length segments;
parsing the decoded fixed length segments;
partitioning the parsed fixed length segments into fragments, the fragments having at least one size, the at least one fragment size being cached in a memory of the computing device;
inserting database persistence boundaries between the fragments;
storing the fragments in a storage medium, the storage medium including the database and the at least one fragment size being determined in accordance with characteristics of the database, the characteristics including a native unit for holding data;
creating a database table, the table comprising;
meta data associated with the document;
queries over the document and respective results;
a first fragment of the document;
sizes of all fragments of the document other than the first fragment; and
storing the database table in the database.
1 Assignment
0 Petitions
Accused Products
Abstract
Large messages in the form of hierarchically structured documents are processed in a streaming fashion using the ultimate consumer read requests as the driving force for the processing. The messages are partitioned into fixed length segments. The segments are processed in pipeline fashion. This processing chain includes simulating random access of hierarchical documents using stream transformations, mapping streams to a transport'"'"'s native capabilities, composing streams into chains and using pipeline processing on the chains, staging fragments into a database and routing messages when complete messages have been formed, and providing tools to allow the end user to inspect partial messages.
-
Citations
25 Claims
-
1. A method, implemented at least in part on a computing device, for processing a data stream embodying a hierarchically structured document, said method comprising:
-
querying the hierarchical structure of the document embodied in the data stream; determining an offset of the data stream, the offset determined during the querying, and the offset including one or more bits; partitioning said data stream into respective fixed length segments utilizing said queried hierarchical structure and the offset to determine a respective length of each fixed length segment; processing said fixed length segments in a pipeline fashion, the processing the fixed length segments including decoding the fixed length segments; parsing the decoded fixed length segments; partitioning the parsed fixed length segments into fragments, the fragments having at least one size, the at least one fragment size being cached in a memory of the computing device; inserting database persistence boundaries between the fragments; storing the fragments in a storage medium, the storage medium including the database and the at least one fragment size being determined in accordance with characteristics of the database, the characteristics including a native unit for holding data; creating a database table, the table comprising; meta data associated with the document; queries over the document and respective results; a first fragment of the document; sizes of all fragments of the document other than the first fragment; and storing the database table in the database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-readable storage medium having computer-executable instructions stored thereon, the instructions when executed by a processor causing the processor to implement a method for processing a data stream embodying a hierarchically structured document, the method comprising:
-
querying the hierarchical structure of the document embodied in the data stream; determining an offset of the data stream, the offset determined during the querying, and the offset including one or more bits; partitioning said data stream into respective fixed length segments utilizing said queried hierarchical structure and the offset to determine a respective length of each fixed length segment; processing said fixed length segments in a pipeline fashion, the processing the fixed length segments including decoding the fixed length segments; parsing the decoded fixed length segments; partitioning the parsed fixed length segments into fragments, the fragments having at least one size, the at least one fragment size being cached in a memory of the computing device; inserting database persistence boundaries between the fragments; storing the fragments in a storage medium, the storage medium including a database and the at least one fragment size being determined in accordance with characteristics of the database, the characteristics including a native unit for holding data; creating a database table, the table comprising; meta data associated with the document; queries over the document and respective results; a first fragment of the document; sizes of all fragments of the document other than the first fragment; and storing the database table in the database. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A system for processing a data stream embodying a hierarchically structured document, said system comprising:
-
at least one computing device; a receive pipeline for; receiving said data stream; querying the hierarchical structure of the document embodied in the data stream; determining an offset of the data stream, the offset determined during the querying, and the offset including one or more bits; partitioning said data stream into respective fixed length segments utilizing said queried hierarchical structure and the offset to determine a respective length of each fixed length segment; processing said fixed length segments in a pipeline fashion; decoding said fixed length segments; parsing said decoded fixed length segments; partitioning the parsed fixed length segments into fragments, the fragments having at least one size, the at least one fragment size being cached in a memory of the computing device; inserting database persistence boundaries between the fragments; storing the fragments in a storage medium, the storage medium including a database and the at least one fragment size being determined in accordance with characteristics of the database, the characteristics including a native unit for holding data; creating a database table, the table comprising; meta data associated with the document; queries over the document and respective results; a first fragment of the document; sizes of all fragments of the document other than the first fragment; and storing the database table in the database, said storage medium coupled to said receiving pipeline and coupled to a transmit pipeline; and said transmit pipeline for; receiving processed data from said storage medium; processing fixed length segments in a pipeline fashion. - View Dependent Claims (24, 25)
-
Specification