Scalable storage and processing of hierarchical documents

US 8,028,007 B2
Filed: 02/06/2006
Issued: 09/27/2011
Est. Priority Date: 06/27/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A method, implemented at least in part on a computing device, for processing a data stream embodying a hierarchically structured document, said method comprising:

querying the hierarchical structure of the document embodied in the data stream;

determining an offset of the data stream, the offset determined during the querying, and the offset including one or more bits;

partitioning said data stream into respective fixed length segments utilizing said queried hierarchical structure and the offset to determine a respective length of each fixed length segment;

processing said fixed length segments in a pipeline fashion, the processing the fixed length segments including decoding the fixed length segments;

parsing the decoded fixed length segments;

partitioning the parsed fixed length segments into fragments, the fragments having at least one size, the at least one fragment size being cached in a memory of the computing device;

inserting database persistence boundaries between the fragments;

storing the fragments in a storage medium, the storage medium including the database and the at least one fragment size being determined in accordance with characteristics of the database, the characteristics including a native unit for holding data;

creating a database table, the table comprising;

meta data associated with the document;

queries over the document and respective results;

a first fragment of the document;

sizes of all fragments of the document other than the first fragment; and

storing the database table in the database.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Large messages in the form of hierarchically structured documents are processed in a streaming fashion using the ultimate consumer read requests as the driving force for the processing. The messages are partitioned into fixed length segments. The segments are processed in pipeline fashion. This processing chain includes simulating random access of hierarchical documents using stream transformations, mapping streams to a transport'"'"'s native capabilities, composing streams into chains and using pipeline processing on the chains, staging fragments into a database and routing messages when complete messages have been formed, and providing tools to allow the end user to inspect partial messages.

Citations

25 Claims

1. A method, implemented at least in part on a computing device, for processing a data stream embodying a hierarchically structured document, said method comprising:
- querying the hierarchical structure of the document embodied in the data stream;
  
  determining an offset of the data stream, the offset determined during the querying, and the offset including one or more bits;
  
  partitioning said data stream into respective fixed length segments utilizing said queried hierarchical structure and the offset to determine a respective length of each fixed length segment;
  
  processing said fixed length segments in a pipeline fashion, the processing the fixed length segments including decoding the fixed length segments;
  
  parsing the decoded fixed length segments;
  
  partitioning the parsed fixed length segments into fragments, the fragments having at least one size, the at least one fragment size being cached in a memory of the computing device;
  
  inserting database persistence boundaries between the fragments;
  
  storing the fragments in a storage medium, the storage medium including the database and the at least one fragment size being determined in accordance with characteristics of the database, the characteristics including a native unit for holding data;
  
  creating a database table, the table comprising;
  
  meta data associated with the document;
  
  queries over the document and respective results;
  
  a first fragment of the document;
  
  sizes of all fragments of the document other than the first fragment; and
  
  storing the database table in the database.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. A method in accordance with claim 1, in which the data stream embodies said hierarchically structured document formatted in a transport protocol.
  - 3. A method in accordance with claim 2, wherein said transport protocol comprises at least one of a hypertext transport protocol (HTTP) and a file transport protocol (FTP).
  - 4. A method in accordance with claim 1, wherein said act of decoding comprises multipurpose mail extensions (MIME) decoding.
  - 5. A method in accordance with claim 1, wherein act of parsing comprises extensible markup language (XML) parsing.
  - 6. A method in accordance with claim 1, further comprising:
    - retrieving said database table from said database; and
      
      utilizing said retrieved database table to retrieve all fragments of said document.
  - 7. A method in accordance with claim 1, said act of processing comprising:
    - retrieving said fragments from said storage medium; and
      
      serializing said retrieved fragments into fixed length segments.
  - 8. A method in accordance with claim 1, further comprising encoding said serialized fixed length segments.
  - 9. A method in accordance with claim 8, wherein said act of encoding comprises multipurpose mail extensions (MIME) encoding.
  - 10. A method in accordance with claim 8, further comprising converting said encoded fixed length segments into a transport protocol.
  - 11. A method in accordance with claim 10, said transport protocol comprises at least one of a hypertext transport protocol (HTTP) and a file transport protocol (FTP).

12. A computer-readable storage medium having computer-executable instructions stored thereon, the instructions when executed by a processor causing the processor to implement a method for processing a data stream embodying a hierarchically structured document, the method comprising:
- querying the hierarchical structure of the document embodied in the data stream;
  
  determining an offset of the data stream, the offset determined during the querying, and the offset including one or more bits;
  
  partitioning said data stream into respective fixed length segments utilizing said queried hierarchical structure and the offset to determine a respective length of each fixed length segment;
  
  processing said fixed length segments in a pipeline fashion, the processing the fixed length segments including decoding the fixed length segments;
  
  parsing the decoded fixed length segments;
  
  partitioning the parsed fixed length segments into fragments, the fragments having at least one size, the at least one fragment size being cached in a memory of the computing device;
  
  inserting database persistence boundaries between the fragments;
  
  storing the fragments in a storage medium, the storage medium including a database and the at least one fragment size being determined in accordance with characteristics of the database, the characteristics including a native unit for holding data;
  
  creating a database table, the table comprising;
  
  meta data associated with the document;
  
  queries over the document and respective results;
  
  a first fragment of the document;
  
  sizes of all fragments of the document other than the first fragment; and
  
  storing the database table in the database.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 13. The computer-readable storage medium of claim 12, in which the data stream embodies said hierarchically structured document formatted in a transport protocol.
  - 14. The computer-readable storage medium of claim 13, wherein said transport protocol comprises at least one of a hypertext transport protocol (HTTP) and a file transport protocol (FTP).
  - 15. The computer-readable storage medium of claim 12, wherein said decoding comprises multipurpose mail extensions (MIME) decoding.
  - 16. The computer-readable storage medium of claim 12, wherein the parsing comprises extensible markup language (XML) parsing.
  - 17. The computer-readable storage medium of claim 12, wherein the method further comprises:
    - retrieving said database table from said database; and
      
      utilizing said retrieved database table to retrieve all fragments of said document.
  - 18. The computer-readable storage medium of claim 12, said processing comprising:
    - retrieving said fragments from said storage medium; and
      
      serializing said retrieved fragments into fixed length segments.
  - 19. The computer-readable storage medium of claim 12, wherein the method further comprises encoding said serialized fixed length segments.
  - 20. The computer-readable storage medium of claim 19, wherein said encoding comprises multipurpose mail extensions (MIME) encoding.
  - 21. The computer-readable storage medium of claim 19, wherein the method further comprises converting said encoded fixed length segments into a transport protocol.
  - 22. The computer-readable storage medium of claim 21, said transport protocol comprising at least one of a hypertext transport protocol (HTTP) and a file transport protocol (FTP).

23. A system for processing a data stream embodying a hierarchically structured document, said system comprising:
- at least one computing device;
  
  a receive pipeline for;
  
  receiving said data stream;
  
  querying the hierarchical structure of the document embodied in the data stream;
  
  determining an offset of the data stream, the offset determined during the querying, and the offset including one or more bits;
  
  partitioning said data stream into respective fixed length segments utilizing said queried hierarchical structure and the offset to determine a respective length of each fixed length segment;
  
  processing said fixed length segments in a pipeline fashion;
  
  decoding said fixed length segments;
  
  parsing said decoded fixed length segments;
  
  partitioning the parsed fixed length segments into fragments, the fragments having at least one size, the at least one fragment size being cached in a memory of the computing device;
  
  inserting database persistence boundaries between the fragments;
  
  storing the fragments in a storage medium, the storage medium including a database and the at least one fragment size being determined in accordance with characteristics of the database, the characteristics including a native unit for holding data;
  
  creating a database table, the table comprising;
  
  meta data associated with the document;
  
  queries over the document and respective results;
  
  a first fragment of the document;
  
  sizes of all fragments of the document other than the first fragment; and
  
  storing the database table in the database, said storage medium coupled to said receiving pipeline and coupled to a transmit pipeline; and
  
  said transmit pipeline for;
  
  receiving processed data from said storage medium;
  
  processing fixed length segments in a pipeline fashion.
- View Dependent Claims (24, 25)
- - 24. A system in accordance with claim 23, wherein said hierarchically structured document and said processed hierarchically structured document are formatted in accordance with a transport protocol.
  - 25. A system in accordance with claim 23, said transmit pipeline comprising:
    - a serializer for converting data received from said storage medium into fixed length segments; and
      
      an encoder coupled to said serializer for encoding said serialized fixed length segments.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Maybee, Paul, Mehta, Bimal, Saha, Sanjib, Graber, Lee, Somasekaran, Anandhi, Sutjahjo, Siunie, Malhi, Balinder, Zhang, Allen, Lo, Wei-Lun, Sagar, Akash, Levanoni, Yossi
Primary Examiner(s)
Kim; Paul

Application Number

US11/348,038
Publication Number

US 20060129524A1
Time in Patent Office

2,059 Days
Field of Search

707/600, 707/790, 1/1
US Class Current

707/811
CPC Class Codes

G06F 16/835   Query processing

G06F 16/8373   Query execution

G06F 16/9574   of access to content, e.g. ...

Y10S 707/99933   Query processing, i.e. sear...

Scalable storage and processing of hierarchical documents

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Scalable storage and processing of hierarchical documents

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links