PROCESSING STRUCTURED DATA
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides a fast and efficient way of processing structured data by utilizing an intermediate file to store the structural information. The structured data may be processed into a Binary mask Format (BMF) file which may serve as a starting point for post-processing. A tree structure built on top of the BMF file may be constructed very quickly, and also takes up less space than a DOM tree. Additionally, BMF records may reside entirely in the memory and contain structural information, allowing SAX-like sequential data access.
-
Citations
21 Claims
-
1. (canceled)
-
2. A method for efficiently processing a structured data file, the structured data file including one or more pieces of content, comprising:
parsing the structured data file by; creating a first record in an intermediate file, the first record corresponding to a starting tag in the structured data file and having information regarding the starting tag and a content type field identifying the first record as corresponding to a starting tag; creating a second record in the intermediate file, the second record corresponding to a piece of content in the structured data file, the second record containing one or more descriptors containing information regarding the property name, wherein the second record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a third record in the intermediate file, the second record corresponding to a piece of content in the structured data file, the second record containing one or more descriptors containing information regarding the property value, wherein the second record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a fourth record in the intermediate file, the second record corresponding to a piece of content in the structured data file, the second record containing one or more descriptors containing information regarding the content, wherein the second record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a fifth record in the intermediate file, the third record corresponding to an ending tag in the structured data file and having information regarding the ending tag and a content type field identifying the third record as corresponding to an ending tag; and formatting said intermediate file in a way that allows data from the structured data file to be accessed using both said intermediate file and the structured data file together. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9)
-
10. An apparatus for efficiently processing structured data, comprising:
-
a peripheral component interface (PCI) interface; a direct memory access (DMA) engine coupled to said PCI interface; a text processor coupled to said PCI interface, the text processor configured to parse a structured data file by; creating a first record in an intermediate file, the first record corresponding to a starting tag in the structured data file and having information regarding the starting tag and a content type field identifying the first record as corresponding to a starting tag; creating a second record in the intermediate file, the second record corresponding to a piece of content in the structured data file, the second record containing one or more descriptors containing information regarding the property name, wherein the second record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a third record in the intermediate file, the second record corresponding to a piece of content in the structured data file, the second record containing one or more descriptors containing information regarding the property value, wherein the second record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a fourth record in the intermediate file, the second record corresponding to a piece of content in the structured data file, the second record containing one or more descriptors containing information regarding the content, wherein the second record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a fifth record in the intermediate file, the third record corresponding to an ending tag in the structured data file and having information regarding the ending tag and a content type field identifying the third record as corresponding to an ending tag; and
formatting said intermediate file in a way that allows data from the structured data file to be accessed using both said intermediate file and the structured data file together. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform a method for efficiently processing a structured data file, the structured data file including one or more pieces of content, the method comprising:
parsing the structured data file by; creating a first record in an intermediate file, the first record corresponding to a starting tag in the structured data file and having information regarding the starting tag and a content type field identifying the first record as corresponding to a starting tag; creating a second record in the intermediate file, the second record corresponding to a piece of content in the structured data file, the second record containing one or more descriptors containing information regarding the property name, wherein the second record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a third record in the intermediate file, the second record corresponding to a piece of content in the structured data file, the second record containing one or more descriptors containing information regarding the property value, wherein the second record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a fourth record in the intermediate file, the second record corresponding to a piece of content in the structured data file, the second record containing one or more descriptors containing information regarding the content, wherein the second record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a fifth record in the intermediate file, the third record corresponding to an ending tag in the structured data file and having information regarding the ending tag and a content type field identifying the third record as corresponding to an ending tag; and formatting said intermediate file in a way that allows data from the structured data file to be accessed using both said intermediate file and the structured data file together. - View Dependent Claims (19, 20, 21)
Specification