Processing structured data
First Claim
Patent Images
1. A method for efficiently processing a structured data file, the structured data file including one or more pieces of content, comprising:
- parsing the structured data file by;
creating a first record in an intermediate file, the first record corresponding to a starting tag in the structured data file and having information regarding the starting tag and a content type field identifying the first record as a starting tag type;
creating a second record in the intermediate file, the second record corresponding to a piece of content in the structured data file, the second record containing one or more descriptors containing information regarding the property name, wherein the second record includes an offset indicting a position for said piece of content relative to a point in said structured data file;
creating a third record in the intermediate file, the third record corresponding to a piece of content in the structured data file, the third record containing one or more descriptors containing information regarding the property value, wherein the third record includes an offset indicting a position for said piece of content relative to a point in said structured data file;
creating a fourth record in the intermediate file, the fourth record corresponding to a piece of content in the structured data file, the fourth record containing one or more descriptors containing information regarding the content, wherein the fourth record includes an offset indicting a position for said piece of content relative to a point in said structured data file;
creating a fifth record in the intermediate file, the fifth record corresponding to an ending tag in the structured data file and having information regarding the ending tag and a content type field identifying the fifth record as an ending tag type; and
formatting said intermediate file in a way that allows data from the structured data file to be accessed using both said intermediate file and the structured data file together.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides a fast and efficient way of processing structured data by utilizing an intermediate file to store the structural information. The structured data may be processed into a Binary mask Format (BMF) file which may serve as a starting point for post-processing. A tree structure built on top of the BMF file may be constructed very quickly, and also takes up less space than a DOM tree. Additionally, BMF records may reside entirely in the memory and contain structural information, allowing SAX-like sequential data access.
-
Citations
20 Claims
-
1. A method for efficiently processing a structured data file, the structured data file including one or more pieces of content, comprising:
-
parsing the structured data file by; creating a first record in an intermediate file, the first record corresponding to a starting tag in the structured data file and having information regarding the starting tag and a content type field identifying the first record as a starting tag type; creating a second record in the intermediate file, the second record corresponding to a piece of content in the structured data file, the second record containing one or more descriptors containing information regarding the property name, wherein the second record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a third record in the intermediate file, the third record corresponding to a piece of content in the structured data file, the third record containing one or more descriptors containing information regarding the property value, wherein the third record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a fourth record in the intermediate file, the fourth record corresponding to a piece of content in the structured data file, the fourth record containing one or more descriptors containing information regarding the content, wherein the fourth record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a fifth record in the intermediate file, the fifth record corresponding to an ending tag in the structured data file and having information regarding the ending tag and a content type field identifying the fifth record as an ending tag type; and formatting said intermediate file in a way that allows data from the structured data file to be accessed using both said intermediate file and the structured data file together. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An apparatus for efficiently processing structured data, comprising:
-
a peripheral component interface (PCI) interface; a direct memory access (DMA) engine coupled to said PCI interface; a text processor coupled to said PCI interface, the text processor configured to parse a structured data the by; creating a first record in an intermediate file, the first record corresponding to a starting tag in the structured data file and having information regarding the starting tag and a content type field identifying the first record as a starting tag type; creating a second record in the intermediate file, the second record corresponding to a piece of content in the structured data file, the second record containing one or more descriptors containing information regarding the property name, wherein the second record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a third record in the intermediate file, the third record corresponding to a piece of content in the structured data file, the third record containing one or more descriptors containing information regarding the property value, wherein the third record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a fourth record in the intermediate file, the fourth record corresponding to a piece of content in the structured data file, the fourth record containing one or more descriptors containing information regarding the content, wherein the fourth record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a fifth record in the intermediate file, the fifth record corresponding to an ending tag in the structured data file and having information regarding the ending tag and a content type field identifying the fifth record as an ending tag type; and formatting said intermediate file in a way that allows data from the structured data file to be accessed using both said intermediate file and the structured data file together. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform a method for efficiently processing a structured data file, the structured data file including one or more pieces of content, the method comprising:
-
parsing the structured data file by; creating a first record in an intermediate file, the first record corresponding to a starting tag in the structured data file and having information regarding the starting tag and a content type field identifying the first record as a starting tag type; creating a second record in the intermediate file, the second record corresponding to a piece of content in the structured data file, the second record containing one or more descriptors containing information regarding the property name, wherein the second record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a third record in the intermediate file, the third record corresponding to a piece of content in the structured data file, the third record containing one or more descriptors containing information regarding the property value, wherein the third record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a fourth record in the intermediate file, the fourth record corresponding to a piece of content in the structured data file, the fourth record containing one or more descriptors containing information regarding the content, wherein the fourth record includes an offset indicting a position for said piece of content relative to a point in said structured data file; creating a fifth record in the intermediate file, the fifth record corresponding to an ending tag in the structured data file and having information regarding the ending tag and a content type field identifying the fifth record as an ending tag type; and formatting said intermediate file in a way that allows data from the structured data file to be accessed using both said intermediate file and the structured data file together. - View Dependent Claims (18, 19, 20)
-
Specification