Efficient data structures for parsing and analyzing a document
First Claim
Patent Images
1. A method comprising:
- receiving an unstructured document comprising a plurality of primitive elements, wherein each of the plurality of primitive elements is a character;
performing a plurality of different processes for analyzing and manipulating the unstructured document in order to generate a structured document from the unstructured document; and
storing a plurality of references associated with the plurality of primitive elements, wherein each of the references refers to a different primitive element of the plurality of primitive elements, wherein at least some of the references are stored in a separate memory space from the plurality of different processes and are shared by at least two different processes of the plurality of different processes, wherein the two different processes access the plurality of references by use of objects that refer to the plurality of references, wherein the plurality of references are not replicated by the two different processes.
0 Assignments
0 Petitions
Accused Products
Abstract
Some embodiments provide a method that parses an unstructured document that includes a number of primitive elements. The method stores the primitive elements in a random order in a first storage. The method stores references to the primitive elements in a second storage in an order based on locations of the primitive elements in the unstructured document. The method receives instructions to perform a document reconstruction operation. The method performs the received instructions without storing any new references to the primitive elements.
99 Citations
25 Claims
-
1. A method comprising:
-
receiving an unstructured document comprising a plurality of primitive elements, wherein each of the plurality of primitive elements is a character; performing a plurality of different processes for analyzing and manipulating the unstructured document in order to generate a structured document from the unstructured document; and storing a plurality of references associated with the plurality of primitive elements, wherein each of the references refers to a different primitive element of the plurality of primitive elements, wherein at least some of the references are stored in a separate memory space from the plurality of different processes and are shared by at least two different processes of the plurality of different processes, wherein the two different processes access the plurality of references by use of objects that refer to the plurality of references, wherein the plurality of references are not replicated by the two different processes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
a set of processors for executing sets of instructions; and a non-transitory machine readable medium storing a program for execution by at least one of the processors, the program comprising sets of instructions for; receiving an unstructured document comprising a plurality of primitive elements, wherein each of the plurality of primitive elements is a character; performing a plurality of different processes for analyzing and manipulating the unstructured document in order to generate a structured document from the unstructured document; and storing a plurality of references associated with the plurality of primitive elements, wherein each of the references refers to a different primitive element of the plurality of primitive elements, wherein at least some of the references are stored in a separate memory space from the plurality of different processes and are shared by at least two different processes of the plurality of different processes, wherein the two different processes access the plurality of references by use of objects that refer to the plurality of references, and wherein the plurality of references are not replicated by the two different processes. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A system comprising:
-
a set of processors for executing sets of instructions; and a non-transitory machine readable medium storing a program for execution by at least one of the processors, the program comprising sets of instructions for; receiving an unstructured comprising a plurality of primitive elements, wherein each of the plurality of primitive elements is a character; performing a plurality of different processes for analyzing and manipulating the unstructured document in order to generate a structured document from the unstructured document, wherein the plurality of different processes comprises (i) a first process for associating sets of characters as a line of text and storing the line of text as a first string and (ii) a second process for associating sets of characters as a word and storing the word as a second string; and storing data associated with the primitive elements, wherein at least some of the data is stored in a separate memory space from the processes and is shared by at least two different processes, wherein the processes access the data by use of references to the data, and wherein the data is not replicated by the processes, wherein the first and second strings reference a same piece of data associated with the primitive elements, wherein the first string comprises a reference to the same piece of data and a first count of a number of characters in the first string, wherein the second string comprises the reference to the same piece of data and a second count of a number of characters in the second string. - View Dependent Claims (17, 18, 19)
-
-
20. A non-transitory machine readable medium storing a program for execution by at least one processor, the program comprising sets of instructions for:
-
receiving an unstructured document comprising a plurality of primitive elements, wherein each of the plurality of primitive elements is a character; performing a plurality of different processes for analyzing and manipulating the unstructured document in order to generate a structured document from the unstructured document; and storing a plurality of references associated with the plurality of primitive elements, wherein each of the references refers to a different primitive element of the plurality of primitive elements, wherein at least some of the references are stored in a separate memory space from the plurality of different processes and are shared by at least two different processes of the plurality of different processes, wherein the two different processes access the plurality of references by use of objects that refer to the plurality of references, and wherein the plurality of references are not replicated by the two different processes. - View Dependent Claims (21, 22, 23, 24, 25)
-
Specification