QUERY AND INDEX OVER DOCUMENTS
First Claim
1. A method comprising:
- receiving a plurality of documents by a computing device, wherein each document comprises a plurality of objects;
for each document, generating a graph representing the document by the computing device, wherein each graph comprises a node corresponding to each object of the represented document;
generating a document index by merging the nodes of the generated graphs by the computing device, wherein each node in the document index includes an identifier of one or more graphs that include the node;
receiving a query by the computing device;
identifying one or more documents of the plurality of documents that are responsive to the query using the generated document index by the computing device;
running the query on the identified one or more documents to generate a subset of the identified one or more documents by the computing device; and
providing the subset of the one or more identified documents in response to the query by the computing device.
3 Assignments
0 Petitions
Accused Products
Abstract
A document index is generated from a set of documents and is used to identify documents that match one or more queries. A tree is generated for each document with a node corresponding to each object of the document. The nodes of the generated trees are merged or combined to generate the document index, which is itself a tree. In addition, an inverted index is generated for each node of the index that identifies the tree(s) that the node originated from. When a query is received, the query is first executed against the document index tree: during the execution, proper set operations are applied to the inverted indices associated with the nodes matched by the query. The resulted set identifies the documents that may match the query. The query is then executed on the identified documents.
131 Citations
20 Claims
-
1. A method comprising:
-
receiving a plurality of documents by a computing device, wherein each document comprises a plurality of objects; for each document, generating a graph representing the document by the computing device, wherein each graph comprises a node corresponding to each object of the represented document; generating a document index by merging the nodes of the generated graphs by the computing device, wherein each node in the document index includes an identifier of one or more graphs that include the node; receiving a query by the computing device; identifying one or more documents of the plurality of documents that are responsive to the query using the generated document index by the computing device; running the query on the identified one or more documents to generate a subset of the identified one or more documents by the computing device; and providing the subset of the one or more identified documents in response to the query by the computing device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method comprising:
-
receiving a document index by a computing device, wherein the document index comprises a plurality of nodes, each node includes identifiers of one or more graphs of a plurality of graphs, and each graph represents a document of a plurality of documents; receiving a query by the computing device, wherein the query comprises a plurality of sub-queries; for each sub-query; determining nodes in the document index that match the sub-query by the computing device; and for each determined node that matches the sub-query, determining the one or more graphs identified by the identifiers of the determined node by the computing device; determining a union of the determined one or more graphs for each sub-query by the computing device; identifying documents that are represented by the graphs of the determined union of the graphs by the computing device; running the query on the identified documents to generate a subset of the identified documents; and providing the subset of the identified documents in response to the query by the computing device. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A system comprising:
-
a computing device; an index engine adapted to generate a document index from a plurality documents, wherein the document index includes a plurality of nodes, and each node identifies one or more documents; and a query engine adapted to; receive a query; identify one or more documents of the plurality of documents that are responsive to the query using the generated document index; run the query on the identified one or more documents to generate a subset of the one or more documents; and provide the subset of the one or more documents in response to the query. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification