Indexing system
First Claim
1. A method comprising:
- receiving, using at least one processor of a first leaf node in a distributed environment, updates to a hybrid-sharded index, the hybrid-sharded index including document-sharded posting lists and term-sharded posting lists;
generating, using the at least one processor of the first leaf node that received an update, replacement posting lists and change information for a respective second leaf node;
dividing the replacement posting lists into portions, a portion having associated change information and being associated with a respective one of the second leaf nodes;
sending the portions to respective second leaf nodes; and
at a particular leaf node of the second leaf nodes;
merging, using at least one processor of the particular leaf node, a received portion into an updated posting list portion,swapping the updated posting list portion into memory, andduring the swap, using the change information and the updated posting list portion to respond to a query with an older version of the hybrid-sharded index.
2 Assignments
0 Petitions
Accused Products
Abstract
A hybrid-sharded index includes document-sharded posting lists and term-sharded posting lists. Implementations include systems and methods for updating a hybrid-sharded index. For example, a method may include receiving updates to the hybrid-sharded index and generating, at a first leaf node, replacement posting lists and change information for a respective second leaf node. The method may also include dividing the replacement posting lists into portions, a portion having associated change information and being associated with a respective one of the second leaf nodes and sending the portions to respective leaf nodes. At a particular leaf node of the second leaf nodes, the method includes merging a received portion into an updated posing list portion, swapping the updated posting list portion into memory. During the swap, the change information and the updated posting list portion are used to respond to a query with an older version of the hybrid-sharded index.
240 Citations
25 Claims
-
1. A method comprising:
-
receiving, using at least one processor of a first leaf node in a distributed environment, updates to a hybrid-sharded index, the hybrid-sharded index including document-sharded posting lists and term-sharded posting lists; generating, using the at least one processor of the first leaf node that received an update, replacement posting lists and change information for a respective second leaf node; dividing the replacement posting lists into portions, a portion having associated change information and being associated with a respective one of the second leaf nodes; sending the portions to respective second leaf nodes; and at a particular leaf node of the second leaf nodes; merging, using at least one processor of the particular leaf node, a received portion into an updated posting list portion, swapping the updated posting list portion into memory, and during the swap, using the change information and the updated posting list portion to respond to a query with an older version of the hybrid-sharded index. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
distributed computing devices represented as leaf nodes and a root node; and an index of documents, the index being distributed across the leaf nodes, the documents being assigned to respective leaf nodes, and wherein a first leaf node includes; memory storing document-sharded posting lists for some or all terms associated with documents in a first set of documents that are assigned to the first leaf node, and memory storing term-sharded posting lists for terms assigned to the first leaf node without regard to leaf node assignments for documents identified in the term-sharded posting lists, wherein the first leaf node includes; at least one processor, memory storing instructions that, when executed by the at least one processor, cause the first leaf node to; receive an update for documents assigned to the first leaf node, determine that the update affects the at least one document-sharded posting list and, responsive to the determining, generate an updated document-sharded posting list for the at least one document-sharded posting list, and determine that the update affects a term-sharded posting list for a term assigned to a second leaf node, the term being associated with at least one document in a second set of documents that are assigned to the first leaf and not in the first set of documents and, responsive to the determining; generate change information for the documents associated with the term, generate an updated term-sharded posting list for the term, and provide the change information and the updated term-sharded posting list to the second leaf node. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A system comprising:
-
distributed computing devices represented as leaf nodes and a root node; an index of documents, the index being distributed across the leaf nodes, the documents being assigned to respective leaf nodes, and wherein a first leaf node includes; memory storing document-sharded posting lists for some or all terms associated with documents in a first set of documents that are assigned to the first leaf node, and memory storing term-sharded posting lists for terms assigned to the first leaf node without regard to leaf node assignments for documents identified in the term-sharded posting lists, wherein the first leaf node includes; at least one processor, memory storing instructions that, when executed by the at least one processor, cause the first leaf node to; receive an update for documents in the first set of documents and, responsive to the receiving, update at least some of the document-sharded posting lists; receive an updated term-sharded posting list portion for a first term from a second leaf node, the first term being assigned to the first leaf node; receive an updated term-sharded posting list portion for the first term from a third leaf node; and generate a new term-sharded posting list for the first term using the portion from the third leaf node and the portion from the second leaf node. - View Dependent Claims (23, 24, 25)
-
Specification