Updating a bit vector search index
First Claim
1. A method of indexing information about documents in a bit vector search index, the method comprising:
- indexing information about a plurality of documents using bit vectors on a first accumulation buffer storage device, each bit vector on the first accumulation buffer storage device comprising an array of bits in which each bit represents whether one or more documents from the plurality of documents includes one or more terms; and
when a threshold is satisfied, indexing the information about the plurality of documents using bit vectors on a subsequent storage device based on the information indexed in the first accumulation buffer storage device, each bit vector on the subsequent storage device comprising an array of bits in which each bit represents whether one or more documents from a set of documents larger than the plurality of documents includes one or more terms.
1 Assignment
0 Petitions
Accused Products
Abstract
The technology described herein provides for indexing information in a bit vector search index. The bit vector search index comprises a data structure for indexing data about terms from a corpus of documents. The data structure includes a number of bit vectors. Each bit vector comprises an array of bits and corresponds to a different set of terms. Bits in the bit vector are used to represent whether at least one document corresponding to the bit includes at least one term from the set of terms corresponding to the bit vector. The bit vector search index is stored by first indexing information about documents using bit vectors on a first accumulation buffer storage device. When a threshold is satisfied, the information is transferred to bit vectors on a subsequent storage device.
-
Citations
20 Claims
-
1. A method of indexing information about documents in a bit vector search index, the method comprising:
-
indexing information about a plurality of documents using bit vectors on a first accumulation buffer storage device, each bit vector on the first accumulation buffer storage device comprising an array of bits in which each bit represents whether one or more documents from the plurality of documents includes one or more terms; and when a threshold is satisfied, indexing the information about the plurality of documents using bit vectors on a subsequent storage device based on the information indexed in the first accumulation buffer storage device, each bit vector on the subsequent storage device comprising an array of bits in which each bit represents whether one or more documents from a set of documents larger than the plurality of documents includes one or more terms. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. One or more computer storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform operations comprising:
-
adding term information about a plurality of documents to bit vectors on a first accumulation buffer storage device until a threshold is satisfied, each bit vector on the first accumulation buffer storage device corresponding with one or more terms from a set of terms and each bit in each bit vector on the first accumulation buffer storage device identifying whether one or more documents corresponding with the bit contains at least one of the one or more terms corresponding with the bit vector on the first accumulation buffer storage device; and when the threshold is satisfied, adding the term information about the documents to bit vectors on a subsequent storage device, each bit vector on the subsequent storage device corresponding with one or more terms from the set of terms and each bit in each bit vector on the subsequent storage device identifying whether one or more documents corresponding with the bit contains at least one of the one or more terms corresponding with the bit vector on the subsequent storage device. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. One or more computer storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform operations comprising:
-
identifying a plurality of terms contained within a document; identifying a bit corresponding to the document in each of a plurality of bit vectors stored on a storage device, a first bit vector from the plurality of bit vectors comprising an array of bits with each bit of the first bit vector corresponding to all terms from a plurality of terms assigned to the first bit vector and each bit representing whether at least one of one or more documents contains any term from the plurality of terms assigned to the first bit vector; and setting bits identified for the document in bit vectors corresponding with terms contained within the document. - View Dependent Claims (20)
-
Specification