Storage and retrieval of data from a bit vector search index
First Claim
1. One or more computer storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform operations comprising:
- receiving a search query;
identifying a term based on the search query;
accessing one or more data structures for determining storage locations of bit vectors for terms indexed in a search index, each bit vector comprising an array of bits, each of at least a portion of the bits representing whether at least one document contains at least one term from a corresponding set of terms, the one or more data structures including explicit mappings for a subset of the terms indexed in the search index, the explicit mappings identifying actual storage locations for each term in the subset of terms, the one or more data structures also including ad hoc information mapping term characteristics to mapping algorithms for calculating storage locations of bit vectors for terms with the term characteristics;
if an explicit mapping for the term is provided in the one or more data structures, identifying storage locations of bit vectors for the term from the explicit mapping; and
if an explicit mapping for the term is not provided in the one or more data structures, using ad hoc information to identify storage locations of bit vectors for the term.
1 Assignment
0 Petitions
Accused Products
Abstract
The technology described herein provides for storing and retrieving data in a bit vector search index. The bit vector search index stores data about terms from documents using bit vectors. Each bit vector comprises an array of bits and corresponds to a different set of terms. Each bit in the bit vector is used to represent whether a document includes at least one term from the set of terms. A band table is used to store bit vector configurations for bands of terms having similar term characteristics. Each term is indexed in the bit vector search index according to a bit vector configuration for a band to which it belongs. When identifying bit vector storage locations for terms, explicit mappings are used for some terms and ad hoc approaches used for other terms. Explicit mappings provide specific locations for terms, while ad hoc approaches use mapping algorithms assigned to bands.
-
Citations
20 Claims
-
1. One or more computer storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform operations comprising:
-
receiving a search query; identifying a term based on the search query; accessing one or more data structures for determining storage locations of bit vectors for terms indexed in a search index, each bit vector comprising an array of bits, each of at least a portion of the bits representing whether at least one document contains at least one term from a corresponding set of terms, the one or more data structures including explicit mappings for a subset of the terms indexed in the search index, the explicit mappings identifying actual storage locations for each term in the subset of terms, the one or more data structures also including ad hoc information mapping term characteristics to mapping algorithms for calculating storage locations of bit vectors for terms with the term characteristics; if an explicit mapping for the term is provided in the one or more data structures, identifying storage locations of bit vectors for the term from the explicit mapping; and if an explicit mapping for the term is not provided in the one or more data structures, using ad hoc information to identify storage locations of bit vectors for the term. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
assigning term characteristics to each of a plurality of bands; assigning a bit vector configuration to each of the plurality of bands, each bit vector configuration specifying a number and length of bit vectors; and storing, on one or more computer storage media, a data structure mapping the term characteristics to bit vector configuration for each of the bands. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system comprising:
-
one or more processors; and one or more computer storage media storing computer-useable instructions that, when used by the one or more processors, cause the one or more processors to; identify a term based on a search query; accessing one or more data structures for determining the storage locations of bit vectors for terms indexed in a search index, each bit vector comprising an array of bits, each of at least a portion of the bits representing whether at least one document contains at least one term from a corresponding set of terms, the one or more data structures including explicit mappings for a subset of the terms indexed in the search index, the explicit mappings identifying actual storage locations for each term in the subset of terms, the one or more data structures also including ad hoc information mapping term characteristics to mapping algorithms for calculating storage locations of bit vectors for terms with the term characteristics; if an explicit mapping for the term is provided in the one or more data structures, identifying storage locations of bit vectors for the term from the explicit mapping; if an explicit mapping for the term is not provided in the one or more data structures, using ad hoc information to identify storage locations of bit vectors for the term by;
determining one or more term characteristics of the term, identifying mapping algorithms for the one or more term characteristics from one or more data structures, and employing the mapping algorithms to determine storage locations for bit vectors for the term;accessing the bit vectors for the term; and intersecting the bit vectors for the term to identify matching documents for the search query. - View Dependent Claims (20)
-
Specification