System and method for identifying data records using solution bitmasks
First Claim
1. A method in a computer system for identifying a subset of a plurality of data records that satisfy a search criterion, each of the plurality of data records having a record identifier and containing data organized into one or more fields, the search criterion relating to a selected one of the fields, the method comprising the steps of:
- receiving a search query specifying the search criterion;
before receiving the search query, providing an index identifying the contents of the selected field of at least a portion of the data records, the index containing, for each data record, a data record identifier usable to access the data record and an indication of the contents of the selected field, the data record identifiers in the index being separated by a uniform offset; and
after receiving the search query, using the index to select the data record identifiers of the data records that satisfy the search criterion without accessing the data records.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for information retrieval includes an input device, a storage device, an output device, and a data file stored in the storage device including n data records, and one or more index files having data corresponding to a key value comprising information derived from a data record, and the record number of the data record containing the data from which the key value is derived. The retrieval system further includes memory for storing a temporary solution bitmask n bits in length where each bit corresponds to a record in the data file and logic for accessing the data file ascertaining the record number for each data record corresponding to a key value which satisfies the search criteria, and logic for setting the bit corresponding to that record number in the temporary solution bitmask. The system also preferably includes logic for analyzing individual search criteria in a search query containing a plurality of search criteria to determine the extent to which the search is optimizable using the present invention, and logic for combining each of the temporary solution bitmasks ascertained for particular search criteria in a query to obtain a final solution bitmask representative of the set of all data records satisfying the query.
-
Citations
28 Claims
-
1. A method in a computer system for identifying a subset of a plurality of data records that satisfy a search criterion, each of the plurality of data records having a record identifier and containing data organized into one or more fields, the search criterion relating to a selected one of the fields, the method comprising the steps of:
-
receiving a search query specifying the search criterion; before receiving the search query, providing an index identifying the contents of the selected field of at least a portion of the data records, the index containing, for each data record, a data record identifier usable to access the data record and an indication of the contents of the selected field, the data record identifiers in the index being separated by a uniform offset; and after receiving the search query, using the index to select the data record identifiers of the data records that satisfy the search criterion without accessing the data records. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method in a computer system for identifying a subset of a plurality of data records that satisfy a search criterion, each of the plurality of data records having a record identifier and containing data organized into one or more fields, the search criterion relating to a selected one of the fields, the method comprising the steps of:
-
receiving a search query specifying the search criterion; before receiving the search query providing an index identifying the contents of the selected field of at least a portion of the data records, the index containing, for each data record, a data record identifier usable to access the data record and an indication of the contents of the selected field, the data record identifiers in the index being separated by a uniform offset; after receiving the search query, using the index to select the data record identifiers of the data records that satisfy the search criterion; creating a mask identifying the data records having the selected data record identifiers, the mask comprised of distinct entries and containing exactly one entry corresponding to each of the plurality of data records, each entry being initialized to a first value; and for each selected index record, setting the entry in the mask that corresponds to the data record identified by the record identifier to a second value, whereby each entry in the mask set to the second value identifies one of the subset of the plurality of stored data records that satisfies the search criterion, and whereby each entry of the mask set to the first value identifies one of the subset of the plurality of stored data records that does not satisfy the search criterion, and wherein the length of the data record identifiers is fixed across data records, and wherein the indications of the contents of the selected field contained in the index comprise compressed field contents and decompression information for decompressing the compressed field contents, the length of the compressed field contents being variable across data records and the length of the decompression information being fixed across data records, and wherein the fixed-length data record identifier and decompression information for each data record are stored contiguously in a first part of the index such that data record identifiers alternate with decompression information, and wherein the variable-length compressed field contents for each data record are stored in a second part of the index.
-
-
8. A computer-readable medium whose contents cause a computer system to identify a subset of a plurality of data records that satisfy a search criterion, each of the plurality of data records having a record identifier and containing data organized into one or more fields, the search criterion relating to a selected one of the fields, by performing the steps of:
-
receiving a search query specifying the search criterion; before receiving the search query, providing an index identifying the contents of the selected field of at least a portion of the data records, the index containing, for each data record, a data record identifier usable to access the data record and an indication of the contents of the selected field, the data record identifiers in the index being separated by a uniform offset; and after receiving the search query, using the index to select the data record identifiers of the data records that satisfy the search criterion without accessing the data records. - View Dependent Claims (9, 10)
-
-
11. A computer-readable medium whose contents cause a computer system to identify a subset of a plurality of data records that satisfy a search criterion, each of the plurality of data records having a record identifier and containing data organized into one or more fields, the search criterion relating to a selected one of the fields, by performing the steps of:
-
receiving a search query specifying the search criterion; before receiving the search query providing an index identifying the contents of the selected field of at least a portion of the data records, the index containing, for each data record, a data record identifier usable to access the data record and an indication of the contents of the selected field, the data record identifiers in the index being separated by a uniform offset; after receiving the search query, using the index to select the data record identifiers of the data records that satisfy the search criterion; creating a mask identifying the data records having the selected data record identifiers, the mask comprised of distinct entries and containing exactly one entry corresponding to each of the plurality of data records each entry being initialized to a first value; and for each selected index record, setting the entry in the mask that corresponds to the data record identified by the record identifier to a second value, whereby each entry in the mask set to the second value identifies one of the subset of the plurality of stored data records that satisfies the search criterion, and whereby each entry of the mask set to the first value identifies one of the subset of the plurality of stored data records that does not satisfy the search criterion, and wherein the length of the data record identifiers is fixed across data records, and wherein the indications of the contents of the selected field contained in the index comprise compressed field contents and decompression information for decompressing the compressed field contents, the length of the compressed field contents being variable across data records and the length of the decompression information being fixed across data records, and wherein the fixed-length data record identifier and decompression information for each data record are stored contiguously in a first part of the index such that data record identifiers alternate with decompression information, and wherein the variable-length compressed field contents for each data record are stored in a second part of the index.
-
-
12. An apparatus for identifying a subset of a plurality of data records that satisfy a search criterion, each of the plurality of data records having a record identifier and containing data organized into one or more fields, the search criterion relating to a selected one of the fields, comprising:
-
a search query receiving subsystem that receives a search query specifying the search criterion; an index provision subsystem that provides an index identifying the contents of the selected field of at least a portion of the data records before receiving the search query, the index containing, for each data record, a data record identifier usable to access the data record and an indication of the contents of the selected field, the data record identifiers in the index being separated by a uniform offset; and a record selection subsystem that uses the index to select the data record identifiers of the data records that satisfy the search criterion after receiving the search query without accessing the data records. - View Dependent Claims (13, 14)
-
-
15. An apparatus for identifying a subset of a plurality of data records that satisfy a search criterion, each of the plurality of data records having a record identifier and containing data organized into one or more fields, the search criterion relating to a selected one of the fields, comprising:
-
a search query receiving subsystem that receives a search query specifying the search criterion; an index provision subsystem that provides an index identifying the contents of the selected field of at least a portion of the data records before receiving the search query, the index containing, for each data record, a data record identifier usable to access the data record and an indication of the contents of the selected field, the data record identifiers in the index being separated by a uniform offset; a record selection subsystem that uses the index to select the data record identifiers of the data records that satisfy the search criterion after receiving the search query; a mask creation subsystem that creates a mask identifying the data records having the selected data record identifiers, the mask comprised of distinct entries and containing exactly one entry corresponding to each of the plurality of data records, each entry being initialized to a first value; and a mask setting subsystem that, for each selected index record, sets the entry in the mask that corresponds to the data record identified by the record identifier to a second value, whereby each entry in the mask set to the second value identifies one of the subset of the plurality of stored data records that satisfies the search criterion, and whereby each entry of the mask set to the first value identifies one of the subset of the plurality of stored data records that does not satisfy the search criterion, and wherein the length of the data record identifiers is fixed across data records, and wherein the indications of the contents of the selected field contained in the index comprise compressed field contents and decompression information for decompressing the compressed field contents, the length of the compressed field contents being variable across data records and the length of the decompression information being fixed across data records, and wherein the fixed-length data record identifier and decompression information for each data record are stored contiguously in a first part of the index such that data record identifiers alternate with decompression information, and wherein the variable-length compressed field contents for each data record are stored in a second part of the index.
-
-
16. A method in a computer system for identifying a subset of a plurality of data records that satisfy a search criterion, each of the plurality of data records having a record number and containing data organized into one or more fields, the search criterion relating to a selected one of the fields, the method comprising the steps of:
-
receiving a search query specifying the search criterion; before receiving the search query, providing an index identifying the contents of the selected field of at least a portion of the data records, the index containing, for each data record, a data record number usable to access the data record and an indication of the contents of the selected field, a data record number size value further being stored in conjunction with the index that indicates the amount of space occupied in the index by each data record number; and after receiving the search query, using the index to select the data record numbers of the data records that satisfy the search criterion. - View Dependent Claims (17, 18)
-
-
19. A computer-readable medium whose contents cause a computer system to identify a subset of a plurality of data records that satisfy a search criterion, each of the plurality of data records having a record number and containing data organized into one or more fields, the search criterion relating to a selected one of the fields, by performing the steps of:
-
receiving a search query specifying the search criterion; before receiving the search query, providing an index identifying the contents of the selected field of at least a portion of the data records, the index containing, for each data record, a data record number usable to access the data record and an indication of the contents of the selected field, a data record number size value further being stored in conjunction with the index that indicates the amount of space occupied in the index by each data record number; and after receiving the search query, using the index to select the data record numbers of the data records that satisfy the search criterion. - View Dependent Claims (20, 21)
-
-
22. An apparatus for identifying a subset of a plurality of data records that satisfy a search criterion, each of the plurality of data records having a record number and containing data organized into one or more fields, the search criterion relating to a selected one of the fields, comprising:
-
a search query receiving subsystem that receives a search query specifying the search criterion; an index provision subsystem that provides an index identifying the contents of the selected field of at least a portion of the data records before receiving the search query, the index containing, for each data record, a data record number usable to access the data record and an indication of the contents of the selected field, a data record number size value further being stored in conjunction with the index that indicates the amount of space occupied in the index by each data record number; and a record selection subsystem that uses the index to select the data record numbers of the data records that satisfy the search criterion after receiving the search query. - View Dependent Claims (23, 24)
-
-
25. A method in a computer system for identifying a subset of a plurality of data records that satisfy a search criterion, each of the plurality of data records having a record identifier and containing data organized into one or more fields, the search criteria relating to a selected one of the fields, the method comprising the steps of:
-
receiving a search query specifying the search criterion; before receiving the search query, providing an index identifying the contents of the selected field of at least a portion of the data records, the index containing, for each data record, a data record identifier usable to access the data record and an indication of the contents of the selected field comprising both compressed field contents for the field and decompression information for decompressing the compressed field contents, the beginning of the data record identifiers in the index being separated by a uniform offset, the beginning of the indications of the contents of the selected field being separated by a non-uniform offset; and after receiving the search query, using the index to select the data record identifiers of the data records that satisfy the search criterion without accessing the data records. - View Dependent Claims (26)
-
-
27. A computer-readable medium whose contents cause a computer system to identify a subset of a plurality of data records that satisfy a search criterion, each of the plurality of data records having a record identifier and containing data organized into one or more fields, the search criteria relating to a selected one of the fields, by performing the steps of:
-
receiving a search query specifying the search criterion; before receiving the search query, providing an index identifying the contents of the selected field of at least a portion of the data records, the index containing, for each data record, a data record identifier usable to access the data record and an indication of the contents of the selected field comprising both compressed field contents for the field and decompression information for decompressing the compressed field contents, the beginning of the data record identifiers in the index being separated by a uniform offset, the beginning of the indications of the contents of the selected field being separated by a non-uniform offset; and after receiving the search query, using the index to select the data record identifiers of the data records that satisfy the search criterion without accessing the data records. - View Dependent Claims (28)
-
Specification