Preventing staleness in query results when using asynchronously updated indexes
First Claim
1. A computer program product embodied in a computer readable storage medium, wherein the computer readable storage medium is not a transitory signal, the computer program product comprising:
- program instructions that are stored on the computer readable storage medium are executed by a computer system to perform a method comprising;
receiving an asynchronously updated index corresponding to a main dataset in a database system;
receiving time-sequenced log data of modifications made to the main dataset after a cutoff time of a last asynchronous index update, wherein the time-sequenced log data is read once by the database system for joining the main dataset with the time-sequenced log data and filtering out updated dataset entries and deleted dataset entries from the asynchronously updated index;
receiving, from an end user, a proximity-based query directed to the main dataset;
joining the main dataset with the time-sequenced log data resulting in a first intermediate result comprising a first one or more entries of the main dataset made after the cutoff time;
processing the proximity-based query to determine a second one or more entries satisfying the proximity-based query by emulating a function of the last asynchronous index update resulting in a second intermediate result, wherein the second intermediate result includes updated and deleted entries of a base table that are retrieved by the proximity-based query using an outdated asynchronously updated index, wherein the processing the proximity-based query further comprises receiving a staleness acceptability criterion; and
determining, based at least in part on the staleness acceptability criterion, that one or more query results are acceptable;
filtering out the updated dataset entries from the asynchronously updated index using the time-sequenced log data to generate a lookup table as index table;
processing the proximity-based query against the main dataset using the lookup table resulting in a third intermediate result; and
building a union of the second intermediate result and the third intermediate result, to generate a final result of the proximity-based query.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, computer program product, and computer system for optimizing query processing is provided. An asynchronously updated index is provided for a main dataset. A time-sequences log of data modifications to the main dataset is provided. A query of the main dataset is received. The main dataset is joined with the time-sequenced log data resulting in a first intermediate result. The query is processed by keeping one or more entries satisfying the query by emulating a function of the asynchronously updated index resulting in a second intermediate result. Updated, deleted dataset entries are deleted from the asynchronously updated index. The query is processed resulting in a third intermediate result. A union of the second intermediate result and third intermediate result is built defining a final result.
13 Citations
12 Claims
-
1. A computer program product embodied in a computer readable storage medium, wherein the computer readable storage medium is not a transitory signal, the computer program product comprising:
-
program instructions that are stored on the computer readable storage medium are executed by a computer system to perform a method comprising; receiving an asynchronously updated index corresponding to a main dataset in a database system; receiving time-sequenced log data of modifications made to the main dataset after a cutoff time of a last asynchronous index update, wherein the time-sequenced log data is read once by the database system for joining the main dataset with the time-sequenced log data and filtering out updated dataset entries and deleted dataset entries from the asynchronously updated index; receiving, from an end user, a proximity-based query directed to the main dataset; joining the main dataset with the time-sequenced log data resulting in a first intermediate result comprising a first one or more entries of the main dataset made after the cutoff time; processing the proximity-based query to determine a second one or more entries satisfying the proximity-based query by emulating a function of the last asynchronous index update resulting in a second intermediate result, wherein the second intermediate result includes updated and deleted entries of a base table that are retrieved by the proximity-based query using an outdated asynchronously updated index, wherein the processing the proximity-based query further comprises receiving a staleness acceptability criterion; and
determining, based at least in part on the staleness acceptability criterion, that one or more query results are acceptable;filtering out the updated dataset entries from the asynchronously updated index using the time-sequenced log data to generate a lookup table as index table; processing the proximity-based query against the main dataset using the lookup table resulting in a third intermediate result; and building a union of the second intermediate result and the third intermediate result, to generate a final result of the proximity-based query. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer system, the computer system comprising:
-
one or more computer processors; one or more computer readable storage media, wherein the one or more computer readable storage media is not a transitory signal; program instructions that are stored on the computer readable storage media for execution by at least one of the one or more processors, the program instructions comprising instructions to perform a method comprising; receiving an asynchronously updated index corresponding to a main dataset in a database system; receiving time-sequenced log data of modifications made to the main dataset after a cutoff time of a last asynchronous index update, wherein the time-sequenced log data is read once by the database system for joining the main dataset with the time-sequenced log data and filtering out updated dataset entries and deleted dataset entries from the asynchronously updated index; receiving, from an end user, a proximity-based query directed to the main dataset; joining the main dataset with the time-sequenced log data resulting in a first intermediate result comprising a first one or more entries of the main dataset made after the cutoff time; processing the proximity-based query to determine a second one or more entries satisfying the proximity-based query by emulating a function of the last asynchronous index update to generate a second intermediate result, wherein the second intermediate result includes updated and deleted entries of a base table that are retrieved by the proximity-based query using an outdated asynchronously updated index, wherein the processing the proximity-based query further comprises receiving a staleness acceptability criterion; and
determining, based at least in part on the staleness acceptability criterion, that one or more query results are acceptable;filtering out the updated dataset entries from the asynchronously updated index using the time-sequenced log data to generate a lookup table as index table; processing the proximity-based query against the main dataset using the lookup table to generate a third intermediate result; and building a union of the second intermediate result and the third intermediate result, to generate a final result of the proximity-based query. - View Dependent Claims (8, 9, 10, 11, 12)
-
Specification