Preventing staleness in query results when using asynchronously updated indexes
First Claim
1. A computer implemented method, the method comprising:
- receiving, by one or more processors, an asynchronously updated index corresponding to a main dataset in a database system;
receiving, by the one or more processors, time-sequenced log data of modifications made to the main dataset after a cutoff time of a last asynchronous index update, wherein the time-sequenced log data is read once by the database system for joining the main dataset with the time-sequenced log data and filtering out updated dataset entries and deleted dataset entries from the asynchronously updated index;
receiving, by the one or more processors, from an end user, a proximity-based query directed to the main dataset;
joining, by the one or more processors, the main dataset with the time-sequenced log data resulting in a first intermediate result comprising a first one or more entries of the main dataset made after the cutoff time;
processing, by the one or more processors, the proximity-based query to determine a second one or more entries satisfying the proximity-based query by emulating a function of the last asynchronous index update resulting in a second intermediate result, wherein the second intermediate result includes updated and deleted entries of a base table that are retrieved by the proximity-based query using an outdated asynchronously updated index, wherein the processing the proximity-based query further comprises receiving a staleness acceptability criterion; and
determining, based at least in part on the staleness acceptability criterion, that one or more query results are acceptable;
filtering out, by the one or more processors, the updated dataset entries from the asynchronously updated index using the time-sequenced log data to generate a lookup table as index table;
processing, by the one or more processors, the proximity-based query against the main dataset using the lookup table resulting in a third intermediate result; and
building, by the one or more processors, a union of the second intermediate result and the third intermediate result, to generate a final result of the proximity-based query.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, computer program product, and computer system for optimizing query processing is provided. An asynchronously updated index is provided for a main dataset. A time-sequences log of data modifications to the main dataset is provided. A query of the main dataset is received. The main dataset is joined with the time-sequenced log data resulting in a first intermediate result. The query is processed by keeping one or more entries satisfying the query by emulating a function of the asynchronously updated index resulting in a second intermediate result. Updated, deleted dataset entries are deleted from the asynchronously updated index. The query is processed resulting in a third intermediate result. A union of the second intermediate result and third intermediate result is built defining a final result.
-
Citations
4 Claims
-
1. A computer implemented method, the method comprising:
-
receiving, by one or more processors, an asynchronously updated index corresponding to a main dataset in a database system; receiving, by the one or more processors, time-sequenced log data of modifications made to the main dataset after a cutoff time of a last asynchronous index update, wherein the time-sequenced log data is read once by the database system for joining the main dataset with the time-sequenced log data and filtering out updated dataset entries and deleted dataset entries from the asynchronously updated index; receiving, by the one or more processors, from an end user, a proximity-based query directed to the main dataset; joining, by the one or more processors, the main dataset with the time-sequenced log data resulting in a first intermediate result comprising a first one or more entries of the main dataset made after the cutoff time; processing, by the one or more processors, the proximity-based query to determine a second one or more entries satisfying the proximity-based query by emulating a function of the last asynchronous index update resulting in a second intermediate result, wherein the second intermediate result includes updated and deleted entries of a base table that are retrieved by the proximity-based query using an outdated asynchronously updated index, wherein the processing the proximity-based query further comprises receiving a staleness acceptability criterion; and
determining, based at least in part on the staleness acceptability criterion, that one or more query results are acceptable;filtering out, by the one or more processors, the updated dataset entries from the asynchronously updated index using the time-sequenced log data to generate a lookup table as index table; processing, by the one or more processors, the proximity-based query against the main dataset using the lookup table resulting in a third intermediate result; and building, by the one or more processors, a union of the second intermediate result and the third intermediate result, to generate a final result of the proximity-based query. - View Dependent Claims (2, 3, 4)
-
Specification