Source query caching as fault prevention for federated queries
First Claim
1. A method of processing a federated query, comprising:
- receiving an indication that a first set of source queries embedded in a first federated query failed to execute successfully, each source query specifying a set of source tables stored in a target autonomous data source of a plurality of target autonomous data sources belonging to a federation, at least two source queries of the first set of source queries being specific to different data sources, and the first federated query being sent from a client;
storing the first set of source queries and metadata associated with the first set of source queries into a data structure, the first set of source queries including a first source query;
for each source query of the first set of source queries that is determined to be stored in the data structure, updating metadata of each entry corresponding to the respective source query stored in the data structure, the metadata including a number of times the respective source query has failed and further including a timestamp of the respective failure;
selecting a second set of source queries from the data structure, the second set of source queries including the first source query and having a higher probability of failure than a third set of source queries stored in the data structure;
submitting the second set of source queries to one or more target data sources;
for each result of a source query of the second set of source queries, storing the result in a cache external to the federation;
receiving an indication that the first source query embedded in a second federated query failed to execute successfully, the second federated query including a second source query, and the first source query specifying a first set of source tables stored in a first target autonomous data source;
generating a third source query by replacing a first set of source table names included in the first source query with a second set of source table names that identifies a second set of source tables, the second set of source tables being stored in the cache and storing data cached from the first set of source tables, and the first set of source table names being different from the second set of source table names;
generating a third federated query including the second and third source queries;
submitting each source query embedded in the third federated query to one or more data sources, the third source query specifying the second set of source tables and being submitted to the cache; and
sending a combined result set responsive to the third federated query to the client, the combined result set including a first result set responsive to the second source query and further including a cached result set responsive to the third source query, and the cached result set being stored in the cache.
1 Assignment
0 Petitions
Accused Products
Abstract
An example system for processing a federated query includes a query proxy that receives a federated query including a plurality of source queries and receives an indication that a failed set of one or more source queries failed to execute successfully. Each source query is specific to an autonomous data source belonging to a federation. The system also includes a data federation engine that identifies a plurality of autonomous data sources to which to send the plurality of source queries. The plurality of autonomous data sources belong to the federation. The system further includes a query fail analyzer that updates a data structure to reflect the unsuccessful execution of one or more source queries of the failed set.
54 Citations
18 Claims
-
1. A method of processing a federated query, comprising:
-
receiving an indication that a first set of source queries embedded in a first federated query failed to execute successfully, each source query specifying a set of source tables stored in a target autonomous data source of a plurality of target autonomous data sources belonging to a federation, at least two source queries of the first set of source queries being specific to different data sources, and the first federated query being sent from a client; storing the first set of source queries and metadata associated with the first set of source queries into a data structure, the first set of source queries including a first source query; for each source query of the first set of source queries that is determined to be stored in the data structure, updating metadata of each entry corresponding to the respective source query stored in the data structure, the metadata including a number of times the respective source query has failed and further including a timestamp of the respective failure; selecting a second set of source queries from the data structure, the second set of source queries including the first source query and having a higher probability of failure than a third set of source queries stored in the data structure; submitting the second set of source queries to one or more target data sources; for each result of a source query of the second set of source queries, storing the result in a cache external to the federation; receiving an indication that the first source query embedded in a second federated query failed to execute successfully, the second federated query including a second source query, and the first source query specifying a first set of source tables stored in a first target autonomous data source; generating a third source query by replacing a first set of source table names included in the first source query with a second set of source table names that identifies a second set of source tables, the second set of source tables being stored in the cache and storing data cached from the first set of source tables, and the first set of source table names being different from the second set of source table names; generating a third federated query including the second and third source queries; submitting each source query embedded in the third federated query to one or more data sources, the third source query specifying the second set of source tables and being submitted to the cache; and sending a combined result set responsive to the third federated query to the client, the combined result set including a first result set responsive to the second source query and further including a cached result set responsive to the third source query, and the cached result set being stored in the cache. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for processing a federated query, comprising:
-
a non-transitory memory; and one or more hardware processors coupled to the non-transitory memory and configured to read instructions from the non-transitory memory to cause the system to perform operations comprising; receiving an indication that a first set of source queries embedded in a first federated query failed to execute successfully, each source query specifying a set of source tables stored in a target autonomous data source of a plurality of target autonomous data sources belonging to a federation, at least two source queries of the first set of source queries being specific to different data sources, and the first federated query being sent from a client; storing the first set of source queries and metadata associated with the first set of source queries into a data structure, the first set of source queries including a first source query; for each source query of the first set of source queries that is determined to be stored in the data structure, updating metadata of each entry corresponding to the respective source query stored in the data structure, the metadata including a number of times the respective source query has failed and further including a timestamp of the respective failure; selecting a second set of source queries from the data structure, the second set of source queries including the first source query and having a higher probability of failure than a third set of source queries stored in the data structure; submitting the second set of source queries to one or more target data sources; for each result of a source query of the second set of source queries, storing the result in a cache external to the federation; receiving an indication that the first source query embedded in a second federated query failed to execute successfully, the second federated query including a second source query, and the first source query specifying a first set of source tables stored in a first target autonomous data source; generating a third source query by replacing a first set of source table names included in the first source query with a second set of source table names that identifies a second set of source tables, the second set of source tables being stored in the cache and storing data cached from the first set of source tables, and the first set of source table names being different from the second set of source table names; generating a third federated query including the second and third source queries; submitting each source query embedded in the third federated query to one or more data sources, the third source query specifying the second set of source tables and being submitted to the cache; and sending a combined result set responsive to the third federated query to the client, the combined result set including a first result set responsive to the second source query and further including a cached result set responsive to the third source query, and the cached result set being stored in the cache. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A non-transitory machine-readable medium comprising a plurality of machine-readable instructions that when executed by one or more processors is adapted to cause the one or more processors to perform a method comprising:
-
receiving an indication that a first set of source queries embedded in a first federated query failed to execute successfully, each source query specifying a set of source tables stored in a target autonomous data source of a plurality of target autonomous data sources belonging to a federation, at least two source queries of the first set of source queries being specific to different data sources, and the first federated query being sent from a client; storing the first set of source queries and metadata associated with the first set of source queries into a data structure, the first set of source queries including a first source query; for each source query of the first set of source queries that is determined to be stored in the data structure, updating metadata of each entry corresponding to the respective source query stored in the data structure, the metadata including a number of times the respective source query has failed and further including a timestamp of the respective failure; selecting a second set of source queries from the data structure, the second set of source queries including the first source query and having a higher probability of failure than a third set of source queries stored in the data structure; submitting the second set of source queries to one or more target data sources; for each result of a source query of the second set of source queries, storing the result in a cache external to the federation; receiving an indication that the first source query embedded in a second federated query failed to execute successfully, the second federated query including a second source query, and the first source query specifying a first set of source tables stored in a first target autonomous data source; generating a third source query by replacing a first set of source table names included in the first source query with a second set of source table names that identifies a second set of source tables, the second set of source tables being stored in the cache and storing data cached from the first set of source tables, and the first set of source table names being different from the second set of source table names; generating a third federated query including the second and third source queries; submitting each source query embedded in the third federated query to one or more data sources, the third source query specifying the second set of source tables and being submitted to the cache; and sending a combined result set responsive to the third federated query to the client, the combined result set including a first result set responsive to the second source query and further including a cached result set responsive to the third source query, and the cached result set being stored in the cache.
-
Specification