Systems and methods for detecting missing data in query results
First Claim
Patent Images
1. A computer system comprising:
- at least one processor; and
a memory storing instructions configured to instruct the at least one processor to perform;
receiving a data set for storage;
storing a data subset of the data set at a set of leaf nodes of a plurality of leaf nodes;
storing data accounting information at the set of leaf nodes, wherein the data accounting information tracks data being stored at the set of leaf nodes, wherein the data accounting information includes one or more identifiers that correspond to the data subset being stored;
receiving an initial query configured to be performed on the data set;
submitting a first query on the data set to the set of leaf nodes, wherein the first query is based on the initial query;
receiving a respective first result and a respective second result in response to the first query from at least a portion of leaf nodes in the set of leaf nodes, wherein the second result is based on a second query performed on the data accounting information determined based at least in part on one or more respective identifiers that correspond to the data included in the first result, the respective second result providing data accounting info that indicates the amount of data stored in the set of leaf node based on the identifier;
aggregating the respective first results that were received from the portion of leaf nodes to determine a final result;
aggregating the respective second results that were received from the set of leaf nodes; and
determining an estimate for an amount of data missing based on the aggregated second result and the final result the portion of leaf nodes to determine an estimate for an amount of data missing from the final result.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques provided herein allow for estimating data missing in query results provided in response to queries performed on data managed by a data management system. In the event that one or more leaf nodes are unable or unavailable to process a query, a final query result provided in response to the original query may be missing data that exists on those leaf nodes. A data accounting service monitors what managed data is being stored on the leaf nodes and on what leaf node. The data accounting service can estimate how much data is missing from a final query result when one or more of the leaf nodes are unable or unavailable to process a query.
-
Citations
17 Claims
-
1. A computer system comprising:
-
at least one processor; and a memory storing instructions configured to instruct the at least one processor to perform; receiving a data set for storage; storing a data subset of the data set at a set of leaf nodes of a plurality of leaf nodes; storing data accounting information at the set of leaf nodes, wherein the data accounting information tracks data being stored at the set of leaf nodes, wherein the data accounting information includes one or more identifiers that correspond to the data subset being stored; receiving an initial query configured to be performed on the data set; submitting a first query on the data set to the set of leaf nodes, wherein the first query is based on the initial query; receiving a respective first result and a respective second result in response to the first query from at least a portion of leaf nodes in the set of leaf nodes, wherein the second result is based on a second query performed on the data accounting information determined based at least in part on one or more respective identifiers that correspond to the data included in the first result, the respective second result providing data accounting info that indicates the amount of data stored in the set of leaf node based on the identifier; aggregating the respective first results that were received from the portion of leaf nodes to determine a final result; aggregating the respective second results that were received from the set of leaf nodes; and determining an estimate for an amount of data missing based on the aggregated second result and the final result the portion of leaf nodes to determine an estimate for an amount of data missing from the final result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A non-transitory computer-storage medium storing computer-executable instructions that, when executed, cause a computer system to perform a computer-implemented method comprising:
-
receiving a data set for storage; storing a data subset of the data set at a set of leaf nodes of a plurality of leaf nodes; storing data accounting information at the set of leaf nodes, wherein the data accounting information tracks data being stored at the set of leaf nodes, wherein the data accounting information includes one or more identifiers that correspond to the data subset being stored; receiving an initial query configured to be performed on the data set; submitting a first query on the data set to the set of leaf nodes, wherein the first query is based on the initial query; receiving a respective first result and a respective second result in response to the first query from at least a portion of leaf nodes in the set of leaf nodes, wherein the second result is based on a second query performed on the data accounting information determined based at least in part on one or more respective identifiers that correspond to the data included in the first result, the respective second result providing data accounting info that indicates the amount of data stored in the set of leaf node based on the identifier; aggregating the respective first results that were received from the portion of leaf nodes to determine a final result; aggregating the respective second results that were received from the set of leaf nodes; and determining an estimate for an amount of data missing based on the aggregated second result and the final result the portion of leaf nodes to determine an estimate for an amount of data missing from the final result.
-
-
17. A computer implementing method comprising:
-
receiving, by a computer system, a data set for storage; storing, by the computer system, a data subset of the data set at a set of leaf nodes of a plurality of leaf nodes; storing, by the computer system, data accounting information at the set of leaf nodes, wherein the data accounting information tracks data being stored at the set of leaf nodes, wherein the data accounting information includes one or more identifiers that correspond to the data subset being stored; receiving, by the computer system, an initial query configured to be performed on the data set; submitting, by the computer system, a first query on the data set to the set of leaf nodes, wherein the first query is based on the initial query; receiving, by the computer system, a respective first result and a respective second result in response to the first query from at least a portion of leaf nodes in the set of leaf nodes, wherein the second result is based on a second query performed on the data accounting information determined based at least in part on one or more respective identifiers that correspond to the data included in the first result, the respective second result providing data accounting info that indicates the amount of data stored in the set of leaf node based on the identifier; aggregating, by the computer system, the respective first results that were received from the portion of leaf nodes to determine a final result; aggregating, by the computer system, the respective second results that were received from the set of leaf nodes; and determining, by the computer system, an estimate for an amount of data missing based on the aggregated second result and the final result the portion of leaf nodes to determine an estimate for an amount of data missing from the final result.
-
Specification