Systems and methods for detecting missing data in query results

US 9,501,521 B2
Filed: 07/25/2013
Issued: 11/22/2016
Est. Priority Date: 07/25/2013
Status: Active Grant

First Claim

Patent Images

1. A computer system comprising:

at least one processor; and

a memory storing instructions configured to instruct the at least one processor to perform;

receiving a data set for storage;

storing a data subset of the data set at a set of leaf nodes of a plurality of leaf nodes;

storing data accounting information at the set of leaf nodes, wherein the data accounting information tracks data being stored at the set of leaf nodes, wherein the data accounting information includes one or more identifiers that correspond to the data subset being stored;

receiving an initial query configured to be performed on the data set;

submitting a first query on the data set to the set of leaf nodes, wherein the first query is based on the initial query;

receiving a respective first result and a respective second result in response to the first query from at least a portion of leaf nodes in the set of leaf nodes, wherein the second result is based on a second query performed on the data accounting information determined based at least in part on one or more respective identifiers that correspond to the data included in the first result, the respective second result providing data accounting info that indicates the amount of data stored in the set of leaf node based on the identifier;

aggregating the respective first results that were received from the portion of leaf nodes to determine a final result;

aggregating the respective second results that were received from the set of leaf nodes; and

determining an estimate for an amount of data missing based on the aggregated second result and the final result the portion of leaf nodes to determine an estimate for an amount of data missing from the final result.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques provided herein allow for estimating data missing in query results provided in response to queries performed on data managed by a data management system. In the event that one or more leaf nodes are unable or unavailable to process a query, a final query result provided in response to the original query may be missing data that exists on those leaf nodes. A data accounting service monitors what managed data is being stored on the leaf nodes and on what leaf node. The data accounting service can estimate how much data is missing from a final query result when one or more of the leaf nodes are unable or unavailable to process a query.

Citations

17 Claims

1. A computer system comprising:
- at least one processor; and
  
  a memory storing instructions configured to instruct the at least one processor to perform;
  
  receiving a data set for storage;
  
  storing a data subset of the data set at a set of leaf nodes of a plurality of leaf nodes;
  
  storing data accounting information at the set of leaf nodes, wherein the data accounting information tracks data being stored at the set of leaf nodes, wherein the data accounting information includes one or more identifiers that correspond to the data subset being stored;
  
  receiving an initial query configured to be performed on the data set;
  
  submitting a first query on the data set to the set of leaf nodes, wherein the first query is based on the initial query;
  
  receiving a respective first result and a respective second result in response to the first query from at least a portion of leaf nodes in the set of leaf nodes, wherein the second result is based on a second query performed on the data accounting information determined based at least in part on one or more respective identifiers that correspond to the data included in the first result, the respective second result providing data accounting info that indicates the amount of data stored in the set of leaf node based on the identifier;
  
  aggregating the respective first results that were received from the portion of leaf nodes to determine a final result;
  
  aggregating the respective second results that were received from the set of leaf nodes; and
  
  determining an estimate for an amount of data missing based on the aggregated second result and the final result the portion of leaf nodes to determine an estimate for an amount of data missing from the final result.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The computer system of claim 1, wherein a first result includes one or more query results.
  - 3. The computer system of claim 1, wherein the data accounting information comprises storage information.
  - 4. The computer system of claim 3, further comprising receiving the storage information from the at least one leaf node.
  - 5. The computer system of claim 1, further comprising generating the data accounting information.
  - 6. The computer system of claim 1, wherein the storing the data accounting information comprises storing the data accounting information at one or more data accounting nodes that are separate from the plurality of leaf nodes.
  - 7. The computer system of claim 6, wherein the performing the second query on the data accounting information comprises:
    - submitting the second query to the data accounting nodes; and
      
      receiving the second result from the data accounting nodes.
  - 8. The computer system of claim 6, wherein at least a portion of the data accounting nodes are synchronized.
  - 9. The computer system of claim 1, wherein the data accounting information further comprises an identifier for a table associated with a data element of the data subset.
  - 10. The computer system of claim 1, wherein the data accounting information further comprises a timestamp associated with a data element of the data subset.
  - 11. The computer system of claim 1, wherein the data set comprises log data associated with operation of a social networking system.
  - 12. The computer system of claim 11, wherein the log data comprises one or more time-stamped data elements regarding user activity occurring on the social networking system.
  - 13. The computer system of claim 1, wherein determining an estimate for an amount of data missing from the first result based at least in part on the second result further comprises:
    - determining an estimated number of bytes missing from the first result.
  - 14. The computer system of claim 1, wherein determining an estimate for an amount of data missing from the first result based at least in part on the second result further comprises:
    - determining an estimated percentage of data missing from the first result.
  - 15. The computer system of claim 1, wherein determining an estimate for an amount of data missing from the first result based at least in part on the second result further comprises:
    - determining an estimated number of data elements missing from the first result.

16. A non-transitory computer-storage medium storing computer-executable instructions that, when executed, cause a computer system to perform a computer-implemented method comprising:
- receiving a data set for storage;
  
  storing a data subset of the data set at a set of leaf nodes of a plurality of leaf nodes;
  
  storing data accounting information at the set of leaf nodes, wherein the data accounting information tracks data being stored at the set of leaf nodes, wherein the data accounting information includes one or more identifiers that correspond to the data subset being stored;
  
  receiving an initial query configured to be performed on the data set;
  
  submitting a first query on the data set to the set of leaf nodes, wherein the first query is based on the initial query;
  
  receiving a respective first result and a respective second result in response to the first query from at least a portion of leaf nodes in the set of leaf nodes, wherein the second result is based on a second query performed on the data accounting information determined based at least in part on one or more respective identifiers that correspond to the data included in the first result, the respective second result providing data accounting info that indicates the amount of data stored in the set of leaf node based on the identifier;
  
  aggregating the respective first results that were received from the portion of leaf nodes to determine a final result;
  
  aggregating the respective second results that were received from the set of leaf nodes; and
  
  determining an estimate for an amount of data missing based on the aggregated second result and the final result the portion of leaf nodes to determine an estimate for an amount of data missing from the final result.

17. A computer implementing method comprising:
- receiving, by a computer system, a data set for storage;
  
  storing, by the computer system, a data subset of the data set at a set of leaf nodes of a plurality of leaf nodes;
  
  storing, by the computer system, data accounting information at the set of leaf nodes, wherein the data accounting information tracks data being stored at the set of leaf nodes, wherein the data accounting information includes one or more identifiers that correspond to the data subset being stored;
  
  receiving, by the computer system, an initial query configured to be performed on the data set;
  
  submitting, by the computer system, a first query on the data set to the set of leaf nodes, wherein the first query is based on the initial query;
  
  receiving, by the computer system, a respective first result and a respective second result in response to the first query from at least a portion of leaf nodes in the set of leaf nodes, wherein the second result is based on a second query performed on the data accounting information determined based at least in part on one or more respective identifiers that correspond to the data included in the first result, the respective second result providing data accounting info that indicates the amount of data stored in the set of leaf node based on the identifier;
  
  aggregating, by the computer system, the respective first results that were received from the portion of leaf nodes to determine a final result;
  
  aggregating, by the computer system, the respective second results that were received from the set of leaf nodes; and
  
  determining, by the computer system, an estimate for an amount of data missing based on the aggregated second result and the final result the portion of leaf nodes to determine an estimate for an amount of data missing from the final result.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Original Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Inventors
Barykin, Oleksandr, Metzler, Josh
Primary Examiner(s)
Truong, Dennis

Application Number

US13/951,438
Publication Number

US 20150032726A1
Time in Patent Office

1,216 Days
Field of Search

707/706, 707/610, 707/722, 714/25, 717/124
US Class Current

1/1
CPC Class Codes

G06F 16/245 Query processing

Systems and methods for detecting missing data in query results

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for detecting missing data in query results

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links