VERIFYING ANALYTICS RESULTS
First Claim
Patent Images
1. A method performed by one or more computers, the method comprising:
- receiving a request for validation of an analytics process configured to execute on a distributed computing system comprising a plurality of physical computers;
receiving a raw subset of a dataset;
processing the raw subset of the dataset to generate a first output subset, including executing a test script specifying an expression of the analytics process and supplying the raw subset as an input to the test script;
receiving, from the distributed computing system that is executing the analytics process on the dataset, a second output subset that is a portion of an output of the analytics process executing on the distributed computing system, the second output subset resulting from the distributed computing system processing the raw subset;
comparing the first output subset to the second output subset; and
outputting, prior to the distributed computing system completing execution of the analytics process on the whole dataset, a validation result based on comparing the first output subset to the second output subset.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for validating analytics results. One of the methods includes processing a subset of a dataset and polling an analytics system for a corresponding output subset and comparing the two subsets to validate the analytics system.
12 Citations
24 Claims
-
1. A method performed by one or more computers, the method comprising:
-
receiving a request for validation of an analytics process configured to execute on a distributed computing system comprising a plurality of physical computers; receiving a raw subset of a dataset; processing the raw subset of the dataset to generate a first output subset, including executing a test script specifying an expression of the analytics process and supplying the raw subset as an input to the test script; receiving, from the distributed computing system that is executing the analytics process on the dataset, a second output subset that is a portion of an output of the analytics process executing on the distributed computing system, the second output subset resulting from the distributed computing system processing the raw subset; comparing the first output subset to the second output subset; and outputting, prior to the distributed computing system completing execution of the analytics process on the whole dataset, a validation result based on comparing the first output subset to the second output subset. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising one or more physical computers configured to perform operations comprising:
-
receiving a request for validation of an analytics process configured to execute on a distributed computing system comprising a plurality of physical computers; receiving a raw subset of a dataset; processing the raw subset of the dataset to generate a first output subset, including executing a test script specifying an expression of the analytics process and supplying the raw subset as an input to the test script; receiving, from the distributed computing system that is executing the analytics process on the dataset, a second output subset that is a portion of an output of the analytics process executing on the distributed computing system, the second output subset resulting from the distributed computing system processing the raw subset; comparing the first output subset to the second output subset; and outputting, prior to the distributed computing system completing execution of the analytics process on the whole dataset, a validation result based on comparing the first output subset to the second output subset. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by a distributed computing system of a plurality of physical computers causes the distributed computing system to perform operations comprising:
-
receiving a request for validation of an analytics process configured to execute on a distributed computing system comprising a plurality of physical computers; receiving a raw subset of a dataset; processing the raw subset of the dataset to generate a first output subset, including executing a test script specifying an expression of the analytics process and supplying the raw subset as an input to the test script; receiving, from the distributed computing system that is executing the analytics process on the dataset, a second output subset that is a portion of an output of the analytics process executing on the distributed computing system, the second output subset resulting from the distributed computing system processing the raw subset; comparing the first output subset to the second output subset; and outputting, prior to the distributed computing system completing execution of the analytics process on the whole dataset, a validation result based on comparing the first output subset to the second output subset. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification