Cloud process for rapid data investigation and data integrity analysis
First Claim
1. A method comprising:
- receiving summary statistics computed by at least executing one or more analytical processes on a dataset stored in parts across a set of memory based compute nodes, each compute node finding partial statistics of a data part stored on the respective compute node, the partial statistics representative of a respective data part;
storing the summary statistics in a random access memory associated with a server computer, the random access memory being accessible by at least one of the compute nodes, the summary statistics being a combination of the partial statistics and representative of a full dataset;
identifying, for pre-model building data understanding, outlier data by comparing subsets of data in the dataset, the identified outlier data accessible to a predictive model;
generating a graphical representation of at least some summary statistics stored in the random access memory; and
formatting the graphical representation of at least some summary statistics for transmission to and display by one or more client computers.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for rapid data investigation and data integrity analysis is disclosed. A data set is received by a server computer from one or more client computers connected with the server computer via a communications network, and the data set is stored in a distributed storage memory. One or more analytical processes are executed on the data set from the distributed storage memory to generate statistics based on each of the analytical processes, and the statistics are stored in a random access memory, the random access memory being accessible by one or more compute nodes, which generate a graphical representation of at least some statistics stored in the random access memory. The graphical representation of at least some statistics is then formatted for transmission to and display by the one or more client computers.
-
Citations
20 Claims
-
1. A method comprising:
-
receiving summary statistics computed by at least executing one or more analytical processes on a dataset stored in parts across a set of memory based compute nodes, each compute node finding partial statistics of a data part stored on the respective compute node, the partial statistics representative of a respective data part; storing the summary statistics in a random access memory associated with a server computer, the random access memory being accessible by at least one of the compute nodes, the summary statistics being a combination of the partial statistics and representative of a full dataset; identifying, for pre-model building data understanding, outlier data by comparing subsets of data in the dataset, the identified outlier data accessible to a predictive model; generating a graphical representation of at least some summary statistics stored in the random access memory; and formatting the graphical representation of at least some summary statistics for transmission to and display by one or more client computers. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
at least one data processor and memory storing instructions, which when executed, cause the at least one data processor to perform operations comprising; receiving summary statistics computed by at least executing one or more analytical processes on a dataset stored in parts across a set of memory based compute nodes, each compute node finding partial statistics of a data part stored on the respective compute node, the partial statistics representative of a respective data part; storing the summary statistics in a random access memory associated with a server computer, the random access memory being accessible by at least one of the compute nodes, the summary statistics being a combination of the partial statistics and representative of a full dataset; identifying, for pre-model building data understanding, outlier data by comparing subsets of data in the dataset, the identified outlier data accessible to a predictive model; generating a graphical representation of at least some summary statistics stored in the random access memory; and formatting the graphical representation of at least some summary statistics for transmission to and display by one or more client computers. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A computer program product comprising a non-transitory machine-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform operations comprising:
receiving summary statistics computed by at least executing one or more analytical processes on a dataset stored in parts across a set of memory based compute nodes, each compute node finding partial statistics of a data part stored on the respective compute node, the partial statistics representative of a respective data part; storing the summary statistics in a random access memory associated with a server computer, the random access memory being accessible by at least one of the compute nodes, the summary statistics being a combination of the partial statistics and representative of the full dataset; identifying, for pre-model building data understanding, outlier data by comparing subsets of data in the dataset, the identified outlier data accessible to a predictive model; generating, a graphical representation of at least some summary statistics stored in the random access memory; and formatting, the graphical representation of at least some summary statistics for transmission to and display by one or more client computers. - View Dependent Claims (16, 17, 18, 19, 20)
Specification