Global benchmarking and statistical analysis at scale
First Claim
1. A method comprising:
- receiving results of intermediate statistical computations performed on respective ones of a plurality of datasets in respective ones of a plurality of distributed processing nodes configured to communicate over at least one network;
performing at least one global statistical computation based at least in part on the results of the intermediate statistical computations; and
utilizing a result of the global statistical computation to perform one or more benchmarking operations for specified parameters relating to the plurality of datasets;
wherein the distributed processing nodes are associated with respective distinct data zones in which the respective datasets are locally accessible to the respective distributed processing nodes;
wherein the global statistical computation comprises at least one of;
computing a global standard deviation of values for a specified parameter based at least in part on sums of differences of the values for the specified parameter determined for respective ones of the datasets as part of respective ones of the intermediate statistical computations and wherein the intermediate statistical computations determine the sums of differences relative to a global average of values for the specified parameter as determined in another global statistical computation performed in a previous iteration; and
computing a global histogram of values for a specified parameter based at least in part on histogram pair lists of the values for the specified parameter determined for respective ones of the datasets as part of respective ones of the intermediate statistical computations wherein a given one of the histogram pair lists comprises a list of histogram slices with corresponding numbers of items in those histogram slices and wherein the intermediate statistical computations determine the histogram pair lists based at least in part on inputs including a minimum value, a maximum value and a number of histogram slices to be included in the corresponding histogram; and
wherein the method is performed by at least one processing device comprising a processor coupled to a memory.
7 Assignments
0 Petitions
Accused Products
Abstract
An apparatus in one embodiment comprises at least one processing device having a processor coupled to a memory. The processing device is configured to receive results of intermediate statistical computations performed on respective ones of a plurality of datasets in respective ones of a plurality of distributed processing nodes configured to communicate over at least one network. The processing device is further configured to perform at least one global statistical computation based at least in part on the results of the intermediate statistical computations, and to utilize a result of the global statistical computation to perform one or more benchmarking operations for specified parameters relating to the plurality of datasets. The distributed processing nodes are associated with respective distinct data zones in which the respective datasets are locally accessible to the respective distributed processing nodes. At least a subset of the receiving, performing and utilizing are repeated in each of a plurality of iterations.
-
Citations
20 Claims
-
1. A method comprising:
-
receiving results of intermediate statistical computations performed on respective ones of a plurality of datasets in respective ones of a plurality of distributed processing nodes configured to communicate over at least one network; performing at least one global statistical computation based at least in part on the results of the intermediate statistical computations; and utilizing a result of the global statistical computation to perform one or more benchmarking operations for specified parameters relating to the plurality of datasets; wherein the distributed processing nodes are associated with respective distinct data zones in which the respective datasets are locally accessible to the respective distributed processing nodes; wherein the global statistical computation comprises at least one of; computing a global standard deviation of values for a specified parameter based at least in part on sums of differences of the values for the specified parameter determined for respective ones of the datasets as part of respective ones of the intermediate statistical computations and wherein the intermediate statistical computations determine the sums of differences relative to a global average of values for the specified parameter as determined in another global statistical computation performed in a previous iteration; and computing a global histogram of values for a specified parameter based at least in part on histogram pair lists of the values for the specified parameter determined for respective ones of the datasets as part of respective ones of the intermediate statistical computations wherein a given one of the histogram pair lists comprises a list of histogram slices with corresponding numbers of items in those histogram slices and wherein the intermediate statistical computations determine the histogram pair lists based at least in part on inputs including a minimum value, a maximum value and a number of histogram slices to be included in the corresponding histogram; and wherein the method is performed by at least one processing device comprising a processor coupled to a memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 11, 12, 13, 14)
-
-
8. A method comprising:
-
receiving results of intermediate statistical computations performed on respective ones of a plurality of datasets in respective ones of a plurality of distributed processing nodes configured to communicate over at least one network; performing at least one global statistical computation based at least in part on the results of the intermediate statistical computations; and utilizing a result of the global statistical computation to perform one or more benchmarking operations for specified parameters relating to the plurality of datasets; wherein the distributed processing nodes are associated with respective distinct data zones in which the respective datasets are locally accessible to the respective distributed processing nodes; wherein the global statistical computation comprises computing at least one of a global minimum set and a global maximum set of values for a specified parameter based at least in part on at least one of minimum sets and maximum sets of the values for the specified parameter determined for respective ones of the datasets as part of respective ones of the intermediate statistical computations; and wherein the method is performed by at least one processing device comprising a processor coupled to a memory. - View Dependent Claims (9, 10)
-
-
15. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes said at least one processing device:
-
to receive results of intermediate statistical computations performed on respective ones of a plurality of datasets in respective ones of a plurality of distributed processing nodes configured to communicate over at least one network; to perform at least one global statistical computation based at least in part on the results of the intermediate statistical computations; and to utilize a result of the global statistical computation to perform one or more benchmarking operations for specified parameters relating to the plurality of datasets; wherein the distributed processing nodes are associated with respective distinct data zones in which the respective datasets are locally accessible to the respective distributed processing nodes; and wherein the global statistical computation comprises at least one of; computing a global standard deviation of values for a specified parameter based at least in part on sums of differences of the values for the specified parameter determined for respective ones of the datasets as part of respective ones of the intermediate statistical computations and wherein the intermediate statistical computations determine the sums of differences relative to a global average of values for the specified parameter as determined in another global statistical computation performed in a previous iteration; and computing a global histogram of values for a specified parameter based at least in part on histogram pair lists of the values for the specified parameter determined for respective ones of the datasets as part of respective ones of the intermediate statistical computations wherein a given one of the histogram pair lists comprises a list of histogram slices with corresponding numbers of items in those histogram slices and wherein the intermediate statistical computations determine the histogram pair lists based at least in part on inputs including a minimum value, a maximum value and a number of histogram slices to be included in the corresponding histogram. - View Dependent Claims (16, 17)
-
-
18. An apparatus comprising:
-
at least one processing device having a processor coupled to a memory; wherein said at least one processing device is configured; to receive results of intermediate statistical computations performed on respective ones of a plurality of datasets in respective ones of a plurality of distributed processing nodes configured to communicate over at least one network; to perform at least one global statistical computation based at least in part on the results of the intermediate statistical computations; and to utilize a result of the global statistical computation to perform one or more benchmarking operations for specified parameters relating to the plurality of datasets; wherein the distributed processing nodes are associated with respective distinct data zones in which the respective datasets are locally accessible to the respective distributed processing nodes; and wherein the global statistical computation comprises at least one of; computing a global standard deviation of values for a specified parameter based at least in part on sums of differences of the values for the specified parameter determined for respective ones of the datasets as part of respective ones of the intermediate statistical computations and wherein the intermediate statistical computations determine the sums of differences relative to a global average of values for the specified parameter as determined in another global statistical computation performed in a previous iteration; and computing a global histogram of values for a specified parameter based at least in part on histogram pair lists of the values for the specified parameter determined for respective ones of the datasets as part of respective ones of the intermediate statistical computations wherein a given one of the histogram pair lists comprises a list of histogram slices with corresponding numbers of items in those histogram slices and wherein the intermediate statistical computations determine the histogram pair lists based at least in part on inputs including a minimum value, a maximum value and a number of histogram slices to be included in the corresponding histogram. - View Dependent Claims (19, 20)
-
Specification