Systems and methods for extracting signal differences from sparse data sets
First Claim
1. A method for calculating signal differences from a sparse data set signal, comprising:
- generating, by a signal aggregator executed by a measurement engine of a computing device, a first data set comprising a first number of null values and a second number of non-null values, the first number at least an order of magnitude larger than the second number;
normalizing, by a signal correlator executed by the measurement engine, each non-null value of the second number of non-null values relative to a total of the non-null values of the first data set;
separating, by the signal correlator, the first data set into a plurality of subsets comprising one or more null values and one or more non-null values; and
for at least one subset of the plurality of subsets;
calculating a difference value, by the signal correlator, between the at least one subset and a corresponding subset of a second plurality of subsets generated from a second data set, the difference value indicating a level that first characteristics of the at least one subset differs from second characteristics of the corresponding subset; and
writing an identification of the subset as including a signal to a memory device, responsive to the difference value exceeding a threshold.
2 Assignments
0 Petitions
Accused Products
Abstract
The present disclosure provides systems and methods for extracting signal differences from sparse data sets. Data sets for comparison, including a control data set and one or more test data sets, may be normalized and separated into subsets or groupings via a MapReduce function. Normalization may account for large values present in both control and test data sets that would otherwise reduce the significance of smaller correlated values, creating false negatives. The MapReduce may provide identification and analysis of correlations between sets via related entities. Accordingly, via the systems and methods discussed herein, a computing device may extract statistically significant differences between data sets, without requiring extensive entity by entity comparison (or entity to every entity comparison, which, for a data set of millions of entities, may be too computationally expensive or take too long), reducing memory footprint and processor requirements.
9 Citations
20 Claims
-
1. A method for calculating signal differences from a sparse data set signal, comprising:
-
generating, by a signal aggregator executed by a measurement engine of a computing device, a first data set comprising a first number of null values and a second number of non-null values, the first number at least an order of magnitude larger than the second number; normalizing, by a signal correlator executed by the measurement engine, each non-null value of the second number of non-null values relative to a total of the non-null values of the first data set; separating, by the signal correlator, the first data set into a plurality of subsets comprising one or more null values and one or more non-null values; and for at least one subset of the plurality of subsets; calculating a difference value, by the signal correlator, between the at least one subset and a corresponding subset of a second plurality of subsets generated from a second data set, the difference value indicating a level that first characteristics of the at least one subset differs from second characteristics of the corresponding subset; and writing an identification of the subset as including a signal to a memory device, responsive to the difference value exceeding a threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for calculating signal differences from a sparse data set signal, comprising:
-
a processor executing a measurement engine comprising a signal aggregator and a signal correlator, and a memory device storing a first data set and a second data set; wherein the signal aggregator is configured to generate the first data set comprising a first number of null values and a second number of non-null values, the first number at least an order of magnitude larger than the second number; and wherein the signal correlator is configured to; normalize each non-null value of the second number of non-null values relative to a total of the non-null values of the first data set, separate the first data set into a plurality of subsets comprising one or more null values and one or more non-null values, and for at least one subset of the plurality of subsets; calculate a difference value between the at least one subset and a corresponding subset of a second plurality of subsets generated from a second data set, the difference value indicating a level that first characteristics of the at least one subset differs from second characteristics of the corresponding subset; and write an identification of the subset as including a signal to the memory device, responsive to the difference value exceeding a threshold. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory computer-readable medium storing instructions that, when executed by a processor, cause the processor to:
-
generate a first data set comprising a first number of null values and a second number of non-null values, the first number at least an order of magnitude larger than the second number; normalize each non-null value of the second number of non-null values relative to a total of the non-null values of the first data set; separate the first data set into a plurality of subsets comprising one or more null values and one or more non-null values; for at least one subset of the plurality of subsets; calculate a difference value between the at least one subset and a corresponding subset of a second plurality of subsets generated from a second data set, the difference value indicating a level that first characteristics of the at least one subset differs from second characteristics of the corresponding subset; and write an identification of the subset as including a signal to a memory device, responsive to the difference value exceeding a threshold; and for at least one second subset of the plurality of subsets; calculate a second difference value between the second subset and a second corresponding subset of the second plurality of subsets generated from the second data set, the second difference value indicating a second level that third characteristics of the at least one second subset differs from fourth characteristics of the second corresponding subset; and write an identification of the second subset as not including a second signal to the memory device, responsive to the second difference value not exceeding the threshold. - View Dependent Claims (18, 19, 20)
-
Specification