METHODS FOR ENHANCING RAPID DATA ANALYSIS
First Claim
Patent Images
1. A method for enhancing rapid data analysis comprising:
- receiving a set of data;
storing the set of data in a first set of data shards;
wherein the first set of data shards is sharded, according to a set of shard partitioning rules, by a first field; and
identifying anomalous data from the set of data, wherein the set of anomalous data is a subset of the set of data, by;
monitoring a range of shard indices associated with a first shard of the first set of data shards;
detecting that the range of shard indices is smaller than an expected range by a threshold value; and
identifying data of the first shard as anomalous data.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for enhancing rapid data analysis includes receiving a set of data; storing the set of data in a first set of data shards sharded by a first field; and identifying anomalous data from the set of data by monitoring a range of shard indices associated with a first shard of the first set of data shards, detecting that the range of shard indices is smaller than an expected range by a threshold value, and identifying data of the first shard as anomalous data.
-
Citations
21 Claims
-
1. A method for enhancing rapid data analysis comprising:
-
receiving a set of data; storing the set of data in a first set of data shards;
wherein the first set of data shards is sharded, according to a set of shard partitioning rules, by a first field; andidentifying anomalous data from the set of data, wherein the set of anomalous data is a subset of the set of data, by; monitoring a range of shard indices associated with a first shard of the first set of data shards; detecting that the range of shard indices is smaller than an expected range by a threshold value; and identifying data of the first shard as anomalous data. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for enhancing rapid data analysis comprising:
-
receiving a set of data; storing the set of data as data shards;
wherein the data shards are sharded, according to a set of shard partitioning rules, by a first field;receiving and interpreting a query;
wherein interpreting the query comprises identifying a first set of the data shards containing data relevant to the query;collecting a first data sample from the first set of the data shards; identifying anomalous data in the first data sample; calculating a result to the query based on analysis of the first data sample;
wherein the analysis of the first data sample ignores the anomalous data. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method for enhancing rapid data analysis comprising:
-
receiving a set of data; storing the set of data in a first set of data shards;
wherein the first set of data shards is sharded, according to a set of shard partitioning rules, by a first field; andidentifying anomalous data from the set of data, wherein the set of anomalous data is a subset of the set of data, by; analyzing the set of data to determine a statistical distribution of data element counts, each data element count associated with a value of a range of values of a second field of the set of data, across the range of values; determining that a value of the range of values is a statistical outlier based on the number of data element counts associated with the value; and identifying data associated with the value as anomalous data. - View Dependent Claims (17, 18, 19, 20, 21)
-
Specification