METHODS FOR ENHANCING RAPID DATA ANALYSIS

US 20160241577A1
Filed: 02/12/2016
Published: 08/18/2016
Est. Priority Date: 02/12/2015
Status: Active Grant

First Claim

Patent Images

1. A method for enhancing rapid data analysis comprising:

receiving a set of data;

storing the set of data in a first set of data shards;

wherein the first set of data shards is sharded, according to a set of shard partitioning rules, by a first field; and

identifying anomalous data from the set of data, wherein the set of anomalous data is a subset of the set of data, by;

monitoring a range of shard indices associated with a first shard of the first set of data shards;

detecting that the range of shard indices is smaller than an expected range by a threshold value; and

identifying data of the first shard as anomalous data.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for enhancing rapid data analysis includes receiving a set of data; storing the set of data in a first set of data shards sharded by a first field; and identifying anomalous data from the set of data by monitoring a range of shard indices associated with a first shard of the first set of data shards, detecting that the range of shard indices is smaller than an expected range by a threshold value, and identifying data of the first shard as anomalous data.

Citations

21 Claims

1. A method for enhancing rapid data analysis comprising:
- receiving a set of data;
  
  storing the set of data in a first set of data shards;
  
  wherein the first set of data shards is sharded, according to a set of shard partitioning rules, by a first field; and
  
  identifying anomalous data from the set of data, wherein the set of anomalous data is a subset of the set of data, by;
  
  monitoring a range of shard indices associated with a first shard of the first set of data shards;
  
  detecting that the range of shard indices is smaller than an expected range by a threshold value; and
  
  identifying data of the first shard as anomalous data.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, further comprising flagging the anomalous data using metadata.
  - 3. The method of claim 1, further comprising restructuring the anomalous data in response to identification of the anomalous data.
  - 4. The method of claim 3, wherein restructuring the anomalous data comprises using a partitioning algorithm to partition the anomalous data into subsets based on values of a second field of the set of data.
  - 5. The method of claim 1, further comprising generating an aggregate of the anomalous data in response to identification of the anomalous data.

6. A method for enhancing rapid data analysis comprising:
- receiving a set of data;
  
  storing the set of data as data shards;
  
  wherein the data shards are sharded, according to a set of shard partitioning rules, by a first field;
  
  receiving and interpreting a query;
  
  wherein interpreting the query comprises identifying a first set of the data shards containing data relevant to the query;
  
  collecting a first data sample from the first set of the data shards;
  
  identifying anomalous data in the first data sample;
  
  calculating a result to the query based on analysis of the first data sample;
  
  wherein the analysis of the first data sample ignores the anomalous data.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 7. The method of claim 6, further comprising restructuring the anomalous data in response to identification of the anomalous data.
  - 8. The method of claim 7, wherein restructuring the anomalous data comprises using a partitioning algorithm to partition the anomalous data into subsets based on values of a second field of the set of data.
  - 9. The method of claim 6, wherein identifying anomalous data comprises identifying anomalous data by, during collection of the first data sample, determining that a data element count associated with a value of a second field of the set of data exceeds a threshold data element count.
  - 10. The method of claim 9, wherein the threshold data element count is set according to a statistical distribution of data element counts, each data element count associated with a value of a range of values of a second field of the set of data, across the range of values.
  - 11. The method of claim 10, wherein the second field is the first field.
  - 12. The method of claim 6, wherein the query contains custom query code;
    - wherein interpreting the query comprises pre-processing the custom query code by converting the custom query code from a foreign language to a native query language.
  - 13. The method of claim 12, wherein the foreign language is a natural language;
    - wherein interpreting the query comprises interpreting the natural language using lexical analysis.
  - 14. The method of claim 12, further comprising compiling and executing the custom query code during execution of the query.
  - 15. The method of claim 6, further comprising generating an aggregate of the anomalous data in response to identification of the anomalous data.

16. A method for enhancing rapid data analysis comprising:
- receiving a set of data;
  
  storing the set of data in a first set of data shards;
  
  wherein the first set of data shards is sharded, according to a set of shard partitioning rules, by a first field; and
  
  identifying anomalous data from the set of data, wherein the set of anomalous data is a subset of the set of data, by;
  
  analyzing the set of data to determine a statistical distribution of data element counts, each data element count associated with a value of a range of values of a second field of the set of data, across the range of values;
  
  determining that a value of the range of values is a statistical outlier based on the number of data element counts associated with the value; and
  
  identifying data associated with the value as anomalous data.
- View Dependent Claims (17, 18, 19, 20, 21)
- - 17. The method of claim 16, further comprising flagging the anomalous data using metadata.
  - 18. The method of claim 16, further comprising restructuring the anomalous data in response to identification of the anomalous data.
  - 19. The method of claim 18, wherein restructuring the anomalous data comprises using a partitioning algorithm to partition the anomalous data into subsets based on values of a third field of the set of data.
  - 20. The method of claim 16, further comprising generating an aggregate of the anomalous data in response to identification of the anomalous data.
  - 21. The method of claim 16, wherein the second field is the first field.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Scuba Analytics, Inc.
Original Assignee
Interana, Inc.
Inventors
Suhan, Alex, Johnson, Robert, Barykin, Oleksandr, Abraham, Lior, Fossgreen, Don

Granted Patent

US 10,296,507 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/24554   Unary operations; Data part...

G06F 16/278   Data partitioning, e.g. hor...

H04L 63/1425   Traffic logging, e.g. anoma...

METHODS FOR ENHANCING RAPID DATA ANALYSIS

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

METHODS FOR ENHANCING RAPID DATA ANALYSIS

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links