Systems and methods for rapid data analysis
First Claim
Patent Images
1. A method for rapid data analysis comprising:
- receiving and interpreting a first query, wherein interpreting the first query comprises identifying a first set of data shards of a first dataset containing data relevant to the first query;
wherein the first dataset is partitioned by a first field;
for a first query pass of the first query, collecting a first data sample from the first set of data shards, wherein collecting the first data sample comprises collecting data from each of the first set of data shards;
for the first query pass, calculating a first result to the first query based on analysis of the first data sample; and
for a second query pass of the first query that uses the first result as input, partitioning a second dataset based on a second field, wherein the second data set contains data identical to the first dataset, wherein the second field is identified by a set of shard partitioning rules, based on the first field;
wherein the second field is non-identical to the first field.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for rapid data analysis includes receiving and interpreting a first query operating on a first dataset partitioned into shards by a first field; collecting a first data sample from a first set of data shards; calculating a first result to the first query based on analysis of the first data sample; and partitioning a second dataset into shards by a second field based on the first result.
92 Citations
20 Claims
-
1. A method for rapid data analysis comprising:
-
receiving and interpreting a first query, wherein interpreting the first query comprises identifying a first set of data shards of a first dataset containing data relevant to the first query;
wherein the first dataset is partitioned by a first field;for a first query pass of the first query, collecting a first data sample from the first set of data shards, wherein collecting the first data sample comprises collecting data from each of the first set of data shards; for the first query pass, calculating a first result to the first query based on analysis of the first data sample; and for a second query pass of the first query that uses the first result as input, partitioning a second dataset based on a second field, wherein the second data set contains data identical to the first dataset, wherein the second field is identified by a set of shard partitioning rules, based on the first field;
wherein the second field is non-identical to the first field. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for rapid data analysis comprising:
-
an event database, comprising first and second datasets;
wherein the first and second datasets contain identical data;
wherein the first dataset is partitioned by a first field;a string lookup database that stores information linking strings to integers that uniquely identify the strings; a string translator that converts strings in incoming data to integer identifiers using the string lookup database; a query engine that processes queries on the event database and returns at least a first query result for a first query pass of a first query, and a second query result for a second query pass of the first query, wherein the second query pass uses the first query result as input; and a data manager that, based on a second field, partitions the second dataset;
wherein the second field is identified by a set of shard partitioning rules, based on the first field;
wherein the a second field is non-identical to the first field. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification