Distributed high performance analytics store
First Claim
1. A computer-implemented method for generating a result set among a plurality of distributed locations, the method comprising:
- receiving, at a computing device, raw data;
segmenting the raw data into a set of time-stamped event records;
dividing the set of time-stamped event records into two or more partitions of event records;
indexing and storing each time-stamped event record of each of the two or more partitions of event records, wherein each of the two or more partitions of event records are stored at a different one of a plurality of distributed locations in a distributed indexed data store;
for each partition of event records, generating a summarization table that;
identifies one or more field-value combinations, wherein a field-value combination includes a field and a value that appears in one or more of the event records of that partition for that field, and wherein a data model or a command is used to identify one or more fields for inclusion in the summarization table; and
for each field-value combination, identifies a set of one or more posting values of event records of that partition that have the value for the field, wherein a posting value provides a way to retrieve the event record to which it corresponds from the distributed indexed data store;
receiving a query;
generating one or more partial results for the query, wherein the one or more partial results are generated using the summarization table for a partition of event records, and wherein the one or more partial results are generated without evaluating each individual event record in the partition of event records; and
generating a result set responsive to the query, wherein the result set is generated using the one or more partial results.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments are directed are towards the transparent summarization of events. Queries directed towards summarizing and reporting on event records may be received at a search head. Search heads may be associated with one more indexers containing event records. The search head may forward the query to the indexers the can resolve the query for concurrent execution. If a query is a collection query, indexers may generate summarization information based on event records located on the indexers. Event record fields included in the summarization information may be determined based on terms included in the collection query. If a query is a stats query, each indexer may generate a partial result set from previously generated summarization information, returning the partial result sets to the search head. Collection queries may be saved and scheduled to run and periodically update the summarization information.
-
Citations
23 Claims
-
1. A computer-implemented method for generating a result set among a plurality of distributed locations, the method comprising:
-
receiving, at a computing device, raw data; segmenting the raw data into a set of time-stamped event records; dividing the set of time-stamped event records into two or more partitions of event records; indexing and storing each time-stamped event record of each of the two or more partitions of event records, wherein each of the two or more partitions of event records are stored at a different one of a plurality of distributed locations in a distributed indexed data store; for each partition of event records, generating a summarization table that; identifies one or more field-value combinations, wherein a field-value combination includes a field and a value that appears in one or more of the event records of that partition for that field, and wherein a data model or a command is used to identify one or more fields for inclusion in the summarization table; and for each field-value combination, identifies a set of one or more posting values of event records of that partition that have the value for the field, wherein a posting value provides a way to retrieve the event record to which it corresponds from the distributed indexed data store; receiving a query; generating one or more partial results for the query, wherein the one or more partial results are generated using the summarization table for a partition of event records, and wherein the one or more partial results are generated without evaluating each individual event record in the partition of event records; and generating a result set responsive to the query, wherein the result set is generated using the one or more partial results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented system for generating a result set among a plurality of distributed locations said system comprising:
-
one or more processors; one or more non-transitory computer-readable storage mediums containing instructions configured to cause the one or more processors to perform operations including; receiving raw data; segmenting the raw data into a set of time-stamped event records; dividing the set of time-stamped event records events into two or more partitions of event records; indexing and storing each time-stamped event record of each of the two or more partitions of event records, wherein each of the two or more partitions of event records are stored at a different one of a plurality of distributed locations in a distributed indexed data store; for each partition of event records, generating a summarization table that; identifies one or more field-value combinations, wherein a field-value combination includes a field and a value that appears in one or more of the event records of that partition for that field, and wherein a data model or a command is used to identify one or more fields for inclusion in the summarization table; and for each field-value combination, identifies a set of one or more posting values of event records of that partition that have the value for the field, wherein a posting value provides a way to retrieve the event record to which it corresponds from the distributed indexed data store; receiving a query; generating one or more partial results for the query, wherein the one or more partial results are generated using the summarization table for a partition of event records, and wherein the one or more partial results are generated without evaluating each individual event record in the partition of event records; and generating a result set responsive to the query, wherein the result set is generated using the one or more partial results. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-program product for generating a result set among a plurality of distributed locations, tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause a data processing apparatus to:
-
receive raw data; segment the raw data into a set of time-stamped event records; divide the set of time-stamped event records into two or more partitions of event records; index and store each time-stamped event record of each of the two or more partitions of event records wherein each of the two or more partitions of event records are stored at a different one of a plurality of distributed locations in a distributed indexed data store; for each partition of event records, generate a summarization table that; identifies one or more field-value combinations, wherein a field-value combination includes a field and a value that appears in one or more of the event records of that partition for that field, and wherein a data model or a command is used to identify one or more fields for inclusion in the summarization table; and for each field-value combination, identifies a set of one or more posting values of event records of that partition that have the value for the field, wherein a posting value provides a way to retrieve the event record to which it corresponds from the distributed indexed data store; receive a query; generate one or more partial results for the query, wherein the one or more partial results are generated using the summarization table for a partition of event records, and wherein the one or more partial results are generated without evaluating each individual event record in the partition of event records; and generate a result set responsive to the query, wherein the result set is generated using the one or more partial results. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
Specification