Sampling of events to use for developing a field-extraction rule for a field to use in event searching
First Claim
1. A computer-implemented method, comprising:
- receiving machine data at a computing device;
generating a plurality of events, wherein each event in the plurality of events includes a portion of the machine data;
associating a time with each event in the plurality of events, the time for each event extracted from the machine data included in that event;
storing the plurality of events in a data store such that they are searchable at least by their associated times;
receiving from a user a selection of one or more event selection parameters;
wherein each event selection parameter corresponds to a distinct process for identifying events for inclusion in a set;
wherein the one or more event selection parameters selected by the user include at least one of diverse, outlier, random, earliest, and latest event selection processes;
for each of the received one or more event selection parameters, identifying events for inclusion in the set using the corresponding distinct processes; and
displaying one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.
88 Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
receiving machine data at a computing device; generating a plurality of events, wherein each event in the plurality of events includes a portion of the machine data; associating a time with each event in the plurality of events, the time for each event extracted from the machine data included in that event; storing the plurality of events in a data store such that they are searchable at least by their associated times; receiving from a user a selection of one or more event selection parameters; wherein each event selection parameter corresponds to a distinct process for identifying events for inclusion in a set; wherein the one or more event selection parameters selected by the user include at least one of diverse, outlier, random, earliest, and latest event selection processes; for each of the received one or more event selection parameters, identifying events for inclusion in the set using the corresponding distinct processes; and displaying one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer-implemented method, comprising:
-
receiving machine data at a computing device; generating a plurality of events, wherein each event in the plurality of events includes a portion of the machine data; associating a time with each event in the plurality of events, the time for each event extracted from the machine data included in that event; storing the plurality of events in a data store such that they are searchable at least by their associated times; selecting a set of events from the plurality of events using a process to identify diverse events, the process including; performing a clustering algorithm on a group of events from the plurality of events to form a plurality of clusters, the clustering algorithm placing two events into a same cluster based on similarities in the machine data included in each of the two events; and selecting events from one or more most populous clusters in the plurality of clusters; and displaying one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field. - View Dependent Claims (16, 17)
-
-
18. A computer-implemented method, comprising:
-
receiving machine data at a computing device; generating a plurality of events, wherein each event in the plurality of events includes a portion of the machine data; associating a time with each event in the plurality of events, the time for each event extracted from the machine data included in that event; storing the plurality of events in a data store such that they are searchable at least by their associated times; selecting a set of events from the plurality of events using a process to identify outlier events, the process including; performing a clustering algorithm on a group of events from the plurality of events to form a plurality of clusters, the clustering algorithm placing two events into a same cluster based on similarities in the machine data included in each of the two events; and selecting events from one or more least populous clusters in the plurality of clusters; and displaying one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field. - View Dependent Claims (19, 20)
-
Specification