Sampling Events for Rule Creation with Process Selection
First Claim
1. A computer-implemented method, comprising:
- accessing a plurality of events, wherein each event in the plurality of events includes a portion of raw machine data;
receiving, from a user, a selection of one or more processes for identifying which events to include in a set;
wherein the one or more processes selected by the user include at least one of the following;
a diverse event-identification process, an outlier event-identification process, a random event-identification process, an earliest event-identification process, and a latest event-identification process;
for each selected process, identifying events for inclusion in the set using the process; and
causing display of one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.
-
Citations
30 Claims
-
1. A computer-implemented method, comprising:
-
accessing a plurality of events, wherein each event in the plurality of events includes a portion of raw machine data; receiving, from a user, a selection of one or more processes for identifying which events to include in a set; wherein the one or more processes selected by the user include at least one of the following;
a diverse event-identification process, an outlier event-identification process, a random event-identification process, an earliest event-identification process, and a latest event-identification process;for each selected process, identifying events for inclusion in the set using the process; and causing display of one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A computer-implemented method, comprising:
-
selecting a set of events from a plurality of events using a process to identify diverse events, the process including; performing a clustering algorithm on a group of events from the plurality of events to form a plurality of clusters, the clustering algorithm placing two events into a same cluster based on similarities in machine data included in each of the two events; and selecting events from one or more most populous clusters in the plurality of clusters; and causing display of one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field. - View Dependent Claims (24, 25, 26)
-
-
27. A computer-implemented method, comprising:
-
selecting a set of events from a plurality of events using a process to identify outlier events, the process including; performing a clustering algorithm on a group of events from the plurality of events to form a plurality of clusters, the clustering algorithm placing two events into a same cluster based on similarities in machine data included in each of the two events; and selecting events from one or more least populous clusters in the plurality of clusters; and causing display of one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field. - View Dependent Claims (28, 29, 30)
-
Specification