Sampling events for rule creation with process selection
First Claim
1. A computer-implemented method, comprising:
- accessing a plurality of events, wherein each event in the plurality of events includes a portion of raw machine data;
receiving, from a user, a selection of one or more processes for identifying which events to include in a set;
wherein the one or more processes selected by the user include at least one of the following;
a diverse event-identification process, an outlier event-identification process, a random event identification process, an earliest event-identification process, and a latest event-identification process;
for each selected process, identifying events for inclusion in the set using the process;
causing display of one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the raw machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field; and
wherein identifying events for inclusion in the set includes using a process to identify diverse events, and wherein the process to identify diverse events includes;
performing a clustering algorithm on a group of events from the plurality of events to form a plurality of clusters, the clustering algorithm placing two events into a same cluster based on similarities in the machine data included in each of the two events; and
selecting events from one or more most populous clusters in the plurality of clusters.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.
-
Citations
29 Claims
-
1. A computer-implemented method, comprising:
-
accessing a plurality of events, wherein each event in the plurality of events includes a portion of raw machine data; receiving, from a user, a selection of one or more processes for identifying which events to include in a set; wherein the one or more processes selected by the user include at least one of the following; a diverse event-identification process, an outlier event-identification process, a random event identification process, an earliest event-identification process, and a latest event-identification process; for each selected process, identifying events for inclusion in the set using the process; causing display of one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the raw machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field; and wherein identifying events for inclusion in the set includes using a process to identify diverse events, and wherein the process to identify diverse events includes; performing a clustering algorithm on a group of events from the plurality of events to form a plurality of clusters, the clustering algorithm placing two events into a same cluster based on similarities in the machine data included in each of the two events; and selecting events from one or more most populous clusters in the plurality of clusters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 24, 28)
-
-
10. A computer-implemented method, comprising:
-
accessing a plurality of events, wherein each event in the plurality of events includes a portion of raw machine data; receiving, from a user, a selection of one or more processes for identifying which events to include in a set; wherein the one or more processes selected by the user include at least one of the following; a diverse event-identification process, an outlier event-identification process, a random event identification process, an earliest event-identification process and a latest event-identification process; for each selected process, identifying events for inclusion in the set using the process; causing display of one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the raw machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field; and wherein identifying events for inclusion in the set includes using a process to identify diverse events, and wherein the process to identify diverse events includes; performing a clustering algorithm on a group of events from the plurality of events to form a plurality of clusters, the clustering algorithm placing two events into a same cluster based on similarities in the machine data included in each of the two events; and selecting events from one or more most populous clusters in the plurality of clusters. - View Dependent Claims (11, 12, 13, 25, 29)
-
-
14. A computer-implemented method, comprising:
-
accessing a plurality of events, wherein each event in the plurality of events includes a portion of raw machine data; receiving, from a user, a selection of one or more processes for identifying which events to include in a set; wherein the one or more processes selected by the user include at least one of the following; a diverse event-identification process, an outlier event-identification process, a random event identification process, an earliest event-identification process, and a latest event-identification process; for each selected process, identifying events for inclusion in the set using the process; causing display of one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the raw machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field; and wherein identifying events for inclusion in the set includes using a process to identify diverse events or using a process to identify outlier events, and wherein the process includes; clustering a group of events in the plurality of events to form a plurality of clusters; determining that a number of clusters in the plurality of clusters is not big enough; and clustering a larger group of events in the plurality of events than the group of events. - View Dependent Claims (15, 16, 17, 22, 26)
-
-
18. A computer-implemented method, comprising:
-
accessing a plurality of events, wherein each event in the plurality of events includes a portion of raw machine data; receiving, from a user, a selection of one or more processes for identifying which events to include in a set; wherein the one or more processes selected by the user include at least one of the following; a diverse event-identification process, an outlier event-identification process, a random event identification process, an earliest event-identification process, and a latest event-identification process; for each selected process, identifying events for inclusion in the set using the process; causing display of one or more events in the set of events in a graphical user interface that enables development of a field-extraction rule that specifies how to extract, from the raw machine data included in each of the one or more events, a value for a field that is defined for each of the one or more events, wherein each of the one or more events is searchable using the field; and wherein identifying events for inclusion in the set includes using a process to identify diverse events or using a process to identify outlier events, and wherein the process includes; clustering a group of events in the plurality of events to form a plurality of clusters; determining that a number of events in one of the clusters in the plurality of clusters is not big enough; and clustering a larger group of events in the plurality of events than the group of events. - View Dependent Claims (19, 20, 21, 23, 27)
-
Specification