Managing selection of a representative data subset according to user-specified parameters with clustering
First Claim
1. A computer implemented method for managing selection of a representative data subset, comprising:
- receiving, from a user via a graphical user interface, selections of;
(i) a data source type from which to generate the representative data subset,(ii) one or a combination of subset types, of a plurality of defined event subset types, for identifying events to include in the subset, and(iii) a number of desired representative events to be included in the subset;
retrieving events from the selected data source according to the received selection of subset type;
clustering to identify similarities between the retrieved events to determine whether the particular events can be characterized as forming a group;
extracting from the retrieved, clustered events a number of events corresponding to the user-selected number of desired representative events, wherein the events are extracted based on a field-extraction rule that specifies how to extract values from raw machine data included in each of the one or more events; and
causing display of the subset of representative events in the graphical user interface.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.
-
Citations
30 Claims
-
1. A computer implemented method for managing selection of a representative data subset, comprising:
-
receiving, from a user via a graphical user interface, selections of; (i) a data source type from which to generate the representative data subset, (ii) one or a combination of subset types, of a plurality of defined event subset types, for identifying events to include in the subset, and (iii) a number of desired representative events to be included in the subset; retrieving events from the selected data source according to the received selection of subset type; clustering to identify similarities between the retrieved events to determine whether the particular events can be characterized as forming a group; extracting from the retrieved, clustered events a number of events corresponding to the user-selected number of desired representative events, wherein the events are extracted based on a field-extraction rule that specifies how to extract values from raw machine data included in each of the one or more events; and causing display of the subset of representative events in the graphical user interface. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A non-transitory, computer-readable storage medium storing instructions, an execution of which in a computer system causes the computer system to perform operations comprising:
-
receiving, from a user via a graphical user interface, selections of; (i) a data source type from which to generate the representative data subset, (ii) one or a combination of subset types, of a plurality of defined event subset types, for identifying events to include in the subset, and (iii) a number of desired representative events to be included in the subset; retrieving events from the selected data source according to the received selection of subset type; clustering to identify similarities between the retrieved events to determine whether the particular events can be characterized as forming a group; extracting from the retrieved, clustered events a number of events corresponding to the user-selected number of desired representative events, wherein the events are extracted based on a field-extraction rule that specifies how to extract values from raw machine data included in each of the one or more events; and causing display of the subset of representative events in the graphical user interface. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer system comprising:
-
computer memory for storing machine data; and a processor for; receiving, from a user via a graphical user interface, selections of; (i) a data source type from which to generate the representative data subset, (ii) one or a combination of subset types, of a plurality of defined event subset types, for identifying events to include in the subset, and (iii) a number of desired representative events to be included in the subset; retrieving events from the selected data source according to the received selection of subset type; clustering to identify similarities between the retrieved events to determine whether the particular events can be characterized as forming a group; extracting from the retrieved, clustered events a number of events corresponding to the user-selected number of desired representative events, wherein the events are extracted based on a field-extraction rule that specifies how to extract values from raw machine data included in each of the one or more events; and causing display of the subset of representative events in the graphical user interface. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification