GENERATION OF A SEARCH QUERY TO APPROXIMATE REPLICATION OF A CLUSTER OF EVENTS
First Claim
Patent Images
1. A method comprising:
- accessing data items in a dataset, each data item containing a portion of raw machine-generated data in textual form generated by a component in an information-technology environment;
applying a clustering algorithm to the data items to group the data items into two or more clusters, the clustering algorithm generating an ordered list of keywords for each data item that is parsed from that data item and grouping data items into a same cluster when their respective ordered lists of keywords meet a similarity threshold; and
for each cluster, identifying a set of one or more search terms providing criteria for a search query that substantially reproduces the cluster upon execution of the search query against the dataset, wherein execution of the search query against the dataset comprises evaluating the search terms against the raw machine-generated data in textual form in the data items;
wherein each of the search terms requires a presence of a particular keyword in the data items, requires an absence of a particular keyword in the data items, or includes a criterion pertaining to a field in the data items;
wherein the method is performed by one or more processing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
A processing device performs a preliminary grouping of data items in a dataset to define one or more clusters and for each cluster, identifies a set of search terms for a search query that would retrieve data items in the cluster upon execution of the search query against the dataset.
56 Citations
30 Claims
-
1. A method comprising:
-
accessing data items in a dataset, each data item containing a portion of raw machine-generated data in textual form generated by a component in an information-technology environment; applying a clustering algorithm to the data items to group the data items into two or more clusters, the clustering algorithm generating an ordered list of keywords for each data item that is parsed from that data item and grouping data items into a same cluster when their respective ordered lists of keywords meet a similarity threshold; and for each cluster, identifying a set of one or more search terms providing criteria for a search query that substantially reproduces the cluster upon execution of the search query against the dataset, wherein execution of the search query against the dataset comprises evaluating the search terms against the raw machine-generated data in textual form in the data items; wherein each of the search terms requires a presence of a particular keyword in the data items, requires an absence of a particular keyword in the data items, or includes a criterion pertaining to a field in the data items; wherein the method is performed by one or more processing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A system comprising:
-
a memory; and a processing device coupled to the memory, the processing device to; accessing data items in a dataset, each data item containing a portion of raw machine-generated data in textual form generated by a component in an information-technology environment; apply a clustering algorithm to the data items to group the data items into two or more clusters, the clustering algorithm generating an ordered list of keywords for each data item that is parsed from that data item and grouping data items into a same cluster when their respective ordered lists of keywords meet a similarity threshold; and for each cluster, identify a set of one or more search terms providing criteria for a search query that substantially reproduces the cluster upon execution of the search query against the dataset, wherein execution of the search query against the dataset comprises evaluating the search terms against the raw machine-generated data in textual form in the data items; wherein each of the search terms requires a presence of a particular keyword in the data items, requires an absence of a particular keyword in the data items, or includes a criterion pertaining to a field in the data items. - View Dependent Claims (28)
-
-
29. A non-transitory computer-readable storage medium storing instructions which, when executed by a processing device, cause the processing device to perform operations comprising:
-
access data items in a dataset, each data item containing a portion of raw machine-generated data in textual form generated by a component in an information-technology environment; applying a clustering algorithm to the data items to group the data items into two or more clusters, the clustering algorithm generating an ordered list of keywords for each data item that is parsed from that data item and grouping data items into a same cluster when their respective ordered lists of keywords meet a similarity threshold; and for each cluster, identifying a set of one or more search terms providing criteria for a search query that substantially reproduces the cluster upon execution of the search query against the dataset, wherein execution of the search query against the dataset comprises evaluating the search terms against the raw machine-generated data in textual form in the data items; wherein each of the search terms requires a presence of a particular keyword in the data items, requires an absence of a particular keyword in the data items, or includes a criterion pertaining to a field in the data items; wherein the operations are performed by one or more processing devices. - View Dependent Claims (30)
-
Specification