CLUSTERING AND LABELING STREAMED DATA
First Claim
1. A computer system, the computer system comprising:
- one or more hardware processors;
system memory coupled to the one or more hardware processors, the system memory storing instructions that are executable by the one or more hardware processors;
the one or more hardware processors executing the instructions stored in the system memory to cluster streamed data, including the following;
receive streamed log data over a network connection;
select relevant features from within the streamed log data,wherein the relevant features are relevant to a condition at a device where the streamed log data originated;
for one or more previously formed log pattern clusters, determine a similarity between the relevant features and the one or more previously formed log pattern cluster; and
assign the streamed log data to a log pattern cluster based on the determined similarities.
1 Assignment
0 Petitions
Accused Products
Abstract
Aspects extend to methods, systems, and computer program products for clustering streamed or batch data. Aspects of the invention include dynamic clustering and labeling of streamed data and/or batch data, including failures and error logs (user, platform, etc.), latency logs, warning logs, information logs, Virtual Machine (VM) creation data logs, template logs, etc., for use in analysis (e.g., error log analysis). A clustering system can learn from previously identified patterns and use that information to group newer information dynamically as it gets generated. The clustering system can leverage streamed data and/or batch data domain knowledge for preprocessing. In one aspect, a clustering system uses a similarity measure. Based on (e.g., users'"'"' configuration of) a similarity threshold, the cluster system (e.g., automatically) assigns/clusters streamed data and/or batch data into groups.
-
Citations
20 Claims
-
1. A computer system, the computer system comprising:
-
one or more hardware processors; system memory coupled to the one or more hardware processors, the system memory storing instructions that are executable by the one or more hardware processors; the one or more hardware processors executing the instructions stored in the system memory to cluster streamed data, including the following; receive streamed log data over a network connection; select relevant features from within the streamed log data, wherein the relevant features are relevant to a condition at a device where the streamed log data originated; for one or more previously formed log pattern clusters, determine a similarity between the relevant features and the one or more previously formed log pattern cluster; and assign the streamed log data to a log pattern cluster based on the determined similarities. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for use at a computer system, the method for clustering streamed data, the method comprising:
-
receiving streamed log data from a computer system over a network connection; selecting relevant features from within the streamed log data, wherein the relevant features are relevant to an error at the computer system; determining a similarity between the relevant features and a first previously formed log pattern clusters; determining a similarity between the relevant features and a second previously formed log pattern clusters; and automatically assigning the streamed log data to a log pattern cluster based on the determined similarities between the relevant features and the first and second previously formed log pattern clusters. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A computer program product for use at a computer system, the computer program product for implementing a method for clustering streamed data, the computer program product comprises one or more computer storage devices having stored thereon computer-executable instructions that, when executed at a processor, cause the computer system to perform the method, including the following:
-
receive streamed error log data from a device over a network connection; select relevant features from within the streamed error log data, wherein the relevant features are relevant to an error at the device; for each of a plurality of previously formed error log pattern clusters, determine a similarity between the relevant features and the previously formed error log pattern clusters; and assign the streamed log data to an error log pattern cluster based on the determined similarities.
-
Specification