Method and system for implementing efficient classification and exploration of data
First Claim
Patent Images
1. A method comprising:
- receiving a plurality of log records from a processing system, the plurality of log records comprising one or more first log records and one or more second log records;
comparing the one or more first log records to the one or more second log records to determine how similar the one or more first log records are to the one or more second log records, the one or more first log records compared to the one or more second log records by independently tokenizing the one or more first log records into a first plurality of tokens and the one or more second log records into a second plurality of tokens, where a similarity value is generated that corresponds to a degree of overlap, in terms of both token content and position, between the first plurality of tokens and the second plurality of tokens;
classifying the one or more first log records and the one or more second log records into a group based at least in part on the similarity value;
storing, for the group, a signature comprising one or more overlapping portions that are shared by both the one or more first log records and the one or more second log records, and one or more variable portions that differ between the one or more first log records and the one or more second log records;
detecting a problem on the processing system based at least in part on how the one or more first log records and the one or more second log records were classified; and
performing at least one operation responsive to detecting the problem on the processing system.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed is a system, method, and computer program product for analyzing sets of data in an efficient manner, such that analytics can be effectively performed over that data. Classification operations can be performed to generate groups of similar log records. This permits classification of the log records in a cohesive and informative manner.
12 Citations
33 Claims
-
1. A method comprising:
-
receiving a plurality of log records from a processing system, the plurality of log records comprising one or more first log records and one or more second log records; comparing the one or more first log records to the one or more second log records to determine how similar the one or more first log records are to the one or more second log records, the one or more first log records compared to the one or more second log records by independently tokenizing the one or more first log records into a first plurality of tokens and the one or more second log records into a second plurality of tokens, where a similarity value is generated that corresponds to a degree of overlap, in terms of both token content and position, between the first plurality of tokens and the second plurality of tokens; classifying the one or more first log records and the one or more second log records into a group based at least in part on the similarity value; storing, for the group, a signature comprising one or more overlapping portions that are shared by both the one or more first log records and the one or more second log records, and one or more variable portions that differ between the one or more first log records and the one or more second log records; detecting a problem on the processing system based at least in part on how the one or more first log records and the one or more second log records were classified; and performing at least one operation responsive to detecting the problem on the processing system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory computer readable medium having stored thereon a sequence of instructions which, when executed by a processor causes the processor to execute a method comprising:
-
receiving a plurality of log records from a processing system, the plurality of log records comprising one or more first log records and one or more second log records; comparing the one or more first log records to the one or more second log records to determine how similar the one or more first log records are to the one or more second log records, the one or more first log records compared to the one or more second log records by independently tokenizing the one or more first log records into a first plurality of tokens and the one or more second log records into a second plurality of tokens, where a similarity value is generated that corresponds to a degree of overlap, in terms of both token content and position, between the first plurality of tokens and the second plurality of tokens; classifying the one or more first log records and the one or more second log records into a group based at least in part on the similarity value; storing, for the group, a signature comprising one or more overlapping portions that are shared by both the one or more first log records and the one or more second log records, and one or more variable portions that differ between the one or more first log records and the one or more second log records; detecting a problem on the processing system based at least in part on how the one or more first log records and the one or more second log records were classified; and performing at least one operation responsive to detecting the problem on the processing system. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A system, comprising:
-
a processor; a memory having stored thereon a sequence of instructions which, when executed by a processor causes the processor to execute operations comprising; receiving a plurality of log records from a processing system, the plurality of log records comprising one or more first log records and one or more second log records; comparing the one or more first log records to the one or more second log records to determine how similar the one or more first log records are to the one or more second log records, the one or more first log records compared to the one or more second log records by independently tokenizing the one or more first log records into a first plurality of tokens and the one or more second log records into a second plurality of tokens, where a similarity value is generated that corresponds to a degree of overlap, in terms of both token content and position, between the first plurality of tokens and the second plurality of tokens; classifying the one or more first log records and the one or more second log records into a group based at least in part on the similarity value; storing, for the group, a signature comprising one or more overlapping portions that are shared by both the one or more first log records and the one or more second log records, and one or more variable portions that differ between the one or more first log records and the one or more second log records; detecting a problem on the processing system based at least in part on how the one or more first log records and the one or more second log records were classified; and performing at least one operation responsive to detecting the problem on the processing system. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
Specification