METHOD AND SYSTEM FOR IMPLEMENTING EFFICIENT CLASSIFICATION AND EXPLORATION OF DATA
First Claim
Patent Images
1. A method comprising:
- receiving a plurality of log records from a processing system, the plurality of log records comprising one or more first log records and one or more second log records;
comparing the one or more first log records to the one or more second log records to determine how similar the one or more first log records are to the one or more second log records, the one or more first log records compared to the one or more second log records by independently tokenizing the one or more first log records into a first plurality of tokens and the one or more second log records into a second plurality of tokens, where a similarity value is generated that corresponds to a degree of overlap, in terms of both token content and position, between the first plurality of tokens and the second plurality of tokens; and
classifying the one or more first log records and the one or more second log records into either same or different groups based at least in part on whether the similarity value is less than a threshold level.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed is a system, method, and computer program product for analyzing sets of data in an efficient manner, such that analytics can be effectively performed over that data. Classification operations can be performed to generate groups of similar log records. This permits classification of the log records in a cohesive and informative manner.
43 Citations
33 Claims
-
1. A method comprising:
-
receiving a plurality of log records from a processing system, the plurality of log records comprising one or more first log records and one or more second log records; comparing the one or more first log records to the one or more second log records to determine how similar the one or more first log records are to the one or more second log records, the one or more first log records compared to the one or more second log records by independently tokenizing the one or more first log records into a first plurality of tokens and the one or more second log records into a second plurality of tokens, where a similarity value is generated that corresponds to a degree of overlap, in terms of both token content and position, between the first plurality of tokens and the second plurality of tokens; and classifying the one or more first log records and the one or more second log records into either same or different groups based at least in part on whether the similarity value is less than a threshold level. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer readable medium having stored thereon a sequence of instructions which, when executed by a processor causes the processor to execute a method comprising:
-
receiving a plurality of log records from a processing system, the plurality of log records comprising one or more first log records and one or more second log records; comparing the one or more first log records to the one or more second log records to determine how similar the one or more first log records are to the one or more second log records, the one or more first log records compared to the one or more second log records by independently tokenizing the one or more first log records into a first plurality of tokens and the one or more second log records into a second plurality of tokens, where a similarity value is generated that corresponds to a degree of overlap, in terms of both token content and position, between the first plurality of tokens and the second plurality of tokens; and classifying the one or more first log records and the one or more second log records into either same or different groups based at least in part on whether the similarity value is less than a threshold level. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A system, comprising:
-
a processor; a memory having stored thereon a sequence of instructions which, when executed by a processor causes the processor to execute operations comprising;
receiving a plurality of log records from a processing system, the plurality of log records comprising one or more first log records and one or more second log records;
comparing the one or more first log records to the one or more second log records to determine how similar the one or more first log records are to the one or more second log records, the one or more first log records compared to the one or more second log records by independently tokenizing the one or more first log records into a first plurality of tokens and the one or more second log records into a second plurality of tokens, where a similarity value is generated that corresponds to a degree of overlap, in terms of both token content and position, between the first plurality of tokens and the second plurality of tokens; and
classifying the one or more first log records and the one or more second log records into either same or different groups based at least in part on whether the similarity value is less than a threshold level. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
Specification