Computer system programmed to identify common subsequences in logs
First Claim
1. A method comprising:
- using a computer, receiving a stream of digital data comprising a plurality of objects;
using programmed tokenizer instructions executed using the computer, in response to receiving a first object of the plurality of objects, tokenizing the first object to create a first tokenized object and electronically digitally storing the first tokenized object in a token database that comprises a plurality of other tokenized objects and using an electronic digital storage device;
using the computer, comparing the first tokenized object to the plurality of other tokenized objects stored in the token database, computing a first pattern including the first tokenized object having a constant sequence placed within characters of the first tokenized object, wherein the constant sequence identifies a subsequence of characters in at least two objects of the plurality of objects that are different, and storing the first pattern in a pattern database that comprises a plurality of patterns;
using the computer, managing a size of the pattern database by;
identifying, from the plurality of patterns, a subset of patterns that are eligible for deletion from the pattern database based on an age of each pattern and storing in computer memory data identifying the subset of patterns;
ranking each pattern of the subset based on a quality metric and a popularity metric, by marking the data identifying the subset of patterns with rank values, wherein the popularity metric is a hit count of an associated pattern, and wherein the quality metric comprises a function of a length of the constant sequence placed within characters of a tokenized object of an associated pattern;
identifying a lowest ranked pattern from the subset for deletion including determining a longest length of the constant sequence placed within characters of a tokenized object of an associated pattern; and
repeating the tokenizing, comparing and storing using the updated database;
wherein the method is executed using one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
A data processing method includes receiving a stream of digital data with a plurality of objects and, in response to receiving an object, tokenizing the object to create a tokenized object, and storing the tokenized object in a token database. The method further includes comparing the tokenized object to a plurality of other tokenized objects stored in the token database, computing a pattern associated with the tokenized object, storing the pattern in a pattern database, and managing a size of the pattern database by identifying, a subset of patterns that are eligible for deletion from the pattern database based on an age of each pattern, ranking each pattern of the subset based on a quality and a popularity metric, identifying, based on the ranking and from the subset, a second pattern and deleting the second pattern from the pattern database to produce an updated database.
9 Citations
17 Claims
-
1. A method comprising:
-
using a computer, receiving a stream of digital data comprising a plurality of objects; using programmed tokenizer instructions executed using the computer, in response to receiving a first object of the plurality of objects, tokenizing the first object to create a first tokenized object and electronically digitally storing the first tokenized object in a token database that comprises a plurality of other tokenized objects and using an electronic digital storage device; using the computer, comparing the first tokenized object to the plurality of other tokenized objects stored in the token database, computing a first pattern including the first tokenized object having a constant sequence placed within characters of the first tokenized object, wherein the constant sequence identifies a subsequence of characters in at least two objects of the plurality of objects that are different, and storing the first pattern in a pattern database that comprises a plurality of patterns; using the computer, managing a size of the pattern database by; identifying, from the plurality of patterns, a subset of patterns that are eligible for deletion from the pattern database based on an age of each pattern and storing in computer memory data identifying the subset of patterns; ranking each pattern of the subset based on a quality metric and a popularity metric, by marking the data identifying the subset of patterns with rank values, wherein the popularity metric is a hit count of an associated pattern, and wherein the quality metric comprises a function of a length of the constant sequence placed within characters of a tokenized object of an associated pattern; identifying a lowest ranked pattern from the subset for deletion including determining a longest length of the constant sequence placed within characters of a tokenized object of an associated pattern; and repeating the tokenizing, comparing and storing using the updated database; wherein the method is executed using one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
using a computer, managing a size of a pattern database that stores a plurality of patterns, each pattern of the plurality of patterns including a tokenized object having a constant sequence placed within characters of the first tokenized object, wherein the constant sequence identifies a subsequence of characters in at least two objects that are different, by; identifying, from the plurality of patterns, a subset of patterns that are eligible for deletion from the pattern database based on an age of each pattern and storing in computer memory data identifying the subset of patterns; ranking each pattern of the subset based on a quality metric and a popularity metric, by marking the data identifying the subset of patterns with rank values, wherein the popularity metric is a hit count of an associated pattern, and wherein the quality metric comprises a function of a length of the constant sequence placed within characters of a tokenized object of an associated pattern; identifying a lowest ranked pattern from the subset for deletion including determining a longest length of the constant sequence placed within characters of a tokenized object of an associated pattern; and deleting the pattern from the pattern database to produce an updated database; wherein the method is executed using one or more computing devices. - View Dependent Claims (9, 10)
-
-
11. A computing device comprising:
-
one or more processors; and one or more non-transitory computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to; receive a stream of digital data comprising a plurality of objects; in response to receiving a first object of the plurality of objects, tokenize the first object to create a first tokenized object and electronically digitally store the first tokenized object in a token database that comprises a plurality of other tokenized objects and using an electronic digital storage device; compare the first tokenized object to the plurality of other tokenized objects stored in the token database, computing a first pattern including the first tokenized object having a constant sequence placed within characters of the first tokenized object, wherein the constant sequence identifies a subsequence of characters in at least two objects of the plurality of objects that are different, and storing the first pattern in a pattern database that comprises a plurality of patterns; manage a size of the pattern database by; identifying, from the plurality of patterns, a subset of patterns that are eligible for deletion from the pattern database based on an age of each pattern and storing in computer memory data identifying the subset of patterns; ranking each pattern of the subset based on a quality metric and a popularity metric, by marking the data identifying the subset of patterns with rank values, wherein the popularity metric is a hit count of an associated pattern, and wherein the quality metric comprises a function of a length of the constant sequence placed within characters of a tokenized object of an associated pattern; identifying a lowest ranked pattern from the subset for deletion including determining a longest length of the constant sequence placed within characters of a tokenized object of an associated pattern; and repeating the tokenizing, comparing and storing using the updated database. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
Specification