×

Computer system programmed to identify common subsequences in logs

  • US 10,664,481 B2
  • Filed: 09/29/2015
  • Issued: 05/26/2020
  • Est. Priority Date: 09/29/2015
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • using a computer, receiving a stream of digital data comprising a plurality of objects;

    using programmed tokenizer instructions executed using the computer, in response to receiving a first object of the plurality of objects, tokenizing the first object to create a first tokenized object and electronically digitally storing the first tokenized object in a token database that comprises a plurality of other tokenized objects and using an electronic digital storage device;

    using the computer, comparing the first tokenized object to the plurality of other tokenized objects stored in the token database, computing a first pattern including the first tokenized object having a constant sequence placed within characters of the first tokenized object, wherein the constant sequence identifies a subsequence of characters in at least two objects of the plurality of objects that are different, and storing the first pattern in a pattern database that comprises a plurality of patterns;

    using the computer, managing a size of the pattern database by;

    identifying, from the plurality of patterns, a subset of patterns that are eligible for deletion from the pattern database based on an age of each pattern and storing in computer memory data identifying the subset of patterns;

    ranking each pattern of the subset based on a quality metric and a popularity metric, by marking the data identifying the subset of patterns with rank values, wherein the popularity metric is a hit count of an associated pattern, and wherein the quality metric comprises a function of a length of the constant sequence placed within characters of a tokenized object of an associated pattern;

    identifying a lowest ranked pattern from the subset for deletion including determining a longest length of the constant sequence placed within characters of a tokenized object of an associated pattern; and

    repeating the tokenizing, comparing and storing using the updated database;

    wherein the method is executed using one or more computing devices.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×