×

Field content based pattern generation for heterogeneous logs

  • US 10,678,669 B2
  • Filed: 04/18/2018
  • Issued: 06/09/2020
  • Est. Priority Date: 04/21/2017
  • Status: Active Grant
First Claim
Patent Images

1. A system for pattern discovery in input heterogeneous logs having unstructured text content and one or more fields, the system comprising:

  • a memory; and

    a processor in communication with the memory, wherein the processor runs program code to;

    preprocess the input heterogeneous logs to obtain pre-processed logs by splitting the input heterogeneous logs into tokens;

    generate seed patterns from the preprocessed logs; and

    generate final patterns by specializing a selected set of fields in each of the seed patterns to generate a final pattern set;

    wherein the processor generates the seed patterns by running program code to;

    identify semantics of the tokens by assigning one of a plurality of semantic datatypes to the tokens based on Regular Expression rules;

    generate seed-pattern signatures, wherein a seed-pattern signature is generated for each of the heterogeneous input logs by position-wise concatenating the semantic datatypes of the tokens therein with spaces; and

    identify unique seed-pattern signatures from the seed-pattern signatures using an index, wherein each index entry includes the seed-pattern signature as an index key and associated metadata obtained as a counter value as an index value;

    wherein the processor generates the seed patterns by running code to;

    search the index for a given seed-pattern signature;

    discard the given seed-pattern signature responsive to a matching one being found in the index and increasing the counter value; and

    add the given seed-pattern signature to a database of seed-pattern signatures responsive to an absence of the matching one in the index.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×