×

Automatically mining patterns for rule based data standardization systems

  • US 10,163,063 B2
  • Filed: 03/07/2012
  • Issued: 12/25/2018
  • Est. Priority Date: 03/07/2012
  • Status: Active Grant
First Claim
Patent Images

1. A system for mining sub-patterns within a text data set, the system comprising:

  • a data source to store the text data set; and

    a processor configured with logic to;

    find a set of N frequently occurring sub-patterns within the data set;

    extract the N sub-patterns from the data set; and

    cluster the extracted sub-patterns into K groups such that each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity based upon a longest common substring between the sub-pattern and every other sub-pattern within the same group and also based upon values associated with characters or symbols for the sub-pattern and every other sub-pattern within the same group;

    wherein the processor is configured to determine the distance value D between any two sub-patterns s1 and s2 of the N sub-patterns based upon the following equation;

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×