×

AUTOMATICALLY MINING PATTERNS FOR RULE BASED DATA STANDARDIZATION SYSTEMS

  • US 20170147688A1
  • Filed: 02/07/2017
  • Published: 05/25/2017
  • Est. Priority Date: 03/07/2012
  • Status: Active Grant
First Claim
Patent Images

1. A system for mining for sub-patterns within a text data set, the system comprising:

  • a data source to store the text data set; and

    a processor configured with logic to;

    find a set of N frequently occurring sub-patterns within the text data set;

    extract the N sub-patterns from the data set; and

    cluster pairs of the extracted sub-patterns into K groups such that each pair of the extracted sub-patterns is placed within the same group with other pairs of the extracted sub-patterns based upon a respective distance value D of each pair of the extracted sub-patterns that determines a degree of similarity based upon a respective longest common substring between a respective first sub-pattern and a respective second sub-pattern of the each pair of the extracted sub-patterns within the same group and also based upon values associated with characters or symbols for the respective first sub-pattern and the respective second sub-pattern of the each pair of the extracted sub-patterns within the same group, whereinthe value of each of the characters or the symbols is related to a respective frequency of occurrence of each of the characters or the symbols in the text data set such that a character or a symbol having a frequency of occurrence that is higher than a frequency of occurrence of a second character or a second symbol has a lower value associated therewith than a value associated with the second character or the second symbol.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×