Please download the dossier by clicking on the dossier button x
×

Detection of patterns in data records

  • US 7,814,111 B2
  • Filed: 01/03/2007
  • Issued: 10/12/2010
  • Est. Priority Date: 01/03/2006
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method for processing data comprising character strings, the method comprising:

  • receiving an input from a user comprising positive examples and negative examples of a specified data type, the positive examples comprising first character strings that belong to the specified data type, and the negative examples comprising second character strings that do not belong to the specified data type, wherein receiving the input comprises displaying a collection of the character strings, and accepting from the user an indication of which of the character strings to include in the specified data type and which of the character strings to exclude from the specified data type;

    processing the first and second character strings to create a set of attributes that characterize the positive examples, wherein the processing comprises;

    assigning a different character code from character codes to each character type of characters forming the first and second character strings; and

    encoding the first and second character strings as sequences of the character codes without distinguishing between sequential characters of the same type, wherein one character code is assigned to letters and another character code is assigned to digits;

    building a decision tree, based on the attributes, which when applied to the first and second strings, distinguishes the positive examples from the negative examples; and

    applying the decision tree to the data so as to classify a character string of the character strings as belonging to the specified data type.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×