×

System and method for normalization of a string of words

  • US 7,822,598 B2
  • Filed: 02/28/2005
  • Issued: 10/26/2010
  • Est. Priority Date: 02/27/2004
  • Status: Active Grant
First Claim
Patent Images

1. In a computer system, a method for use in a predetermined categorization scheme, comprising:

  • normalizing a string of words utilizing a computer configured to perform the steps of;

    receiving an input string of text;

    tagging the string of text by annotating a string of words with labels marking the start and end of relevant portions of text;

    comparing said tagged strings of text to a literal index, the literal index including a plurality of predetermined text sequences;

    determining if the string of text matches at least one of the plurality of predetermined text sequences within the literal index;

    if the string of words does not match at least one of the plurality of predetermined text sequences;

    determining a baseform transform of the input string, said baseform transform derived by removing of noise words and stemming the remaining words using de-derivation and uninflection, said baseform transform including at least one baseform associated with the input string;

    preparing a sorted version of the baseform transform;

    comparing the at least one baseform to a baseform index, the baseform index including a plurality of predetermined baseform sequences;

    determining a score for each of the plurality of predetermined baseform sequences that substantially match the at least one baseform and outputting feedback for any baseforms that exceed a predetermined threshold score;

    if no baseforms exceed the predetermined threshold score;

    computing a feature transformation of the input string, the feature transform including at least one feature associated with the input string;

    comparing the at least one feature to a feature index, the feature index including a plurality of predetermined feature sequences;

    determining a score for each of the plurality of predetermined feature sequences that substantially match the at least one feature; and

    outputting a hit list of candidate sequence matches based on the input string, and if no feature sequences are found based on the input string, outputting an indication that no predetermined text sequences were found within the predetermined categorization scheme wherein the method is performed by a computer executing stored instructions.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×