System and method for normalization of a string of words
First Claim
1. A method for adaptive categorization of strings of text in a predetermined categorization scheme, comprising the steps of:
- receiving a string of text;
tagging the string of text;
normalizing the string of text to create a normalized string of text;
comparing the normalized string of text to a plurality of sequences of text within the predetermined categorization scheme;
if the normalized string of text substantially matches at least one of the plurality of the sequences of text in the predetermined categorization scheme, output data associated with the normalized string of text based on the comparison of the normalized string of text with the sequences of text within the predetermined categorization scheme; and
if the normalized string of text does not substantially match at least one of the plurality of the sequences of text within the predetermined categorization scheme, classify the string of text within the predetermined categorization scheme.
9 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates generally to a system and method for categorization of strings of words. More specifically, the present invention relates to a system and method for normalizing a string of words for use in a system for categorization of words in a predetermined categorization scheme. A method for adaptive categorization of words in a predetermined categorization scheme may include receiving a string of text, tagging the string of text, and normalizing the string of text. Normalization may be performed with a three-stage algorithm including a literal match processing stage, an approximation match processing stage, and a nearest neighbor match processing stage. The normalized string of text can be compared to a number of sequences of text in the predetermined categorization scheme.
-
Citations
16 Claims
-
1. A method for adaptive categorization of strings of text in a predetermined categorization scheme, comprising the steps of:
-
receiving a string of text;
tagging the string of text;
normalizing the string of text to create a normalized string of text;
comparing the normalized string of text to a plurality of sequences of text within the predetermined categorization scheme;
if the normalized string of text substantially matches at least one of the plurality of the sequences of text in the predetermined categorization scheme, output data associated with the normalized string of text based on the comparison of the normalized string of text with the sequences of text within the predetermined categorization scheme; and
if the normalized string of text does not substantially match at least one of the plurality of the sequences of text within the predetermined categorization scheme, classify the string of text within the predetermined categorization scheme.
-
-
2. A method for normalizing a string of words for use in a predetermined categorization scheme, comprising the steps of:
-
receiving an input string of text;
comparing the input string to a literal index, the literal index including a plurality of predetermined text sequences determining if the string of text matches at least one of the plurality of predetermined text sequences within the literal index;
if the string of words does not match at least one of the plurality of predetermined text sequences;
determining a baseform transform of the input string, the baseform transform including at least one baseform associated with the input string;
preparing a sorted version of the baseform transform;
comparing the at least one baseform to a baseform index, the baseform index including a plurality of predetermined baseform sequences;
determining a score for each of the plurality of predetermined baseform sequences that substantially match the at least one baseform and outputting feedback for any baseforms that exceed a predetermined threshold score;
if no baseforms exceed the predetermined threshold score;
computing a feature transformation of the input string, the feature transform including at least one feature associated with the input string;
comparing the at least one feature to a feature index, the feature index including a plurality of predetermined feature sequences;
determining a score for each of the plurality of predetermined feature sequences that substantially match the at least one feature; and
outputting a hit list of candidate sequence matches based on the input string, and if no feature sequences are found based on the input string, outputting an indication that no predetermined text sequences were found within the predetermined categorization scheme.
-
-
3. A system normalizing a string of words for use in a predetermined categorization scheme, the system comprising:
a computer having a computer code mechanism programmed to receive a string of text, tag the string of text, create a normalized string of text and compare the normalized string of text to a plurality of sequences of text within the predetermined categorization scheme. - View Dependent Claims (4, 5, 6)
-
7. An apparatus for normalizing a string of words for use in a predetermined categorization scheme, comprising:
a computer having a computer code mechanism programmed to receive an input string of text, compare the input string to a literal index, where the literal index includes a plurality of predetermined text sequences, and determine if the string of text matches at least one of the plurality of predetermined text sequences within the literal index. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16)
Specification