System for spelling correction in which the context of a target word in a sentence is utilized to determine which of several possible words was intended
First Claim
1. A system for spelling correction in which the context of a target word in a sentence is utilized to determine which of several possible words was intended, comprising:
- a training corpus containing a set of sentences;
a dictionary of part-of-speech tags of words in said training corpus;
a confusion set including a list of possible words that could have been intended for said target word;
an ordered list of features usable to discriminate among words in said confusion set to correct instances in which one word in said confusion set has been incorrectly substituted for another; and
,means responsive to said training corpus, said dictionary, said confusion set, said ordered list of features, and said target word for determining the intended spelling of said target word from context, said means for determining the intended selling of said target word including means for assigning a probability to each word in said confusion set, means for obtaining a feature from said ordered list of features, means for ascertaining if said obtained feature matches the context of said target word in said sentence, thereby to provide a list of matched features, and means for determining if a feature from said ordered list conflicts with a previously obtained feature, said conflict-determining means including means for establishing if there is an egregious interdependency between said obtained features.
3 Assignments
0 Petitions
Accused Products
Abstract
A system is provided for spelling correction in which the context of a wordn a sentence is utilized to determine which of several alternative or possible words was intended. The probability that a particular alternative was the word that was intended is determined through Bayesian analysis utilizing multiple kinds of features of the context of the target word, such as the presence of certain characteristic words within some distance of the target word, or the presence of certain characteristic patterns of words and part-of-speech tags around the target word. The system successfully combines multiple types of features via Bayesian analysis through means for resolving egregious interdependencies among features. The system first recognizes the interdependencies, and then resolves them by deleting all but the strongest feature involved in each interdependency, thereby allowing it to make its decisions based on the strongest non-conflicting set of features. In addition, the robustness of the system'"'"'s decisions is enhanced by the pruning or deletion from consideration of certain features, in one case by deleting features for which there is insufficient evidence in the training corpus to support reliable decision-making, and secondly by deleting features which are uninformative at discriminating among the alternative spellings of the target word under consideration.
129 Citations
9 Claims
-
1. A system for spelling correction in which the context of a target word in a sentence is utilized to determine which of several possible words was intended, comprising:
-
a training corpus containing a set of sentences; a dictionary of part-of-speech tags of words in said training corpus; a confusion set including a list of possible words that could have been intended for said target word; an ordered list of features usable to discriminate among words in said confusion set to correct instances in which one word in said confusion set has been incorrectly substituted for another; and
,means responsive to said training corpus, said dictionary, said confusion set, said ordered list of features, and said target word for determining the intended spelling of said target word from context, said means for determining the intended selling of said target word including means for assigning a probability to each word in said confusion set, means for obtaining a feature from said ordered list of features, means for ascertaining if said obtained feature matches the context of said target word in said sentence, thereby to provide a list of matched features, and means for determining if a feature from said ordered list conflicts with a previously obtained feature, said conflict-determining means including means for establishing if there is an egregious interdependency between said obtained features. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for spelling correction in which the context of a target word in a sentence is utilized to determine which of several possible words was intended, comprising:
-
a training corpus containing a set of sentences; a dictionary of part-of-speech tags of words in said training corpus; a confusion set including a list of possible words that could have been intended for said target word; an ordered list of features usable to discriminate among words in said confusion set to correct instances in which one word in said confusion set has been incorrectly substituted for another; means responsive to said training corpus, said dictionary, said confusion set, said ordered list of features, and said target word for determining the intended spelling of said target word from context; and
,means for providing said ordered list including means for providing a pruned list of features, said means for providing a pruned list of features including means responsive to said training corpus, said confusion set, and said dictionary for proposing all possible features as candidate features; means for providing a count of the occurrences of each candidate feature in said training corpus; means responsive to said count for enumerating features having a count below a predetermined threshold; and
,means for eliminating features that are not informative at discriminating among the words in said confusion set. - View Dependent Claims (7, 8, 9)
-
Specification