Algorithm for context sensitive spelling correction
First Claim
1. A database which stores information for allowing a computer program to evaluate the usage of a target word within the context of surrounding words in a target text, the database comprising:
- a computer-readable medium;
a database structure stored on the computer-readable medium, the database structure including;
a plurality of target words, each one of said target words including at least two members so as to constitute a target word cloud that represents said target word; and
a plurality of features and associated weight values stored with respect to each of said at least two members, the plurality of features reside in the vicinity of said one target word within the target text and are essentially common to all of said members;
wherein the plurality of weight values indicate a contextual relationship at least between the target word and the plurality of features.
8 Assignments
0 Petitions
Accused Products
Abstract
A method determining whether a target word used in a text is a correct word. The method being carried out in a data processor that is being configured according to a given operating system for reading data from, writing data to a storage medium and presenting data on a display screen. The method includes the steps of:
(a) identifying one or more features residing in the vicinity of the target word in a text. The features being associated with said target word in a database stored on said storage medium.
(b) using the features identified in step (a) to retrieve information from the database. The information being indicative as to the likelihood of said target word being in context with the features specified in step (a).
(c) using the information retrieved in step (b) as a criterion for indicating whether the target word is likely to be either the correct word or should it be replaced within said text.
138 Citations
24 Claims
-
1. A database which stores information for allowing a computer program to evaluate the usage of a target word within the context of surrounding words in a target text, the database comprising:
-
a computer-readable medium; a database structure stored on the computer-readable medium, the database structure including; a plurality of target words, each one of said target words including at least two members so as to constitute a target word cloud that represents said target word; and a plurality of features and associated weight values stored with respect to each of said at least two members, the plurality of features reside in the vicinity of said one target word within the target text and are essentially common to all of said members; wherein the plurality of weight values indicate a contextual relationship at least between the target word and the plurality of features. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 23, 24)
-
-
13. A method for determining whether a target word used in a text is a correct word, the method being carried out in a data processor and utilizing a database structure, the data processor being configured according to a given operating system for reading data from, writing data to a storage medium and presenting data on a display screen, the data structure including:
-
(i) a plurality of target words; (ii) each one of said target words has at least two members so as to constitute a target word cloud that represents said target word, a plurality of features and associated weight values stored with respect to each of said at least two members, the plurality of features reside in the vicinity of said one target word within the target text, and are essentially common to all of said members, wherein the plurality of weight values indicated a contextual relationship at least between the target word and the plurality of features; the method comprising; (a) identifying at least one feature residing in the vicinity of said target word in a text, said target word being in the database stored on said storage medium; (b) using the at least one feature identified in step (a) to retrieve information from the database by application of the at least two members in the target word cloud, the information being indicative as to the likelihood of said target word being in context with the at least one feature; and (c) evaluating the information retrieved in step (b) to determine whether the target word is likely to be either the correct word or if it should be replaced within said text. - View Dependent Claims (14, 15, 16, 18)
-
-
19. A training method for determining whether a target word used in a training corpus text is a correct word, the method being carried out in a data processor and utilizing a database structure on a storage medium, the data processor being configured according to a given operating system for reading data from, writing data to the storage medium and presenting data on a display screen, the data structure including:
-
(i) a plurality of target words; (ii) each one of said target words has at least two members so as to constitute a target word cloud that represents said target word, a plurality of features and associated weight values stored with respect to each of said at least two members, the plurality of features reside in the vicinity of said one target word within the target text, and are essentially common to all of said members, wherein the plurality of weight values indicated a contextual relationship at least between the target word and the plurality of features; the method comprising; (a) identifying at east one feature residing in the vicinity of said target word in a text, said target word being in the data structure stored on said storage medium; (b) using the at least one feature identified in step (a) to acquire information from the database by application of the at least two members in the target word cloud, the information being indicative as to the likelihood of said target word being in context with the at least one feature; and (c) using the information retrieved in step (b) as a criterion for predicting whether the target word is likely to be either the correct word or an incorrect word that should be replaced within said text and in the latter case, altering the information in said data structure. - View Dependent Claims (20, 21, 22)
-
Specification