Methods and system for fast, adaptive correction of misspells
First Claim
1. A computer-implemented method for adaptive correction of misspelling, the method comprising:
- pre-training, by a processor, a pre-trained word vector;
receiving, at the processor from a user device connected to the processor, a text for spelling analysis;
creating, by the processor, a table of entries for each correctly spelled word in a corpus, wherein each table of entries includes alternative words of a particular correctly spelled word, each alternative word having one or more characters less than the particular correctly spelled word, the number of occurrences of each alternative word in the corpus, and links of the alternative words to the particular correctly spelled word;
comparing, by the processor, a particular misspelled word in the text to the table of entries having an edit distance from the particular misspelled word and a minimum frequency of occurrence in the corpus to form a candidate set of entries;
mapping, by the processor, each word in the text to the pre-trained word vector;
obtaining, by the processor, a first vector representing a left context of the particular misspelled word and a second vector representing a right context of the particular misspelled word using a recurrent neural network (RNN);
inputting, by the processor, the first vector and the second vector to a fully connected layer through the RNN, and inputting, by the processor, a third vector representing the particular misspelled word directly to the fully connected layer;
replacing, by the processor, the particular misspelled word with each candidate in the candidate set of entries;
outputting, by the processor, a context sensitive score from a logistic unit for each candidate, wherein the logistic unit is connected to the fully connected layer;
ranking, by the processor, the candidate set of entries utilizing the context sensitive score so that each candidate has a ranking;
ordering, by the processor, at least some of the candidates based on the ranking to identify corrections to the particular misspelled word; and
displaying, to a user, the corrections to the particular misspelled word.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments are directed to a spellcheck module for an enterprise search engine. The spellcheck module includes a candidate suggestion generation module that generates a number of candidate words that may be the correction of the misspelled word. The candidate suggestion generation module implements an algorithm for indexing, searching, and storing terms from an index with a constrained edit distance, using words in a collection of documents. The spellcheck module further includes a candidate suggestion ranking module. In one embodiment, a non-contextual approach using a linear combination of distance and probability scores is utilized; while in another embodiment, a context sensitive approach accounting for real-word misspells and adopting deep learning models is utilized. In use, a query is provided to the spellcheck module to generate results in the form of a ranked list of generated candidate entries that may be an entry a user accidentally misspelled.
-
Citations
14 Claims
-
1. A computer-implemented method for adaptive correction of misspelling, the method comprising:
-
pre-training, by a processor, a pre-trained word vector; receiving, at the processor from a user device connected to the processor, a text for spelling analysis; creating, by the processor, a table of entries for each correctly spelled word in a corpus, wherein each table of entries includes alternative words of a particular correctly spelled word, each alternative word having one or more characters less than the particular correctly spelled word, the number of occurrences of each alternative word in the corpus, and links of the alternative words to the particular correctly spelled word; comparing, by the processor, a particular misspelled word in the text to the table of entries having an edit distance from the particular misspelled word and a minimum frequency of occurrence in the corpus to form a candidate set of entries; mapping, by the processor, each word in the text to the pre-trained word vector; obtaining, by the processor, a first vector representing a left context of the particular misspelled word and a second vector representing a right context of the particular misspelled word using a recurrent neural network (RNN); inputting, by the processor, the first vector and the second vector to a fully connected layer through the RNN, and inputting, by the processor, a third vector representing the particular misspelled word directly to the fully connected layer; replacing, by the processor, the particular misspelled word with each candidate in the candidate set of entries; outputting, by the processor, a context sensitive score from a logistic unit for each candidate, wherein the logistic unit is connected to the fully connected layer; ranking, by the processor, the candidate set of entries utilizing the context sensitive score so that each candidate has a ranking; ordering, by the processor, at least some of the candidates based on the ranking to identify corrections to the particular misspelled word; and displaying, to a user, the corrections to the particular misspelled word. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for adaptive correction of misspelling, the system comprising:
a processor coupled to one or more user devices to receive user-generated search queries from the one or more user devices, the processor configured to; pre-train a pre-trained word vector; receive, from a first user device of the one or more user devices, a text for spelling analysis; create a table of entries for each correctly spelled word in a corpus, wherein each table of entries includes alternative words of a particular correctly spelled word, each alternative word having one or more characters less than the particular correctly spelled word, the number of occurrences of each alternative word in the corpus, and links of the alternative words to the particular correctly spelled word; compare a particular misspelled word in the text to the table of entries having an edit distance from the particular misspelled word and a minimum frequency of occurrence in the corpus to form a candidate set of entries; map each word in the text to the pre-trained word vector; obtain a first vector representing a left context and a second vector representing a right context using a recurrent neural network (RNN); output, the first vector and the second vector from the RNN to a fully connected layer, and output a third vector representing the particular misspelled word in the text directly to the fully connected layer; replace the particular misspelled word with each candidate in the candidate set of entries; output a context sensitive score from a logistic unit for each candidate, wherein the logistic unit is connected to the fully connected layer; rank the candidate set of entries utilizing the context sensitive score so that each candidate has a ranking; order at least some of the candidates based on the ranking to identify corrections to the particular misspelled word; and display, to a user, the corrections to the particular misspelled word. - View Dependent Claims (7, 8, 9, 10)
-
11. A computer program product for adaptive correction of misspelling, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor coupled to one or more user devices to receive user-generated search queries from the one or more user devices to cause the processor to:
-
pre-train a pre-trained word vector; receive, from a first user device of the one or more user devices, a text for spelling analysis; create a table of entries for each correctly spelled word in a corpus, wherein each table of entries includes alternative words of a particular correctly spelled word, each alternative word having one or more characters less than the particular correctly spelled word, the number of occurrences of each alternative word in the corpus, and links of the alternative words to the particular correctly spelled word; compare a particular misspelled word in the text to the table of entries having an edit distance from the particular misspelled word and a minimum frequency of occurrence in the corpus to form a candidate set of entries; map each word in the text to the pre-trained word vector; obtain a first vector representing a left context and a second vector representing a right context using a recurrent neural network (RNN); output the first vector and the second vector from the RNN to a fully connected layer, and output a third vector representing the particular misspelled word in the text directly to the fully connected layer; replace the particular misspelled word with each candidate in the candidate set of entries; output a context sensitive score from a logistic unit for each candidate, wherein the logistic unit is connected to the fully connected layer; rank the candidate set of entries utilizing the context sensitive score so that each candidate has a ranking; order at least some of the candidates based on the ranking to identify corrections to the particular misspelled word; and display, to a user, the corrections to the particular misspelled word. - View Dependent Claims (12, 13, 14)
-
Specification