Spelling and grammar checking system
First Claim
1. A method of correcting a misspelled word in input text, the method comprising steps of:
- storing one or more lexicon finite state machines (FSM), each of the lexicon FSMs representing plural reference words, wherein a representation of a reference word comprises one or more states and one or more arcs, each arc comprising a character in the reference word;
detecting a misspelled word in the input text, wherein the detecting comprises comparing each word in the input text to a dictionary database and characterizing a word as misspelled when the word does not match any words in the dictionary database;
generating an additional FSM comprising a plurality of states, each state including information identifying a state of a lexicon FSM and a position in the input word and a cost, wherein the cost is used to select states of the additional FSM that are to be expanded; and
determining a list of alternative words for the misspelled word, wherein the determining comprises selecting one or more words from the additional FSM.
2 Assignments
0 Petitions
Accused Products
Abstract
A system of correcting misspelled words in input text detects a misspelled word in the input text, determines a list of alternative words for the misspelled word, and ranks the list of alternative words based on a context of the input text. The system then selects one of the alternative words from the list, and replaces the misspelled word in the text with the selected one of the alternative words.
In certain embodiments of the invention finite state machines are utilized in the spelling and grammar correction process. Thus according to certain embodiments the invention stores one or more lexicon finite state machines (FSM), each of which represents a set of correctly spelled reference words. Storing the lexicon as one or more finite state machines facilitates those embodiments of the invention employing a client-server architecture. The input text to be corrected may also be encoded as a finite state machine, which includes alternative word(s) for word(s) in need of correction along with associated weights. The weights are determined by a process that involves assessing the number and type of changes that would be required in order to transform an incorrect word, e.g., a misspelled word, into a correct word. The invention adjusts the weights by taking into account the grammatical context in which the word appears in the input text. In certain embodiments of the invention the modification is performed by applying a second finite state machine to the finite state machine that was generated for the input text, where the second finite state machine encodes a grammatically correct sequence of words, thereby generating an additional finite state machine.
-
Citations
40 Claims
-
1. A method of correcting a misspelled word in input text, the method comprising steps of:
-
storing one or more lexicon finite state machines (FSM), each of the lexicon FSMs representing plural reference words, wherein a representation of a reference word comprises one or more states and one or more arcs, each arc comprising a character in the reference word;
detecting a misspelled word in the input text, wherein the detecting comprises comparing each word in the input text to a dictionary database and characterizing a word as misspelled when the word does not match any words in the dictionary database;
generating an additional FSM comprising a plurality of states, each state including information identifying a state of a lexicon FSM and a position in the input word and a cost, wherein the cost is used to select states of the additional FSM that are to be expanded; and
determining a list of alternative words for the misspelled word, wherein the determining comprises selecting one or more words from the additional FSM. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
ranking the list of alternative words.
-
-
7. The method of claim 6, further comprising the step of:
selecting one of the alternative words as a correct spelling for the misspelled word based on the ranking.
-
8. The method of claim 1, 2, 5, or 3, wherein each of the alternative words corresponds to a final state of the additional FSM, further including the step of:
ranking the list of alternative words based on the costs of the final states corresponding to the alternative words.
-
9. The method of claim 1, wherein the states further include information indicating whether a character transposition has occurred in the input word.
-
10. The method of claim 1, wherein the generating step comprises applying one or more states of the additional FSM to one or more modules selected from the list consisting of:
- (i) a character identity module, (ii) a phonetic identity module, (iii) a character insertion module, (iv) a character deletion module, (v) a character replacement module, (vi) a character transposition module, and (vii) a character transposition completion module, wherein (i) the character identity module determines whether characters of a reference word in the lexicon FSM match characters of the misspelled word, (ii) the phonetic identity module determines whether characters of the reference word are pronounced the same as characters of the misspelled word, (iii) the character insertion module determines whether a character inserted in the misspelled word causes at least part of the incorrect word to match at least part of the reference word, (iv) the character deletion module determines whether a character deleted in the misspelled word causes at least part of the misspelled word to match at least part of the reference word, (v) the character replacement module replaces characters in the incorrect word with characters in the reference word in order to determine whether at least part of the misspelled word matches at least part of the reference word, (vi) the character transposition module changes the order of two or more characters in the misspelled word and compares a changed character in the misspelled word to a corresponding character in the reference word, and (vii) the character transposition completion module compares characters in the misspelled word which were not compared by the character transposition module in order to determine if at least part of the misspelled word matches at least part of the reference word.
-
11. A method of correcting a misspelled word in input text, the method comprising steps of:
-
storing one or more lexicon finite state machines (FSM), each of the lexicon FSMs representing plural reference words together with a phonetic representation of each reference word, wherein a representation of a reference word together with its phonetic representation comprises one or more states and one or more arcs, each arc comprising a pair of characters, one of which is a character in the reference word and the other of which is a phonetic representation thereof;
detecting a misspelled word in the input text, wherein the detecting comprises comparing each word in the input text to a dictionary database and characterizing a word as misspelled when the word does not match any words in the dictionary database;
generating an input FSM for the misspelled word, the input FSM representing the misspelled word together with a phonetic representation of the misspelled word, wherein the representation of the misspelled word together with its phonetic representation comprises one or more arcs, each arc comprising a pair of characters, one of which is a character in the reference word and the other of which is a phonetic representation thereof;
generating an additional FSM comprising a plurality of states, each state including information identifying a state of a lexicon FSM and a state of the input FSM and a cost, wherein the cost is used to select states of the additional FSM that are to be expanded; and
determining a list of alternative words for the misspelled word, wherein the determining comprises selecting one or more words from the additional FSM. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
ranking the list of alternative words.
-
-
17. The method of claim 11, further comprising the step of:
selecting one of the alternative words as a correct spelling for the misspelled word based on the ranking.
-
18. The method of claim 11, 13, 14, or 15, wherein each of the alternative words corresponds to a final state of the additional FSM, further including the step of:
ranking the list of alternative words based on the costs of the final states corresponding to the alternative words.
-
19. The method of claim 11, wherein the states further include information indicating whether a character transposition has occurred in the input word.
-
20. The method of claim 11, wherein the generating step comprises applying one or more states of the additional FSM to one or more modules selected from the list consisting of:
- (i) a character identity module, (ii) a phonetic identity module, (iii) a character insertion module, (iv) a character deletion module, (v) a character replacement module, (vi) a character transposition module, and (vii) a character transposition completion module, wherein (i) the character identity module determines whether characters of a reference word in the lexicon FSM match characters of the misspelled word, (ii) the phonetic identity module determines whether characters of the reference word are pronounced the same as characters of the misspelled word, (iii) the character insertion module determines whether a character inserted in the misspelled word causes at least part of the incorrect word to match at least part of the reference word, (iv) the character deletion module determines whether a character deleted in the misspelled word causes at least part of the misspelled word to match at least part of the reference word, (v) the character replacement module replaces characters in the incorrect word with characters in the reference word in order to determine whether at least part of the misspelled word matches at least part of the reference word, (vi) the character transposition module changes the order of two or more characters in the misspelled word and compares a changed character in the misspelled word to a corresponding character in the reference word, and (vii) the character transposition completion module compares characters in the misspelled word which were not compared by the character transposition module in order to determine if at least part of the misspelled word matches at least part of the reference word.
-
21. A method of correcting a misspelled word in input text, the method comprising steps of:
-
storing one or more lexicon finite state machines (FSM), each of the lexicon FSMs representing plural reference words, wherein a representation of a reference word comprises one or more states and one or more arcs, each arc comprising a character in the reference word;
detecting a misspelled word in the input text, wherein the detecting comprises comparing each word in the input text to a dictionary database and characterizing a word as misspelled when the word does not match any words in the dictionary database;
generating an additional FSM comprising a plurality of states, each state including information identifying a state of a lexicon FSM and a position in the input word and a cost indicating the likelihood that a current state is on a path leading to a correct spelling of the misspelled word, wherein the cost is used to select states of the additional FSM that are to be expanded;
determining a list of alternative words for the misspelled word, wherein the determining comprises selecting one or more words from the additional FSM; and
ranking the list of alternative words based on a context of the input text. - View Dependent Claims (22)
-
-
23. A method of retrieving text from a source, the method comprising the steps of:
-
inputting a search word;
correcting a spelling of the search word through the steps of storing one or more lexicon finite state machines (FSM), each of the lexicon FSMs representing plural reference words, wherein a representation of a reference word comprises one or more states and one or more arcs, each arc comprising a character in the reference word, comparing the search word to a dictionary database and characterizing the search word as a misspelled search word when the search word does not match any words in the dictionary database, generating an additional FSM comprising a plurality of states, each state including information identifying a state of a lexicon FSM and a position in the search word and a cost indicating the likelihood that a current state is on a path leading to a correct spelling of the misspelled search word, wherein the cost is used to select states of the additional FSM that are to be expanded, and determining a list of alternative words for the misspelled search word, wherein the determining comprises selecting one or more words from the additional FSM; and
retrieving text from the source that includes the corrected search word. - View Dependent Claims (24, 25, 26, 27)
-
-
28. A method of retrieving text from a source, the method comprising the steps of:
-
inputting a search phrase comprised of a plurality of words, at least one of the plurality of words being a misspelled word;
replacing the misspelled word in the search phrase with a corrected word in order to produce a corrected search phrase through the steps of storing one or more lexicon finite state machines (FSM), each of the lexicon FSMs representing plural reference words, wherein a representation of a reference word comprises one or more states and one or more arcs, each arc comprising a character in the reference word, comparing each word in the search phrase to a dictionary database and characterizing a word as a misspelled word when the word does not match any words in the dictionary database, generating an additional FSM comprising a plurality of states, each state including information identifying a state of a lexicon FSM and a position in the input search word and a cost indicating that a current state is on a path leading to a correct spelling of the misspelled word, wherein the cost is used to select states of the additional FSM that are to be expanded, and determining a list of alternative words for the misspelled word, wherein the determining comprises selecting one or more words from the additional FSM; and
retrieving text from the source based on the corrected search phrase. - View Dependent Claims (29, 30, 31, 32)
-
-
33. A method of correcting misspelled words in input text sequences received from a plurality of different clients, the method comprising the steps of:
-
storing, in a memory on a server, a single shared lexicon comprised of a plurality of reference words, wherein the single shared lexicon comprises one or more lexicon finite state machines (“
FSM”
), each of the lexicon FSMs representing plural reference words together with a phonetic representation of each reference word, wherein a representation of a reference word comprises one or more states and one or more arcs, each arc comprising a pair of characters, one of which is a character in the reference word and the other of which is a phonetic representation thereof;
receiving the input text sequences from the plurality of different clients;
spell-checking the input text sequences using the reference words in the single shared lexicon, wherein the spell-checking step comprises a correcting step for correcting misspelled words in each of the input text sequences substantially in parallel using the single shared lexicon comprised of one or more lexicon FSMs; and
outputting spell-checked text sequences to the plurality of different clients. - View Dependent Claims (34, 35, 36)
generating an additional FSM comprising a plurality of states, each state including information identifying a state of a lexicon FSM and a position in the input word and a cost, wherein the cost is used to select states of the additional FSM that are to be expanded;
selecting one or more reference words from the lexicon FSMs based on the additional FSM; and
replacing the misspelled word in the text sequence with a selected one of the one or reference words.
-
-
35. The method of claim 34, further comprising the step of:
generating an input FSM for a misspelled word in the text sequence, wherein the input FSM comprises one or more states and one or more arcs, each arc comprising a character in the reference word.
-
36. The method of claim 34, further comprising the step of:
generating an input FSM for a misspelled word in the text sequence, wherein the input FSM comprises one or more states and one or more arcs, each arc comprising a pair of characters, one of which is a character in the reference word and the other of which is a phonetic representation thereof.
-
37. A method of spell-checking input text, the method comprising the steps of:
-
detecting a misspelled word in the input text;
storing one or more lexicon finite state machines (“
FSM”
) in a memory, each of the lexicon FSMs including plural reference words;
generating an input FSM for the misspelled word;
selecting one or more reference words from the lexicon FSMs based on the input FSM, the one or more reference words substantially corresponding to a spelling of the misspelled word, wherein the selecting step further comprises the steps of generating an additional FSM comprising a plurality of states, each state identifying a state of a lexicon FSM and a position in the input word and a cost, wherein the cost is used to select states of the additional FSM that are to be expanded, and selecting the one or more reference words from the lexicon FSMs using the additional FSM; and
otuputting selected ones of the one or more reference words. - View Dependent Claims (38, 39)
wherein (i) the character identity module determines whether characters of a reference word in the lexicon FSM match characters of the misspelled word in the input FSM, (ii) the phonetic identity module determines whether characters of the reference word are pronounced the same as characters of the misspelled word, (iii) the character insertion module determines whether a character inserted in the misspelled word causes at least part of the misspelled word to match at least part of the reference word, (iv) the character deletion module determines whether a character deleted from the misspelled word causes at least part of the misspelled word to match at least part of the reference word, (v) the character replacement module replaces characters in the misspelled word with characters in the reference word in order to determine whether at least part of the misspelled word matches at least part of the reference word, (vi) the character transposition module changes the order of two or more characters in the misspelled word and compares a changed character in the misspelled word to a corresponding character in the reference word, and (vi) the character transposition completion module compares characters in the misspelled word which were not compared by the character transposition module in order to determine if at least part of the misspelled word matches at least part of the reference word.
-
-
39. The method of claim 37,
wherein each of the lexicon FSMs also includes a phonetic representation of each reference word; -
wherein the input FSM also includes a phonetic representation of the misspelled word; and
wherein the selecting step selects reference words from the lexicon FSMs which also substantially correspond to the phonetic representation of the misspelled word.
-
-
40. A method of correcting a misspelled word in input text, the method comprising the steps of:
-
storing one or more lexicon finite state machines (FSM), each of the lexicon FSMs representing plural reference words together with a phonetic representation of each reference word, wherein a representation of a reference word together with its phonetic representation comprises one or more states and one or more arcs, each arc comprising a pair of characters, one of which is a character in the reference word and the other of which is a phonetic representation thereof;
detecting a misspelled word in the input text, wherein the detecting comprises comparing each word in the input text to a dictionary database and characterizing a word as misspelled when the word does not match any words in the dictionary database;
generating an input FSM for the misspelled word, the input FSM representing the misspelled word together with a phonetic representation of the misspelled word, wherein the representation of the misspelled word together with its phonetic representation comprises one or more arcs, each arc comprising a pair of characters, one of which is a character in the reference word and the other of which is a phonetic representation thereof;
generating an additional FSM comprising a plurality of states, each state including information identifying a state of a lexicon FSM and a state of the input FSM and a cost indicating of the likelihood that a current state is on a path leading to a correct spelling of the misspelled word, wherein the cost is used to select states of the additional FSM that are to be expanded;
determining a list of alternative words for the misspelled word, wherein the determining comprises selecting one or more words from the additional FSM; and
ranking the list of the alternative words based on a context of the input text.
-
Specification