Spelling and grammar checking system
First Claim
1. A computer implemented method of correcting a misspelled word in input text, the method comprising the steps of:
- detecting a misspelled word in the input text, wherein the detecting comprises comparing each word in the input text to a dictionary database and characterizing a word as misspelled when the word does not match any words in the dictionary database;
determining a list of alternative words for the misspelled word; and
ranking the list of alternative words based on a context of the input text, wherein the alternative words yield correct parts of speech sequences according to the context, wherein the ranking step comprises;
generating a first finite state machine (“
FSM”
) for the input text, the first FSM having a plurality of arcs which include the alternative words and weights associated therewith, where a weight of each alternative word corresponds to a likelihood that the alternative word, taken out of grammatical context, comprises a correctly-spelled version of the misspelled word; and
applying a second FSM to the first FSM wherein the second FSM encodes a set of grammatically correct sequences of words.
0 Assignments
0 Petitions
Accused Products
Abstract
System of correcting misspelled words in input text detects a misspelled word in the input text, determines a list of alternative words for the misspelled word, and ranks the list of alternative words based on a context of the input text. In certain embodiments, finite state machines (FSMs) are utilized in the spelling and grammar correction process, storing one or more lexicon FSMs, each of which represents a set of correctly spelled reference words. Storing the lexicon as one or more FSMs facilitates those embodiments of the invention employing a clinet-server architecture. The input text to be corrected may also be encoded as a FSM, which includes alternative word(s) for word(s) in need of correction along with associated weights. The invention adjusts the weights by taking into account the grammatical context in which the word appears in the input text. In certain embodiments the modification is performed by applying a second FSM to the FSM that was generated for the input text, where the second FSM encodes a grammatically correct sequence of words, thereby generating an additional FSM.
210 Citations
8 Claims
-
1. A computer implemented method of correcting a misspelled word in input text, the method comprising the steps of:
-
detecting a misspelled word in the input text, wherein the detecting comprises comparing each word in the input text to a dictionary database and characterizing a word as misspelled when the word does not match any words in the dictionary database; determining a list of alternative words for the misspelled word; and ranking the list of alternative words based on a context of the input text, wherein the alternative words yield correct parts of speech sequences according to the context, wherein the ranking step comprises; generating a first finite state machine (“
FSM”
) for the input text, the first FSM having a plurality of arcs which include the alternative words and weights associated therewith, where a weight of each alternative word corresponds to a likelihood that the alternative word, taken out of grammatical context, comprises a correctly-spelled version of the misspelled word; andapplying a second FSM to the first FSM wherein the second FSM encodes a set of grammatically correct sequences of words. - View Dependent Claims (4, 5)
-
-
2. A computer implemented method of correcting a misspelled word in input text, a method comprising the steps of:
-
detecting a misspelled word in the input text, wherein the detecting comprises comparing each word in the input text to a dictionary database and characterizing a word as misspelled when the word does not match any word in the dictionary database; determining a list of alternative words for the misspelled word; and ranking the list of alternative words based on a context of the input text, wherein the alternative words yield correct parts of speech sequences according to the context, wherein the list of alternative words includes a best alternative word, the method further comprising the step of replacing the misspelled word with the best alternative word, wherein the ranking step comprises; generating a first finite state machine (“
FSM”
) for the input text, the first FSM having a plurality of arcs which include the alternative words and weights associated therewith, where a weight of each alternative word corresponds to a likelihood that the alternative word, taken out of grammatical context, comprises a correctly-spelled version of a misspelled word; andapplying a second FSM to the first FSM, wherein the second FSM encodes a set of grammatically correct sequences of words.
-
-
3. A computer implemented method of correcting a misspelled word in input text, the method comprising the steps of:
-
detecting a misspelled word in the input text, wherein the detecting comprises comparing each word in the input text to a dictionary database and characterizing a word as misspelled when the word does not match any words in the dictionary database; determining a list of alternative words for the misspelled word; and ranking the list of alternative words based on a context of the input text, wherein the alternative words yield correct parts of speech sequences according to the context and further comprising the steps of; determining whether an alternative word is an element of a compound word or lexical phrase; and wherein the ranking step comprises modifying the rank of one or more of the alternative words if the alternative word is an element of a compound word or lexical phrase, wherein the ranking step comprises; generating a first finite state machine (“
FSM”
) for the input text, the first FSM having a plurality of arcs which include the alternative words and weights associated therewith, where a weight of each alternative word corresponds to a likelihood that the alternative word, taken out of grammatical context, comprises a correctly-spelled version of the misspelled word, andapplying a second FSM to the first FSM wherein the second FSM encodes a set of grammatically correct sequences of words.
-
-
6. A computer implemented method of correcting misspelled words in input text sequences received from a plurality of different clients, the method comprising the steps of:
-
storing, in a memory on a server, a single shared lexicon comprised of a plurality of reference words; receiving the input text sequences from the plurality of different clients; spell-checking the input text sequences using the reference words in the single shared lexicon; and outputting spell-checked text sequences to the plurality of different clients, wherein the single shared lexicon comprises one or more lexicon finite state machines (“
FSM”
), each of the lexicon FSMs representing plural reference words, wherein a representation of a reference word comprises one or more states and one or more arcs, each arc comprising a character in the reference word; andwherein the spell-checking step comprises a correcting step for correcting misspelled words in each of the input text sequences substantially in parallel using the single shared lexicon comprised of one or more lexicon FSMs, wherein, for each text sequence, the correcting step comprises; generating an additional FSM comprising a plurality of states, each state including information identifying a state of a lexicon FSM and a position in the input word and a cost, wherein the cost is used to select states of the additional FSM that are to be expanded; selecting one or more reference words from the lexicon FSMs based on the additional FSM; and replacing the misspelled word in the text sequence with a selected one of the one or more reference words.
-
-
7. A computer implemented method of correcting misspelled words in input text sequences received from a plurality of different clients, the method comprising the steps of:
-
storing, in a memory on a server, a single shared lexicon comprised of a plurality of reference words; receiving the input text sequences from the plurality of different clients; spell-checking the input text sequences using the reference words in the single shared lexicon; and outputting spell-checked text sequences to the plurality of different clients, wherein the single shared lexicon comprises one or more lexicon finite state machines (“
FSM”
), each of the lexicon FSMs representing plural reference words, wherein a representation of a reference word comprises one or more states and one or more arcs, each arc comprising a character in the reference word; andwherein the spell-checking step comprises a correcting step for correcting misspelled words at each of the input text sequences substantially in parallel using the single shared lexicon comprised of one or more lexicon FSMs, further comprising the step of; generating an FSM for a misspelled word in the text sequence, wherein the input FSM comprises one or more states and one or more arcs, each arc comprising a character in the reference word.
-
-
8. A computer implemented method of correcting misspelled words in input text sequences received from a plurality of different clients, the method comprising the steps of:
-
storing, in a memory on a sewer, a single shared lexicon comprised of a plurality of reference words; receiving the input text sequences from the plurality of different clients; spell-checking the input text sequences using the reference words in the single shared lexicon; and outputting spell-checked text sequences to the plurality of different clients, wherein the single shared lexicon comprises one or more lexicon finite state machines (“
FSM”
), each of the lexicon FSMs representing plural reference words, wherein a representation of a reference word comprises one or more states and one or more arcs, each arc comprising a character in the reference word; andwherein the spell-checking step comprises a correcting step for correcting misspelled words in each of the input text sequences substantially in parallel using the single shared lexicon comprised of one or more lexicon FSMs, farther comprising the step of; generating an input FSM for a misspelled word in the text sequence, wherein the input FSM comprises one or more states and one or more arcs, each arc comprising a pair of characters, one of which is a character in the reference word and the other of which is a phonetic representation thereof.
-
Specification