Systems and methods for improved spell checking
First Claim
1. A system comprising a computer readable storage having computer readable instructions stored thereon, when executed causes a computer to facilitate spell checking comprising:
- a component that receives input data containing text; and
a spell checking component that identifies a set of potentially misspelled substrings in the text and proposes at least one alternative spelling for the substring set based on at least one query log;
the query log comprising data utilized by users to query a data collection over a time frame, the spell checking component employs an iterative process where alternative spellings for potentially misspelled stop words in the substring are identified in an iteration after the iteration where alternative spellings for potentially misspelled words that are not stop words in the same substring are identified, wherein a Viterbi search is employed to facilitate in determining the optimum set of alternative spellings according to the context model in each iteration;
the Viterbi search can employ fringes to restrict a search for alternative spellings in an iteration such that for every pair of adjacent substrings, if any of the substrings is in at least one trusted lexicon, then only one of the substrings is allowed to change in that iteration.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention leverages iterative transformations of search query strings along with statistics extracted from search query logs and/or web data to provide possible alternative spellings for the search query strings. This provides for spell checking that can be influenced to provide individualized suggestions for each user. By utilizing search query logs, the present invention can account for substrings not found in a lexicon but still acceptable as a search query of interest. This allows for a higher quality proposal for alternative spellings, beyond the content of the lexicon. One instance of the present invention operates at a substring level by utilizing word unigram and/or bigram statistics extracted from query logs combined with an iterative search. This provides substantially better spelling alternatives for a given query than employing only substring matching. Other instances can receive input data from sources other than a search query input.
122 Citations
36 Claims
-
1. A system comprising a computer readable storage having computer readable instructions stored thereon, when executed causes a computer to facilitate spell checking comprising:
-
a component that receives input data containing text; and a spell checking component that identifies a set of potentially misspelled substrings in the text and proposes at least one alternative spelling for the substring set based on at least one query log;
the query log comprising data utilized by users to query a data collection over a time frame, the spell checking component employs an iterative process where alternative spellings for potentially misspelled stop words in the substring are identified in an iteration after the iteration where alternative spellings for potentially misspelled words that are not stop words in the same substring are identified, wherein a Viterbi search is employed to facilitate in determining the optimum set of alternative spellings according to the context model in each iteration;
the Viterbi search can employ fringes to restrict a search for alternative spellings in an iteration such that for every pair of adjacent substrings, if any of the substrings is in at least one trusted lexicon, then only one of the substrings is allowed to change in that iteration. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 36)
-
-
30. A method of facilitating spell checking, comprising:
-
receiving input data containing text; identifying a set of potentially misspelled substrings in the text;
generating a set of alternative spellings that are substrings in at least one selected from the group consisting of at least one query log comprising data utilized by users to query a data collection over a time frame and at least one lexicon, the set of alternative spellings comprising a set of alternative spellings determined via an iterative correction process that includes searching for an optimum set of alternative spellings via utilization of a statistical context model;
the statistical context model comprising substring occurrence and co-occurrence statistics extracted from the at least one query log, the iterative correction process includes identifying alternative spellings for potentially misspelled stop words in the substrings in an iteration after an iteration identifying alternative spellings for potentially misspelled words that are not stop words in the same substrings;employing a Viterbi search to facilitate in determining the optimum set of alternative spellings according to the context model in each iteration;
the Viterbi search can employ fringes to restrict a search for alternative spellings in an iteration such that for every pair of adjacent substrings, if any of the substrings is in at least one trusted lexicon, then only one of the substrings is allowed to change in that iteration; andproposing at least one alternative spelling for the substring set. - View Dependent Claims (31, 32, 33, 35)
-
-
34. A system comprising a computer readable storage having computer readable instructions stored thereon, when executed causes a computer to facilitate spell checking queries to a search engine comprising:
-
means for receiving input data containing text; and means for identifying a set of potentially misspelled substrings in the text and proposing at least one alternative spelling for the substring set based on at least one query log;
the query log comprising data utilized by users to query a data collection over a time frame, the means employing an iterative process that identifies alternative spellings for potentially misspelled stop words in the substrings in an iteration after an iteration identifies alternative spellings for potentially misspelled words that are not stop words in the same substrings, and the means employing a Viterbi search to facilitate in determining the optimum set of alternative spellings according to the context model in each iteration;
the Viterbi search can employ fringes to restrict a search for alternative spellings in an iteration such that for every pair of adjacent substrings, if any of the substrings is in at least one trusted lexicon, then only one of the substrings is allowed to change in that iteration.
-
Specification