Self learning contextual spell corrector
First Claim
Patent Images
1. A computer-implemented method comprising:
- receiving a group of keywords, wherein each keyword includes one or more words;
forming a word list from the group of keywords, where the word list includes a list of each word in the group of keywords;
determining that a first word in the word list is a misspelling of a second word in the word list by;
determining correct spelling candidate words in the word list;
computing misspelling confidence scores for correcting the first word to the correct spelling candidate words; and
in at least one instance, choosing, as the second word, an individual correct spelling candidate word having a misspelling confidence score that exceeds a misspelling confidence score threshold and that has a highest misspelling confidence score;
correcting the first word by spelling the first word like the second word; and
outputting corrected keywords that include the corrected first word.
2 Assignments
0 Petitions
Accused Products
Abstract
A group of keywords are received, wherein each keyword includes one or more words. A word list is formed from the group of keywords, where the word list includes a list of each word in the group of keywords. A misspelled keyword is corrected using analysis of the words in the word list. The corrected keyword is output.
-
Citations
18 Claims
-
1. A computer-implemented method comprising:
-
receiving a group of keywords, wherein each keyword includes one or more words; forming a word list from the group of keywords, where the word list includes a list of each word in the group of keywords; determining that a first word in the word list is a misspelling of a second word in the word list by; determining correct spelling candidate words in the word list; computing misspelling confidence scores for correcting the first word to the correct spelling candidate words; and in at least one instance, choosing, as the second word, an individual correct spelling candidate word having a misspelling confidence score that exceeds a misspelling confidence score threshold and that has a highest misspelling confidence score; correcting the first word by spelling the first word like the second word; and outputting corrected keywords that include the corrected first word. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implemented method comprising:
-
receiving a group of keywords, wherein each keyword includes one or more words; forming a word list from the group of keywords, where the word list includes a list of each word in the group of keywords; determining that a first word in the word list is a portion of a second word in the word list by; combining the first word with other words in the word list to form combination candidate words; computing separation confidence scores for the combination candidate words; and in at least one instance, choosing, as the second word, an individual combination candidate word having a separation confidence score that exceeds a separation confidence score threshold and that has a highest separation confidence score; correcting the first word by spelling the first word like the second word; and outputting corrected keywords that include the corrected first word. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer-implemented method comprising:
-
receiving an order including keywords for an online advertising system, wherein individual keywords include at least two words comprising a first word and a second word; breaking the keywords into the at least two words; forming a word list from the at least two words by sorting the at least two words decreasingly by frequency of occurrence in the order; determining that correcting the first word to the second word meets misspelling candidate criteria, the misspelling candidate criteria including; a frequency of occurrence of the second word is higher than a frequency of occurrence of the first word, stemming forms of the first word and the second word are not the same, an edit distance from the first word to the second word is less than a first threshold, a ratio of the length of the first word to the edit distance is larger than or equal to a second threshold, and the frequency of occurrence of the first word is less than a third threshold; in an instance when the misspelling candidate criteria are met, computing a misspelling confidence score and correcting spelling of the first word using the second word when the misspelling confidence score exceeds a fourth threshold; and outputting corrected keywords that include the corrected first word. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification