Method of spell-checking search queries
First Claim
1. A computer-implemented method for detecting spelling errors in a target text-string, comprising:
- comparing the target text-string to a database of contexts and determining from the comparison, a set of contexts characterized asX contexts, where each of the X contexts are characterized as including a correct spelling of the target text-string, andY contexts, where each of the Y contexts are characterized as including an incorrect spelling of a reference text-string; and
computing a likelihood that the target text-string is a misspelling of the reference text-string as a function of one of X or Y, relative to a combination of X and Y;
characterizing a context as including an incorrect spelling of the reference text-string in response to occurrences of the reference text-string in the context being at least equal to a pre-determined minimum quantity threshold, and a ratio of reference text-string occurrences in the context to target text-string occurrences in the context being greater than a pre-determined ratio threshold.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method for determining whether a target text-string is correctly spelled is provided. The target text-string is compared to a corpus to determine a set of contexts which each include an occurrence of the target text-string. Using heuristics, each context of the set is characterized based on occurrences in the corpus of the target text-string and a reference text-string. Contexts are characterized as including a correct spelling of the target text-string, an incorrect spelling of the reference text-string, or including an indeterminate usage of the target text-string. A likelihood that the target text-string is a misspelling of the reference text-string is computed as a function of the quantity of contexts including a correct spelling of the target text-string and the quantity of contexts including an incorrect spelling of a reference text-string. In one application, the target text-string is received in a search query, the search executed following a spell-check.
73 Citations
32 Claims
-
1. A computer-implemented method for detecting spelling errors in a target text-string, comprising:
-
comparing the target text-string to a database of contexts and determining from the comparison, a set of contexts characterized as X contexts, where each of the X contexts are characterized as including a correct spelling of the target text-string, and Y contexts, where each of the Y contexts are characterized as including an incorrect spelling of a reference text-string; and computing a likelihood that the target text-string is a misspelling of the reference text-string as a function of one of X or Y, relative to a combination of X and Y; characterizing a context as including an incorrect spelling of the reference text-string in response to occurrences of the reference text-string in the context being at least equal to a pre-determined minimum quantity threshold, and a ratio of reference text-string occurrences in the context to target text-string occurrences in the context being greater than a pre-determined ratio threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A computer-implemented method for detecting spelling errors in a target text-string, comprising:
-
selecting a reference text-string having characteristics corresponding to the target text-string; computing from a first database, a first ratio of occurrences of the reference text-string relative to occurrences of the target text-string; computing from a second database, a second ratio of occurrences of the reference text-string relative to occurrences of the target text-string; and determining a likelihood that the target text-string is misspelled as a function of the first ratio and the second ratio, wherein the first and second databases are each a corpus including naturally occurring text that are similar in patterns of content, the second database being a better-spelled corpus than the first database; wherein the characteristics corresponding to the target text-string include an edit distance between the target text-string and the reference text-string, and an edit distance between the target text-string and any previously identified misspellings of the reference text-string, and wherein the target text-string is misspelled when the second ratio is greater than a first comparison threshold, and the ratio of the second ratio to the first ratio is greater than a second comparison threshold. - View Dependent Claims (23, 24, 25, 26)
-
-
27. A computer-implemented method for detecting spelling errors in a target text-string, comprising:
-
comparing the target text-string to a first and second database of contexts and determining from the comparisons, a set of contexts including contexts characterized as X contexts, each of the X contexts characterized as including a correct spelling of the target text-string, Y contexts, each of the Y contexts characterized as including an incorrect spelling of a reference text-string, and Z contexts, each of the Z contexts characterized as an indeterminate context; and computing a likelihood that the target text-string is a misspelling of the reference text-string as a function of one of X and Y, relative to X plus Y; wherein a context is characterized as including an incorrect spelling of the reference text-string or characterized as including a correct spelling of the target text-string in response to (1) a first ratio, determined from the first database, of occurrences of the reference text-string in the context to occurrences of the target text-string in the context, (2) a second ratio determined from the second database, of occurrences of the reference text-string in the context to occurrences of the target text-string in the context, and (3) a third ratio of the second ratio to the first ratio, the first and second databases each being a corpus including naturally occurring text that are similar in patterns of content, the second database being a better-spelled corpus than the first database. - View Dependent Claims (28, 29, 30, 31, 32)
-
Specification