System and method for correcting spelling errors in search queries
First Claim
1. In a computer system that implements a search engine that is accessible to users over a computer network, a method of handling misspelled search terms in search queries, the method comprising the computer-implemented steps of:
- (a) processing search queries submitted to the search engine by a plurality of users over a period of time to generate correlation data, the correlation data indicating correlations between search terms based at least upon frequencies of prior occurrences of the search terms within the same search query;
(b) receiving a search query from a user over the computer network, the search query comprising a plurality of search terms and being directed to an informational database to be searched;
(c) identifying within the search query a non-matching search term which does not produce a match within the informational database, and at least one matching search term which produces a match within the informational database;
(d) using at least the correlation data to identify a plurality of additional terms that are deemed to be related to the at least one matching search term; and
(e) comparing the additional terms identified in step (d) to the non-matching term to identify an additional term that is a candidate correctly-spelled replacement term for the non-matching term.
6 Assignments
0 Petitions
Accused Products
Abstract
A search engine is disclosed that uses correlations between search terms to correct misspelled terms within search queries. The correlations are based at least in-part on historical query submissions to the search engine. Preferably, the correlations reflect the frequencies with which the search terms have historically appeared together within the same query, and are stored within a correlation table using related terms lists. In one embodiment, the correlation table is generated periodically from the M (e.g. 10) most recent days of entries in a query log, and thus reflects the current preferences of users. In operation, when a query that includes both matching and non-matching search terms is submitted to the search engine, a spelling correction process accesses the correlation table to generate a list of terms that are deemed to be related to the matching term(s). The spellings of these related terms are then compared to the spelling of each non-matching term using a spelling comparison function that compares two character strings and generates a similarity score. If a suitable replacement is found for a given non-matching term, the non-matching term is replaced with the similar related term. The modified query is then used to perform the search, and the user is notified of the modification(s) made to the query. In the disclosed embodiment, the search engine is used on the Web site of an online merchant to assist users in locating book titles, music titles, and other types of products.
283 Citations
31 Claims
-
1. In a computer system that implements a search engine that is accessible to users over a computer network, a method of handling misspelled search terms in search queries, the method comprising the computer-implemented steps of:
-
(a) processing search queries submitted to the search engine by a plurality of users over a period of time to generate correlation data, the correlation data indicating correlations between search terms based at least upon frequencies of prior occurrences of the search terms within the same search query; (b) receiving a search query from a user over the computer network, the search query comprising a plurality of search terms and being directed to an informational database to be searched; (c) identifying within the search query a non-matching search term which does not produce a match within the informational database, and at least one matching search term which produces a match within the informational database; (d) using at least the correlation data to identify a plurality of additional terms that are deemed to be related to the at least one matching search term; and (e) comparing the additional terms identified in step (d) to the non-matching term to identify an additional term that is a candidate correctly-spelled replacement term for the non-matching term. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 29, 30, 31)
-
-
16. In a computer system that implements a search engine that is accessible to users over a computer network, a method of processing a search query that includes at least one misspelled search term, the method comprising the computer-implemented steps of:
-
(a) receiving the search query from a user over the computer network, the search query comprising a plurality of search terms and being directed to an informational database to be searched; (b) identifying within the search query a non-matching search term which does not produce a match within the informational database, and at least one matching search term which produces a match within the informational database; (c) using search term correlation data to identify a plurality of additional terms that are deemed to be related to the at least one matching search term, the search term correlation data based at least on historical query submissions; and (d) comparing the additional terms identified in step (c) to the non-matching term to identify an additional term that is a candidate correctly-spelled replacement for the non-matching term. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
-
-
24. A search engine for allowing users of a computer network to conduct searches of a database of items, the search engine comprising:
-
a computer system which has search term correlation data stored in a memory thereof, the search term correlation data indicating correlations between search terms based at least upon prior query submissions of users; and a query server which runs on the computer system, the query server adapted to search the database of items using search queries received from users, the query server configured to process a multi-term search query that includes both matching and non-matching terms by at least; accessing the search term correlation data to identify a plurality of additional terms that are deemed to be related to the matching terms(s) of the search query; and comparing the spellings of the additional terms to the spelling of at least one non-matching term of the query to determine whether any of the additional terms is a candidate replacement for the non-matching term. - View Dependent Claims (25, 26, 27, 28)
-
Specification