Generating synonyms based on query log data
First Claim
1. A method performed by at least one computing device, the method comprising:
- receiving an input information item;
identifying a set of page items that are returned when the input information item is used in a first query to at least one search module;
identifying one or more second queries, other than the first query, that have also been submitted to the at least one search module by users that accessed individual page items from the set of page items, wherein the one or more second queries are identified using query log data reflecting associations between the one or more second queries and the individual page items accessed by the users;
providing a set of initial synonym candidates that were previously submitted to the at least one search module in the one or more second queries by the users that accessed the individual page items from the set of page items;
removing noise from the set of initial synonym candidates, when the noise is present and can be identified, to obtain a set of filtered synonym candidates;
reducing the set of filtered synonym candidates to a set of selected synonyms that are deemed appropriate proxies for the input information item by;
obtaining metric factor values associated with the set of filtered synonym candidates, andapplying a threshold to the metric factor values to select the set of selected synonyms from the set of filtered synonym candidates; and
outputting synonym-expanded data that is formed based on the reducing.
2 Assignments
0 Petitions
Accused Products
Abstract
An approach is described for generating synonyms to supplement at least one information item, such as, in one case, a set of related items. The approach can involve an expansion phase, a clean-up phase, and a reduction phase. In the expansion phase, the approach identifies, for each related item, a set of initial synonym candidates. In the clean-up phase, the approach removes noise from the set of initial synonym candidates (if such noise exists), to provide a set of filtered synonym candidate items. In the reduction phase, the approach ranks and applies a threshold (or thresholds) to the set of filtered synonym candidate items, to generate, for each information item, a set of selected synonyms. The approach uses query log data at various points in its operation. The selected synonyms can be used to improve the effectiveness of user searches.
-
Citations
20 Claims
-
1. A method performed by at least one computing device, the method comprising:
-
receiving an input information item; identifying a set of page items that are returned when the input information item is used in a first query to at least one search module; identifying one or more second queries, other than the first query, that have also been submitted to the at least one search module by users that accessed individual page items from the set of page items, wherein the one or more second queries are identified using query log data reflecting associations between the one or more second queries and the individual page items accessed by the users; providing a set of initial synonym candidates that were previously submitted to the at least one search module in the one or more second queries by the users that accessed the individual page items from the set of page items; removing noise from the set of initial synonym candidates, when the noise is present and can be identified, to obtain a set of filtered synonym candidates; reducing the set of filtered synonym candidates to a set of selected synonyms that are deemed appropriate proxies for the input information item by; obtaining metric factor values associated with the set of filtered synonym candidates, and applying a threshold to the metric factor values to select the set of selected synonyms from the set of filtered synonym candidates; and outputting synonym-expanded data that is formed based on the reducing. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computing device comprising:
-
at least one storage device storing instructions of a synonym-generating module; and at least one processing device configured to execute the instructions, wherein the instructions of the synonym generating module comprise; an input module configured to receive an input information item; an expansion module configured to expand the input information item into a set of initial synonym candidates based on query log data, the query log data reflecting associations between prior queries submitted by users and page items accessed by the users in response to the prior submitted queries, the set of initial synonym candidates corresponding to the prior submitted queries in the query log data; a clean-up module configured to remove identifiable noise from the set of initial synonym candidates, wherein an output of the expansion module and the clean-up module comprises a set of filtered synonym candidates; a reduction module configured to select a set of synonyms from the set of filtered synonym candidates based on the query log data to provide a set of selected synonyms that are deemed appropriate proxies for the input information item; and an output module configured to output synonym-expanded data that is formed based on the set of synonyms selected by the reduction module, wherein the reduction module comprises; logic configured to obtain, based on the query log data, at least one metric factor associated with each filtered synonym candidate in the set of filtered synonym candidates to provide metric factors; logic configured to rank the set of filtered synonym candidates based on the metric factors to obtain a set of ranked synonym candidates; and logic configured to select the set of selected synonyms from the set of ranked synonym candidates based on the metric factors. - View Dependent Claims (10, 11)
-
-
12. A method performed by at least one computing device, the method comprising:
-
receiving an input information item; identifying a set of page items that are returned as first search results when the input information item is used in a first query to at least one search module; identifying one or more second queries, other than the first query, that have also been submitted to the at least one search module by users that accessed individual page items from the set of page items when the individual page items were returned as second search results for the one or more second queries, wherein the one or more second queries are identified using query log data reflecting that the one or more second queries were used by the users to access the individual page items; providing a set of synonym candidates that were submitted to the at least one search engine in the one or more second queries by the users that accessed the individual page items from the set of page items; identifying, from the set of synonym candidates, a set of selected synonyms that are deemed appropriate proxies for the input information item, wherein the set of selected synonyms are identified by; obtaining metric factor values associated with the set of synonym candidates, and at least one of applying a threshold to the metric factor values or ranking the synonym candidates according to the metric factor values; and outputting the set of selected synonyms. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification