Identifying entity synonyms
First Claim
1. A method performed by a computing device, the method comprising:
- selecting, from a query log stored in a database, a first query associated with a first set of Uniform Resource Locators (URLs) returned by a search engine responsive to the first query and a second query associated with a second set of URLs returned by the search engine responsive to the second query;
measuring a click similarity between the first query and the second query, the click similarity being measured based on similarity of first click behavior of users with respect to the first set of URLs to second click behavior of the users with respect to the second set of URLs;
measuring first mutual information values between first phrases and first tags of the first query based on the first click behavior, wherein the first phrases and the first tags comprise different tokens of the first query;
measuring second mutual information values between second phrases and second tags of the second query based on the second click behavior, wherein the second phrases and the second tags comprise different tokens of the second query;
measuring a tag similarity between the first query and the second query using the first mutual information values and the second mutual information values;
determining whether the first query and the second query are synonyms using both the click similarity and the tag similarity; and
when the first query and the second query are determined to be synonyms, storing the second query as a query synonym of the first query.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments for identifying an entity synonym of an entity are described. A query log is stored in a database located on at least one computing device. A candidate generation module can select a candidate query in the query log that shares a click on a URL with the entity. A correlated tag module can generate a set of phrase-tag pairs for the entity and the candidate query and measure a mutual information value for each phrase-tag pair. A candidate filtering module can determine a click similarity value between the candidate query and the entity based on a set of URLs selected in the search engine results and a tag similarity value based on the mutual information values. A candidate query is selected as an entity synonym if the click similarity value and the tag similarity value are greater than predetermined thresholds respectively.
154 Citations
20 Claims
-
1. A method performed by a computing device, the method comprising:
-
selecting, from a query log stored in a database, a first query associated with a first set of Uniform Resource Locators (URLs) returned by a search engine responsive to the first query and a second query associated with a second set of URLs returned by the search engine responsive to the second query; measuring a click similarity between the first query and the second query, the click similarity being measured based on similarity of first click behavior of users with respect to the first set of URLs to second click behavior of the users with respect to the second set of URLs; measuring first mutual information values between first phrases and first tags of the first query based on the first click behavior, wherein the first phrases and the first tags comprise different tokens of the first query; measuring second mutual information values between second phrases and second tags of the second query based on the second click behavior, wherein the second phrases and the second tags comprise different tokens of the second query; measuring a tag similarity between the first query and the second query using the first mutual information values and the second mutual information values; determining whether the first query and the second query are synonyms using both the click similarity and the tag similarity; and when the first query and the second query are determined to be synonyms, storing the second query as a query synonym of the first query. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a machine-readable memory device or storage device storing instructions; and a hardware processor configured to execute the instructions, wherein the instructions, when executed by the hardware processor, cause the hardware processor to; select, from a query log stored in a database, a first query associated with a first set of Uniform Resource Locators (URLs) returned by a search engine responsive to the first query and a second query associated with a second set of URLs returned by the search engine responsive to the second query; measure a click similarity between the first query and the second query, the click similarity being measured based on similarity of first click behavior of users with respect to the first set of URLs to second click behavior of the users with respect to the second set of URLs; measure first mutual information values between first phrases and first tags of the first query based on the first click behavior, wherein the first phrases and the first tags comprise different tokens of the first query; measure second mutual information values between second phrases and second tags of the second query based on the second click behavior, wherein the second phrases and the second tags comprise different tokens of the second query; measure a tag similarity between the first query and the second query using the first mutual information values and the second mutual information values; determine whether the first query and the second query are synonyms using both the click similarity and the tag similarity; and when the first query and the second query are determined to be synonyms, store the second query as a query synonym of the first query. - View Dependent Claims (9, 10, 11)
-
-
12. A hardware memory device or hardware storage device comprising instructions which, when executed by a hardware processor, cause the hardware processor to perform acts comprising:
-
selecting, from a query log stored in a database, a first query associated with a first set of Uniform Resource Locators (URLs) returned by a search engine responsive to the first query and a second query associated with a second set of URLs returned by the search engine responsive to the second query; measuring a click similarity between the first query and the second query, the click similarity being measured based on similarity of first click behavior of users with respect to the first set of URLs to second click behavior of the users with respect to the second set of URLs; measuring first mutual information values between first phrases and first tags of the first query based on the first click behavior, wherein the first phrases and the first tags comprise different tokens of the first query; measuring second mutual information values between second phrases and second tags of the second query based on the second click behavior, wherein the second phrases and the second tags comprise different tokens of the second query; measuring a tag similarity between the first query and the second query using the first mutual information values and the second mutual information values; determining whether the first query and the second query are synonyms using both the click similarity and the tag similarity; and when the first query and the second query are determined to be synonyms, storing the second query as a query synonym of the first query. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification