Robust discovery of entity synonyms using query logs
First Claim
1. A method, implemented by one or more computer processors, comprising:
- providing an entity reference string re that is associated with an entity;
providing a set of candidate strings Se, comprising at least one candidate string se;
generating, using query log data, similarity score information for respective pairs of individual candidate strings of the set of candidate strings Se and the entity reference string re using at least two similarity analysis functions; and
determining, using the similarity score information, whether the individual candidate strings are valid synonyms of the entity reference string re, where a valid synonym refers to the entity and satisfies a core set of synonym-related properties jointly provided by said at least two similarity analysis functions.
2 Assignments
0 Petitions
Accused Products
Abstract
A similarity analysis framework is described herein which leverages two or more similarity analysis functions to generate synonyms for an entity reference string re. The functions are selected such that the synonyms that are generated by the framework satisfy a core set of synonym-related properties. The functions operate by leveraging query log data. One similarity analysis function takes into consideration the strength of similarity between a particular candidate string se and an entity reference string re even in the presence of sparse query log data, while another function takes into account the classes of se and re. The framework also provides indexing mechanisms that expedite its computations. The framework also provides a reduction module for converting long entity reference strings into shorter strings, where each shorter string (if found) contains a subset of the terms in its longer counterpart.
72 Citations
20 Claims
-
1. A method, implemented by one or more computer processors, comprising:
-
providing an entity reference string re that is associated with an entity; providing a set of candidate strings Se, comprising at least one candidate string se; generating, using query log data, similarity score information for respective pairs of individual candidate strings of the set of candidate strings Se and the entity reference string re using at least two similarity analysis functions; and determining, using the similarity score information, whether the individual candidate strings are valid synonyms of the entity reference string re, where a valid synonym refers to the entity and satisfies a core set of synonym-related properties jointly provided by said at least two similarity analysis functions. - View Dependent Claims (4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 16, 18, 19)
-
-
2. A computer-implemented system comprising:
-
a similarity analysis framework including at least two similarity analysis modules and a filtering module, wherein; the similarity analysis framework is configured to provide an entity reference string that is associated with an entity and provide a set of candidate strings, the at least two similarity analysis modules are configured to generate, using query log data, similarity score information for respective pairs of the entity reference string and individual candidate strings of the set of candidate strings, and the filtering module is configured to determine, using the similarity score information, whether the individual candidate strings are valid synonyms of the entity reference string, where a valid synonym refers to the entity and satisfies a core set of synonym-related properties jointly provided by said at least two similarity analysis modules; and a processor that executes computer-executable instructions associated with the similarity analysis framework. - View Dependent Claims (10, 15, 17, 20)
-
-
3. Computer readable storage media storing computer readable instructions that, when executed by one or more processing devices, cause the one or more processing devices to perform acts comprising:
-
providing an entity reference string that is associated with an entity; providing a set of candidate strings; generating, using query log data, similarity score information for respective pairs of the entity reference string and individual candidate strings of the set of candidate strings using at least two similarity analysis functions; and determining, using the similarity score information, whether the individual candidate strings are valid synonyms of the entity reference string, where a valid synonym refers to the entity and satisfies a core set of synonym-related properties jointly provided by said at least two similarity analysis functions.
-
Specification