Assigning terms of interest to an entity
First Claim
1. A computer-implemented method for determining characteristic terms for an entity, the method comprising:
- identifying one or more resources relating to the entity, each resource including a plurality of terms;
identifying a plurality of candidate terms from the plurality of terms;
identifying a business type associated with the entity;
identifying one or more resources associated with other entities having the same business type;
determining, via one or more processors and for each of the plurality of candidate terms;
a first frequency with which the candidate term appears in the one or more resources related to the entity;
a second frequency with which the candidate term appears in the one or more resources associated with other entities in the identified business type;
a relative frequency for the candidate term based on the first frequency and the second frequency; and
a weighted relative frequency for the candidate term based on the relative frequency of the candidate term and how frequently the entity is selected as a search result when presented in response to a search query that includes the candidate term;
identifying, via one or more processors, one or more of the candidate terms as characteristic terms for the entity based on the weighted relative frequency of each candidate term; and
associating the identified characteristic terms with the entity in a data repository.
2 Assignments
0 Petitions
Accused Products
Abstract
The subject matter of this specification can be embodied in, among other things, a method that includes identifying resources relating to an entity, where each resource includes multiple terms and is included in a corpus of resources relating to multiple entities. Candidate terms from the resources for potentially associating with the entity and a category associated with the entity are identified. A relative frequency of the candidate terms in the identified resources is compared to a frequency of the candidate terms associated with other entities. Each of the candidate terms are weighted, for example, based on a source of the candidate term and the relative frequency of the candidate term. A weighted frequency of each candidate term is calculated based on the weights, and candidate terms are selected as representative terms for the entity based on the weighted frequency.
79 Citations
18 Claims
-
1. A computer-implemented method for determining characteristic terms for an entity, the method comprising:
-
identifying one or more resources relating to the entity, each resource including a plurality of terms; identifying a plurality of candidate terms from the plurality of terms; identifying a business type associated with the entity; identifying one or more resources associated with other entities having the same business type; determining, via one or more processors and for each of the plurality of candidate terms; a first frequency with which the candidate term appears in the one or more resources related to the entity; a second frequency with which the candidate term appears in the one or more resources associated with other entities in the identified business type; a relative frequency for the candidate term based on the first frequency and the second frequency; and a weighted relative frequency for the candidate term based on the relative frequency of the candidate term and how frequently the entity is selected as a search result when presented in response to a search query that includes the candidate term; identifying, via one or more processors, one or more of the candidate terms as characteristic terms for the entity based on the weighted relative frequency of each candidate term; and associating the identified characteristic terms with the entity in a data repository. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform operations including:
-
identifying a set of candidate terms for potentially associating with an entity, wherein the set of candidate terms are included within one or more resources relating to the entity; identifying at least one of an entity type category for the entity or an entity location of the entity; determining a first frequency with which each of the candidate terms appears in the one or more resources relating to the entity; determining a second frequency with which each of the candidate terms appears in one or more resources relating to the identified entity type category or the identified entity location; determining a relative frequency for each of the candidate terms based on the first frequency and the second frequency; determining a weighted relative frequency for each of the candidate terms based on the relative frequency of each candidate term and how frequently the entity is selected as a search result when presented in response to a respective search query that includes each candidate term; selecting one or more candidate terms as being characteristic terms for the entity based at least in part on the determined weighted relative frequency of each candidate term; and storing data identifying the selected characteristic terms for the entity as being associated with the entity. - View Dependent Claims (15)
-
-
16. A system comprising:
-
one or more memories storing instructions; and one or more processors configured to execute the instructions stored in the one or more memories in order to; identify a set of candidate terms for potentially associating with an entity, wherein the set of candidate terms are included within one or more resources that include information about the entity; and identify at least one of an entity type category for the entity or an entity location of the entity; determine a first frequency with which each of the candidate terms appears in the one or more resources relating to the entity; determine a second frequency with which each of the candidate terms appears in one or more resources relating to the identified entity type category or the identified entity location; determine a relative frequency for each of the candidate terms based on the first frequency and the second frequency; determine a weighted relative frequency for each of the candidate terms based on the relative frequency of each candidate term and how frequently the entity is selected as a search result when presented in response to a respective search query that includes each candidate term; select one or more candidate terms as being characteristic terms for the entity based on the weighted relative frequency of each candidate term; and store data associating the characteristic terms with information about the entity in a data repository. - View Dependent Claims (17, 18)
-
Specification