Identifying topical entities
First Claim
1. A computer-implemented method comprising:
- identifying a plurality of entities that are associated with a particular web resource, each entity being a word or a phrase; and
determining whether one entity from the plurality of entities is a topical entity that represents a predominant topic of the resource, wherein the determining comprises;
generating respective search queries for each of the identified entities;
obtaining, from a search engine, respective search results responsive to each of the generated search queries, wherein each search result has a respective ranking score;
determining that a search result referencing the particular web resource appears above a specified rank in a ranking of the search results responsive to at least one generated search query for at least one entity; and
in response to determining that a search result referencing the particular web resource appears above a specified rank in the ranking of the search results responsive to at least one generated search query for at least one entity;
if the particular web resource appears above a specified rank in the ranking of the search results responsive to the generated search query for exactly one entity, designating the one entity as the topical entity; and
if a search result referencing the particular web resource appears above a specified rank in the ranking of multiple search results responsive to multiple generated search queries, each search query and responsive results associated with more than one entity, designating one entity that generated a search result with a highest ranking score, among the multiple search results referencing the particular web resource, as the topical entity for the particular web resource.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for identifying topical entities. In one aspect, a method includes obtaining a plurality of entities that are associated with a first resource; for one or more of the identified entities, receiving search results for a search query derived from the entity; determining that search results for a search query including a particular entity include a specific type of search results; and determining that the particular entity is a topical entity of the first resource based at least in part on the particular entity appearing in a title or a resource locator of the first resource, wherein the topical entity of the first resource represents a predominant topic of the first resource.
-
Citations
12 Claims
-
1. A computer-implemented method comprising:
-
identifying a plurality of entities that are associated with a particular web resource, each entity being a word or a phrase; and determining whether one entity from the plurality of entities is a topical entity that represents a predominant topic of the resource, wherein the determining comprises; generating respective search queries for each of the identified entities; obtaining, from a search engine, respective search results responsive to each of the generated search queries, wherein each search result has a respective ranking score; determining that a search result referencing the particular web resource appears above a specified rank in a ranking of the search results responsive to at least one generated search query for at least one entity; and in response to determining that a search result referencing the particular web resource appears above a specified rank in the ranking of the search results responsive to at least one generated search query for at least one entity; if the particular web resource appears above a specified rank in the ranking of the search results responsive to the generated search query for exactly one entity, designating the one entity as the topical entity; and if a search result referencing the particular web resource appears above a specified rank in the ranking of multiple search results responsive to multiple generated search queries, each search query and responsive results associated with more than one entity, designating one entity that generated a search result with a highest ranking score, among the multiple search results referencing the particular web resource, as the topical entity for the particular web resource. - View Dependent Claims (2, 3)
-
-
4. A computer-implemented method comprising:
identifying a topical entity for a particular web resource, comprising; identifying multiple entities that are associated with the particular web resource, each entity being a word or a phrase; generating respective search queries for each of the identified entities; obtaining, from a search engine, respective search results responsive to each of the generated search queries, wherein each search result has a respective ranking score; determining that, in multiple different search result sets responsive to respective different search queries generated from the entities associated with the particular web resource, at least one search result references the particular web resource within a threshold number of top-ranked search results, wherein the search result is responsive to a search query generated for a particular entity associated with the particular web resource; in response to determining that, in multiple different search result sets responsive to respective search queries generated from the entities associated with the particular web resource, a single search result references the particular web resource within the threshold number of top-ranked search results, wherein the search result is responsive to a search query generated for a particular entity associated with the particular web resource, designating the particular entity as the topical entity for the particular web resource; and in response to determining that, in multiple different search result sets responsive to respective search queries generated from the entities associated with the particular web resource, a respective search result references the particular web resource within the threshold number of top-ranked search results, wherein the respective search result is responsive to the respective one of multiple search queries, each generated for a respective entity associated with the particular web resource, designating one entity that generated a search result with a highest ranking score, among the multiple search results referencing the particular web resource, as the topical entity for the particular web resource. - View Dependent Claims (5, 6, 7)
-
8. A computer-implemented method for associating content comprising:
identifying a topical entity for a particular web resource, comprising; identifying multiple entities that are associated with the particular web resource, each entity being a word or a phrase; generating respective search queries for each of the identified entities, obtaining, from a search engine, respective search results responsive to each of the generated search queries, wherein each search result has a respective ranking score, determining that search results responsive to a search query generated from a particular entity include a particular type of search result; determining that the particular entity occurs in a title or a resource locator of the particular web resource; in response to determining that the search results responsive to the search query generated from the particular entity include the particular type of search result and that the particular entity occurs in the title or the resource locator of the particular web resource, designating the particular entity as a topical entity of the particular web resource; and in response to determining that, in multiple different search result sets responsive to respective search queries generated from the entities associated with the particular web resource, multiple search results reference the particular web resource within the threshold number of top-ranked search results, wherein the web resource is responsive to multiple search queries, each generated for a particular entity associated with the particular web resource, designating one entity that generated a search result with a highest ranking score, among the multiple search results referencing the particular web resource, as the topical entity for the particular web resource. - View Dependent Claims (9, 10, 11, 12)
Specification