Finding related entity results for search queries
First Claim
1. A computer-implemented search system, comprising:
- an extraction component configured to extract named entities from documents having corresponding document identifiers;
an indexing component configured to create and store;
a document index that indexes the named entities by the document identifiers to indicate whether the named entities occur in the documents; and
an inverted index that indexes the document identifiers by word identifiers of words that occur in the documents, wherein the document index and the inverted index are at least partly created in advance of receiving a keyword query from a user;
a document-based search component configured to;
receive the keyword query from the user, the keyword query comprising keywords; and
provide the keywords to the inverted index to identify individual document identifiers corresponding to individual matching documents that match one or more of the keywords of the keyword query;
a retrieval component configured to;
retrieve direct matching document entities that directly match at least one of the keywords of the keyword query; and
provide the individual document identifiers to the document index and retrieve co-occurring related named entities from the document index, wherein the co-occurring related named entities do not directly match the keywords of the keyword query but occur in the individual matching documents with the direct matching document entities; and
a scoring component configured to;
compute aggregate relevance scores for the co-occurring related named entities that occur in the individual matching documents; and
return one or more of the co-occurring related named entities in response to the keyword query based on the aggregate relevance scores; and
at least one processing unit configured to execute one or more of the extraction component, the indexing component, the document-based search component, the retrieval component, or the scoring component.
2 Assignments
0 Petitions
Accused Products
Abstract
Architecture for finding related entities for web search queries. An extraction component takes a document as input and outputs all the mentions (or occurrences) of named entities such as names of people, organizations, locations, and products in the document, as well as entity metadata. An indexing component takes a document identifier (docID) and the set of mentions of named entities and, stores and indexes the information for retrieval. A document-based search component takes a keyword query and returns the docIDs of the top documents matching with the query. A retrieval component takes a docID as input, accesses the information stored by the indexing component and returns the set of mentions of named entities in the document. This information is then passed to an entity scoring and thresholding component that computes an aggregate score of each entity and selects the entities to return to the user.
-
Citations
20 Claims
-
1. A computer-implemented search system, comprising:
-
an extraction component configured to extract named entities from documents having corresponding document identifiers; an indexing component configured to create and store; a document index that indexes the named entities by the document identifiers to indicate whether the named entities occur in the documents; and an inverted index that indexes the document identifiers by word identifiers of words that occur in the documents, wherein the document index and the inverted index are at least partly created in advance of receiving a keyword query from a user; a document-based search component configured to; receive the keyword query from the user, the keyword query comprising keywords; and provide the keywords to the inverted index to identify individual document identifiers corresponding to individual matching documents that match one or more of the keywords of the keyword query; a retrieval component configured to; retrieve direct matching document entities that directly match at least one of the keywords of the keyword query; and provide the individual document identifiers to the document index and retrieve co-occurring related named entities from the document index, wherein the co-occurring related named entities do not directly match the keywords of the keyword query but occur in the individual matching documents with the direct matching document entities; and a scoring component configured to; compute aggregate relevance scores for the co-occurring related named entities that occur in the individual matching documents; and return one or more of the co-occurring related named entities in response to the keyword query based on the aggregate relevance scores; and at least one processing unit configured to execute one or more of the extraction component, the indexing component, the document-based search component, the retrieval component, or the scoring component. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method comprising:
-
extracting named entities from documents having corresponding document identifiers; creating and storing; a document index that indexes the named entities by the document identifiers to indicate whether the named entities occur in the documents; and an inverted index that indexes the document identifiers by word identifiers of words that occur in the documents, wherein the document index and the inverted index are at least partly created in advance of receiving a keyword query from a user; receiving the keyword query from the user, the keyword query comprising keywords; providing the keywords to the inverted index to identify individual document identifiers corresponding to individual matching documents that match one or more of the keywords of the keyword query; retrieving direct matching document entities that directly match at least one of the keywords of the keyword query; providing the individual document identifiers to the document index and retrieving co-occurring related named entities from the document index, wherein the co-occurring related named entities do not directly match the keywords of the keyword query but occur in the individual matching documents with the direct matching document entities; computing aggregate relevance scores for the co-occurring related named entities that occur in the individual matching documents; and returning one or more of the co-occurring related named entities in response to the keyword query based on the aggregate relevance scores, wherein at least the computing the aggregate relevance scores is performed using a processing unit. - View Dependent Claims (11, 12, 13, 14)
-
-
15. One or more computer-readable memory devices or storage devices having computer-readable instructions stored thereon that, when executed by one or more computing devices, cause the one or more computing devices to perform:
-
extracting named entities from documents having corresponding document identifiers; creating and storing; a document index that indexes the named entities by the document identifiers to indicate whether the named entities occur in the documents; and an inverted index that indexes the document identifiers by word identifiers of words that occur in the documents, wherein the document index and the inverted index are at least partly created in advance of receiving a keyword query from a user; receiving the keyword query from the user, the keyword query comprising keywords; providing the keywords to the inverted index to identify individual document identifiers corresponding to individual matching documents that match one or more of the keywords of the keyword query; retrieving direct matching document entities that directly match at least one of the keywords of the keyword query; providing the individual document identifiers to the document index and retrieving co-occurring related named entities from the document index, wherein the co-occurring related named entities do not directly match the keywords of the keyword query but occur in the individual matching documents with the direct matching document entities; computing aggregate relevance scores for the co-occurring related named entities that occur in the individual matching documents; and returning one or more of the co-occurring related named entities in response to the keyword query based on the aggregate relevance scores. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification