Finding Related Entities For Search Queries

US 20080306908A1
Filed: 06/05/2007
Published: 12/11/2008
Est. Priority Date: 06/05/2007
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented search system, comprising:

an indexing component for storing and indexing document entities of a documents, the documents associated with corresponding document identifiers;

a document search component for processing a keyword query and returning document identifiers of documents associated with results of the query; and

a retrieval component for retrieving the document entities based on the document identifiers.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Architecture for finding related entities for web search queries. An extraction component takes a document as input and outputs all the mentions (or occurrences) of named entities such as names of people, organizations, locations, and products in the document, as well as entity metadata. An indexing component takes a document identifier (docID) and the set of mentions of named entities and, stores and indexes the information for retrieval. A document-based search component takes a keyword query and returns the docIDs of the top documents matching with the query. A retrieval component takes a docID as input, accesses the information stored by the indexing component and returns the set of mentions of named entities in the document. This information is then passed to an entity scoring and thresholding component that computes an aggregate score of each entity and selects the entities to return to the user.

111 Citations

View as Search Results

20 Claims

1. A computer-implemented search system, comprising:
- an indexing component for storing and indexing document entities of a documents, the documents associated with corresponding document identifiers;
  
  a document search component for processing a keyword query and returning document identifiers of documents associated with results of the query; and
  
  a retrieval component for retrieving the document entities based on the document identifiers.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The system of claim 1, wherein the document entities include names of at least one of people, organizations, locations, or products.
  - 3. The system of claim 1, wherein the document entities are associated with at least one of an entity name, entity type, or entity position in the document.
  - 4. The system of claim 1, wherein the document search component processes the keyword query and returns top documents of the query results.
  - 5. The system of claim 4, wherein the retrieval component retrieves the document entities for the top documents and number of mentions of the document entities.
  - 6. The system of claim 5, further comprising a scoring and threshold component for receiving the document entities and number of mentions of the document entities for the top documents and computing an aggregate score for each of the document entities.
  - 7. The system of claim 1, further comprising a scoring and threshold component for selecting one or more entities to return for presentation, and a machine learning and reasoning component for computing a threshold on which selection of the one or more entities to return, is based.
  - 8. The system of claim 1, further comprising a scoring and threshold component for computing a score based on proximity of an occurrence of an entity to a query keyword in a document.
  - 9. The system of claim 1, further comprising an extraction component for extracting all occurrences of the entities and entity metadata.
  - 10. The system of claim 9, wherein the entity metadata includes entity name, entity type and position of entity in the document relative to a query keyword.
  - 11. The system of claim 1, further comprising a direct matching component for processing the query.

12. A computer-implemented method of searching, comprising:
- extracting occurrences of the entities and entity metadata from web documents;
  
  storing and indexing the entities and entity metadata in association with corresponding document identifiers;
  
  processing a query and returning document identifiers associated with document results of the query;
  
  retrieving a set of the entity occurrences based on the document identifiers;
  
  generating scores for the entities; and
  
  selecting the entities with the highest scores.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
- - 13. The method of claim 12, wherein the set of occurrences retrieved is associated with document identifiers of top ranked documents.
  - 14. The method of claim 12, further comprising storing the entities and entity metadata in a metadata field of a document index.
  - 15. The method of claim 12, further comprising generating contextual descriptions of top ranked documents for presentation and piggybacking retrieval of the set of entity occurrences based on generation of the contextual descriptions.
  - 16. The method of claim 15, further comprising employing document-based matching that finds entities related to a keyword query by obtaining the documents relevant to the query and exploiting the relationship information to obtain the related entities.
  - 17. The method of claim 12, further comprising returning the entities with the highest scores to an application for consumption.
  - 18. The method of claim 12, further comprising computing scores for each entity as a combination of document importance among top ranked documents, contribution of the document to an overall score, and a function that aggregates all scores across the documents.
  - 19. The method of claim 18, further comprising computing the contribution of the document to the overall score of an entity based on proximity of the occurrences of the entity to the query keywords in the document.

20. A computer-implemented system, comprising:
- computer-implemented means for extracting occurrences of the entities and entity metadata in a web documents;
  
  computer-implemented means for storing and indexing the entities and entity metadata in association with corresponding document identifiers;
  
  computer-implemented means for processing a query and returning document identifiers associated with document results of the query;
  
  computer-implemented means for retrieving a set of the occurrences based on the document identifiers;
  
  computer-implemented means for generating scores for the entities; and
  
  computer-implemented means for selecting the entities with the highest scores.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Chaudhuri, Surajit, Ganti, Venkatesh, Agrawal, Sanjay, Chakrabarti, Kaushik

Granted Patent

US 8,195,655 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/3
CPC Class Codes

G06F 16/951 Indexing; Web crawling tech...

G06F 40/295 Named entity recognition

Finding Related Entities For Search Queries

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

111 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Finding Related Entities For Search Queries

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

111 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links