Entity Assessment and Ranking

US 20100169375A1
Filed: 12/29/2008
Published: 07/01/2010
Est. Priority Date: 12/29/2008
Status: Active Grant

First Claim

Patent Images

1. In a processing device in communication with a document repository, a method for assessing entities, the method comprising:

retrieving a first set of documents from the document repository based on a query, the first set of documents having first metadata values corresponding to a plurality of metadata attributes;

characterizing the first set of documents based on the first metadata values to provide a first document set characterization;

determining at least one candidate entity based on the first set of documents;

for each of the at least one candidate entity, retrieving a second set of documents from the document repository based on the query and the candidate entity, the second set of documents having second metadata values corresponding to the plurality of metadata attributes;

for each of the at least one candidate entity, characterizing the second set of documents based on the second metadata values to provide a second document set characterization; and

for each of the at least one candidate entity, comparing the second document set characterization with the first document set characterization to determine a corresponding degree of similarity between the first document set characterization and the second document set characterization.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

General entity retrieval and ranking is described. A first set of documents is retrieved from one or more document repositories based on a query formed according to the topic. The first set of documents is characterized based on its first set of metadata values. One or more candidate entities are identified based on the first set of documents and the original query is thereafter augmented according to a candidate entity. The second set of documents resulting from the augmented query is then characterized in a similar manner. For each candidate entity, the first and second document set characterizations are compared to determine their degree of similarity. Increasingly similar document set characterizations indicates that the candidate entity is increasingly relevant to the original query. Repeating this process for each of the one or more candidate entities can give rise to rankings according to the respective degrees of similarity.

Citations

18 Claims

1. In a processing device in communication with a document repository, a method for assessing entities, the method comprising:
- retrieving a first set of documents from the document repository based on a query, the first set of documents having first metadata values corresponding to a plurality of metadata attributes;
  
  characterizing the first set of documents based on the first metadata values to provide a first document set characterization;
  
  determining at least one candidate entity based on the first set of documents;
  
  for each of the at least one candidate entity, retrieving a second set of documents from the document repository based on the query and the candidate entity, the second set of documents having second metadata values corresponding to the plurality of metadata attributes;
  
  for each of the at least one candidate entity, characterizing the second set of documents based on the second metadata values to provide a second document set characterization; and
  
  for each of the at least one candidate entity, comparing the second document set characterization with the first document set characterization to determine a corresponding degree of similarity between the first document set characterization and the second document set characterization.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, further comprising:
    - for each of the at least one candidate entity, determining a ranking for the candidate entity based on the corresponding similarity between the first document set characterization and the second document set characterization.
  - 3. The method of claim 2, further comprising:
    - providing an ordered listing of the rankings for the at least one candidate entity.
  - 4. The method of claim 1, wherein the first document set characterization and the second document set characterization each comprise a vector representation in which each of the first and second metadata values is a separate dimension, and wherein comparing the second document set characterization with the first document set characterization further comprises determining a distance between respective vector representations of the second document set characterization and the first document set characterization.
  - 5. The method of claim 4, further comprising:
    - applying at least one weighting factor to a given metadata value of the vector representation when determining the distance between the respective vector representations of the second document set characterization and the first document set characterization.
  - 6. The method of claim 1, further comprising:
    - determining an entity type of a plurality of entity types, wherein the at least one candidate entity is determined based on the entity type.
  - 7. The method of claim 1, wherein determining the at least one candidate entity further comprises:
    - determining a citation frequency for each of the at least one candidate entity in at least some of the first set of documents; and
      
      identifying a plurality of entities referenced in the first set of documents;
      
      determining, for each entity of the plurality of entities, a citation frequency for the entity in at least some of the first set of documents to provide a plurality of citation frequencies; and
      
      selecting, as the at least one candidate entity, at least a portion of the plurality of entities corresponding to that portion of the plurality of citation frequencies having largest values.
  - 8. The method of claim 1, wherein the at least one candidate entity is determined based on the first metadata values for at least some of the first set of documents.
  - 9. The method of claim 1, wherein the at least one candidate entity is determined based on text-based entity extraction on at least some of the first set of documents.

10. An apparatus comprising:
- at least one processor; and
  
  at least one storage device comprising instructions that, when executed, cause the at least one processor to;
  
  retrieve a first set of documents from a document repository based on a query, the first set of documents having first metadata values corresponding to a plurality of metadata attributes;
  
  characterize the first set of documents based on the first metadata values to provide a first document set characterization;
  
  determine at least one candidate entity based on the first set of documents;
  
  for each of the at least one candidate entity, retrieve a second set of documents from the document repository based on the query and the candidate entity, the second set of documents having second metadata values corresponding to the plurality of metadata attributes;
  
  for each of the at least one candidate entity, characterize the second set of documents based on the second metadata values to provide a second document set characterization; and
  
  for each of the at least one candidate entity, compare the second document set characterization with the first document set characterization to determine a corresponding degree of similarity between the first document set characterization and the second document set characterization.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The apparatus of claim 10, the at least one storage device further comprising instructions that, when executed, cause the at least one processor to:
    - for each of the at least one candidate entity, determine a ranking for the candidate entity based on the corresponding similarity between the first document set characterization and the second document set characterization.
  - 12. The apparatus of claim 11, the at least one storage device further comprising instructions that, when executed, cause the at least one processor to:
    - provide an ordered listing of the rankings for the at least one candidate entity.
  - 13. The apparatus of claim 10, wherein the first document set characterization and the second document set characterization each comprise a vector representation in which each of the first and second metadata values is a separate dimension, and wherein the instructions that, when executed, cause the at least one processor to compare the second document set characterization with the first document set characterization are further operative to determine a distance between respective vector representations of the second document set characterization and the first document set characterization.
  - 14. The apparatus of claim 13, the at least one storage device further comprising instructions that, when executed, cause the at least one processor to:
    - apply at least one weighting factor to a given metadata value of the vector representation when determining the distance between the respective vector representations of the second document set characterization and the first document set characterization.
  - 15. The apparatus of claim 10, the at least one storage device further comprising instructions that, when executed, cause the at least one processor to:
    - determine an entity type of a plurality of entity types, wherein the at least one candidate entity is determined based on the entity type.
  - 16. The apparatus of claim 10, wherein the instructions that, when executed, cause the at least one processor to determine the at least one candidate entity are further operative to:
    - determine a citation frequency for each of the at least one candidate entity in at least some of the first set of documents; and
      
      identify a plurality of entities referenced in the first set of documents;
      
      determine, for each entity of the plurality of entities, a citation frequency for the entity in at least some of the first set of documents to provide a plurality of citation frequencies; and
      
      select, as the at least one candidate entity, at least a portion of the plurality of entities corresponding to that portion of the plurality of citation frequencies having largest values.
  - 17. The apparatus of claim 10, wherein the instructions that, when executed, cause the at least one processor to determine the at least one candidate entity are further operative to determine the at least one candidate entity based on the first metadata values for at least some of the first set of documents.
  - 18. The apparatus of claim 10, wherein the instructions that, when executed, cause the at least one processor to determine the at least one candidate entity are further operative to determine the at least one candidate entity based on text-based entity extraction on at least some of the first set of documents.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Accenture Global Services Limited (Accenture PLC)
Original Assignee
Accenture Global Services GmbH (Accenture PLC)
Inventors
PROBST, Katherine, CUMBY, Chad, GHANI, Rayid

Granted Patent

US 8,639,682 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/780
CPC Class Codes

G06F 16/24578   using ranking

G06F 16/3338   Query expansion

G06F 16/3346   using probabilistic model

Entity Assessment and Ranking

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Entity Assessment and Ranking

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links