Annotating Entities Using Cross-Document Signals

US 20130325849A1
Filed: 08/16/2012
Published: 12/05/2013
Est. Priority Date: 05/29/2012
Status: Active Grant

First Claim

Patent Images

1. A method for annotating an entity in a document corpus using cross-document signals, the method comprising:

determining which documents in a document corpus mention an entity of interest;

clustering the documents that mention the entity of interest according to a temporal signal, a structural signal and/or a content signal, thereby forming at least one cluster of documents; and

annotating at least one document in the at least one cluster of documents by marking each occurrence of the entity in the at least one document;

wherein at least one of the steps is carried out by a computer device.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for annotating an entity in a document corpus using cross-document signals. A method includes determining which documents in a document corpus mention an entity of interest, clustering the documents that mention an entity of interest according to a temporal signal, a structural signal and/or a content signal, thereby forming at least one cluster of documents, and annotating at least one document in the at least one cluster of documents by marking each occurrence of the entity in the at least one document.

15 Citations

View as Search Results

19 Claims

1. A method for annotating an entity in a document corpus using cross-document signals, the method comprising:
- determining which documents in a document corpus mention an entity of interest;
  
  clustering the documents that mention the entity of interest according to a temporal signal, a structural signal and/or a content signal, thereby forming at least one cluster of documents; and
  
  annotating at least one document in the at least one cluster of documents by marking each occurrence of the entity in the at least one document;
  
  wherein at least one of the steps is carried out by a computer device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The method of claim 1, comprising marking an occurrence of the entity in the at least one document as not applicable, in order to opt out of an incorrect and/or uncertain annotation.
  - 3. The method of claim 1, comprising applying a model to each cluster that provides weights to features of a signal to guide said annotating.
  - 4. The method of claim 1, wherein a temporal signal corresponds to a situation in which a document is from the same time epoch as another document.
  - 5. The method of claim 4, wherein the time epoch is measured via at least one granularity including minutes, hours, days, weeks, and/or months.
  - 6. The method of claim 1, wherein a structural signal corresponds to a situation in which a document is part of a larger document arrangement.
  - 7. The method of claim 1, wherein said determining comprises using a dictionary of entities.
  - 8. The method of claim 7, wherein the dictionary contains a description corresponding to each entity.
  - 9. The method of claim 1, comprising calculating an annotation score for each entity mention inside a document inside a cluster.
  - 10. The method of claim 1, comprising training a set of documents with labeled entities to determine a threshold for clustering documents and/or a document similarity weight.
  - 11. The method of claim 1, comprising annotating a single entity in a single document.
  - 12. The method of claim 11, comprising determining an entity-to-context match for a mention of the entity from the document in an entity database without considering a signal from other documents or other entities in the document.
  - 13. The method of claim 1, comprising annotating multiple entities in a single document.
  - 14. The method of claim 13, comprising determining an entity-to-context match for each mention of the multiple entities from the document in an entity database without considering a signal from other documents.
  - 15. The method of claim 1, comprising annotating a single entity in multiple documents.
  - 16. The method of claim 15, comprising determining an entity-to-context match and/or a document-to-document match for a mention of the entity from the document in an entity database, taking into account similarity and temporal proximity to other documents without considering a signal from other entities in the document.
  - 17. The method of claim 1, comprising annotating multiple entities in multiple documents.
  - 18. The method of claim 17, wherein said determining includes considering signals from multiple entities in each document as well as temporal and textual similarity to other documents in the corpus.

19. A method for annotating an entity in a document corpus, the method comprising:
- processing each document in a corpus of documents obtained from at least one online source to identify which documents mention an entity of interest;
  
  using a description of the entity derived from a database to generate at least one context feature for the entity;
  
  processing each of the documents that mention the entity of interest by comparing text from each document with the at least one context feature and grouping the documents with a comparison similarity above a pre-determined threshold into a cluster of documents;
  
  annotating at least one document in the cluster of documents by marking each occurrence of the entity in the at least one document; and
  
  outputting the at least one annotated document to a user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Visweswariah, Karthik, De, Sushovan, Singh, Amit K.

Granted Patent

US 9,465,865 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/722
CPC Class Codes

G06F 16/355 Class or cluster creation o...

Annotating Entities Using Cross-Document Signals

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

15 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Annotating Entities Using Cross-Document Signals

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

15 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others