Annotating Entities Using Cross-Document Signals
First Claim
Patent Images
1. A method for annotating an entity in a document corpus using cross-document signals, the method comprising:
- determining which documents in a document corpus mention an entity of interest;
clustering the documents that mention the entity of interest according to a temporal signal, a structural signal and/or a content signal, thereby forming at least one cluster of documents; and
annotating at least one document in the at least one cluster of documents by marking each occurrence of the entity in the at least one document;
wherein at least one of the steps is carried out by a computer device.
0 Assignments
0 Petitions
Accused Products
Abstract
Techniques for annotating an entity in a document corpus using cross-document signals. A method includes determining which documents in a document corpus mention an entity of interest, clustering the documents that mention an entity of interest according to a temporal signal, a structural signal and/or a content signal, thereby forming at least one cluster of documents, and annotating at least one document in the at least one cluster of documents by marking each occurrence of the entity in the at least one document.
15 Citations
19 Claims
-
1. A method for annotating an entity in a document corpus using cross-document signals, the method comprising:
-
determining which documents in a document corpus mention an entity of interest; clustering the documents that mention the entity of interest according to a temporal signal, a structural signal and/or a content signal, thereby forming at least one cluster of documents; and annotating at least one document in the at least one cluster of documents by marking each occurrence of the entity in the at least one document; wherein at least one of the steps is carried out by a computer device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method for annotating an entity in a document corpus, the method comprising:
-
processing each document in a corpus of documents obtained from at least one online source to identify which documents mention an entity of interest; using a description of the entity derived from a database to generate at least one context feature for the entity; processing each of the documents that mention the entity of interest by comparing text from each document with the at least one context feature and grouping the documents with a comparison similarity above a pre-determined threshold into a cluster of documents; annotating at least one document in the cluster of documents by marking each occurrence of the entity in the at least one document; and outputting the at least one annotated document to a user.
-
Specification