Annotating Entities Using Cross-Document Signals
1 Assignment
0 Petitions
Accused Products
Abstract
A method, an apparatus and an article of manufacture for annotating an entity in a document corpus using cross-document signals. The method includes determining which documents in a document corpus mention an entity of interest, clustering the documents that mention an entity of interest according to a temporal signal, a structural signal and/or a content signal, thereby forming at least one cluster of documents, and annotating at least one document in the at least one cluster of documents by marking each occurrence of the entity in the at least one document.
34 Citations
33 Claims
-
2. (canceled)
-
3. (canceled)
-
4. (canceled)
-
5. (canceled)
-
6. (canceled)
-
7. (canceled)
-
8. (canceled)
-
9. (canceled)
-
10. (canceled)
-
11. (canceled)
-
12. (canceled)
-
13. (canceled)
-
14. (canceled)
-
15. (canceled)
-
16. (canceled)
-
17. (canceled)
-
18. (canceled)
-
19. (canceled)
-
20. An article of manufacture comprising a computer readable storage medium having computer readable instructions tangibly embodied thereon which, when implemented, cause a computer to carry out a plurality of method steps comprising:
-
determining which documents in a document corpus mention an entity of interest; clustering the documents that mention an entity of interest according to a temporal signal, a structural signal and/or a content signal, thereby forming at least one cluster of documents; and annotating at least one document in the at least one cluster of documents by marking each occurrence of the entity in the at least one document. - View Dependent Claims (1, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32)
-
-
25. A system for annotating an entity in a document corpus using cross-document signals, comprising:
-
at least one distinct software module, each distinct software module being embodied on a tangible computer-readable medium; a memory; and at least one processor coupled to the memory and operative for; determining which documents in a document corpus mention an entity of interest; clustering the documents that mention an entity of interest according to a temporal signal, a structural signal and/or a content signal, thereby forming at least one cluster of documents; and annotating at least one document in the at least one cluster of documents by marking each occurrence of the entity in the at least one document.
-
-
29-1. The article of manufacture of claim 20, wherein the method steps comprise annotating a single entity in multiple documents.
-
33. An article of manufacture comprising a computer readable storage medium having computer readable instructions tangibly embodied thereon which, when implemented, cause a computer to carry out a plurality of method steps comprising:
-
processing each document in a corpus of documents obtained from at least one online source to identify which documents mention an entity of interest; using a description of the entity derived from a database to generate at least one context feature for the entity; processing each of the documents that mention the entity of interest by comparing text from each document with the at least one context feature and grouping the documents with a comparison similarity above a pre-determined threshold into a cluster of documents; annotating at least one document in the cluster of documents by marking each occurrence of the entity in the at least one document; and outputting the at least one annotated document to a user.
-
Specification