Collecting Training Data using Anomaly Detection
First Claim
1. A method implemented by an information handling system that includes a memory and a processor, the method comprising:
- detecting a multi-entity co-occurrence anomaly in a set of documents, wherein the multi-entity co-occurrence anomaly corresponds to an amount of co-occurrences of at least a first entity and a second entity;
determining that at least one document in the of the set of documents includes a title comprising at least one connecting verb that grammatically connects the first entity to the second entity;
collecting a plurality of document segments from the set of documents in response to the determination, wherein each of the collected plurality of document segments includes the first entity, the second entity, and the at least one connecting verb; and
training a relation-based classifier using the collected plurality of document segments.
1 Assignment
0 Petitions
Accused Products
Abstract
An approach is provided in which an information handling system detects a multi-entity co-occurrence anomaly within a set of documents that corresponds to an amount of times that a first entity and a second entity co-occur in the set of documents. The information handling system then determines that at least one of the documents includes a title having a verb that grammatically connects the first entity to the second entity. As such, the information handling system collects document segments from the set of documents that have the first entity, the second entity, and the connecting verb. In turn, the information handling system uses the collected document segments to train a relation-based classifier.
44 Citations
20 Claims
-
1. A method implemented by an information handling system that includes a memory and a processor, the method comprising:
-
detecting a multi-entity co-occurrence anomaly in a set of documents, wherein the multi-entity co-occurrence anomaly corresponds to an amount of co-occurrences of at least a first entity and a second entity; determining that at least one document in the of the set of documents includes a title comprising at least one connecting verb that grammatically connects the first entity to the second entity; collecting a plurality of document segments from the set of documents in response to the determination, wherein each of the collected plurality of document segments includes the first entity, the second entity, and the at least one connecting verb; and training a relation-based classifier using the collected plurality of document segments. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An information handling system comprising:
-
one or more processors; a memory coupled to at least one of the processors; and a set of computer program instructions stored in the memory and executed by at least one of the processors in order to perform actions of; detecting a multi-entity co-occurrence anomaly in a set of documents, wherein the multi-entity co-occurrence anomaly corresponds to an amount of co-occurrences of at least a first entity and a second entity; determining that at least one document in the of the set of documents includes a title comprising at least one connecting verb that grammatically connects the first entity to the second entity; collecting a plurality of document segments from the set of documents in response to the determination, wherein each of the collected plurality of document segments includes the first entity, the second entity, and the at least one connecting verb; and training a relation-based classifier using the collected plurality of document segments. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product stored in a computer readable storage medium, comprising computer program code that, when executed by an information handling system, causes the information handling system to perform actions comprising:
-
detecting a multi-entity co-occurrence anomaly in a set of documents, wherein the multi-entity co-occurrence anomaly corresponds to an amount of co-occurrences of at least a first entity and a second entity; determining that at least one document in the of the set of documents includes a title comprising at least one connecting verb that grammatically connects the first entity to the second entity; collecting a plurality of document segments from the set of documents in response to the determination, wherein each of the collected plurality of document segments includes the first entity, the second entity, and the at least one connecting verb; and training a relation-based classifier using the collected plurality of document segments. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification