Collecting Training Data using Anomaly Detection

US 20170262429A1
Filed: 03/12/2016
Published: 09/14/2017
Est. Priority Date: 03/12/2016
Status: Active Grant

First Claim

Patent Images

1. A method implemented by an information handling system that includes a memory and a processor, the method comprising:

detecting a multi-entity co-occurrence anomaly in a set of documents, wherein the multi-entity co-occurrence anomaly corresponds to an amount of co-occurrences of at least a first entity and a second entity;

determining that at least one document in the of the set of documents includes a title comprising at least one connecting verb that grammatically connects the first entity to the second entity;

collecting a plurality of document segments from the set of documents in response to the determination, wherein each of the collected plurality of document segments includes the first entity, the second entity, and the at least one connecting verb; and

training a relation-based classifier using the collected plurality of document segments.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An approach is provided in which an information handling system detects a multi-entity co-occurrence anomaly within a set of documents that corresponds to an amount of times that a first entity and a second entity co-occur in the set of documents. The information handling system then determines that at least one of the documents includes a title having a verb that grammatically connects the first entity to the second entity. As such, the information handling system collects document segments from the set of documents that have the first entity, the second entity, and the connecting verb. In turn, the information handling system uses the collected document segments to train a relation-based classifier.

44 Citations

View as Search Results

20 Claims

1. A method implemented by an information handling system that includes a memory and a processor, the method comprising:
- detecting a multi-entity co-occurrence anomaly in a set of documents, wherein the multi-entity co-occurrence anomaly corresponds to an amount of co-occurrences of at least a first entity and a second entity;
  
  determining that at least one document in the of the set of documents includes a title comprising at least one connecting verb that grammatically connects the first entity to the second entity;
  
  collecting a plurality of document segments from the set of documents in response to the determination, wherein each of the collected plurality of document segments includes the first entity, the second entity, and the at least one connecting verb; and
  
  training a relation-based classifier using the collected plurality of document segments.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein the first entity is a first entity type and the second entity is a second entity type, and wherein both the first entity and the second entity are devoid of an entity name.
  - 3. The method of claim 1 further comprising:
    - determining, based upon a co-occurrence threshold, an anomaly duration of the multi-entity co-occurrence anomaly; and
      
      wherein each of the set of documents include a time stamp within the anomaly duration.
  - 4. The method of claim 1 further comprising:
    - aggregating a plurality of connecting verbs within the set of documents, wherein each of the plurality of connecting verbs grammatically connects the first entity to the second entity, the at least one connecting verb included in the plurality of connecting verbs;
      
      selecting a set of relevant connecting verbs, from the plurality of connecting verbs, based upon an aggregation amount of each of the plurality of connecting verbs and their relevance; and
      
      performing the collection of the plurality of document segments based upon the set of relevant connecting verbs.
  - 5. The method of claim 4 wherein each of the plurality of connecting verbs form a subject-verb-object (SVO) relation with the first entity and the second entity.
  - 6. The method of claim 1 wherein the information handling system is a question answer system, the method further comprising:
    - generating an alert in response to detecting the multi-entity co-occurrence anomaly, wherein the alert includes the first entity, the second entity, and the at least one connecting verb.
  - 7. The method of claim 1 wherein the first entity and the second entity are selected from the group consisting of an entity type and an entity name.

8. An information handling system comprising:
- one or more processors;
  
  a memory coupled to at least one of the processors; and
  
  a set of computer program instructions stored in the memory and executed by at least one of the processors in order to perform actions of;
  
  detecting a multi-entity co-occurrence anomaly in a set of documents, wherein the multi-entity co-occurrence anomaly corresponds to an amount of co-occurrences of at least a first entity and a second entity;
  
  determining that at least one document in the of the set of documents includes a title comprising at least one connecting verb that grammatically connects the first entity to the second entity;
  
  collecting a plurality of document segments from the set of documents in response to the determination, wherein each of the collected plurality of document segments includes the first entity, the second entity, and the at least one connecting verb; and
  
  training a relation-based classifier using the collected plurality of document segments.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The information handling system of claim 8 wherein the first entity is a first entity type and the second entity is a second entity type, and wherein both the first entity and the second entity are devoid of an entity name.
  - 10. The information handling system of claim 8 wherein at least one of the one or more processors perform additional actions comprising:
    - determining, based upon a co-occurrence threshold, an anomaly duration of the multi-entity co-occurrence anomaly; and
      
      wherein each of the set of documents include a time stamp within the anomaly duration.
  - 11. The information handling system of claim 8 wherein at least one of the one or more processors perform additional actions comprising:
    - aggregating a plurality of connecting verbs within the set of documents, wherein each of the plurality of connecting verbs grammatically connects the first entity to the second entity, the at least one connecting verb included in the plurality of connecting verbs;
      
      selecting a set of relevant connecting verbs, from the plurality of connecting verbs, based upon an aggregation amount of each of the plurality of connecting verbs and their relevance; and
      
      performing the collection of the plurality of document segments based upon the set of relevant connecting verbs.
  - 12. The information handling system of claim 11 wherein each of the plurality of connecting verbs form a subject-verb-object (SVO) relation with the first entity and the second entity.
  - 13. The information handling system of claim 8 wherein the information handling system is a question answer system, and wherein at least one of the one or more processors perform additional actions comprising:
    - generating an alert in response to detecting the multi-entity co-occurrence anomaly, wherein the alert includes the first entity, the second entity, and the at least one connecting verb.
  - 14. The information handling system of claim 8 wherein the first entity and the second entity are selected from the group consisting of an entity type and an entity name.

15. A computer program product stored in a computer readable storage medium, comprising computer program code that, when executed by an information handling system, causes the information handling system to perform actions comprising:
- detecting a multi-entity co-occurrence anomaly in a set of documents, wherein the multi-entity co-occurrence anomaly corresponds to an amount of co-occurrences of at least a first entity and a second entity;
  
  determining that at least one document in the of the set of documents includes a title comprising at least one connecting verb that grammatically connects the first entity to the second entity;
  
  collecting a plurality of document segments from the set of documents in response to the determination, wherein each of the collected plurality of document segments includes the first entity, the second entity, and the at least one connecting verb; and
  
  training a relation-based classifier using the collected plurality of document segments.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer program product of claim 15 wherein the first entity is a first entity type and the second entity is a second entity type, and wherein both the first entity and the second entity are devoid of an entity name.
  - 17. The computer program product of claim 15 wherein the information handling system performs additional actions comprising:
    - determining, based upon a co-occurrence threshold, an anomaly duration of the multi-entity co-occurrence anomaly; and
      
      wherein each of the set of documents include a time stamp within the anomaly duration.
  - 18. The computer program product of claim 15 wherein the information handling system performs additional actions comprising:
    - aggregating a plurality of connecting verbs within the set of documents, wherein each of the plurality of connecting verbs grammatically connects the first entity to the second entity, the at least one connecting verb included in the plurality of connecting verbs;
      
      selecting a set of relevant connecting verbs, from the plurality of connecting verbs, based upon an aggregation amount of each of the plurality of connecting verbs and their relevance; and
      
      performing the collection of the plurality of document segments based upon the set of relevant connecting verbs.
  - 19. The computer program product of claim 18 wherein each of the plurality of connecting verbs form a subject-verb-object (SVO) relation with the first entity and the second entity.
  - 20. The computer program product of claim 15 wherein the information handling system is a question answer system, and wherein the information handling system performs additional actions comprising:
    - generating an alert in response to detecting the multi-entity co-occurrence anomaly, wherein the alert includes the first entity, the second entity, and the at least one connecting verb.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Harper, Devin R., Lakshmanan, Pawan K., Schoeninger, Gregory W., Turner, Elliot B.

Granted Patent

US 10,078,632 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/3329   Natural language query form...

G06F 16/35   Clustering; Classification

G06F 40/253   Grammatical analysis; Style...

G06F 40/289   Phrasal analysis, e.g. fini...

G06F 40/40   Processing or translation o...

Collecting Training Data using Anomaly Detection

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

44 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Collecting Training Data using Anomaly Detection

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

44 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links