Methods and systems relating to information extraction
First Claim
Patent Images
1. A method of training an information extraction system comprising:
- employing a first corpus of annotated text;
automatedly extracting information from a second corpus of unannotated text;
automatedly populating a discriminative information extraction model based on the information extracted from the second corpus and the first corpus of annotated text;
automatedly identifying from at least one of the first corpus, the second corpus, and a third corpus one or more word strings including words having an ambiguous relationship and providing the one or more word strings to a trainer for annotation; and
automatedly updating the discriminative information extraction model based on annotations to the one or more word strings provided by the trainer.
5 Assignments
0 Petitions
Accused Products
Abstract
The invention relates to information extraction systems having discriminative models which utilize hierarchical cluster trees and active learning to enhance training.
117 Citations
30 Claims
-
1. A method of training an information extraction system comprising:
-
employing a first corpus of annotated text;
automatedly extracting information from a second corpus of unannotated text;
automatedly populating a discriminative information extraction model based on the information extracted from the second corpus and the first corpus of annotated text;
automatedly identifying from at least one of the first corpus, the second corpus, and a third corpus one or more word strings including words having an ambiguous relationship and providing the one or more word strings to a trainer for annotation; and
automatedly updating the discriminative information extraction model based on annotations to the one or more word strings provided by the trainer. - View Dependent Claims (2, 3, 4)
-
-
5. A storage medium including computer readable instructions for carrying out a method of training an information extraction system comprising:
-
employing a first corpus of annotated text;
automatedly extracting information from a second corpus of unannotated text;
automatedly populating a discriminative information extraction model based on the information extracted from the second corpus and the first corpus of annotated text;
automatedly identifying from at least one of the first corpus, the second corpus, and a third corpus one or more word strings including words having an ambiguous relationship and providing the one or more word strings to a trainer for annotation; and
automatedly updating the discriminative information extraction model based on annotations to the one or more word strings provided by the trainer. - View Dependent Claims (6, 7, 8)
-
-
9. A method of training an information extraction system comprising:
-
employing a first corpus of annotated text;
automatedly parsing a second corpus of text based on relative positions of words in the second corpus and generating a hierarchical cluster tree indicative thereof;
automatedly populating a discriminative information extraction model based on the hierarchical cluster tree and the first corpus of annotated text;
automatedly identifying from at least one of the first corpus, the second corpus, and a third corpus one or more word strings including words having an ambiguous relationship and providing the one or more word strings to a trainer for annotation; and
automatedly updating the discriminative information extraction model based on annotations to the one or more word strings provided by the trainer. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A storage medium including computer readable instructions for carrying out a method of training an information extraction system comprising:
-
employing a first corpus of annotated text;
automatedly parsing a second corpus of text based on relative positions of words in the second corpus and generating a hierarchical cluster tree indicative thereof;
automatedly populating a discriminative information extraction model based on the hierarchical cluster tree and the first corpus of annotated text;
automatedly identifying from at least one of the first corpus, the second corpus, and a third corpus one or more word strings including words having an ambiguous relationship and providing the one or more word strings to a trainer for annotation; and
automatedly updating the discriminative information extraction model based on annotations to the one or more word strings provided by the trainer. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A method of training an information extraction system comprising:
-
receiving a first corpus of annotated text;
receiving a hierchical cluster tree indicative of the relative positions of words in a second corpus of text;
automatedly populating a discriminative information extraction model based on the hierarchical cluster tree and the first corpus of annotated text;
automatedly identifying from at least one of the first corpus, the second corpus, and a third corpus one or more word strings including words having an ambiguous relationship and providing the one or more word strings to a trainer for annotation; and
automatedly updating the discriminative information extraction model based on annotations to the one or more word strings provided by the trainer.
-
-
30. A storage medium including computer readable instructions for carrying out a method of training an information extraction system comprising:
-
employing a first corpus of annotated text;
receiving a hierchical cluster tree indicative of the relative positions of words in a second corpus of text;
automatedly populating a discriminative information extraction model based on the hierarchical cluster tree and the first corpus of annotated text;
automatedly identifying from at least one of the first corpus, the second corpus, and a third corpus one or more word strings including words having an ambiguous relationship and providing the one or more word strings to a trainer for annotation; and
automatedly updating the discriminative information extraction model based on annotations to the one or more word strings provided by the trainer.
-
Specification