DYNAMIC CREATION OF DOMAIN SPECIFIC CORPORA
First Claim
1. A computer implemented method for generating a domain-relevant corpus for use in a question answering (QA) application, the method comprising:
- receiving a corpus of select documents corresponding to a plurality of elements of a model of a domain;
generating a plurality of select topics based on the corpus of select documents;
comparing topics of an additional document to the plurality of select topics to obtain a distance measure between the topics of the additional document and the plurality of select topics; and
upon the distance measure matching a set of selection criteria, adding the additional document to a new corpus.
1 Assignment
0 Petitions
Accused Products
Abstract
A model of a domain is received, wherein the model has a plurality of elements. A corpus of select documents covering the plurality of elements of the model is also received. A plurality of select topics is generated from the corpus of select documents. Topics of an additional document are compared to the plurality of select topics to calculate a distance between the topics of the additional document and the plurality of select topics. Upon the distance meeting a threshold value, a new corpus is generated to include the additional document. The new document is annotated with the plurality of elements of the model.
-
Citations
20 Claims
-
1. A computer implemented method for generating a domain-relevant corpus for use in a question answering (QA) application, the method comprising:
-
receiving a corpus of select documents corresponding to a plurality of elements of a model of a domain; generating a plurality of select topics based on the corpus of select documents; comparing topics of an additional document to the plurality of select topics to obtain a distance measure between the topics of the additional document and the plurality of select topics; and upon the distance measure matching a set of selection criteria, adding the additional document to a new corpus. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer system for generating a domain-relevant corpus for use in a question answering (QA) application, comprising:
-
a computer having a processor and a computer-readable storage device; a program embodied on the storage device for execution by the processor, the program having a plurality of program modules, the program modules including; a receiving module configured to receive a corpus of select documents corresponding to a plurality of elements of a model of a domain; a generating module configured to generate a plurality of select topics based on the corpus of select documents; a comparing module configured to compare topics of an additional document to the plurality of select topics to obtain a distance measure between the topics of the additional document and the plurality of select topics; and an adding module configured to add the additional document to a new corpus upon the distance measure matching a set of selection criteria. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A computer program product for generating a domain-relevant corpus for use in a question answering (QA) application, comprising a computer-readable storage medium having program code embodied therewith, the program code executable by a processor of a computer to perform a method comprising:
-
receiving, by the processor, a corpus of select documents corresponding to a plurality of elements of a model of a domain; generating, by the processor, a plurality of select topics based on the corpus of select documents; comparing, by the processor, topics of an additional document to the plurality of select topics to obtain a distance measure between the topics of the additional document and the plurality of select topics; and upon the distance measure matching a set of selection criteria, adding, by the processor, the additional document to a new corpus. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification