Process for identifying completion of domain adaptation dictionary activities
First Claim
1. An apparatus comprising:
- a memory; and
a processor coupled to the memory and configured to;
identify a corpus of documents of an evaluation domain;
generate a first lexicon based on the corpus of documents of the evaluation domain;
determine a threshold that indicates a sufficiency of domain adaptation of the evaluation domain based at least in part on the first lexicon;
identify a corpus of documents of a client domain;
generate a second lexicon based on the corpus of documents of the client domain;
determine a metric associated with the corpus of documents of the client domain and the second lexicon by determining a ratio of newly extracted and unique domain terms extracted from the corpus of documents of the client domain for inclusion in the second lexicon to a total number of domain terms extracted from the corpus of documents of the client domain;
determine that domain adaptation of the client domain is complete when the metric exceeds the threshold;
receive a first question for processing according to natural language processing; and
perform first natural language processing to determine a first answer to the first question based at least in part on the second lexicon.
2 Assignments
0 Petitions
Accused Products
Abstract
An apparatus comprising a memory and a processor configured for semi-autonomous natural language processing domain adaptation related activities. The processor coupled to the memory and configured to identify a corpus of documents of an evaluation domain and generate a first lexicon based on the corpus of documents of the evaluation domain, and determine a threshold that indicates a sufficiency of domain adaptation of the evaluation domain based at least in part on the first lexicon. The processor is further configured to identify a corpus of documents of a client domain, generate a second lexicon based on the corpus of documents of the client domain, determine a metric associated with the corpus of documents of the client domain and the second lexicon, and determine that domain adaptation of the client domain is complete when the metric exceeds the threshold.
7 Citations
20 Claims
-
1. An apparatus comprising:
-
a memory; and a processor coupled to the memory and configured to; identify a corpus of documents of an evaluation domain; generate a first lexicon based on the corpus of documents of the evaluation domain; determine a threshold that indicates a sufficiency of domain adaptation of the evaluation domain based at least in part on the first lexicon; identify a corpus of documents of a client domain; generate a second lexicon based on the corpus of documents of the client domain; determine a metric associated with the corpus of documents of the client domain and the second lexicon by determining a ratio of newly extracted and unique domain terms extracted from the corpus of documents of the client domain for inclusion in the second lexicon to a total number of domain terms extracted from the corpus of documents of the client domain; determine that domain adaptation of the client domain is complete when the metric exceeds the threshold; receive a first question for processing according to natural language processing; and perform first natural language processing to determine a first answer to the first question based at least in part on the second lexicon. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product for performing domain adaptation of a domain, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
-
identify a corpus of documents from within a client domain; divide the corpus of documents into a plurality of sub-corpora; extract at least one domain term from each of the plurality of sub-corpora, wherein domain terms extracted from one of the plurality of sub-corpora form a lexicon for that respective sub-corpora of the plurality of sub-corpora; determine a metric having a relationship to the lexicon for that respective sub-corpora of the plurality of sub-corpora by determining a ratio of newly extracted and unique domain terms extracted from that respective sub-corpora of the plurality of sub-corpora for inclusion in the lexicon for that respective sub-corpora of the plurality of sub-corpora to a total number of domain terms extracted from that respective sub-corpora of the plurality of sub-corpora; determine, based at least in part on the metric, that sufficient domain adaptation of the client domain has been performed; receive a question for processing according to natural language processing; and perform the natural language processing to determine a first answer to the first question based at least in part on the lexicon. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A computer-implemented method, comprising:
-
identifying a corpus of documents of an evaluation domain; generating a first lexicon based on the corpus of documents of the evaluation domain; determining a threshold that indicates a sufficiency of domain adaptation of the evaluation domain based at least in part on the first lexicon; identifying a corpus of documents of a client domain; generating a second lexicon based on the corpus of documents of the client domain; determining a metric associated with the corpus of documents of the client domain and the second lexicon by determining a ratio of newly extracted and unique domain terms extracted from the corpus of documents of the client domain for inclusion in the second lexicon to a total number of domain terms extracted from the corpus of documents of the client domain; determining that domain adaptation of the client domain is complete when the metric exceeds the threshold; receiving a first question for processing according to natural language processing; and performing first natural language processing to determine a first answer to the first question based at least in part on the second lexicon. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification