Scalable ground truth disambiguation
First Claim
Patent Images
1. A computer implemented method for disambiguating training data in natural language classification (NLC), comprising:
- obtaining, by one or more processor of a computer, an utterance input from a user agent;
collecting, by the one or more processor, context data of the utterance input from the user agent, wherein the context data describes circumstances of the utterance input;
generating, by the one or more processor, a context tag of one or more context tag based on the context data, wherein the one or more context tag corresponds to the utterance input;
selecting, by the one or more processor, one or more ground truth from the training data by use of the utterance input and the context tag, wherein each of the one or more ground truth respectively includes an utterance and an intent, wherein the utterance of each ground truth is semantically identical to the utterance input, and wherein the intent of each ground truth is semantically consistent with the context tag; and
updating, by the one or more processor, the one or more ground truth by attaching the context tag, wherein the selecting is performed by invoking a machine learning process with the utterance input and the context tag so that the machine learning process provides a first ground truth having a first utterance and a first intent, wherein the updating the one or more ground truth by attaching the context tag includes updating the first ground truth so that the first ground truth includes the context tag, and training the machine learning process using first training data, wherein the first training data used to train the machine learning process includes the first ground truth tagged with the context tag and having the first utterance and the first intent.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, computer program products, and systems are presented. The methods include, for instance: obtaining an utterance input from a user agent, and collecting context data of the utterance input. A context tag is generated based on the context data, and one or more ground truth having respective utterance semantically identical to the utterance input is selected. Semantical relationship between the context tag and an intent of the selected ground truth is examined and the selected ground truth is updated with the context tag.
40 Citations
19 Claims
-
1. A computer implemented method for disambiguating training data in natural language classification (NLC), comprising:
-
obtaining, by one or more processor of a computer, an utterance input from a user agent; collecting, by the one or more processor, context data of the utterance input from the user agent, wherein the context data describes circumstances of the utterance input; generating, by the one or more processor, a context tag of one or more context tag based on the context data, wherein the one or more context tag corresponds to the utterance input; selecting, by the one or more processor, one or more ground truth from the training data by use of the utterance input and the context tag, wherein each of the one or more ground truth respectively includes an utterance and an intent, wherein the utterance of each ground truth is semantically identical to the utterance input, and wherein the intent of each ground truth is semantically consistent with the context tag; and updating, by the one or more processor, the one or more ground truth by attaching the context tag, wherein the selecting is performed by invoking a machine learning process with the utterance input and the context tag so that the machine learning process provides a first ground truth having a first utterance and a first intent, wherein the updating the one or more ground truth by attaching the context tag includes updating the first ground truth so that the first ground truth includes the context tag, and training the machine learning process using first training data, wherein the first training data used to train the machine learning process includes the first ground truth tagged with the context tag and having the first utterance and the first intent. - View Dependent Claims (2, 3, 4, 5, 6, 7, 16, 17, 18, 19)
-
-
8. A computer program product comprising:
a computer readable storage medium readable by one or more processor and storing instructions for execution by the one or more processor for performing a method for disambiguating training data in natural language classification, comprising; obtaining an utterance input from a user agent; collecting context data of the utterance input from the user agent, wherein the context data describes circumstances of the utterance input; generating a context tag of one or more context tag based on the context data, wherein the one or more context tag corresponds to the utterance input; selecting one or more ground truth from the training data by use of the utterance input and the context tag, wherein each of the one or more ground truth respectively includes an utterance and an intent, wherein the utterance of each ground truth is semantically identical to the utterance input, and wherein the intent of each ground truth is semantically consistent with the context tag; and updating the one or more ground truth by attaching the context tag, wherein the selecting is performed by invoking a machine learning process with the utterance input and the context tag so that the machine learning process provides a first ground truth having a first utterance and a first intent, wherein the updating the one or more ground truth by attaching the context tag includes updating the first ground truth so that the first ground truth includes the context tag, and training the machine learning process using first training data, wherein the first training data used to train the machine learning process includes the first ground truth tagged with the context tag and having the first utterance and the first intent. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A system comprising:
-
a memory; one or more processor in communication with the memory; and program instructions executable by the one or more processor via the memory to perform a method for disambiguating training data in natural language classification, comprising; obtaining an utterance input from a user agent; collecting context data of the utterance input from the user agent, wherein the context data describes circumstances of the utterance input; generating a context tag of one or more context tag based on the context data, wherein the one or more context tag corresponds to the utterance input; selecting one or more ground truth from the training data by use of the utterance input and the context tag, wherein each of the one or more ground truth respectively includes an utterance and an intent, wherein the utterance of each ground truth is semantically identical to the utterance input, and wherein the intent of each ground truth is semantically consistent with the context tag; and updating the one or more ground truth by attaching the context tag, wherein the selecting is performed by invoking a machine learning process with the utterance input and the context tag so that the machine learning process provides a first ground truth having a first utterance and a first intent, wherein the updating the one or more ground truth by attaching the context tag includes updating the first ground truth so that the first ground truth includes the context tag , and training the machine learning process using first training data, wherein the first training data used to train the machine learning process includes the first ground truth tagged with the context tag and having the first utterance and the first intent.
-
Specification