Scalable ground truth disambiguation

US 10,572,826 B2
Filed: 04/18/2017
Issued: 02/25/2020
Est. Priority Date: 04/18/2017
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method for disambiguating training data in natural language classification (NLC), comprising:

obtaining, by one or more processor of a computer, an utterance input from a user agent;

collecting, by the one or more processor, context data of the utterance input from the user agent, wherein the context data describes circumstances of the utterance input;

generating, by the one or more processor, a context tag of one or more context tag based on the context data, wherein the one or more context tag corresponds to the utterance input;

selecting, by the one or more processor, one or more ground truth from the training data by use of the utterance input and the context tag, wherein each of the one or more ground truth respectively includes an utterance and an intent, wherein the utterance of each ground truth is semantically identical to the utterance input, and wherein the intent of each ground truth is semantically consistent with the context tag; and

updating, by the one or more processor, the one or more ground truth by attaching the context tag, wherein the selecting is performed by invoking a machine learning process with the utterance input and the context tag so that the machine learning process provides a first ground truth having a first utterance and a first intent, wherein the updating the one or more ground truth by attaching the context tag includes updating the first ground truth so that the first ground truth includes the context tag, and training the machine learning process using first training data, wherein the first training data used to train the machine learning process includes the first ground truth tagged with the context tag and having the first utterance and the first intent.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, computer program products, and systems are presented. The methods include, for instance: obtaining an utterance input from a user agent, and collecting context data of the utterance input. A context tag is generated based on the context data, and one or more ground truth having respective utterance semantically identical to the utterance input is selected. Semantical relationship between the context tag and an intent of the selected ground truth is examined and the selected ground truth is updated with the context tag.

40 Citations

19 Claims

1. A computer implemented method for disambiguating training data in natural language classification (NLC), comprising:
- obtaining, by one or more processor of a computer, an utterance input from a user agent;
  
  collecting, by the one or more processor, context data of the utterance input from the user agent, wherein the context data describes circumstances of the utterance input;
  
  generating, by the one or more processor, a context tag of one or more context tag based on the context data, wherein the one or more context tag corresponds to the utterance input;
  
  selecting, by the one or more processor, one or more ground truth from the training data by use of the utterance input and the context tag, wherein each of the one or more ground truth respectively includes an utterance and an intent, wherein the utterance of each ground truth is semantically identical to the utterance input, and wherein the intent of each ground truth is semantically consistent with the context tag; and
  
  updating, by the one or more processor, the one or more ground truth by attaching the context tag, wherein the selecting is performed by invoking a machine learning process with the utterance input and the context tag so that the machine learning process provides a first ground truth having a first utterance and a first intent, wherein the updating the one or more ground truth by attaching the context tag includes updating the first ground truth so that the first ground truth includes the context tag, and training the machine learning process using first training data, wherein the first training data used to train the machine learning process includes the first ground truth tagged with the context tag and having the first utterance and the first intent.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 16, 17, 18, 19)
- - 2. The computer implemented method of claim 1, wherein the context data of the utterance input is selected from the group consisting of:
    - a manual input, metadata of a page from which the utterance input has been provided, and login information of a user.
  - 3. The computer implemented method of claim 1, wherein the user agent runs in a retail website, and wherein the context data of the utterance is selected from the group consisting of:
    - a department name of a page from which the utterance input has been provided, a product type of the page, previous search terms used by a user, and search details selected by the user.
  - 4. The computer implemented method of claim 1, the generating comprising:
    - selecting an instance from the context data for the context tag, wherein the instance is associated with a first intent of the utterance input, wherein the instance distinguishes the first intent from a second intent, wherein the utterance input means both the first intent and the second intent; and
      
      assigning the context tag for the utterance input with the instance.
  - 5. The computer implemented method of claim 1, the selecting comprising:
    - discovering one or more ground truth that has respective utterance identical to the utterance input;
      
      ascertaining that an intent of a first ground truth from the discovering is semantically relevant to the context tag by examining respective intent of the one or more ground truth from the discovering; and
      
      determining the first ground truth as a ground truth corresponding to the utterance input and the context tag.
  - 6. The computer implemented method of claim 1, wherein the selecting is performed by use of a machine learning process.
  - 7. The computer implemented method of claim 1, wherein a ground truth of the one or more ground truth from the updating includes an utterance, the context tag, and an intent, such that the ground truth disambiguate the utterance according to the context tag and such that the intent of the ground truth is utilized to interpret the utterance input that is identical to the utterance of the ground truth.
  - 16. The computer implemented method of claim 1, wherein the context data is absent of data derived from the utterance input.
  - 17. The computer implemented method of claim 1, wherein the context data is absent of data derived from the utterance input, and wherein the context data of the utterance includes manually input data, metadata of a page from which the utterance input has been provided, and login information of a user.
  - 18. The computer implemented method of claim 1, further including updating the training data of the NLC so that the training data of the NLC includes the one or more ground truth as tagged with the context tag.
  - 19. The computer implemented method of claim 1, wherein the context data is absent of data derived from the utterance input, and wherein the context data of the utterance includes manually input data, metadata of a page from which the utterance input has been provided, and login information of a user,wherein the user agent runs in a retail website, and wherein the context data of the utterance includes a department name of a page from which the utterance input has been provided, a product type of the page, previous search terms used by a user, and search details selected by the user,wherein the generating comprises selecting an instance from the context data for the context tag, wherein the instance is associated with a first intent of the utterance input, wherein the instance distinguishes the first intent from a second intent, wherein the utterance input means both the first intent and the second intent, and assigning the context tag for the utterance input with the instance,wherein the selecting comprises discovering one or more ground truth that has respective utterance identical to the utterance input, ascertaining that an intent of a certain ground truth from the discovering is semantically relevant to the context tag by examining respective intent of the one or more ground truth from the discovering, and determining the certain ground truth as a ground truth corresponding to the utterance input and the context tag.

8. A computer program product comprising:
- a computer readable storage medium readable by one or more processor and storing instructions for execution by the one or more processor for performing a method for disambiguating training data in natural language classification, comprising;
  
  obtaining an utterance input from a user agent;
  
  collecting context data of the utterance input from the user agent, wherein the context data describes circumstances of the utterance input;
  
  generating a context tag of one or more context tag based on the context data, wherein the one or more context tag corresponds to the utterance input;
  
  selecting one or more ground truth from the training data by use of the utterance input and the context tag, wherein each of the one or more ground truth respectively includes an utterance and an intent, wherein the utterance of each ground truth is semantically identical to the utterance input, and wherein the intent of each ground truth is semantically consistent with the context tag; and
  
  updating the one or more ground truth by attaching the context tag, wherein the selecting is performed by invoking a machine learning process with the utterance input and the context tag so that the machine learning process provides a first ground truth having a first utterance and a first intent, wherein the updating the one or more ground truth by attaching the context tag includes updating the first ground truth so that the first ground truth includes the context tag, and training the machine learning process using first training data, wherein the first training data used to train the machine learning process includes the first ground truth tagged with the context tag and having the first utterance and the first intent.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The computer program product of claim 8, wherein the context data of the utterance input is selected from the group consisting of:
    - a manual input, metadata of a page from which the utterance input has been provided, and login information of a user.
  - 10. The computer program product of claim 8, wherein the user agent runs in a retail website, and wherein the context data of the utterance is selected from the group consisting of:
    - a department name of a page from which the utterance input has been provided, a product type of the page, previous search terms used by a user, and search details selected by the user.
  - 11. The computer program product of claim 8, the generating comprising:
    - selecting an instance from the context data for the context tag, wherein the instance is associated with a first intent of the utterance input, wherein the instance distinguishes the first intent from a second intent, wherein the utterance input means both the first intent and the second intent; and
      
      assigning the context tag for the utterance input with the instance.
  - 12. The computer program product of claim 8, the selecting comprising:
    - discovering one or more ground truth that has respective utterance identical to the utterance input;
      
      ascertaining that an intent of a first ground truth from the discovering is semantically relevant to the context tag by examining respective intent of the one or more ground truth from the discovering; and
      
      determining the first ground truth as a ground truth corresponding to the utterance input and the context tag.
  - 13. The computer program product of claim 8, wherein the selecting is performed by use of a machine learning process.
  - 14. The computer program product of claim 8, wherein a ground truth of the one or more ground truth from the updating includes an utterance, the context tag, and an intent, such that the ground truth disambiguate the utterance according to the context tag and such that the intent of the ground truth is utilized to interpret the utterance input that is identical to the utterance of the ground truth.

15. A system comprising:
- a memory;
  
  one or more processor in communication with the memory; and
  
  program instructions executable by the one or more processor via the memory to perform a method for disambiguating training data in natural language classification, comprising;
  
  obtaining an utterance input from a user agent;
  
  collecting context data of the utterance input from the user agent, wherein the context data describes circumstances of the utterance input;
  
  generating a context tag of one or more context tag based on the context data, wherein the one or more context tag corresponds to the utterance input;
  
  selecting one or more ground truth from the training data by use of the utterance input and the context tag, wherein each of the one or more ground truth respectively includes an utterance and an intent, wherein the utterance of each ground truth is semantically identical to the utterance input, and wherein the intent of each ground truth is semantically consistent with the context tag; and
  
  updating the one or more ground truth by attaching the context tag, wherein the selecting is performed by invoking a machine learning process with the utterance input and the context tag so that the machine learning process provides a first ground truth having a first utterance and a first intent, wherein the updating the one or more ground truth by attaching the context tag includes updating the first ground truth so that the first ground truth includes the context tag , and training the machine learning process using first training data, wherein the first training data used to train the machine learning process includes the first ground truth tagged with the context tag and having the first utterance and the first intent.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Altaf, Faheem, Deluca, Lisa Seacat, Srinivas, Raghuram
Primary Examiner(s)
Serrou, Abdelali

Application Number

US15/490,081
Publication Number

US 20180301141A1
Time in Patent Office

1,043 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/953   Querying, e.g. by the use o...

G06F 40/20   Natural language analysis s...

G06N 20/00   Machine learning

G06N 5/025   Extracting rules from data

Scalable ground truth disambiguation

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

40 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Scalable ground truth disambiguation

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

40 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links