System for automatically annotating training data for a natural language understanding system

US 7,548,847 B2
Filed: 05/10/2002
Issued: 06/16/2009
Est. Priority Date: 05/10/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A method of generating annotated training data to train a natural language understanding (NLU) system having one or more models, comprising:

generating a proposed annotation with the NLU system for each of one or more units of unannotated training data;

displaying the proposed annotations for user verification or correction to obtain a user-confirmed annotation; and

training the NLU system with the user-confirmed annotation; and

displaying an indication of a volume of training data used to train a plurality of different portions of the one or more models of the natural language understanding system;

wherein displaying the proposed annotations for user verification or correction comprises;

receiving a user input indicative of a user-identified portion of the proposed annotation; and

displaying a plurality of alternative proposed annotations for the user-identified portion;

wherein the one or more models impose model constraints and wherein displaying the one or more alternative proposed annotations comprises displaying an alternative proposed annotation for the user-identified portion of data only if the alternative proposed annotation can lead to an overall annotation for the unit that is consistent with the model constraints;

wherein the proposed annotation includes parent and child nodes and wherein displaying a plurality of alternative proposed annotations includes displaying a user actuable delete node input which, when actuated, deletes a child node, and a user actuable add node input which, when actuated, adds a child node, and displaying the plurality of alternative proposed annotations in response to a user deleting a child node associated with the user-identified portion of data;

wherein displaying a plurality of alternative proposed annotations comprises displaying a portion of the unit of data not covered by the proposed annotation, and displaying a plurality of alternative proposed annotations for the portion of data not covered by the proposed annotation;

wherein the user is enabled to select a segment of the portion of data not covered by the proposed annotation and wherein displaying alternative proposed annotations comprises displaying a plurality of one or more alternative proposed annotations for the user-selected segment; and

wherein the user is enabled to select one of the alternative proposed annotations from among the plurality of alternative proposed annotations, and the user-selected alternative proposed annotation is incorporated into the annotated training data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention uses a natural language understanding system that is currently being trained to assist in annotating training data for training that natural language understanding system. Unannotated training data is provided to the system and the system proposes annotations to the training data. The user is offered an opportunity to confirm or correct the proposed annotations, and the system is trained with the corrected or verified annotations.

64 Citations

View as Search Results

36 Claims

1. A method of generating annotated training data to train a natural language understanding (NLU) system having one or more models, comprising:
- generating a proposed annotation with the NLU system for each of one or more units of unannotated training data;
  
  displaying the proposed annotations for user verification or correction to obtain a user-confirmed annotation; and
  
  training the NLU system with the user-confirmed annotation; and
  
  displaying an indication of a volume of training data used to train a plurality of different portions of the one or more models of the natural language understanding system;
  
  wherein displaying the proposed annotations for user verification or correction comprises;
  
  receiving a user input indicative of a user-identified portion of the proposed annotation; and
  
  displaying a plurality of alternative proposed annotations for the user-identified portion;
  
  wherein the one or more models impose model constraints and wherein displaying the one or more alternative proposed annotations comprises displaying an alternative proposed annotation for the user-identified portion of data only if the alternative proposed annotation can lead to an overall annotation for the unit that is consistent with the model constraints;
  
  wherein the proposed annotation includes parent and child nodes and wherein displaying a plurality of alternative proposed annotations includes displaying a user actuable delete node input which, when actuated, deletes a child node, and a user actuable add node input which, when actuated, adds a child node, and displaying the plurality of alternative proposed annotations in response to a user deleting a child node associated with the user-identified portion of data;
  
  wherein displaying a plurality of alternative proposed annotations comprises displaying a portion of the unit of data not covered by the proposed annotation, and displaying a plurality of alternative proposed annotations for the portion of data not covered by the proposed annotation;
  
  wherein the user is enabled to select a segment of the portion of data not covered by the proposed annotation and wherein displaying alternative proposed annotations comprises displaying a plurality of one or more alternative proposed annotations for the user-selected segment; and
  
  wherein the user is enabled to select one of the alternative proposed annotations from among the plurality of alternative proposed annotations, and the user-selected alternative proposed annotation is incorporated into the annotated training data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The method of claim 1 and further comprising:
    - initializing one or more models in the NLU system.
  - 3. The method of claim 1 wherein each unit of the training data comprises a sentence.
  - 4. The method of claim 1 wherein each unit of the training data comprises a phrase.
  - 5. The method of claim 1 wherein each unit of the training data comprises multiple sentences.
  - 6. The method of claim 1 wherein each unit of the training data comprises a single word.
  - 7. The method of claim 1 wherein displaying the proposed annotation for verification or correction comprises:
    - generating a confidence metric for the proposed annotation; and
      
      visually contrasting a portion of the displayed proposed annotation based on the confidence metric.
  - 8. The method of claim 7 wherein visually contrasting comprises:
    - visually contrasting the portion of the displayed annotation that has an associated confidence metric that is below a threshold level.
  - 9. The method of claim 1 and further comprising:
    - prior to generating a proposed annotation, receiving a limiting user indication; and
      
      limiting natural language understanding processing used to generate the proposed annotation based on the limiting user indication.
  - 10. The method of claim 9 wherein limiting natural language understanding processing comprises:
    - limiting the natural language understanding processing to utilizing only user-identified portions of the one or more models.
  - 11. The method of claim 1 wherein displaying the proposed annotations comprises:
    - generating a confidence metric for each proposed annotation; and
      
      displaying the proposed annotations in an order based on the confidence metrics.
  - 12. The method of claim 11 wherein displaying the proposed annotations based on the confidence metrics comprises:
    - displaying the proposed annotations in order of ascending confidence metrics.
  - 13. The method of claim 11 wherein displaying the proposed annotations based on the confidence metrics comprises:
    - displaying the proposed annotations in order of descending confidence metrics.
  - 14. The method of claim 1 wherein displaying the proposed annotations comprises:
    - sorting the proposed annotations based on type of annotation.
  - 15. The method of claim 14 wherein displaying the proposed annotations comprises:
    - displaying similar types of annotations closely proximate one another.
  - 16. The method of claim 14 wherein displaying the proposed annotations comprises:
    - providing a user actuable input which, when actuated, allows the user to correct or verify similar types of annotations sequentially.
  - 17. The method of claim 1 wherein generating proposed annotations with the NLU system, comprises:
    - generating a plurality of proposed annotations for each unit using a plurality of different NLU subsystems.
  - 18. The method of claim 17 wherein generating proposed annotations further comprises:
    - choosing one of the proposed annotations for each unit to be displayed.

19. A method of generating annotated training data to train a natural language understanding (NLU) system having one or more models, comprising:
- generating a proposed annotation with the NLU system for each of one or more units of unannotated training data;
  
  displaying the proposed annotations for user verification or correction to obtain a user-confirmed annotation, comprising;
  
  displaying a plurality of alternative proposed annotations to data portions associated with a child node in response to that child node being deleted;
  
  wherein the user is enabled to select one of the alternative proposed annotations from among the plurality of alternative proposed annotations, and the user-selected alternative proposed annotation is incorporated into the annotated training data;
  
  training the NLU system with the user-confirmed annotation; and
  
  displaying an indication of a volume of training data used to train a plurality of different portions of the one or more models of the natural language understanding system, wherein displaying an indication of a volume of training data comprises;
  
  displaying a representation of the one or more models; and
  
  visually contrasting portions of the one or more models that have been trained with a threshold volume of training data.
- View Dependent Claims (20)
- - 20. The method of claim 19 wherein the threshold volume of training data is dynamic, based on one or more performance criteria for the one or more models.

21. A method of generating annotated training data to train a natural language understanding (NLU) system having one or more models, comprising:
- generating a proposed annotation with the NLU system for each of one or more units of unannotated training data;
  
  displaying the proposed annotations for user verification or correction to obtain a user-confirmed annotation;
  
  training the NLU system with the user-confirmed annotation;
  
  identifying inconsistencies between the user-confirmed annotation and prior annotations;
  
  displaying a user actuable delete node input which, when actuated, deletes a child node;
  
  displaying a user actuable add node input which, when actuated, adds a child node;
  
  displaying a plurality of alternative proposed annotations to data portions associated with a child node in response to that child node being deleted, such that the user is enabled to select one of the alternative proposed annotations from among the plurality of alternative proposed annotations, and the user-selected alternative proposed annotation is incorporated into the annotated training data; and
  
  displaying an indication of a volume of training data used to train a plurality of different portions of the one or more models of the natural language understanding system.
- View Dependent Claims (22)
- - 22. The method of claim 21 and further comprising:
    - if an inconsistency is identified, displaying the user-confirmed annotation visually contrasting inconsistent portions of the user-confirmed annotation.

23. A computing environment comprising a processor, the computing environment being configured to execute a user interface for training a natural language understanding (NLU) system that has one or more models, the user interface comprising:
- a first portion displaying a model display representative of the one or more models;
  
  a second portion displaying unannotated training inputs;
  
  one or more user-actuable inputs configured to be actuable by a user to indicate a user-selected one of the unannotated training inputs, the computing environment comprising a processor that is configured to receive the user-selected unannotated training inputs and provide an output comprising a plurality of proposed annotations for the user-selected unannotated training inputs;
  
  a third portion displaying the proposed annotations for a selected one of the unannotated training inputs;
  
  a fourth portion displaying a sample of the unannotated training input not covered by the proposed annotations;
  
  one or more user-actuable inputs configured to be actuable by a user to indicate a user-selected segment of the sample not covered, such that the input indicating the user-selected segment is received by the processor;
  
  a fifth portion displaying a plurality of alternative proposed annotations for the user-selected segment, provided by the processor in response to the input indicating the user-selected segment; and
  
  a sixth portion displaying an indication of a volume of training data used to train a plurality of different portions of the one or more models of the natural language understanding system;
  
  such that the fifth portion displaying the plurality of alternative proposed annotations further includes;
  
  displaying one or more user actuable alternative annotation node inputs;
  
  displaying a user actuable delete node input which, when actuated, deletes a child node;
  
  displaying a user actuable add node input which, when actuated, adds a child node;
  
  displaying a plurality of alternative proposed annotations to data portions associated with a child node in response to that child node being deleted; and
  
  enabling the user to select one of the alternative proposed annotations from among the plurality of alternative proposed annotations, and using the user-selected alternative proposed annotation for training the natural language understanding (NLU) system.
- View Dependent Claims (24, 25, 26, 27, 28)
- - 24. The computing environment of claim 23 wherein the NLU system includes one or more model constraints and wherein the fifth portion displays alternative proposed annotations for the identified portion only if those alternative proposed annotations are part of an overall annotation for the selected unannotated training input that is consistent with the model constraints.
  - 25. The computing environment of claim 23 wherein the NLU system calculates a confidence measure for each proposed annotation and wherein the second portion displays the unannotated training inputs in an order based on the confidence measures associated with proposed annotations corresponding to the training inputs.
  - 26. The computing environment of claim 25 wherein the second portion displays the unannotated training inputs in ascending order of the confidence measures.
  - 27. The computing environment of claim 23 and further comprising comprises:
    - a seventh portion displaying the selected one of the unannotated training inputs.
  - 28. The computing environment of claim 23 wherein the first portion displays the model display in the form of a dependency structure having parent and child nodes.

29. A method of generating annotated training data for training a natural language understanding (NLU) system having at least one model, comprising:
- generating a proposed annotation for a unit of unannotated training data;
  
  calculating a confidence measure for a plurality of different portions of the proposed annotation;
  
  displaying the proposed annotation by visually contrasting portions that have a corresponding confidence measure that falls below a threshold level;
  
  displaying user actuable inputs for user correction or verification of the proposed annotation, user actuable inputs comprising;
  
  one or more user actuable node inputs for annotation alternatives;
  
  a user actuable delete node input which, when actuated, deletes a child node; and
  
  a user actuable add node input which, when actuated, adds a child node;
  
  displaying a plurality of alternative proposed annotations to data portions associated with the child node in response to that child node being deleted, such that the user is enabled to select one of the alternative proposed annotations from among the plurality of alternative proposed annotations, and the user-selected alternative proposed annotation is incorporated into the annotated training data; and
  
  displaying an indication of a volume of training data used to train a plurality of different portions of the at least one model of the natural language understanding system.
- View Dependent Claims (30, 31, 32)
- - 30. The method of claim 29 wherein displaying comprises:
    - visually contrasting only a portion having the lowest confidence measure.
  - 31. The method of claim 29 wherein displaying comprises:
    - displaying user actuable inputs, actuable by the user to confirm or correct the proposed annotation to obtain an user-confirmed annotation.
  - 32. The method of claim 31 and further comprising:
    - training the model with the user-confirmed annotation.

33. A method of generating annotated training data for training a natural language understanding (NLU) system having at least one model, comprising:
- generating, with the NLU system, a proposed annotation for a unit of unannotated training data;
  
  displaying the proposed annotation with user actuable inputs for user correction or verification of the proposed annotation to obtain a user-confirmed annotation;
  
  training the model with the user-confirmed annotation; and
  
  checking for inconsistencies among user-confirmed annotation and data already used to train the model by determining whether the model accurately predicts the prior user-confirmed annotations;
  
  the user actuable inputs comprising;
  
  one or more user actuable node inputs for annotation alternatives;
  
  a user actuable delete node input which, when actuated, deletes a child node; and
  
  a user actuable add node input which, when actuated, adds a child node;
  
  the method further comprising;
  
  displaying a plurality of alternative proposed annotations to data portions associated with the child node in response to that child node being deleted, such that the user is enabled to select one of the alternative proposed annotations from among the plurality of alternative proposed annotations, and the user-selected alternative proposed annotation is incorporated into the annotated training data; and
  
  displaying an indication of a volume of training data used to train a plurality of different portions of the at least one model of the natural language understanding system.
- View Dependent Claims (34, 35, 36)
- - 34. The method of claim 33 and further comprising:
    - if an inconsistency is detected, re-displaying the user-confirmed annotations spawning the inconsistency.
  - 35. The method of claim 34 wherein re-displaying comprises:
    - displaying the user-confirmed annotations by visually contrasting portions in which the NLU system has a calculated confidence measure below a threshold.
  - 36. The method of claim 33 wherein checking comprises:
    - re-generating annotations with the NLU system for the unit of unannotated training data and comparing the re-generated annotations with the user-confirmed annotations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Acero, Alejandro, Wang, Ye-Yi, Wong, Leon
Primary Examiner(s)
Wozniak; James S

Application Number

US10/142,623
Publication Number

US 20030212544A1
Time in Patent Office

2,594 Days
Field of Search

704/1, 704/2, 704/9, 704/10, 704/257
US Class Current

704/9
CPC Class Codes

G06F 40/169 Annotation, e.g. comment da...

G06F 40/30 Semantic analysis

System for automatically annotating training data for a natural language understanding system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

64 Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

System for automatically annotating training data for a natural language understanding system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

64 Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links