COMPUTER-AIDED NATURAL LANGUAGE ANNOTATION

US 20090276380A1
Filed: 05/06/2009
Published: 11/05/2009
Est. Priority Date: 05/10/2002
Status: Active Grant

First Claim

Patent Images

1. A method, performed using one or more processors, of generating annotated training data for training a natural language understanding system, comprising:

generating, with the natural language understanding system running on one or more of the processors, a proposed annotation for each of a plurality of units of unannotated training data received via one or more input components;

calculating, with one or more of the processors, a confidence measure for each of the proposed annotations for a given unit of the training data;

displaying on an output component at least some of the proposed annotations in an order based on the confidence measures of the proposed annotations, with one or more input components providing one or more user-actuable inputs configured for verification of the proposed annotations and one or more user-actuable inputs configured for deletion of the proposed annotations;

responding, with one or more of the processors, to an input for verification of one of the proposed annotations by storing the verified annotation for the given unit of the training data; and

responding, with one or more of the processors, to an input for deletion of one of the proposed annotations by presenting on an output component at least some of the remaining proposed annotations in an order based on the confidence measures of the remaining proposed annotations.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention uses a natural language understanding system that is currently being trained to assist in annotating training data for training that natural language understanding system. Unannotated training data is provided to the system and the system proposes annotations to the training data. The user is offered an opportunity to confirm or correct the proposed annotations, and the system is trained with the corrected or verified annotations.

39 Citations

View as Search Results

14 Claims

1. A method, performed using one or more processors, of generating annotated training data for training a natural language understanding system, comprising:
- generating, with the natural language understanding system running on one or more of the processors, a proposed annotation for each of a plurality of units of unannotated training data received via one or more input components;
  
  calculating, with one or more of the processors, a confidence measure for each of the proposed annotations for a given unit of the training data;
  
  displaying on an output component at least some of the proposed annotations in an order based on the confidence measures of the proposed annotations, with one or more input components providing one or more user-actuable inputs configured for verification of the proposed annotations and one or more user-actuable inputs configured for deletion of the proposed annotations;
  
  responding, with one or more of the processors, to an input for verification of one of the proposed annotations by storing the verified annotation for the given unit of the training data; and
  
  responding, with one or more of the processors, to an input for deletion of one of the proposed annotations by presenting on an output component at least some of the remaining proposed annotations in an order based on the confidence measures of the remaining proposed annotations.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1 in which displaying comprises:
    - displaying the proposed annotations in order based on ascending value of the corresponding confidence measures.
  - 3. The method of claim 1 in which displaying comprises:
    - displaying the proposed annotations in order based on descending value of the corresponding confidence measures.
  - 4. The method of claim 1 in which re-displaying comprises:
    - displaying the user-confirmed annotations by visually contrasting portions in which the NLU system has a calculated confidence measure below a threshold.
  - 5. The method of claim 1 and further comprising:
    - training the NLU system with the user-selected annotation.

6. A method, performed using one or more processors, of generating annotated training data for training a natural language understanding (NLU) system, comprising:
- generating, with the NLU system running on one or more of the processors, a proposed annotation for each of a plurality of units of unannotated training data, each proposed annotation having a type;
  
  displaying on an output component at least some of the proposed annotations in an order based on the type of each of the proposed annotations, with one or more input components providing one or more user-actuable inputs configured for verification of the proposed annotations and one or more user-actuable inputs configured for deletion of the proposed annotations;
  
  responding, with one or more of the processors, to an input for verification of one of the proposed annotations by storing the verified annotation for the given unit of the training data; and
  
  responding, with one or more of the processors, to an input for deletion of one of the proposed annotations by presenting at least some of the remaining proposed annotations in an order based on the type of each of the remaining proposed annotations.
- View Dependent Claims (7, 8, 9)
- - 7. The method of claim 6 in which displaying comprises:
    - sorting the proposed annotations based on type; and
      
      displaying the proposed annotations with those of a similar type grouped together.
  - 8. The method of claim 6 in which re-displaying comprises:
    - displaying the user-confirmed annotations by visually contrasting portions in which the NLU system has a calculated confidence measure below a threshold.
  - 9. The method of claim 6 and further comprising:
    - training the NLU system with the user-verified annotations.

10. A method, performed using one or more processors, of generating annotated training data for training a natural language understanding (NLU) system employing a plurality of different natural language training techniques, comprising:
- generating, with the natural language understanding system running on one or more of the one or more processors, a different proposed annotation with each of a plurality of the natural language training techniques for a unit of unannotated training data, to obtain a plurality of proposed annotations generated by the natural language training techniques used for that unit of unannotated training data;
  
  displaying on an output component one or more of the proposed annotations, with user actuable inputs for user rejection or user selection of each of the displayed proposed annotations;
  
  responding, with one or more of the one or more processors, to an input for user rejection of one of the proposed annotations by displaying on an output component one or more of any remaining proposed annotations; and
  
  responding, with one or more of the processors, to an input for user selection of one of the proposed annotations by storing the selected annotation for the given unit of the training data.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The method of claim 10 in which displaying comprises:
    - displaying the proposed annotations with corresponding indications of the natural language training technique with which one or more of the proposed annotations were generated.
  - 12. The method of claim 10 in which displaying the proposed annotations comprises:
    - sorting the proposed annotations based on the natural language training techniques with which the proposed annotations were generated; and
      
      displaying the proposed annotations with those generated with the same natural language training techniques grouped together.
  - 13. The method of claim 10 in which displaying comprises:
    - displaying the proposed annotations in order based on ascending value of corresponding confidence measures.
  - 14. The method of claim 10 and further comprising:
    - training the NLU system with the user-selected annotations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Acero, Alejandro, Wang, Ye-Yi, Wong, Leon

Granted Patent

US 7,983,901 B2
Time in Patent Office

Days
Field of Search
US Class Current

706/12
CPC Class Codes

G06F 40/169 Annotation, e.g. comment da...

G06F 40/30 Semantic analysis

COMPUTER-AIDED NATURAL LANGUAGE ANNOTATION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

39 Citations

14 Claims

Specification

Use Cases

Quick Links

Others

COMPUTER-AIDED NATURAL LANGUAGE ANNOTATION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

39 Citations

14 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others