Methods and systems for natural language understanding using human knowledge and collected data

US 9,792,904 B2
Filed: 07/23/2014
Issued: 10/17/2017
Est. Priority Date: 07/25/2005
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

operating, via a processor of a computing device, a natural language understanding application with a statistical model that generates a sequence of tags assigned to a first sequence of words;

developing a first part without human knowledge, the first part being first data used to formulate a replacement statistical model;

developing a second part using the human knowledge and annotated words, the second part being a second data used to formulate the replacement statistical model;

assigning weighted versions of the first part and the second part to yield the replacement statistical model, wherein the weighted versions are determined based on an amount of annotated data that is available; and

during execution of the natural language understanding application in which a second sequences of words is received as speech for processing using the natural language understanding application, assigning, via the processor of the computer device executing the natural language understanding application, a new sequence of tags to the second sequence of words, the new sequence of tags being generated by the replacement statistical model.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems and methods to incorporate human knowledge when developing and using statistical models for natural language understanding. The disclosed systems and methods embrace a data-driven approach to natural language understanding which progresses seamlessly along the continuum of availability of annotated collected data, from when there is no available annotated collected data to when there is any amount of annotated collected data.

Citations

20 Claims

1. A method comprising:
- operating, via a processor of a computing device, a natural language understanding application with a statistical model that generates a sequence of tags assigned to a first sequence of words;
  
  developing a first part without human knowledge, the first part being first data used to formulate a replacement statistical model;
  
  developing a second part using the human knowledge and annotated words, the second part being a second data used to formulate the replacement statistical model;
  
  assigning weighted versions of the first part and the second part to yield the replacement statistical model, wherein the weighted versions are determined based on an amount of annotated data that is available; and
  
  during execution of the natural language understanding application in which a second sequences of words is received as speech for processing using the natural language understanding application, assigning, via the processor of the computer device executing the natural language understanding application, a new sequence of tags to the second sequence of words, the new sequence of tags being generated by the replacement statistical model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein developing of the second part further comprises using a language model executor.
  - 3. The method of claim 2, wherein the language model executor is configured in run time to output the sequence of tags for an inputted sequence of words by using a statistical classifier model and a language model.
  - 4. The method of claim 1, wherein the new sequence of tags comprises tags from a predefined set of tags relating to different types of named entities.
  - 5. The method of claim 1, wherein developing of the replacement statistical model further comprises:
    - enumerating a phrase based on human knowledge for each tag in a predetermined set of possible tags for the natural language understanding application, to yield an enumerated phrase; and
      
      developing a language model for each tag in the predetermined set of tags based on the enumerated phrase.
  - 6. The method of claim 1, further comprising replacing the statistical model with the replacement statistical model.
  - 7. The method of claim 1, further comprising determining when the replacement statistical model is sufficiently different than the statistical model to replace the statistical model.
  - 8. The method of claim 7, wherein determining when the replacement statistical model is sufficiently different than the statistical model comprises evaluating if a sufficiently large body of annotated collected data has been collected.

9. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising;
  
  operating a natural language understanding application with a statistical model that generates a first sequence of tags assigned to a sequence of words;
  
  developing a first part without human knowledge, the first part being first data used to formulate a replacement statistical model;
  
  developing a second part using the human knowledge and annotated words, the second part being a second data used to formulate the replacement statistical model;
  
  assigning weighted versions of the first part and the second part to yield the replacement statistical model, wherein the weighted version are determined based on an amount of annotated data that is available; and
  
  during execution of the natural language understanding application in which a second sequences of words is received as speech for processing using the natural language understanding application, assigning a new sequence of tags to the second sequence of words, the new sequence of tags being generated by the replacement statistical model.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein developing of the second part further comprises using a language model executor.
  - 11. The system of claim 10, wherein the language model executor is configured in run time to output the new sequence of tags for an inputted sequence of words by using a statistical classifier model and a language model.
  - 12. The system of claim 9, wherein the new sequence of tags comprises tags from a predefined set of tags relating to different types of named entities.
  - 13. The system of claim 9, wherein developing of the replacement statistical model further comprises:
    - enumerating a phrase based on human knowledge for each tag in a predetermined set of possible tags for the natural language understanding application, to yield an enumerated phrase; and
      
      developing a language model for each tag in the predetermined set of tags based on the enumerated phrase.
  - 14. The system of claim 9, the computer-readable storage medium having additional instructions stored which result in operations comprising replacing the statistical model with the replacement statistical model.
  - 15. The system of claim 9, the computer-readable storage medium having additional instructions stored which result in operations comprising determining when the replacement statistical model is sufficiently different than the statistical model to replace the statistical model.
  - 16. The system of claim 15, wherein determining when the replacement statistical model is sufficiently different than the statistical model comprises evaluating if a sufficiently large body of annotated collected data has been collected.

17. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
- operating a natural language understanding application with a statistical model that generates a first sequence of tags assigned to a sequence of words;
  
  developing a first part without human knowledge, the first part being first data used to formulate a replacement statistical model;
  
  developing a second part using the human knowledge and annotated words, the second part being a second data used to formulate the replacement statistical model;
  
  assigning weighted versions of the first part and the second part to yield the replacement statistical model, wherein the weighted version are determined based on an amount of annotated data that is available; and
  
  during execution of the natural language understanding application in which a second sequences of words is received as speech for processing using the natural language understanding application, assigning a new sequence of tags to the second sequence of words, the new sequence of tags being generated by the replacement statistical model.
- View Dependent Claims (18, 19, 20)
- - 18. The computer-readable storage device of claim 17, wherein developing of the second part further comprises using a language model executor.
  - 19. The computer-readable storage device of claim 18, wherein the language model executor is configured in run time to output the new sequence of tags for an inputted sequence of words by using a statistical classifier model and a language model.
  - 20. The computer-readable storage device of claim 17, wherein the new sequence of tags comprises tags from a predefined set of tags relating to different types of named entities.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Bangalore, Srinivas, Gilbert, Mazin, Gupta, Narendra K.
Primary Examiner(s)
Shah, Paras D

Application Number

US14/338,602
Publication Number

US 20140330555A1
Time in Patent Office

1,182 Days
Field of Search

704 1- 10, 704231, 704257, 715231, 715233, 715256
US Class Current
CPC Class Codes

G06F 40/44   Statistical methods, e.g. p...

G10L 15/14   using statistical models, e...

G10L 15/183   using context dependencies,...

G10L 15/19   Grammatical context, e.g. d...

Methods and systems for natural language understanding using human knowledge and collected data

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and systems for natural language understanding using human knowledge and collected data

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links