Methods and systems for natural language understanding using human knowledge and collected data

US 8,798,990 B2
Filed: 04/30/2013
Issued: 08/05/2014
Est. Priority Date: 07/25/2005
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

developing, via a processor, a statistical model for a natural language understanding application exclusive of annotated data;

during first execution of the natural language understanding application, assigning a sequence of tags to a first received sequence of words using the statistical model, to yield annotated words;

developing a replacement statistical model for the natural language understanding application using the annotated words, wherein developing the replacement statistical model comprises;

developing a first part without human knowledge;

developing a second part using the human knowledge and the annotated words; and

assigning weighted versions of the first part and the second part to the replacement statistical model, wherein the weighted versions are determined based on an amount of annotated data that is available; and

during a second execution of the natural language understanding application, assigning a second sequence of tags to a second received sequence of words by using the replacement statistical model.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems and methods to incorporate human knowledge when developing and using statistical models for natural language understanding. The disclosed systems and methods embrace a data-driven approach to natural language understanding which progresses seamlessly along the continuum of availability of annotated collected data, from when there is no available annotated collected data to when there is any amount of annotated collected data.

29 Citations

View as Search Results

17 Claims

1. A method comprising:
- developing, via a processor, a statistical model for a natural language understanding application exclusive of annotated data;
  
  during first execution of the natural language understanding application, assigning a sequence of tags to a first received sequence of words using the statistical model, to yield annotated words;
  
  developing a replacement statistical model for the natural language understanding application using the annotated words, wherein developing the replacement statistical model comprises;
  
  developing a first part without human knowledge;
  
  developing a second part using the human knowledge and the annotated words; and
  
  assigning weighted versions of the first part and the second part to the replacement statistical model, wherein the weighted versions are determined based on an amount of annotated data that is available; and
  
  during a second execution of the natural language understanding application, assigning a second sequence of tags to a second received sequence of words by using the replacement statistical model.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein developing of the replacement statistical model comprises using both human knowledge and the annotated words.
  - 3. The method of claim 1, wherein the sequence of tags comprises tags from a predefined set of tags relating to different types of named entities.
  - 4. The method of claim 1, wherein developing of the replacement statistical model comprises:
    - enumerating a phrase based on human knowledge for each tag in a predetermined set of possible tags for the natural language understanding application, to yield an enumerated phrase; and
      
      developing a language model for each tag in the predetermined set of tags based on the enumerated phrase.
  - 5. The method of claim 1, further comprising replacing the statistical model with the replacement statistical model.
  - 6. The method of claim 1, further comprising determining if the replacement statistical model is sufficiently different than the statistical model to replace the statistical model.

7. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, result in the processor performing operations comprising;
  
  developing a statistical model for a natural language understanding application exclusive of annotated data;
  
  during first execution of the natural language understanding application, assigning a sequence of tags to a first received sequence of words using the statistical model, to yield annotated words;
  
  developing a replacement statistical model for the natural language understanding application using the annotated words, wherein developing the replacement statistical model comprises;
  
  developing a first part without human knowledge;
  
  developing a second part using the human knowledge and the annotated words; and
  
  assigning weighted versions of the first part and the second part to the replacement statistical model, wherein the weighted versions are determined based on an amount of annotated data that is available; and
  
  during a second execution of the natural language understanding application, assigning a second sequence of tags to a second received sequence of words by using the replacement statistical model.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, wherein developing of the replacement statistical model comprises using both human knowledge and the annotated words.
  - 9. The system of claim 7, wherein the sequence of tags comprises tags from a predefined set of tags relating to different types of named entities.
  - 10. The system of claim 7, wherein developing of the replacement statistical model comprises:
    - enumerating a phrase based on human knowledge for each tag in a predetermined set of possible tags for the natural language understanding application, to yield an enumerated phrase; and
      
      developing a language model for each tag in the predetermined set of tags based on the enumerated phrase.
  - 11. The system of claim 7, the computer-readable storage medium having additional instructions stored which result in the operations further comprising replacing the statistical model with the replacement statistical model.
  - 12. The system of claim 7, the computer-readable storage medium having additional instructions stored which result in the operations further comprising determining if the replacement statistical model is sufficiently different than the statistical model to replace the statistical model.

13. A non-transitory computer readable storage medium having instructions stored which, when executed by a computing device, result in the computing device performing operations comprising:
- developing a statistical model for a natural language understanding application exclusive of annotated data;
  
  during first execution of the natural language understanding application, assigning a sequence of tags to a first received sequence of words using the statistical model, to yield annotated words;
  
  developing a replacement statistical model for the natural language understanding application using the annotated words, wherein developing the replacement statistical model comprises;
  
  developing a first part without human knowledge;
  
  developing a second part using the human knowledge and the annotated words; and
  
  assigning weighted versions of the first part and the second part to the replacement statistical model, wherein the weighted versions are determined based on an amount of annotated data that is available; and
  
  during a second execution of the natural language understanding application, assigning a second sequence of tags to a second received sequence of words by using the replacement statistical model.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The speech recognizer of claim 13, wherein developing of the replacement statistical model comprises using both human knowledge and the annotated words.
  - 15. The speech recognizer of claim 13, wherein the sequence of tags comprises tags from a predefined set of tags relating to different types of named entities.
  - 16. The speech recognizer of claim 13, wherein developing of the replacement statistical model comprises:
    - enumerating a phrase based on human knowledge for each tag in a predetermined set of possible tags for the natural language understanding application, to yield an enumerated phrase; and
      
      developing a language model for each tag in the predetermined set of tags based on the enumerated phrase.
  - 17. The speech recognizer of claim 13, the speech recognizer having additional instructions stored which result in the operations further comprising replacing the statistical model with the replacement statistical model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property II LP (AT&T, Inc.)
Inventors
Bangalore, Srinivas, Gilbert, Mazin, Gupta, Narendra K.
Primary Examiner(s)
Shah, Paras D

Application Number

US13/873,548
Publication Number

US 20130311170A1
Time in Patent Office

462 Days
Field of Search

704 1- 10, 704/231, 704/257, 715/233, 715/231, 715/256
US Class Current

704/9
CPC Class Codes

G06F 40/44   Statistical methods, e.g. p...

G10L 15/14   using statistical models, e...

G10L 15/183   using context dependencies,...

G10L 15/19   Grammatical context, e.g. d...

Methods and systems for natural language understanding using human knowledge and collected data

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

29 Citations

17 Claims

Specification

Use Cases

Quick Links

Others

Methods and systems for natural language understanding using human knowledge and collected data

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

29 Citations

17 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others