Methods and systems for natural language understanding using human knowledge and collected data

US 8,433,558 B2
Filed: 07/25/2005
Issued: 04/30/2013
Est. Priority Date: 07/25/2005
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

developing, via a processor, a statistical model for a natural language understanding application using human knowledge exclusive of any annotated data;

during a first execution of the natural language understanding application via the processor;

receiving a sequence of words to yield a received sequence of words;

assigning a sequence of tags to the received sequence of words by using the statistical model, to yield annotated words; and

when sufficient annotated words become available, updating the statistical model;

developing a replacement statistical model for the natural language understanding application using the annotated words by;

developing a first part of the replacement model without the human knowledge;

developing a second part of the replacement model from both the human knowledge and the annotated words; and

when the received and subsequently annotated data is sufficient, weighting a contribution from the first part more to the assigning than when the received and subsequently annotated data is insufficient; and

during a second execution of the natural language understanding application, receiving a second sequence of words and assigning a second sequence of tags to the second sequence of words by using the replacement statistical model.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems and methods to incorporate human knowledge when developing and using statistical models for natural language understanding. The disclosed systems and methods embrace a data-driven approach to natural language understanding which progresses seamlessly along the continuum of availability of annotated collected data, from when there is no available annotated collected data to when there is any amount of annotated collected data.

Citations

12 Claims

1. A method comprising:
- developing, via a processor, a statistical model for a natural language understanding application using human knowledge exclusive of any annotated data;
  
  during a first execution of the natural language understanding application via the processor;
  
  receiving a sequence of words to yield a received sequence of words;
  
  assigning a sequence of tags to the received sequence of words by using the statistical model, to yield annotated words; and
  
  when sufficient annotated words become available, updating the statistical model;
  
  developing a replacement statistical model for the natural language understanding application using the annotated words by;
  
  developing a first part of the replacement model without the human knowledge;
  
  developing a second part of the replacement model from both the human knowledge and the annotated words; and
  
  when the received and subsequently annotated data is sufficient, weighting a contribution from the first part more to the assigning than when the received and subsequently annotated data is insufficient; and
  
  during a second execution of the natural language understanding application, receiving a second sequence of words and assigning a second sequence of tags to the second sequence of words by using the replacement statistical model.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein developing the replacement statistical model comprises:
    - developing the replacement model using both the human knowledge and the annotated words.
  - 3. The method of claim 1, wherein the developing comprises:
    - enumerating a phrase based on human knowledge for each tag in a predetermined set of possible tags for the natural language understanding application, to yield an enumerated phrase; and
      
      using the enumerated phrase to develop one language model for each tag in the predetermined set of possible tags.
  - 4. The method of claim 1, wherein the sequence of tags comprises tags from a predefined set of tags relating to different types of named entities.

5. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, result in the processor performing a method comprising;
  
  during a first execution of the natural language understanding application;
  
  receiving a sequence of words to yield a received sequence of words;
  
  assigning a sequence of tags to the received sequence of words by using the statistical model, to yield annotated words; and
  
  when sufficient annotated words become available, updating the statistical model;
  
  developing a replacement statistical model for the natural language understanding application using the annotated words by;
  
  developing a first part of the replacement model without the human knowledge;
  
  developing a second part of the replacement model from both the human knowledge and the annotated words; and
  
  when the received and subsequently annotated data is sufficient, weighting a contribution from the first part more to the assigning than when the received and subsequently annotated data is insufficient andduring a second execution of the natural language understanding application, receiving a second sequence of words and assigning a second sequence of tags to the second sequence of words by using the replacement statistical model.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
- - 6. The system of claim 5, wherein the first part and the second part are determined empirically when there is sufficient available data received and subsequently annotated.
  - 7. The system of claim 5, the computer-readable storage medium having additional instructions stored which result in the method further comprising:
    - when no data has been previously received, the predetermined first part is zero and the assigning assigns the sequence of tags by using only the second part of the statistical model.
  - 8. The system of claim 5, wherein the first part comprises a projection based Markov Model configured to provide a probability of a tag being assigned to a word in the received sequence of words.
  - 9. The system of claim 5, wherein the second part comprises a language model configured to provide a probability of a word occurring which is associated with a predetermined tag based on a previous word.
  - 10. The system of claim 5, wherein the first part of the statistical model is developed to model (P(t_i|f(w_i), t_i. . . t_i-1) and the second part is developed to model P(w_i|w_i-n, . . . w_i-1, t_i-n. . . t_i), wherein variable t represents tags.
  - 11. The system of claim 5, wherein the sequences of tags comprises tags from a predefined set of tags relating to different types of named entities.
  - 12. The system of claim 5, wherein assigning comprises use of a dynamically programmed model executor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Gupta, Narendra K., Bangalore, Srinivas, Gilbert, Mazin
Primary Examiner(s)
Shah, Paras D

Application Number

US11/188,825
Publication Number

US 20070033004A1
Time in Patent Office

2,836 Days
Field of Search

704 1- 10, 715/233, 715/231, 715/256
US Class Current

704/9
CPC Class Codes

G06F 40/44   Statistical methods, e.g. p...

G10L 15/14   using statistical models, e...

G10L 15/183   using context dependencies,...

G10L 15/19   Grammatical context, e.g. d...

Methods and systems for natural language understanding using human knowledge and collected data

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and systems for natural language understanding using human knowledge and collected data

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links