Methods and systems for natural language understanding using human knowledge and collected data

US 20070033004A1
Filed: 07/25/2005
Published: 02/08/2007
Est. Priority Date: 07/25/2005
Status: Active Grant

First Claim

Patent Images

1. A method of natural language understanding, comprising:

developing a statistical model for a natural language understanding application using human knowledge exclusive of any data that is collected during execution of said application; and

during execution of said application receiving a sequence of words and assigning a sequence of tags to said received sequence of words by using said developed model.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems and methods to incorporate human knowledge when developing and using statistical models for natural language understanding. The disclosed systems and methods embrace a data-driven approach to natural language understanding which progresses seamlessly along the continuum of availability of annotated collected data, from when there is no available annotated collected data to when there is any amount of annotated collected data.

Citations

20 Claims

1. A method of natural language understanding, comprising:
- developing a statistical model for a natural language understanding application using human knowledge exclusive of any data that is collected during execution of said application; and
  
  during execution of said application receiving a sequence of words and assigning a sequence of tags to said received sequence of words by using said developed model.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, further comprising:
    - annotating said received sequence of words;
      
      developing a replacement statistical model for said natural language understanding application using at least said annotated received sequence of words; and
      
      during execution of said application, receiving a sequence of words and assigning a sequence of tags to said received sequence of words by using said developed replacement model.
  - 3. The method of claim 2, wherein said developing said replacement model includes:
    - developing said replacement model using both human knowledge and data received during execution of said application and subsequently annotated.
  - 4. The method of claim 2, wherein said developing said replacement model includes:
    - developing a first part of said replacement model without human knowledge and developing a second part of said replacement model from both human knowledge and data received during execution of said application and subsequently annotated, and wherein said using said developed replacement model includes;
      
      if said received and subsequently annotated data is sufficient, allowing said first part to contribute more to said assigning than if said received and subsequently annotated data is insufficient.
  - 5. The method of claim 1, wherein said developing includes:
    - enumerating from human knowledge at least one phrase related to each tag in a predetermined set of possible tags for said application and using said enumerated phrases to develop one language model for said each tag.
  - 6. The method of claim 1, wherein said sequence of tags includes tags from a predefined set of tags relating to different types of named entities.

7. A system for natural language understanding, comprising:
- means for receiving sequences of words;
  
  means for developing a statistical model for natural language understanding using human knowledge and optionally using data previously received by said receiving means and subsequently annotated; and
  
  means, using said developed statistical model, for assigning sequences of tags to sequences of words received by said receiving means.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 8. The system of claim 7, wherein said means for developing a statistical model includes:
    - first means for developing a first part of said statistical model using data received by said receiving means and subsequently annotated, and second means for developing a second part of said statistical model using at least one selected from a group consisting of;
      
      human knowledge and data received by said receiving means and subsequently annotated.
  - 9. The system of claim 8, wherein said means for assigning by using said developed statistical model includes means for weighting a contribution from said first part in accordance with a predetermined first proportion and means for weighting a contribution from said second part in accordance with a predetermined second proportion.
  - 10. The system of claim 9, wherein said first proportion and said second proportion are determined empirically when there is sufficient available data received by said receiving means and subsequently annotated.
  - 11. The system of claim 9, wherein when no data has been previously received by said receiving means, said predetermined first proportion is zero and said means for assigning by using said developed statistical model assigns said sequence of tags by using only said second part of said statistical model.
  - 12. The system of claim 8, wherein said first part includes at least one Projection based Markov Model configured to provide a probability or function thereof of a tag being assigned to a word in said received sequence based on at least one feature of said word and at least one tag assigned to at least one previous word.
  - 13. The system of claim 8, wherein said second part includes at least one language model configured to provide a probability or function thereof of a word occurring which is associated with a predetermined tag based on at least one previous word.
  - 14. The system of claim 8, wherein said first part of said statistical model is developed to model (P(t_i|f(w_i), t_i-n. . . t_i-1) and said second part is developed to model P(w_i|w_i-n, . . . w_i-1, t_i-n. . . t_i).
  - 15. The system of claim 7, wherein said sequences of tags includes tags from a predefined set of tags relating to different types of named entities.
  - 16. The system of claim 7, wherein said means for assigning includes a dynamically programmed model executor.

17. A system for natural language understanding, comprising:
- a language model building tool configured to use tag-related phrases to build at least one n-gram language model, wherein said phrases are obtained from at least one selected from a group consisting of;
  
  human knowledge and annotated collected data;
  
  a statistical classifier training tool configured to train a classifier model using a body of annotated collected data to model the dependency of a tag for a word on at least one feature of said word and on at least one tag of at least one previous word; and
  
  a model executor configured in run time to output a sequence of tags for an inputted sequence of words by using said statistical classifier model and said at least one language model in accordance with predetermined proportions.
- View Dependent Claims (18, 19, 20)
- - 18. The system of claim 17, wherein said sequence of tags includes tags from a predefined set of tags relating to different types of named entities.
  - 19. The system of claim 17, wherein if no annotated collected data is available, said classifier model is not trained, said predetermined proportion corresponding to said classifier model is zero, and said model executor uses only said at least one language model.
  - 20. The system of claim 17, wherein if sufficient annotated collected data is available, said classifier model is trained and said proportions are determined empirically.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Gilbert, Mazin, Bangalore, Srinivas, Gupta, Narendra

Granted Patent

US 8,433,558 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 40/44   Statistical methods, e.g. p...

G10L 15/14   using statistical models, e...

G10L 15/183   using context dependencies,...

G10L 15/19   Grammatical context, e.g. d...

Methods and systems for natural language understanding using human knowledge and collected data

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and systems for natural language understanding using human knowledge and collected data

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links