Use of a unified language model

US 6,865,528 B1
Filed: 06/01/2000
Issued: 03/08/2005
Est. Priority Date: 06/01/2000
Status: Active Grant

First Claim

Patent Images

1. A language processing system comprising:

a unified language model comprising;

a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts and terminals, wherein each of the terminals include a probability value assigned by using non-uniform probability values derived from a terminal based language model and normalizing said values using the set of terminals constrained by the plurality of context-free grammars; and

a N-gram language model having the non-terminal tokens; and

a language processing module capable of receiving an input signal indicative of language and accessing the unified language model to recognize the language and predict non-terminal tokens contained therein, the language processing module further generating hypotheses for the received language as a function of words in the unified language model corresponding to the non-terminal tokens predicted and calculating a language model score for each of the hypotheses using the associated probability value for each terminal present therein and obtained from the plurality of context-free grammars.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A language processing system includes a unified language model. The unified language model comprises a plurality of context-free grammars having non-terminal tokens representing semantic or syntactic concepts and terminals, and an N-gram language model having non-terminal tokens. A language processing module capable of receiving an input signal indicative of language accesses the unified language model to recognize the language. The language processing module generates hypotheses for the received language as a function of words of the unified language model and/or provides an output signal indicative of the language and at least some of the semantic or syntactic concepts contained therein.

133 Citations

19 Claims

1. A language processing system comprising:
- a unified language model comprising;
  
  a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts and terminals, wherein each of the terminals include a probability value assigned by using non-uniform probability values derived from a terminal based language model and normalizing said values using the set of terminals constrained by the plurality of context-free grammars; and
  
  a N-gram language model having the non-terminal tokens; and
  
  a language processing module capable of receiving an input signal indicative of language and accessing the unified language model to recognize the language and predict non-terminal tokens contained therein, the language processing module further generating hypotheses for the received language as a function of words in the unified language model corresponding to the non-terminal tokens predicted and calculating a language model score for each of the hypotheses using the associated probability value for each terminal present therein and obtained from the plurality of context-free grammars.
- View Dependent Claims (2)
- - 2. The language processing system of claim 1 wherein the language processing module provides an output signal indicative of the language and at least some of the semantic or syntactic concepts contained therein.

3. A method for recognizing language and providing an output signal indicative thereof, the method comprising:
- receiving an input signal indicative of language;
  
  accessing a unified language model to recognize the language and predict non-terminal tokens contained therein, the unified language model comprising a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts and terminals, and a N-gram language model having the non-terminal tokens; and
  
  generating hypotheses for the language by exploring each of the terminals in the unified language model associated with the non-terminal tokens predicted based on a probability value for each terminal, wherein at least one terminal has a different probability value than one other terminal in the same context-free grammar.
- View Dependent Claims (4, 5, 6)
- - 4. The method of claim 3 wherein each of the terminals of the plurality of context-free grammars include a probability value, and wherein the method further comprises calculating a language model score for each of the hypotheses using the associated probability value for each terminal present therein and obtained from the plurality of context-free grammars.
  - 5. The method of claim 4 and further comprising:
    - assigning probability values of at least some of the terminals of the context-free grammars from a terminal-based language model and normalizing said values using the set of terminals constrained by the context-free grammars.
  - 6. The method of claim 3 and further comprising:
    - providing an output signal indicative of the language and at least some of the semantic or syntactic concepts contained therein.

7. A computer readable medium including instructions readable by a computer which, when implemented execute a method to perform language processing, the method comprising:
- receiving an input signal indicative of language;
  
  accessing a unified language model to recognize the language and predict non-terminal tokens contained therein, the unified language model comprising a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts and terminals wherein each of the terminals of the plurality of context-free grammars include a probability value, and a N-gram language model having the non-terminal tokens;
  
  assigning probability values of at least some of the terminals of the context-free grammars from a terminal-based language model, wherein at least one terminal has a probability value different than one other terminal in the same context-free grammar and normalizing said values using the set of terminals constrained by the context-free grammars;
  
  generating hypotheses for the language as a function of words in the unified language model corresponding to the non-terminal tokens predicted; and
  
  calculating a language model score for each of the hypotheses using the associated probability value for each terminal present therein and obtained from the plurality of context-free grammars.
- View Dependent Claims (8)
- - 8. The computer readable medium of claim 7 and further comprising:
    - providing an output signal indicative of the language and at least some of the semantic or syntactic concepts contained therein.

9. A language processing system comprising:
- a unified language model comprising;
  
  a topic identification context-free grammar comprising non-terminal tokens representing semantic or syntactic concepts related to actions to be performed using slots and a plurality of informational context-free grammars associated with the slots of the topic identification context-free grammar, each informational context-free grammar having terminals associated with a slot; and
  
  a N-gram language model having the non-terminal tokens; and
  
  a language processing module capable of receiving an input signal indicative of language and accessing the unified language model to recognize the language and predict an action associated with the topic identification context-free grammar and a terminal associated with one of the slots, the language processing module providing an output signal indicative of the language, the action and an indication of the informational context-free grammar having the terminal associated with one of the slots.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The language processing system of claim 9 wherein information of the output signal indicative of at least some of the semantic or syntactic concepts includes information indicative of the non-terminals.
  - 11. The language processing system of claim 9 wherein the semantic or syntactic concepts relate to at least one of an action, a subject and an object.
  - 12. The language processing system of claim 9 wherein the output signal comprises terminals and non-terminal tokens embedded therein.
  - 13. The language processing system of claim 9 wherein the output signal comprises a first output signal comprising terminals of the language and a second output signal comprising non-terminals tokens indicating terminals of the first output signal indicative of semantic or syntactic concepts.

14. A method for recognizing language and providing an output signal indicative thereof, the method comprising:
- receiving an input signal indicative of language;
  
  accessing a unified language model to recognize the language and predict non-terminal tokens contained therein, the unified language model comprising a plurality of related context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts and terminals, the plurality of related context-free grammars arranged in a hierarchical structure such that some of the non-terminal tokens of at least one of the plurality of the related context-free grammars are defined by another of the plurality of related context-free grammars and a N-gram language model having the non-terminal tokens; and
  
  providing an output signal indicative of the language and an indication of the plurality of related context-free grammars used in recognizing the language, wherein one of the used context-free grammars has a non-terminal token defined by another of the used context-free grammars.
- View Dependent Claims (15, 16)
- - 15. The method of claim 14 wherein information of the output signal indicative of at least some of the semantic or syntactic concepts includes information indicative of the non-terminals.
  - 16. The method of claim 14 wherein the semantic or syntactic concepts relate to at least one of an action, a subject and an object.

17. A computer readable medium including instructions readable by a computer which, when implemented execute a method to perform language processing, the method comprising:
- receiving an input signal indicative of language;
  
  accessing a unified language model to recognize the language and predict non-terminal tokens contained therein, the unified language model comprising;
  
  a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts and terminals, wherein some of the non-terminal tokens correspond to actions having a plurality of slots corresponding to information related to the action, the slots being defined by other context-free grammars; and
  
  a N-gram language model having the non-terminal tokens; and
  
  providing an output signal indicative of the language, one of the actions and information corresponding to the action that is associated with one of the plurality of slots.
- View Dependent Claims (18, 19)
- - 18. The computer readable medium of claim 17 wherein information of the output signal indicative of at least some of the semantic or syntactic concepts includes information indicative of the non-terminals.
  - 19. The computer readable medium of claim 17 wherein the semantic or syntactic concepts relate to at least one of an action, a subject and an object.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Mahajan, Milind V., Huang, Xuedong D., Wang, Ye-Yi, Mou, Xiaolong
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
HARPER, V PAUL

Application Number

US09/585,834
Time in Patent Office

1,741 Days
Field of Search

704/1, 704/10, 704/2, 704/226, 704/231, 704/251, 704/255, 704/3, 704/9, 704/257, 704/275, 709/201, 709/203
US Class Current

704/9
CPC Class Codes

G10L 15/193 Formal grammars, e.g. finit...

G10L 15/197 Probabilistic grammars, e.g...

Use of a unified language model

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

133 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Use of a unified language model

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

133 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links