Use of a unified language model

US 20050080615A1
Filed: 12/03/2004
Published: 04/14/2005
Est. Priority Date: 06/01/2000
Status: Active Grant

First Claim

Patent Images

1. A computer readable medium including instructions readable by a computer which, when implemented execute a method to perform language processing for recognizing language and providing an output signal indicative thereof, the method comprising:

receiving an input signal indicative of language;

accessing a unified language model to recognize the language and predict non-terminal tokens contained therein, the unified language model comprising a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts and terminals, and a N-gram language model having the non-terminal tokens; and

generating hypotheses for the language by exploring each of the terminals in the unified language model associated with the non-terminal tokens predicted based on a probability value for each terminal, wherein at least one terminal has a different probability value than one other terminal in the same context-free grammar.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A language processing system includes a unified language model. The unified language model comprises a plurality of context-free grammars having non-terminal tokens representing semantic or syntactic concepts and terminals, and an N-gram language model having non-terminal tokens. A language processing module capable of receiving an input signal indicative of language accesses the unified language model to recognize the language. The language processing module generates hypotheses for the received language as a function of words of the unified language model and/or provides an output signal indicative of the language and at least some of the semantic or syntactic concepts contained therein.

Citations

18 Claims

1. A computer readable medium including instructions readable by a computer which, when implemented execute a method to perform language processing for recognizing language and providing an output signal indicative thereof, the method comprising:
- receiving an input signal indicative of language;
  
  accessing a unified language model to recognize the language and predict non-terminal tokens contained therein, the unified language model comprising a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts and terminals, and a N-gram language model having the non-terminal tokens; and
  
  generating hypotheses for the language by exploring each of the terminals in the unified language model associated with the non-terminal tokens predicted based on a probability value for each terminal, wherein at least one terminal has a different probability value than one other terminal in the same context-free grammar.
- View Dependent Claims (2, 3, 4, 14)
- - 2. The computer readable medium of claim 1 wherein each of the terminals of the plurality of context-free grammars include a probability value, and wherein the method further comprises calculating a language model score for each of the hypotheses using the associated probability value for each terminal present therein and obtained from the plurality of context-free grammars.
  - 3. The computer readable medium of claim 2 and further comprising:
    - assigning probability values of at least some of the terminals of the context-free grammars from a terminal-based language model and normalizing said values using the set of terminals constrained by the context-free grammars.
  - 4. The computer readable medium of claim 1 and further comprising:
    - providing an output signal indicative of the language and at least some of the semantic or syntactic concepts contained therein.
  - 14. The computer readable medium of claim 3 wherein the language processing module provides an output signal indicative of the language and at least some of the semantic or syntactic concepts contained therein.

5. A language processing system comprising:
- a unified language model comprising;
  
  a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts and terminals; and
  
  a N-gram language model having the non-terminal tokens; and
  
  a language processing module capable of receiving an input signal indicative of language and accessing the unified language model to recognize the language and predict non-terminal tokens contained therein, the language processing module further adapted to generate hypotheses for the language by exploring each of the terminals in the unified language model associated with the non-terminal tokens predicted based on a probability value for each terminal, wherein at least one terminal has a different probability value than one other terminal in the same context-free grammar.
- View Dependent Claims (6, 7, 8)
- - 6. The system of claim 5 wherein each of the terminals of the plurality of context-free grammars include a probability value, and wherein the language processing module is further adapted to calculate a language model score for each of the hypotheses using the associated probability value for each terminal present therein and obtained from the plurality of context-free grammars.
  - 7. The system of claim 5 wherein the language processing module is further adapted to assign probability values of at least some of the terminals of the context-free grammars from a terminal-based language model and normalize said values using the set of terminals constrained by the context-free grammars.
  - 8. The system of claim 5 wherein the language processing module is further adapted to provide an output signal indicative of the language and at least some of the semantic or syntactic concepts contained therein.

9. A method to perform language processing comprising:
- receiving an input signal indicative of language;
  
  accessing a unified language model to recognize the language and predict non-terminal tokens contained therein, the unified language model comprising a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts and terminals wherein each of the terminals of the plurality of context-free grammars include a probability value, and a N-gram language model having the non-terminal tokens;
  
  assigning probability values of at least some of the terminals of the context-free grammars from a terminal-based language model, wherein at least one terminal has a probability value different than one other terminal in the same context-free grammar and normalizing said values using the set of terminals constrained by the context-free grammars;
  
  generating hypotheses for the language as a function of words in the unified language model corresponding to the non-terminal tokens predicted; and
  
  calculating a language model score for each of the hypotheses using the associated probability value for each terminal present therein and obtained from the plurality of context-free grammars.
- View Dependent Claims (10)
- - 10. The method of claim 9 and further comprising:
    - providing an output signal indicative of the language and at least some of the semantic or syntactic concepts contained therein.

11. A language processing system comprising:
- a unified language model comprising;
  
  a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts and terminals, wherein each of the terminals of the plurality of context-free grammars include a probability value; and
  
  a N-gram language model having the non-terminal tokens; and
  
  a language processing module capable of receiving an input signal indicative of language and accessing the unified language model to recognize the language and predict non-terminal tokens contained therein, the language processing module further adapted to assign probability values of at least some of the terminals of the context-free grammars from a terminal-based language model, wherein at least one terminal has a probability value different than one other terminal in the same context-free grammar and normalize said values using the set of terminals constrained by the context-free grammars, and adapted to generate hypotheses for the language as a function of words in the unified language model corresponding to the non-terminal tokens predicted and calculate a language model score for each of the hypotheses using the associated probability value for each terminal present therein and obtained from the plurality of context-free grammars.
- View Dependent Claims (12)
- - 12. The system of claim 11 wherein the language processing module is further adapted to provide an output signal indicative of the language and at least some of the semantic or syntactic concepts contained therein.

13. A computer readable medium having instructions to process information, the instructions comprising:
- a unified language model comprising;
  
  a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts and terminals, wherein each of the terminals include a probability value assigned by using non-uniform probability values derived from a terminal based language model and normalizing said values using the set of terminals constrained by the plurality of context-free grammars; and
  
  a N-gram language model having the non-terminal tokens; and
  
  a language processing module capable of receiving an input signal indicative of language and accessing the unified language model to recognize the language and predict non-terminal tokens contained therein, the language processing module further generating hypotheses for the received language as a function of words in the unified language model corresponding to the non-terminal tokens predicted and calculating a language model score for each of the hypotheses using the associated probability value for each terminal present therein and obtained from the plurality of context-free grammars.

15. A method to perform language processing comprising:
- receiving an input signal indicative of language;
  
  accessing a unified language model to recognize the language and predict non-terminal tokens contained therein, the unified language model comprising a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts and terminals, wherein each of the terminals include a probability value assigned by using non-uniform probability values derived from a terminal based language model, said values being normalized using the set of terminals constrained by the plurality of context-free grammars, and a N-gram language model having the non-terminal tokens; and
  
  generating hypotheses for the received language as a function of words in the unified language model corresponding to the non-terminal tokens predicted and calculating a language model score for each of the hypotheses using the associated probability value for each terminal present therein and obtained from the plurality of context-free grammars.
- View Dependent Claims (16)
- - 16. The method of claim 15 and further comprising:
    - providing an output signal indicative of the language and at least some of the semantic or syntactic concepts contained therein.

17. A computer readable medium including instructions readable by a computer which, when implemented execute a method to perform language processing for recognizing language and providing an output signal indicative thereof, the method comprising:
- receiving an input signal indicative of language;
  
  accessing a unified language model to recognize the language and predict non-terminal tokens contained therein, the unified language model comprising a plurality of context-free grammars comprising non-terminal tokens representing semantic or syntactic concepts and terminals, wherein each of the terminals include a probability value assigned by using non-uniform probability values derived from a terminal based language model, said values being normalized using the set of terminals constrained by the plurality of context-free grammars, and a N-gram language model having the non-terminal tokens; and
  
  generating hypotheses for the received language as a function of words in the unified language model corresponding to the non-terminal tokens predicted and calculating a language model score for each of the hypotheses using the associated probability value for each terminal present therein and obtained from the plurality of context-free grammars.
- View Dependent Claims (18)
- - 18. The computer readable medium of claim 17 and further comprising:
    - providing an output signal indicative of the language and at least some of the semantic or syntactic concepts contained therein.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Mahajan, Milind V., Huang, Xuedong D., Wang, Ye-Yi, Mou, Xiaolong

Granted Patent

US 7,013,265 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G10L 15/193 Formal grammars, e.g. finit...

G10L 15/197 Probabilistic grammars, e.g...

Use of a unified language model

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Use of a unified language model

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links