Fully expanded context-dependent networks for speech recognition

US 6,574,597 B1
Filed: 02/11/2000
Issued: 06/03/2003
Est. Priority Date: 05/08/1998
Status: Expired due to Term

First Claim

Patent Images

1. A method for making a combined weighted speech transducer for a large-vocabulary context-dependent speech recognizer based on signals representing:

(i) the inverse, C^−

1, of a context-dependency transducer;

(ii) a word pronunciation transducer, L; and

(iii) a language model transducer, G;

the method comprising the steps ofgenerating signals representing transducer C, the inverse of a determinized version of transducer C^−

1, generating signals representing transducer P′

, a determinized version of the composition L′

BG′

, which composition is a composition of disambiguated versions of each of said transducers L and G, generating signals representing a transducer P, a minimized version of transducer P′

, and generating signals representing said combined speech transducer as the composition C B P.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A large vocabulary speech recognizer including a combined weighted network of transducers reflecting fully expanded context-dependent modeling of pronunciations and language that can be used with a single-pass Viterbi or other coder based on sequences of labels provided by feature analysis of input speech.

Citations

20 Claims

1. A method for making a combined weighted speech transducer for a large-vocabulary context-dependent speech recognizer based on signals representing:
- (i) the inverse, C^−
  
  1, of a context-dependency transducer;
  
  (ii) a word pronunciation transducer, L; and
  
  (iii) a language model transducer, G;
  
  the method comprising the steps ofgenerating signals representing transducer C, the inverse of a determinized version of transducer C^−
  
  1, generating signals representing transducer P′
  
  , a determinized version of the composition L′
  
  BG′
  
  , which composition is a composition of disambiguated versions of each of said transducers L and G, generating signals representing a transducer P, a minimized version of transducer P′
  
  , and generating signals representing said combined speech transducer as the composition C B P.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1 wherein said step of generating signals representing transducer C comprises the steps of generating signals representing a determinized version of C⁻
    - 1, and generating signals representing the inverse of said determinized version of C^−
      
      1.
  - 3. The method of claim 1 wherein said step of generating signals representing P′
    - comprises
4. The method of claim 3, wherein said step of generating signals representing L′
- comprises labeling with auxiliary labels those paths in L that map input strings to outputs in excess of a first output.
5. The method of claim 3, wherein said step of generating signals representing G′
- comprises labeling with auxiliary labels those paths that map input strings to context sequences in excess of a first context sequence.
6. The method of claim 4, wherein said step of generating signals representing G′
- comprises labeling with auxiliary labels those paths that map input strings to context sequences in excess of a first context sequence.
7. The method of claim 6, where in said step of generating signals representing P comprises steps ofmodifying said transducer P′
- by replacing said auxiliary labels by ε
  
  , and removing ε
  
  -arcs in said modified version of P′
  
  .
8. The method of claim 1 wherein said language model, G, is an n-gram model, where n is a positive integer.
9. The method of claim 8 wherein n=2.
10. The method of claim 8 wherein n=3.
11. The method of claim 1 wherein said context dependency transducer, said inverse transducer C⁻
- 1, the determinized version of C^−
  
  1and the inverse of the determinized version of C^−
  
  1are cross-word context transducers.
12. The method of claim 1 wherein said combined weighted transducer is fully expanded.
13. The method of claim 1 wherein G is a weighted transducer.
14. The method of claim 1 wherein L is a weighted transducer.
15. The method of claim 1 wherein G and L are weighted transducers.

16. A combined weighted speech transducer for use in a large-vocabulary context-dependent speech recognizer, said transducer stored in a memory system and being based on signals representing:
- (i) the inverse, C^−
  
  1, of a context-dependency transducer;
  
  (ii) a word pronunciation transducer, L; and
  
  (iii) a language model transducer, G;
  
  said transducer comprisingsignals representing C B P, the composition of transducers C and P, where transducer C comprises signals representing the inverse of a determinized version of transducer C^−
  
  1, transfer P comprises signals representing a minimized version of a transducer P′
  
  , where transducer P′
  
  is a determinized version of the composition L′
  
  BG′
  
  , of disambiguated versions of each of said transducers L and G.

17. A large-vocabulary, context-dependent speech recognizer comprisinga. a feature extractor for extracting features of input speech signals and applying sequences of one or more labels to said features, b. a combined weighted speech transducer for use in a speech recognizer, said transducer being stored in a memory system and being based on signals representing:
- (i) the inverse, C^−
  
  1, of a context-dependency transducer;
  
  (ii) a word pronunciation transducer, L; and
  
  (iii) a language model transducer, G;
  
  said combined speech transducer comprising signals representing C B P, the composition of transducers C and P, where transducer C comprises signals representing the inverse of a determinized version of transducer C^−
  
  1, and transducer P comprises signals representing a minimized version of a transducer P′
  
  , where transducer P′
  
  is a determinized version of the composition L′
  
  BG′
  
  , of disambiguated versions of each of said transducers L and G, and c. a decoder for outputting decisions about said input speech signals based on said sequences of labels and said combined speech transducer.
- View Dependent Claims (18, 19, 20)
- - 18. The speech recognizer of claim 17 wherein said decoder is a single-pass decoder.
  - 19. The speech recognizer of claim 17 wherein said decoder is a Viterbi decoder.
  - 20. The speech recognizer of claim 19 wherein said combined weighted speech transducer is fully expanded.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Riley, Michael Dennis, Mohri, Mehryar
Primary Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US09/502,501
Time in Patent Office

1,208 Days
Field of Search

704/255, 704/256, 704/231, 704/232, 704/242, 704/243, 704/250, 704/251, 704/257
US Class Current

704/251
CPC Class Codes

G10L 15/183   using context dependencies,...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/193   Formal grammars, e.g. finit...

Fully expanded context-dependent networks for speech recognition

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Fully expanded context-dependent networks for speech recognition

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links