Fully expanded context-dependent networks for speech recognition
First Claim
Patent Images
1. A method for making a combined weighted speech transducer for a large-vocabulary context-dependent speech recognizer based on signals representing:
- (i) the inverse, C−
1, of a context-dependency transducer;
(ii) a word pronunciation transducer, L; and
(iii) a language model transducer, G;
the method comprising the steps ofgenerating signals representing transducer C, the inverse of a determinized version of transducer C−
1, generating signals representing transducer P′
, a determinized version of the composition L′
BG′
, which composition is a composition of disambiguated versions of each of said transducers L and G, generating signals representing a transducer P, a minimized version of transducer P′
, and generating signals representing said combined speech transducer as the composition C B P.
4 Assignments
0 Petitions
Accused Products
Abstract
A large vocabulary speech recognizer including a combined weighted network of transducers reflecting fully expanded context-dependent modeling of pronunciations and language that can be used with a single-pass Viterbi or other coder based on sequences of labels provided by feature analysis of input speech.
-
Citations
20 Claims
-
1. A method for making a combined weighted speech transducer for a large-vocabulary context-dependent speech recognizer based on signals representing:
- (i) the inverse, C−
1, of a context-dependency transducer;
(ii) a word pronunciation transducer, L; and
(iii) a language model transducer, G;
the method comprising the steps ofgenerating signals representing transducer C, the inverse of a determinized version of transducer C−
1,generating signals representing transducer P′
, a determinized version of the composition L′
BG′
, which composition is a composition of disambiguated versions of each of said transducers L and G,generating signals representing a transducer P, a minimized version of transducer P′
, andgenerating signals representing said combined speech transducer as the composition C B P. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
generating signals representing a transducer L′
, a disambiguated version of L,generating signals representing transducer G′
, a disambiguated version of G,generating signals representing a transducer L′
BG′
that is a determinized version of the composition of L′ and
G′
.
- (i) the inverse, C−
-
4. The method of claim 3, wherein said step of generating signals representing L′
- comprises labeling with auxiliary labels those paths in L that map input strings to outputs in excess of a first output.
-
5. The method of claim 3, wherein said step of generating signals representing G′
- comprises labeling with auxiliary labels those paths that map input strings to context sequences in excess of a first context sequence.
-
6. The method of claim 4, wherein said step of generating signals representing G′
- comprises labeling with auxiliary labels those paths that map input strings to context sequences in excess of a first context sequence.
-
7. The method of claim 6, where in said step of generating signals representing P comprises steps of
modifying said transducer P′ - by replacing said auxiliary labels by ε
, and removing ε
-arcs in said modified version of P′
.
- by replacing said auxiliary labels by ε
-
8. The method of claim 1 wherein said language model, G, is an n-gram model, where n is a positive integer.
-
9. The method of claim 8 wherein n=2.
-
10. The method of claim 8 wherein n=3.
-
11. The method of claim 1 wherein said context dependency transducer, said inverse transducer C−
- 1, the determinized version of C−
1 and the inverse of the determinized version of C−
1 are cross-word context transducers.
- 1, the determinized version of C−
-
12. The method of claim 1 wherein said combined weighted transducer is fully expanded.
-
13. The method of claim 1 wherein G is a weighted transducer.
-
14. The method of claim 1 wherein L is a weighted transducer.
-
15. The method of claim 1 wherein G and L are weighted transducers.
-
16. A combined weighted speech transducer for use in a large-vocabulary context-dependent speech recognizer, said transducer stored in a memory system and being based on signals representing:
- (i) the inverse, C−
1, of a context-dependency transducer;
(ii) a word pronunciation transducer, L; and
(iii) a language model transducer, G;
said transducer comprisingsignals representing C B P, the composition of transducers C and P, where transducer C comprises signals representing the inverse of a determinized version of transducer C−
1,transfer P comprises signals representing a minimized version of a transducer P′
, where transducer P′
is a determinized version of the composition L′
BG′
, of disambiguated versions of each of said transducers L and G.
- (i) the inverse, C−
-
17. A large-vocabulary, context-dependent speech recognizer comprising
a. a feature extractor for extracting features of input speech signals and applying sequences of one or more labels to said features, b. a combined weighted speech transducer for use in a speech recognizer, said transducer being stored in a memory system and being based on signals representing: - (i) the inverse, C−
1, of a context-dependency transducer;
(ii) a word pronunciation transducer, L; and
(iii) a language model transducer, G;
said combined speech transducer comprising signals representing C B P, the composition of transducers C and P, wheretransducer C comprises signals representing the inverse of a determinized version of transducer C−
1, andtransducer P comprises signals representing a minimized version of a transducer P′
, where transducer P′
is a determinized version of the composition L′
BG′
, of disambiguated versions of each of said transducers L and G, andc. a decoder for outputting decisions about said input speech signals based on said sequences of labels and said combined speech transducer. - View Dependent Claims (18, 19, 20)
- (i) the inverse, C−
Specification