Method and apparatus for a parameter sharing speech recognition system
First Claim
1. A method for recognizing speech comprising the steps of:
- receiving speech signals into a processor;
processing the received speech signals using a speech recognition system produced by generating a plurality of phoneme models, wherein at least one of the plurality of phoneme models are shared among a plurality of phonemes, and at least a first one of the plurality of phoneme models are shared with at least a second one of the plurality of phoneme models; and
generating signals representative of the received speech signals.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and an apparatus for a parameter sharing speech recognition system are provided. Speech signals are received into a processor of a speech recognition system. The speech signals are processed using a speech recognition system hosting a shared hidden Markov model (HMM) produced by generating a number of phoneme models, some of which are shared. The phoneme models are generated by retaining as a separate phoneme model any triphone model having a number of trained frames available that exceeds a prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models for which the number of trained frames having a common biphone exceed the prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models for which the number of trained frames having an equivalent effect on a phonemic context exceed the prespecified threshold. A shared phoneme model is generated to represent each of the groups of triphone phoneme models having the same center context. The generated phoneme models are trained, and shared phoneme model states are generated that are shared among the phoneme models. Shared probability distribution functions are generated that are shared among the phoneme model states. Shared probability sub-distribution functions are generated that are shared among the phoneme model probability distribution functions. The shared phoneme model hierarchy is reevaluated for further sharing in response to the shared probability sub-distribution functions. Signals representative of the received speech signals are generated.
60 Citations
45 Claims
-
1. A method for recognizing speech comprising the steps of:
-
receiving speech signals into a processor; processing the received speech signals using a speech recognition system produced by generating a plurality of phoneme models, wherein at least one of the plurality of phoneme models are shared among a plurality of phonemes, and at least a first one of the plurality of phoneme models are shared with at least a second one of the plurality of phoneme models; and generating signals representative of the received speech signals. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. An apparatus for speech recognition comprising:
-
an input for receiving speech signals into a processor; a processor configured to recognize the received speech signals using a speech recognition system to generate a signal representative of the received speech signal, the speech recognition system produced by generating and training a plurality of phoneme models, wherein at least one of the plurality of phoneme models are shared among a plurality of phonemes, and at least a first one of the plurality of phoneme models are shared with at least a second one of the plurality of phoneme models; and an output for providing a signal representative of the received speech signal. - View Dependent Claims (17, 18, 19, 20)
-
-
21. A speech recognition process comprising a statistical learning technique that uses a model, the model produced by:
-
generating and training a plurality of phoneme models, wherein at least one of the plurality of phoneme models are shared among a plurality of phonemes; generating a plurality of shared probability sub-distribution functions from the trained plurality of phoneme models; and evaluating the plurality of phoneme models for further sharing in response to the plurality of shared probability sub-distribution functions. - View Dependent Claims (22, 23, 24, 25, 26)
-
-
27. A method for generating a plurality of phoneme models for use in a speech recognition system, the method comprising the steps of:
-
retaining as a separate phoneme model a triphone phoneme model for which a number of trained frames exceeds a threshold; generating at least one shared phoneme model to represent a plurality of triphone phoneme models for which the number of trained frames having a common biphone exceeds the threshold; generating at least one shared phoneme model to represent a plurality of triphone phoneme models for which the number of trained frames having an equivalent effect on a phonemic context exceeds the threshold; and generating at least one shared phoneme model to represent a plurality of triphone phoneme models having the same center context. - View Dependent Claims (28)
-
-
29. A computer readable medium containing executable instructions which, when executed in a processing system, causes the system to perform the steps for recognizing speech comprising:
-
receiving speech signals into a processor; processing the received speech signals using a speech recognition system comprising a plurality of phoneme models, wherein at least one of the plurality of phoneme models are shared among a plurality of phonemes and at least a first one of the plurality of phoneme models are shared with at least a second one of the plurality of phoneme models; and generating signals representative of the received speech signals. - View Dependent Claims (30, 31, 32)
-
-
33. A method for recognizing speech comprising the steps of:
-
receiving speech signals into a processor; processing the received speech signals using a model comprising a plurality of phoneme models, wherein at least one of the plurality of phoneme models are shared among a plurality of phonemes, and at least a first one of the plurality of phoneme models are shared with at least a second one of the plurality of phoneme models; and generating signals representative of the received speech signals. - View Dependent Claims (34, 35, 36)
-
-
37. An apparatus for speech recognition comprising:
-
an input configured to receive speech signals into a processor; a processor configured to process the received speech signals using a model comprising a plurality of phoneme models, wherein at least one of the plurality of phoneme models are shared among a plurality of phonemes, and at least a first one of the plurality of phoneme models are shared with at least a second one of the plurality of phoneme models; and an output configured to provide a signal representative of the received speech signal. - View Dependent Claims (38, 39)
-
-
40. A computer readable medium containing executable instructions which, when executed in a processing system, causes the system to perform the steps for recognizing speech comprising:
-
receiving speech signals into a processor; processing the received speech signals using a model comprising a plurality of context dependent phoneme models, wherein at least one of the plurality of phoneme models are shared among a plurality of phonemes, and at least a first one of the plurality of phoneme models are shared with at least a second one of the plurality of phoneme models; and providing output signals representative of the received speech signals. - View Dependent Claims (41, 42, 43)
-
-
44. A system for recognizing speech comprising:
-
means for receiving speech signals into a processor; means for processing the received speech signals using a speech recognition system produced by generating a plurality of phoneme models, wherein at least one of the plurality of phoneme models are shared among a plurality of phonemes, and at least a first one of the plurality of phoneme models are shared with at least a second one of the plurality of phoneme models; and means for generating signals representative of the received speech signals.
-
-
45. A system for generating a plurality of phoneme models for use in a speech recognition system, comprising:
-
means for retaining as a separate phoneme model a triphone phoneme model for which a number of trained frames exceeds a threshold; means for generating at least one shared phoneme model to represent a plurality of triphone phoneme models for which the number of trained frames having a common biphone exceeds the threshold; means for generating at least one shared phoneme model to represent a plurality of triphone phoneme models for which the number of trained frames having an equivalent effect on a phonemic context exceeds the threshold; and means for generating at least one shared phoneme model to represent a plurality of triphone phoneme models having the same center context.
-
Specification