Gaussian model-based dynamic time warping system and method for speech processing
First Claim
1. A method for constructing a speech model, comprising:
- constructing an acoustic space model from a plurality of utterances obtained from a plurality of speakers;
constructing a speaker model by adapting the acoustic space model using enrollment speech from at least one speaker;
identifying a temporal structure associated with said enrollment speech; and
constructing a speech model based on said speaker model and on the enrollment speech while preserving the temporal structure of said enrollment speech in said speech model.
1 Assignment
0 Petitions
Accused Products
Abstract
The Gaussian Dynamic Time Warping model provides a hierarchical statistical model for representing an acoustic pattern. The first layer of the model represents the general acoustic space; the second layer represents each speaker space and the third layer represents the temporal structure information contained in each enrollment speech utterance, based on equally-spaced time intervals. These three layers are hierarchically developed: the second layer is derived from the first, and the third layer is derived from the second. The model is useful in speech processing application, particularly in applications such as word and speaker recognition, using a spotting recognition mode.
30 Citations
33 Claims
-
1. A method for constructing a speech model, comprising:
-
constructing an acoustic space model from a plurality of utterances obtained from a plurality of speakers;
constructing a speaker model by adapting the acoustic space model using enrollment speech from at least one speaker;
identifying a temporal structure associated with said enrollment speech; and
constructing a speech model based on said speaker model and on the enrollment speech while preserving the temporal structure of said enrollment speech in said speech model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for constructing a speech model, comprising:
-
constructing an acoustic space model from a plurality of utterances obtained from a plurality of speakers;
constructing a speaker model by adapting the acoustic space model using enrollment speech from at least one speaker;
constructing a temporal structure information model by representing said speaker model as a plurality of frame dependent models that correspond to sequential time intervals associated with said enrollment speech; and
constructing said speech model by adapting the temporal structure information model using said enrollment speech, said speaker model and said acoustic space model. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A hierarchical speech model comprising:
-
a first layer for representing an acoustic space;
a second layer for representing a speaker space;
a third layer for representing temporal structure of enrollment speech according to a predetermined frame structure. - View Dependent Claims (19, 20, 21, 22, 23, 24)
-
-
25. A speech processing system comprising:
-
a speech recognizer having a set of probabilistic models against which an input speech utterance is tested;
said set of probabilistic models being configured to contain;
a first layer for representing an acoustic space;
a second layer for representing a speaker space;
a third layer for representing temporal structure of speech according to a predetermined frame structure. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33)
-
Specification