LOW LATENCY REAL-TIME VOCAL TRACT LENGTH NORMALIZATION
First Claim
1. A computer-implemented method for training an automatic speech recognition system, the method comprising:
- separating training data into speaker specific segments; and
performing, for each speaker specific segment, the acts of;
generating spectral data representative of the speaker specific segment;
selecting a first warping factor as a best warping factor, and generating a warped spectral data representation of the spectral data;
comparing the warped spectral data representation to a predetermined speech model; and
iteratively performing, until an end condition is satisfied, the acts of;
selecting an other warping factor and generating an other warped spectral data representation;
comparing the warped spectral data representation to a respective speech model for a given iteration; and
if the other warping factor produces a closer match to the respective speech model, saving the other warping factor as the best warping factor for the respective speaker specific segment.
4 Assignments
0 Petitions
Accused Products
Abstract
A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.
-
Citations
19 Claims
-
1. A computer-implemented method for training an automatic speech recognition system, the method comprising:
-
separating training data into speaker specific segments; and performing, for each speaker specific segment, the acts of; generating spectral data representative of the speaker specific segment; selecting a first warping factor as a best warping factor, and generating a warped spectral data representation of the spectral data; comparing the warped spectral data representation to a predetermined speech model; and iteratively performing, until an end condition is satisfied, the acts of; selecting an other warping factor and generating an other warped spectral data representation; comparing the warped spectral data representation to a respective speech model for a given iteration; and if the other warping factor produces a closer match to the respective speech model, saving the other warping factor as the best warping factor for the respective speaker specific segment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for training an automatic speech recognition system, the system comprising:
-
a processor; a module configured to control the processor to generate spectral data from at least a portion of training data; a module configured to control the processor to generate a plurality of warped spectral axes for the spectral data using a range of warping factors; a module configured to control the processor to determine which one of the plurality of warped spectral axes best matches one of a generic speech model or a Vocal Tract Length Normalized acoustic model; a module configured to control the processor to generate the Vocal Tract Length Normalized Acoustic model using a warping factor corresponding to the determined one of the plurality of warped spectral axes; and a module configured to control the processor to rescore lattices based on the Vocal Tract Length Normalized Acoustic model. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A tangible computer-readable storage medium storing a computer program having instructions for training an automatic speech recognition system, the instructions comprising:
-
separating training data into speaker specific segments; and performing, for each speaker specific segment, the acts of; generating spectral data representative of the speaker specific segment; selecting a first warping factor as a best warping factor, and generating a warped spectral data representation of the spectral data; comparing the warped spectral data representation to a predetermined speech model; and iteratively performing, until an end condition is satisfied, the acts of; selecting an other warping factor and generating an other warped spectral data representation; comparing the warped spectral data representation to a respective speech model for a given iteration; and if the other warping factor produces a closer match to the respective speech model, saving the other warping factor as the best warping factor for the respective speaker specific segment.
-
Specification