Method and apparatus for constructing voice templates for a speaker-independent voice recognition system
First Claim
1. A method of creating speech template for uses a speaker-independent speech recognition system, the method comprising:
- segmenting each utterance of a first plurality of utterances to generate a plurality of time-clustered segments for each utterance, each time-clustered segment being represented by a spectral mean;
quantizing the plurality of spectral means for all of the first plurality of utterances to generate a plurality of template vectors;
comparing each one of the plurality of template vectors with a second plurality of utterances using a dynamic time warping calculation to generate at least one comparison result;
matching the first plurality of utterances with the plurality of template vectors if the at least one comparison result exceeds at least one predefined threshold value, to generate an optimal matching path result;
partitioning the first plurality of utterances in time in accordance with the optimal matching path result; and
repeating the quantizing, comparing, matching, and partitioning until the at least one comparison result does not exceed any at least one predefined threshold value.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for constructing voice templates for a speaker-independent voice recognition system includes segmenting a training utterance to generate time-clustered segments, each segment being represented by a mean. The means for all utterances of a given word are quantized to generate template vectors. Each template vector is compared with testing utterances to generate a comparison result. The comparison is typically a dynamic time warping computation. The training utterances are matched with the template vectors if the comparison result exceeds at least one predefined threshold value, to generate an optimal path result, and the training utterances are partitioned in accordance with the optimal path result. The partitioning is typically a K-means segmentation computation. The partitioned utterances may then be re-quantized and re-compared with the testing utterances until the at least one predefined threshold value is not exceeded.
36 Citations
26 Claims
-
1. A method of creating speech template for uses a speaker-independent speech recognition system, the method comprising:
-
segmenting each utterance of a first plurality of utterances to generate a plurality of time-clustered segments for each utterance, each time-clustered segment being represented by a spectral mean;
quantizing the plurality of spectral means for all of the first plurality of utterances to generate a plurality of template vectors;
comparing each one of the plurality of template vectors with a second plurality of utterances using a dynamic time warping calculation to generate at least one comparison result;
matching the first plurality of utterances with the plurality of template vectors if the at least one comparison result exceeds at least one predefined threshold value, to generate an optimal matching path result;
partitioning the first plurality of utterances in time in accordance with the optimal matching path result; and
repeating the quantizing, comparing, matching, and partitioning until the at least one comparison result does not exceed any at least one predefined threshold value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An apparatus configured to create speech templates for use in a speaker-independent speech recognition system, the apparatus comprising:
-
means for segmenting each utterance of a first plurality of utterances to generate a plurality of time-clustered segments for each utterance, each time-clustered segment being represented by a spectral mean;
means for quantizing the plurality of spectral means for all of the first plurality of utterances to generate a plurality of template vectors;
means for using a dynamic time warping calculation to compare each one of the plurality of template vectors with a second plurality of utterances to generate at least one comparison result;
means for matching the first plurality of utterances with the plurality of template vectors if the at least one comparison result exceeds at least one predefined threshold value, to generate an optimal matching path result;
means for partitioning the first plurality of utterances in time in accordance with the optimal matching path result; and
means for repeating the quantizing, comparing, matching, and partitioning until the at least one comparison result does not exceed any at least one predefined threshold value.
-
-
10. An apparatus configured to create speech templates for use in a speaker-independent speech recognition system, the apparatus comprising:
-
segmentation logic configured to segment each utterance of a first plurality of utterances to generate a plurality of time-clustered segments for each utterance, each time-clustered segment being represented by a spectral mean;
a quantizer coupled to the segmentation logic and configured to quantize the plurality of spectral means for all of the first plurality of utterances to generate a plurality of template vectors;
a convergence test coupled to the quantizer and configured to compare each one of the plurality of template vectors with a second plurality of utterances using a dynamic time warping calculation to generate at least one comparison result; and
partitioning logic coupled to the quantizer and the convergence tester, and configured to match the first plurality of utterances with the plurality of template vectors if the at least one comparison result exceeds at least one predefined threshold value, to generate an optimal matching path result, and to partition the first plurality of utterances in time accordance with the optimal matching path result, wherein the quantizer, the convergence tester, and the partitioning logic are further configured to repeat the quantizing, comparing, matching, and partitioning until the at least one comparison result does not exceed any at least one predefined threshold value. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. An apparatus configured to create speech templates for use in a speaker-independent speech recognition system, the apparatus comprising:
-
a processor, and a storage medium coupled to the processor and containing a set of instructions executable by the processor to segment each utterance of a first plurality of utterances to generate a plurality of time-clustered segments for each utterance, each time-clustered segment being represented by a mean, quantize the plurality of spectral means for all of the first plurality of utterances to generate a plurality of template vectors, compare each one of the plurality of template vectors with a second plurality of utterances using a dynamic time warping calculation to generate at least one comparison result, match the first plurality of utterances with the plurality of template vectors if the at least one comparison result exceeds at least one predefined threshold value, to generate an optimal matching path result, partition the first plurality of utterances in time in accordance with the optimal matching path result, and repeat the quantizing, comparing, matching, and partitioning until the at least one comparison result does not exceed any at least one predefined threshold value. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
-
-
26. A processor-readable medium containing a set of instructions executable by a processor to:
-
segment each utterance of a first plurality of utterances to generate a plurality of time-clustered segments for each utterance, each time-clustered segment being represented by a spectral mean;
quantize the plurality of spectral means for all of the first plurality of utterances to generate a plurality of template vectors;
compare each one of the plurality of template vectors with a second plurality of utterances using a dynamic time warping calculation to generate at least one comparison result;
match the first plurality of utterances with the plurality of template vectors if the at least one comparison result exceeds at least one predefined threshold value, to generate an optimal matching path result;
partition the first plurality of utterances in time in accordance with the optimal matching path result; and
repeat the quantizing, comparing, matching, and partitioning until the at least one comparison result does not exceed any at least one predefined threshold value.
-
Specification