System and method for improving robustness of speech recognition using vocal tract length normalization codebooks
First Claim
1. A method comprising:
- estimating, based on a voice sample from a user, a vocal tract length;
generating, via a processor, a codebook comprising the vocal tract length;
receiving an utterance from the user;
adjusting the codebook based on environmental conditions, to yield a modified codebook;
normalizing the utterance using the modified codebook, to yield a normalized utterance; and
recognizing the utterance using the normalized utterance.
5 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker'"'"'s vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.
14 Citations
20 Claims
-
1. A method comprising:
-
estimating, based on a voice sample from a user, a vocal tract length; generating, via a processor, a codebook comprising the vocal tract length; receiving an utterance from the user; adjusting the codebook based on environmental conditions, to yield a modified codebook; normalizing the utterance using the modified codebook, to yield a normalized utterance; and recognizing the utterance using the normalized utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a processor; and a computer-readable storage device having instructions stored which, when executed on the processor, perform operations comprising; estimating, based on a voice sample from a user, a vocal tract length; generating a codebook comprising the vocal tract length; receiving an utterance from the user; adjusting the codebook based on environmental conditions, to yield a modified codebook; normalizing the utterance using the modified codebook, to yield a normalized utterance; and recognizing the utterance using the normalized utterance. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable storage device having instructions stored which, when executed on a computing device, cause the computing device to perform operations comprising:
-
estimating, based on a voice sample from a user, a vocal tract length; generating a codebook comprising the vocal tract length; receiving an utterance from the user; adjusting the codebook based on environmental conditions, to yield a modified codebook; normalizing the utterance using the modified codebook, to yield a normalized utterance; and recognizing the utterance using the normalized utterance. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification