SYSTEM AND METHOD FOR IMPROVING ROBUSTNESS OF SPEECH RECOGNITION USING VOCAL TRACT LENGTH NORMALIZATION CODEBOOKS
First Claim
1. A method of performing speech recognition, the method comprising:
- selecting a codebook, from a plurality of codebooks, indicating a vocal tract length based on an acoustic distance to a received speech, wherein the plurality of codebooks comprises codebooks for each of a plurality of speakers and is generated based on a respective vocal tract length for each of the plurality of speakers;
normalizing the received speech based on the vocal tract length to yield a normalized received speech; and
recognizing the normalized received speech.
4 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker'"'"'s vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.
-
Citations
20 Claims
-
1. A method of performing speech recognition, the method comprising:
-
selecting a codebook, from a plurality of codebooks, indicating a vocal tract length based on an acoustic distance to a received speech, wherein the plurality of codebooks comprises codebooks for each of a plurality of speakers and is generated based on a respective vocal tract length for each of the plurality of speakers; normalizing the received speech based on the vocal tract length to yield a normalized received speech; and recognizing the normalized received speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 20)
-
-
8. A system for performing speech recognition, the system comprising:
-
a processor; a first module configured to control the processor to select a codebook, from a plurality of codebooks, indicating a vocal tract length based on an acoustic distance to a received speech, wherein the plurality of codebooks comprises codebooks for each of a plurality of speakers and is generated based on a respective vocal tract length for each of the plurality of speakers; a second module configured to control the processor to normalize the received speech based on the vocal tract length to yield a normalized received speech; and a third module configured to control the processor to recognize the normalized received speech. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable storage medium storing instructions which, when executed by a computing device, cause the computing device to perform speech recognition, the instructions comprising:
-
selecting a codebook, from a plurality of codebooks, indicating a vocal tract length based on an acoustic distance to a received speech, wherein the plurality of codebooks comprises codebooks for each of a plurality of speakers and is generated based on a respective vocal tract length for each of the plurality of speakers; normalizing the received speech based on the vocal tract length to yield a normalized received speech; and recognizing the normalized received speech. - View Dependent Claims (16, 17, 18, 19)
-
Specification