SYSTEM AND METHOD FOR IMPROVING ROBUSTNESS OF SPEECH RECOGNITION USING VOCAL TRACT LENGTH NORMALIZATION CODEBOOKS

US 20100324893A1
Filed: 08/26/2010
Published: 12/23/2010
Est. Priority Date: 06/20/2007
Status: Active Grant

First Claim

Patent Images

1. A method of performing speech recognition, the method comprising:

selecting a codebook, from a plurality of codebooks, indicating a vocal tract length based on an acoustic distance to a received speech, wherein the plurality of codebooks comprises codebooks for each of a plurality of speakers and is generated based on a respective vocal tract length for each of the plurality of speakers;

normalizing the received speech based on the vocal tract length to yield a normalized received speech; and

recognizing the normalized received speech.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker'"'"'s vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.

Citations

20 Claims

1. A method of performing speech recognition, the method comprising:
- selecting a codebook, from a plurality of codebooks, indicating a vocal tract length based on an acoustic distance to a received speech, wherein the plurality of codebooks comprises codebooks for each of a plurality of speakers and is generated based on a respective vocal tract length for each of the plurality of speakers;
  
  normalizing the received speech based on the vocal tract length to yield a normalized received speech; and
  
  recognizing the normalized received speech.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 20)
- - 2. The method of claim 1, further comprising:
    - identifying an additional speech vector in the received speech; and
      
      selecting an additional codebook indicating an additional vocal tract length from the plurality of codebooks based on the additional speech vector.
  - 3. The method of claim 2, further comprising:
    - normalizing the received speech based on the vocal tract length and on the additional vocal tract length to yield additional normalized received speech; and
      
      recognizing the additional normalized received speech.
  - 4. The method of claim 1, wherein the method is performed frame by frame.
  - 5. The method of claim 1, wherein recognizing the normalized received speech occurs in real time.
  - 6. The method of claim 1, wherein the plurality of codebooks covers vocal tract lengths from approximately 0.8 to 1.2 times an ideal vocal tract length.
  - 7. The method of claim 1, wherein selecting the codebook from the plurality of codebooks is based on a likelihood calculation.
  - 20. The non-transitory computer-readable storage medium of claim 1, wherein selecting the codebook from the plurality of codebooks is based on a likelihood calculation.

8. A system for performing speech recognition, the system comprising:
- a processor;
  
  a first module configured to control the processor to select a codebook, from a plurality of codebooks, indicating a vocal tract length based on an acoustic distance to a received speech, wherein the plurality of codebooks comprises codebooks for each of a plurality of speakers and is generated based on a respective vocal tract length for each of the plurality of speakers;
  
  a second module configured to control the processor to normalize the received speech based on the vocal tract length to yield a normalized received speech; and
  
  a third module configured to control the processor to recognize the normalized received speech.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, further comprising:
    - a fourth module configured to control the processor to identify an additional speech vector in the received speech; and
      
      a fifth module configured to control the processor to select an additional codebook indicating an additional vocal tract length from the plurality of codebooks based on the additional speech vector.
  - 10. The system of claim 9, further comprising:
    - a sixth module configured to control the processor to normalize the received speech based on the vocal tract length and on the additional vocal tract length to yield additional normalized received speech; and
      
      a seventh module configured to control the processor to recognize the additional normalized received speech.
  - 11. The system of claim 8, wherein the system operates frame by frame.
  - 12. The system of claim 8, wherein the third module is further configured to control the processor to recognize the normalized received speech in real time.
  - 13. The system of claim 8, wherein the plurality of codebooks covers vocal tract lengths from approximately 0.8 to 1.2 times an ideal vocal tract length.
  - 14. The system of claim 8, wherein the first module is further configured to control the processor to select the codebook from the plurality of codebooks based on a likelihood calculation.

15. A non-transitory computer-readable storage medium storing instructions which, when executed by a computing device, cause the computing device to perform speech recognition, the instructions comprising:
- selecting a codebook, from a plurality of codebooks, indicating a vocal tract length based on an acoustic distance to a received speech, wherein the plurality of codebooks comprises codebooks for each of a plurality of speakers and is generated based on a respective vocal tract length for each of the plurality of speakers;
  
  normalizing the received speech based on the vocal tract length to yield a normalized received speech; and
  
  recognizing the normalized received speech.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The non-transitory computer-readable storage medium of claim 15, further comprising:
    - identifying an additional speech vector in the received speech;
      
      selecting an additional codebook indicating an additional vocal tract length from the plurality of codebooks based on the additional speech vector;
  - 17. The non-transitory computer-readable storage medium of claim 16, further comprising:
    - normalizing the received speech based on the vocal tract length and on the additional vocal tract length to yield additional normalized received speech;
      
      recognizing the additional normalized received speech.
  - 18. The non-transitory computer-readable storage medium of claim 15, wherein recognizing the normalized speech is performed frame by frame.
  - 19. The non-transitory computer-readable storage medium of claim 15, wherein recognizing the normalized received speech occurs in real time.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
GILBERT, Mazin

Granted Patent

US 8,160,875 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/234
CPC Class Codes

G10L 15/07 to the speaker

SYSTEM AND METHOD FOR IMPROVING ROBUSTNESS OF SPEECH RECOGNITION USING VOCAL TRACT LENGTH NORMALIZATION CODEBOOKS

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR IMPROVING ROBUSTNESS OF SPEECH RECOGNITION USING VOCAL TRACT LENGTH NORMALIZATION CODEBOOKS

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links