SYSTEM AND METHOD FOR IMPROVING ROBUSTNESS OF SPEECH RECOGNITION USING VOCAL TRACT LENGTH NORMALIZATION CODEBOOKS

US 20120203547A1
Filed: 04/13/2012
Published: 08/09/2012
Est. Priority Date: 06/20/2007
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

estimating, based on a voice sample from a user, a vocal tract length;

generating, via a processor, a codebook specific to the user comprising the vocal tract length;

receiving an utterance from the user;

normalizing the utterance using the codebook, to yield a normalized utterance; and

recognizing the utterance using the normalized utterance.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker'"'"'s vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.

0 Citations

20 Claims

1. A method comprising:
- estimating, based on a voice sample from a user, a vocal tract length;
  
  generating, via a processor, a codebook specific to the user comprising the vocal tract length;
  
  receiving an utterance from the user;
  
  normalizing the utterance using the codebook, to yield a normalized utterance; and
  
  recognizing the utterance using the normalized utterance.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the codebook further comprises a speech vector.
  - 3. The method of claim 2, wherein the speech vector further comprises a weight.
  - 4. The method of claim 1, wherein normalizing the utterance using the codebook comprises modifying the utterance based on a difference in the vocal tract length and a detected vocal tract length of the utterance.
  - 5. The method of claim 1, further comprising modifying the codebook upon receiving additional voice samples.
  - 6. The method of claim 1, wherein recognizing the utterance using the normalized utterance occurs in real time.
  - 7. The method of claim 1, further comprising inserting the codebook into a plurality of codebooks.

8. A system comprising:
- a processor; and
  
  a non-transitory computer-readable storage medium storing instructions which, when executed on the processor, perform a method comprising;
  
  estimating, based on a voice sample from a user, a vocal tract length;
  
  generating a codebook specific to the user comprising the vocal tract length;
  
  receiving an utterance from the user;
  
  normalizing the utterance using the codebook, to yield a normalized utterance; and
  
  recognizing the utterance using the normalized utterance.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the codebook further comprises a speech vector.
  - 10. The system of claim 9, wherein the speech vector further comprises a weight.
  - 11. The system of claim 8, wherein normalizing the utterance using the codebook comprises modifying the utterance based on a difference in the vocal tract length and a detected vocal tract length of the utterance.
  - 12. The system of claim 8, the non-transitory computer-readable storage medium storing additional instructions which, when executed on the processor, perform a method comprising modifying the codebook upon receiving additional voice samples.
  - 13. The system of claim 8, wherein recognizing the utterance using the normalized utterance occurs in real time.
  - 14. The system of claim 8, the non-transitory computer-readable storage medium storing additional instructions which, when executed on the processor, perform a method comprising inserting the codebook into a plurality of codebooks.

15. A non-transitory computer-readable storage medium storing instructions which, when executed on a computing device, cause the computing device to perform a method comprising:
- estimating, based on a voice sample from a user, a vocal tract length;
  
  generating a codebook specific to the user comprising the vocal tract length;
  
  receiving an utterance from the user;
  
  normalizing the utterance using the codebook, to yield a normalized utterance; and
  
  recognizing the utterance using the normalized utterance.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The non-transitory computer-readable storage medium of claim 15, wherein the codebook further comprises a speech vector.
  - 17. The non-transitory computer-readable storage medium of claim 16, wherein the speech vector further comprises a weight.
  - 18. The non-transitory computer-readable storage medium of claim 15, wherein normalizing the utterance using the codebook comprises modifying the utterance based on a difference in the vocal tract length and a detected vocal tract length of the utterance.
  - 19. The non-transitory computer-readable storage medium of claim 15 storing additional instructions which, when executed on the computing device, perform a method comprising modifying the codebook upon receiving additional voice samples.
  - 20. The non-transitory computer-readable storage medium of claim 15, wherein recognizing the utterance using the normalized utterance occurs in real time.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property II LP (AT&T, Inc.)
Inventors
Gilbert, Mazin

Granted Patent

US 8,600,744 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/222
CPC Class Codes

G10L 15/07 to the speaker

SYSTEM AND METHOD FOR IMPROVING ROBUSTNESS OF SPEECH RECOGNITION USING VOCAL TRACT LENGTH NORMALIZATION CODEBOOKS

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

0 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR IMPROVING ROBUSTNESS OF SPEECH RECOGNITION USING VOCAL TRACT LENGTH NORMALIZATION CODEBOOKS

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

0 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links