SYSTEM AND METHOD FOR ADAPTING AUTOMATIC SPEECH RECOGNITION PRONUNCIATION BY ACOUSTIC MODEL RESTRUCTURING

US 20100312560A1
Filed: 06/09/2009
Published: 12/09/2010
Est. Priority Date: 06/09/2009
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of recognizing speech, the method comprising:

identifying an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect;

collecting speech from a new speaker resulting in collected speech;

transcribing the collected speech to generate a lattice of plausible phonemes which depend on the properties of the target dialect;

creating a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech; and

recognizing via a processor additional speech from the new speaker using the custom speech model.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

37 Citations

View as Search Results

20 Claims

1. A computer-implemented method of recognizing speech, the method comprising:
- identifying an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect;
  
  collecting speech from a new speaker resulting in collected speech;
  
  transcribing the collected speech to generate a lattice of plausible phonemes which depend on the properties of the target dialect;
  
  creating a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech; and
  
  recognizing via a processor additional speech from the new speaker using the custom speech model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The computer-implemented method of claim 1, wherein the custom speech model is a Gaussian mixture model.
  - 3. The computer-implemented method of claim 1, wherein the typical native speech represents a class of speakers.
  - 4. The computer-implemented method of claim 1, wherein the target dialect is at least one of a regional dialect and a foreign accent.
  - 5. The computer-implemented method of claim 1, wherein creating the custom speech model is based on optimizing an objective function.
  - 6. The computer-implemented method of claim 1, wherein the transcribing is further based on reference transcriptions or recognition output.
  - 7. The computer-implemented method of claim 1, wherein creating the custom speech model is performed iteratively.
  - 8. The computer-implemented method of claim 1, wherein a weighted average of phonemes is used instead of the weighted sum of phonemes.

9. A system for recognizing speech, the system comprising:
- a processor;
  
  a module configured to control the processor to identify an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect;
  
  a module configured to control the processor to collect speech from a new speaker resulting in collected speech;
  
  a module configured to control the processor to transcribe the collected speech to generate a lattice of plausible phonemes which depend on the properties of the target dialect;
  
  a module configured to control the processor to create a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech; and
  
  a module configured to control the processor to recognize additional speech from the new speaker using the custom speech model.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein the custom speech model is a Gaussian mixture model.
  - 11. The system of claim 9, wherein the typical native speech represents a class of speakers.
  - 12. The system of claim 9, wherein the target dialect is at least one of a regional dialect and a foreign accent.
  - 13. The system of claim 9, wherein the module configured to control the processor to create the custom speech model is based on optimizing an objective function.
  - 14. The system of claim 9, wherein the module configured to control the processor to transcribe further operates based on reference transcriptions or recognition output.
  - 15. The system of claim 9, wherein the module configured to control the processor to create a custom speech model operates iteratively.
  - 16. The system of claim 9, wherein a weighted average of phonemes is used instead of the weighted sum of phonemes.

17. A tangible computer-readable medium storing a computer program having instructions for recognizing speech, the instructions comprising:
- identifying an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect;
  
  collecting speech from a new speaker resulting in collected speech;
  
  transcribing the collected speech to generate a lattice of plausible phonemes which depend on the properties of the target dialect;
  
  creating a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech; and
  
  recognizing via a processor additional speech from the new speaker using the custom speech model.
- View Dependent Claims (18, 19, 20)
- - 18. The tangible computer-readable medium of claim 17, wherein the custom speech model is a Gaussian mixture model.
  - 19. The tangible computer-readable medium of claim 17, wherein the typical native speech represents a class of speakers.
  - 20. The tangible computer-readable medium of claim 17, wherein creating the custom speech model is based on optimizing an objective function.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
CONKIE, Alistair D., LJOLJE, Andrej, SYRDAL, Ann K.

Granted Patent

US 8,548,807 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/254
CPC Class Codes

G10L 15/063   Training

G10L 15/07   to the speaker

G10L 15/14   using statistical models, e...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 17/14   Use of phonemic categorisat...

G10L 2015/025   Phonemes, fenemes or fenone...

SYSTEM AND METHOD FOR ADAPTING AUTOMATIC SPEECH RECOGNITION PRONUNCIATION BY ACOUSTIC MODEL RESTRUCTURING

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

37 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR ADAPTING AUTOMATIC SPEECH RECOGNITION PRONUNCIATION BY ACOUSTIC MODEL RESTRUCTURING

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

37 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links