System and Method for Adapting Automatic Speech Recognition Pronunciation by Acoustic Model Restructuring

US 20140032214A1
Filed: 10/01/2013
Published: 01/30/2014
Est. Priority Date: 06/09/2009
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

identifying an acoustic model, wherein the acoustic model is trained on native speech in a target dialect;

transcribing collected speech from a speaker, to yield a lattice of plausible phonemes which depend on a property of the target dialect; and

replacing a phoneme in the acoustic model with a modified phoneme, wherein the modified phoneme is chosen based on the lattice of plausible phonemes.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

Citations

20 Claims

1. A method comprising:
- identifying an acoustic model, wherein the acoustic model is trained on native speech in a target dialect;
  
  transcribing collected speech from a speaker, to yield a lattice of plausible phonemes which depend on a property of the target dialect; and
  
  replacing a phoneme in the acoustic model with a modified phoneme, wherein the modified phoneme is chosen based on the lattice of plausible phonemes.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein upon replacing the each phoneme in the acoustic model, the acoustic model is a Gaussian mixture model.
  - 3. The method of claim 1, wherein the native speech represents a class of speakers.
  - 4. The method of claim 1, wherein the target dialect comprises one of a regional dialect and a foreign accent.
  - 5. The method of claim 1, wherein the replacing of each phoneme phoneme in the acoustic model is based on evaluating an objective function.
  - 6. The method of claim 1, wherein the transcribing of the collected speech is based on one of reference transcriptions and recognition output.
  - 7. The method of claim 1, wherein the replacing of each phoneme in the acoustic model is performed iteratively.
  - 8. The method of claim 1, wherein a weighted average of the plausible phonemes is used instead of the weighted sum of the plausible phonemes.
  - 9. The method of claim 1, wherein the modified phoneme further comprises a weighted sum of plausible phonemes in the lattice of plausible phonemes.

10. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising;
  
  identifying an acoustic model, wherein the acoustic model is trained on native speech in a target dialect;
  
  transcribing collected speech from a speaker, to yield a lattice of plausible phonemes which depend on a property of the target dialect; and
  
  replacing each phoneme in the acoustic model with a modified phoneme, wherein the modified phoneme is a weighted sum of plausible phonemes in the lattice of plausible phonemes.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The system of claim 10, wherein upon replacing the each phoneme in the acoustic model, the acoustic model is a Gaussian mixture model.
  - 12. The system of claim 10, wherein the native speech represents a class of speakers.
  - 13. The system of claim 10, wherein the target dialect comprises one of a regional dialect and a foreign accent.
  - 14. The system of claim 10, wherein the replacing of each phoneme phoneme in the acoustic model is based on evaluating an objective function.
  - 15. The system of claim 10, wherein the transcribing of the collected speech is based on one of reference transcriptions and recognition output.
  - 16. The system of claim 10, wherein the replacing of each phoneme in the acoustic model is performed iteratively.
  - 17. The system of claim 10, wherein a weighted average of the plausible phonemes is used instead of the weighted sum of the plausible phonemes.

18. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
- identifying an acoustic model, wherein the acoustic model is trained on native speech in a target dialect;
  
  transcribing collected speech from a speaker, to yield a lattice of plausible phonemes which depend on a property of the target dialect; and
  
  replacing each phoneme in the acoustic model with a modified phoneme, wherein the modified phoneme is a weighted sum of plausible phonemes in the lattice of plausible phonemes.
- View Dependent Claims (19, 20)
- - 19. The computer-readable storage device of claim 18, wherein upon replacing the each phoneme in the acoustic model, the acoustic model is a Gaussian mixture model.
  - 20. The computer-readable storage device of claim 18, wherein the native speech represents a class of speakers.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
LJOLJE, Andrej, CONKIE, Alistair D., Syrdal, Ann K.

Granted Patent

US 8,812,315 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/063   Training

G10L 15/07   to the speaker

G10L 15/14   using statistical models, e...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 17/14   Use of phonemic categorisat...

G10L 2015/025   Phonemes, fenemes or fenone...

System and Method for Adapting Automatic Speech Recognition Pronunciation by Acoustic Model Restructuring

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and Method for Adapting Automatic Speech Recognition Pronunciation by Acoustic Model Restructuring

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links