System and Method for Adapting Automatic Speech Recognition Pronunciation by Acoustic Model Restructuring
First Claim
1. A method comprising:
- identifying an acoustic model, wherein the acoustic model is trained on native speech in a target dialect;
transcribing collected speech from a speaker, to yield a lattice of plausible phonemes which depend on a property of the target dialect; and
replacing a phoneme in the acoustic model with a modified phoneme, wherein the modified phoneme is chosen based on the lattice of plausible phonemes.
4 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
-
Citations
20 Claims
-
1. A method comprising:
-
identifying an acoustic model, wherein the acoustic model is trained on native speech in a target dialect; transcribing collected speech from a speaker, to yield a lattice of plausible phonemes which depend on a property of the target dialect; and replacing a phoneme in the acoustic model with a modified phoneme, wherein the modified phoneme is chosen based on the lattice of plausible phonemes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; identifying an acoustic model, wherein the acoustic model is trained on native speech in a target dialect; transcribing collected speech from a speaker, to yield a lattice of plausible phonemes which depend on a property of the target dialect; and replacing each phoneme in the acoustic model with a modified phoneme, wherein the modified phoneme is a weighted sum of plausible phonemes in the lattice of plausible phonemes. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
-
identifying an acoustic model, wherein the acoustic model is trained on native speech in a target dialect; transcribing collected speech from a speaker, to yield a lattice of plausible phonemes which depend on a property of the target dialect; and replacing each phoneme in the acoustic model with a modified phoneme, wherein the modified phoneme is a weighted sum of plausible phonemes in the lattice of plausible phonemes. - View Dependent Claims (19, 20)
-
Specification