SYSTEM AND METHOD FOR ADAPTING AUTOMATIC SPEECH RECOGNITION PRONUNCIATION BY ACOUSTIC MODEL RESTRUCTURING
First Claim
1. A computer-implemented method of recognizing speech, the method comprising:
- identifying an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect;
collecting speech from a new speaker resulting in collected speech;
transcribing the collected speech to generate a lattice of plausible phonemes which depend on the properties of the target dialect;
creating a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech; and
recognizing via a processor additional speech from the new speaker using the custom speech model.
3 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
37 Citations
20 Claims
-
1. A computer-implemented method of recognizing speech, the method comprising:
-
identifying an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect; collecting speech from a new speaker resulting in collected speech; transcribing the collected speech to generate a lattice of plausible phonemes which depend on the properties of the target dialect; creating a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech; and recognizing via a processor additional speech from the new speaker using the custom speech model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for recognizing speech, the system comprising:
-
a processor; a module configured to control the processor to identify an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect; a module configured to control the processor to collect speech from a new speaker resulting in collected speech; a module configured to control the processor to transcribe the collected speech to generate a lattice of plausible phonemes which depend on the properties of the target dialect; a module configured to control the processor to create a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech; and a module configured to control the processor to recognize additional speech from the new speaker using the custom speech model. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A tangible computer-readable medium storing a computer program having instructions for recognizing speech, the instructions comprising:
-
identifying an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect; collecting speech from a new speaker resulting in collected speech; transcribing the collected speech to generate a lattice of plausible phonemes which depend on the properties of the target dialect; creating a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech; and recognizing via a processor additional speech from the new speaker using the custom speech model. - View Dependent Claims (18, 19, 20)
-
Specification