Differential acoustic model representation and linear transform-based adaptation for efficient user profile update techniques in automatic speech recognition
First Claim
Patent Images
1. A method comprising:
- adapting, by a computing device, an initial speech recognition acoustic model to produce a speaker adapted acoustic model using speech recognition data from a particular speaker;
developing, by the computing device, a speaker differential acoustic model representing one or more differences between the initial speech recognition acoustic model and the speaker adapted acoustic model;
optimizing different quantization ranges for parameter subsets of the speaker adapted acoustic model using a scoring function;
using the different quantization ranges to minimize an acoustic model difference measure of the speaker adapted acoustic model against the initial speech recognition acoustic model; and
storing the speaker differential acoustic model for subsequent speech recognition with the particular speaker.
4 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method is described for speaker adaptation in automatic speech recognition. Speech recognition data from a particular speaker is used for adaptation of an initial speech recognition acoustic model to produce a speaker adapted acoustic model. A speaker dependent differential acoustic model is determined that represents differences between the initial speech recognition acoustic model and the speaker adapted acoustic model. In addition, an approach is also disclosed to estimate speaker-specific feature or model transforms over multiple sessions. This is achieved by updating the previously estimated transform using only adaptation statistics of the current session.
90 Citations
20 Claims
-
1. A method comprising:
-
adapting, by a computing device, an initial speech recognition acoustic model to produce a speaker adapted acoustic model using speech recognition data from a particular speaker; developing, by the computing device, a speaker differential acoustic model representing one or more differences between the initial speech recognition acoustic model and the speaker adapted acoustic model; optimizing different quantization ranges for parameter subsets of the speaker adapted acoustic model using a scoring function; using the different quantization ranges to minimize an acoustic model difference measure of the speaker adapted acoustic model against the initial speech recognition acoustic model; and storing the speaker differential acoustic model for subsequent speech recognition with the particular speaker. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method comprising:
-
loading a user profile for a particular speaker including an initial user feature transform representing speaker adapted speech recognition acoustic models; performing speech recognition, by a computing device, for a session of speech utterances from the particular speaker using the initial user feature transform and a plurality of speaker independent speech recognition acoustic models; determining a session update transform for a linear transform-based speaker adaptation of the initial user feature transform based on speech recognition data from the session; producing an updated user feature transform by combining the initial user feature transform and the session update transform; and storing in the user profile the updated user feature transform for subsequent speech recognition with the particular speaker. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium storing computer-readable instructions that, when executed by a processor, cause a device to:
-
load a user profile for a particular speaker including an initial user feature transform representing speaker adapted speech recognition acoustic models; perform speech recognition for a session of speech utterances from the particular speaker using the initial user feature transform and a plurality of speaker independent speech recognition acoustic models; determine a session update transform for a linear transform-based speaker adaptation of the initial user feature transform based on speech recognition data from the session; produce an updated user feature transform by combining the initial user feature transform and the session update transform; and store in the user profile the updated user feature transform for subsequent speech recognition with the particular speaker. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification