Lattice-based unsupervised maximum likelihood linear regression for speaker adaptation
First Claim
Patent Images
1. A method of providing speaker adaptation in speech recognition, said method comprising the steps of:
- providing at least one speech recognition model;
accepting speaker data;
generating a word lattice having a plurality of paths based on the speaker data, wherein the step of generating the word lattice comprises considering language model probabilities by incorporating the language model probabilities into a transition probability; and
adapting at least one of the speaker data and the at least one speech recognition model with respect to the generated word lattice in a manner to maximize the likelihood of the speaker data,wherein said step of generating a word lattice comprises generating a maximum a-posteriori probability word lattice,wherein said step of generating a maximum a-posteriori probability word lattice comprises;
determining posterior state occupancy probabilities for each state in the speaker data at each time;
determining posterior word occupancy probabilities by summing over all states interior to each word in the speaker data; and
determining at least one likeliest word at each frame of the speaker data,wherein said step of determining posterior state occupancy probabilities for each state in the speaker data at each time comprises the use of the following formula;
where α
st=P(y1t, St=s)and
β
st=P(y1+tT/St=s)for states s and a set of observations T, and where ytT represents T observation frames of adaptation data.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and arrangements using lattice-based information for unsupervised speaker adaptation. By performing adaptation against a word lattice, correct models are more likely to be used in estimating a transform. Further, a particular type of lattice proposed herein enables the use of a natural confidence measure given by the posterior occupancy probability of a state, that is, the statistics of a particular state will be updated with the current frame only if the a posteriori probability of the state at that particular time is greater than a predetermined threshold.
81 Citations
4 Claims
-
1. A method of providing speaker adaptation in speech recognition, said method comprising the steps of:
-
providing at least one speech recognition model; accepting speaker data; generating a word lattice having a plurality of paths based on the speaker data, wherein the step of generating the word lattice comprises considering language model probabilities by incorporating the language model probabilities into a transition probability; and adapting at least one of the speaker data and the at least one speech recognition model with respect to the generated word lattice in a manner to maximize the likelihood of the speaker data, wherein said step of generating a word lattice comprises generating a maximum a-posteriori probability word lattice, wherein said step of generating a maximum a-posteriori probability word lattice comprises; determining posterior state occupancy probabilities for each state in the speaker data at each time; determining posterior word occupancy probabilities by summing over all states interior to each word in the speaker data; and determining at least one likeliest word at each frame of the speaker data, wherein said step of determining posterior state occupancy probabilities for each state in the speaker data at each time comprises the use of the following formula; where α
st=P(y1t, St=s)and
β
st=P(y1+tT/St=s)for states s and a set of observations T, and where ytT represents T observation frames of adaptation data. - View Dependent Claims (2)
where wi is the set of states in word Wi.
-
-
3. An apparatus for providing speaker adaptation in speech recognition, said apparatus comprising:
-
at least one speech recognition model; an accepting arrangement which accepts speaker data; a lattice generator which generates a word lattice having a plurality of paths based on the speaker data, wherein the generation of the word lattice comprises consideration of language model probabilities by incorporating the language model probabilities into a transition probability; and a processing arrangement which adapts at least one of the speaker data and the at least one speech recognition model with respect to the generated word lattice in a manner to maximize the likelihood of the speaker data, wherein said generator is adapted to generate a maximum a-posteriori probability word lattice, wherein said generator is adapted to; determine posterior state occupancy probabilities for each state in the speaker data at each time; determine posterior word occupancy probabilities by summing over all states interior to each word in the speaker data; and determine at least one likeliest word at each frame of the speaker data, wherein said determining posterior state occupancy probabilities for each state in the speaker data at each time comprises the use of the following formula; where
α
st=P(y1t,St=s)and
β
st=P(yt+1T/St=s)for states s and a set of observations T, and where ytT represents T observation frames of adaptation data. - View Dependent Claims (4)
where wi is the set of states in word Wi.
-
Specification