Lattice-based unsupervised maximum likelihood linear regression for speaker adaptation

US 7,216,077 B1
Filed: 09/26/2000
Issued: 05/08/2007
Est. Priority Date: 09/26/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A method of providing speaker adaptation in speech recognition, said method comprising the steps of:

providing at least one speech recognition model;

accepting speaker data;

generating a word lattice having a plurality of paths based on the speaker data, wherein the step of generating the word lattice comprises considering language model probabilities by incorporating the language model probabilities into a transition probability; and

adapting at least one of the speaker data and the at least one speech recognition model with respect to the generated word lattice in a manner to maximize the likelihood of the speaker data,wherein said step of generating a word lattice comprises generating a maximum a-posteriori probability word lattice,wherein said step of generating a maximum a-posteriori probability word lattice comprises;

determining posterior state occupancy probabilities for each state in the speaker data at each time;

determining posterior word occupancy probabilities by summing over all states interior to each word in the speaker data; and

determining at least one likeliest word at each frame of the speaker data,wherein said step of determining posterior state occupancy probabilities for each state in the speaker data at each time comprises the use of the following formula;

$P (S_{t} = s | y_{1}^{T}) = \frac{α_{s}^{t} β_{s}^{t}}{P (y_{1}^{T})}$ where α

_s^t=P(y₁^t, S_t=s)and
β

_s^t=P(y_1+t^T/S_t=s)for states s and a set of observations T, and where y_t^Trepresents T observation frames of adaptation data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and arrangements using lattice-based information for unsupervised speaker adaptation. By performing adaptation against a word lattice, correct models are more likely to be used in estimating a transform. Further, a particular type of lattice proposed herein enables the use of a natural confidence measure given by the posterior occupancy probability of a state, that is, the statistics of a particular state will be updated with the current frame only if the a posteriori probability of the state at that particular time is greater than a predetermined threshold.

81 Citations

View as Search Results

4 Claims

1. A method of providing speaker adaptation in speech recognition, said method comprising the steps of:
- providing at least one speech recognition model;
  
  accepting speaker data;
  
  generating a word lattice having a plurality of paths based on the speaker data, wherein the step of generating the word lattice comprises considering language model probabilities by incorporating the language model probabilities into a transition probability; and
  
  adapting at least one of the speaker data and the at least one speech recognition model with respect to the generated word lattice in a manner to maximize the likelihood of the speaker data,wherein said step of generating a word lattice comprises generating a maximum a-posteriori probability word lattice,wherein said step of generating a maximum a-posteriori probability word lattice comprises;
  
  determining posterior state occupancy probabilities for each state in the speaker data at each time;
  
  determining posterior word occupancy probabilities by summing over all states interior to each word in the speaker data; and
  
  determining at least one likeliest word at each frame of the speaker data,wherein said step of determining posterior state occupancy probabilities for each state in the speaker data at each time comprises the use of the following formula;
  
  $P (S_{t} = s | y_{1}^{T}) = \frac{α_{s}^{t} β_{s}^{t}}{P (y_{1}^{T})}$ where α
  
  _s^t=P(y₁^t, S_t=s)and
  β
  
  _s^t=P(y_1+t^T/S_t=s)for states s and a set of observations T, and where y_t^Trepresents T observation frames of adaptation data.
- View Dependent Claims (2)
- - 2. The method according to claim 1, wherein said step of determining posterior word occupancy probabilities by summing over all states interior to each word in the speaker data comprises a determination using the following formula at each time frame:

3. An apparatus for providing speaker adaptation in speech recognition, said apparatus comprising:
- at least one speech recognition model;
  
  an accepting arrangement which accepts speaker data;
  
  a lattice generator which generates a word lattice having a plurality of paths based on the speaker data, wherein the generation of the word lattice comprises consideration of language model probabilities by incorporating the language model probabilities into a transition probability; and
  
  a processing arrangement which adapts at least one of the speaker data and the at least one speech recognition model with respect to the generated word lattice in a manner to maximize the likelihood of the speaker data,wherein said generator is adapted to generate a maximum a-posteriori probability word lattice,wherein said generator is adapted to;
  
  determine posterior state occupancy probabilities for each state in the speaker data at each time;
  
  determine posterior word occupancy probabilities by summing over all states interior to each word in the speaker data; and
  
  determine at least one likeliest word at each frame of the speaker data,wherein said determining posterior state occupancy probabilities for each state in the speaker data at each time comprises the use of the following formula;
  
  $P (S_{t} = s | y_{1}^{T}) = \frac{α_{s}^{t} β_{s}^{t}}{P (y_{1}^{T})}$ where
  α
  
  _s^t=P(y₁^t,S_t=s)and
  β
  
  _s^t=P(y_t+1^T/S_t=s)for states s and a set of observations T, and where y_t^Trepresents T observation frames of adaptation data.
- View Dependent Claims (4)
- - 4. The apparatus according to claim 3, wherein said determining posterior word occupancy probabilities by summing over all states interior to each word in the speaker data comprises a determination using the following formula at each time frame:

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Padmanabhan, Mukund, Zweig, Geoffrey G., Saon, George A.
Primary Examiner(s)
Lerner; Martin

Application Number

US09/670,251
Time in Patent Office

2,415 Days
Field of Search

704/236, 704/240, 704/243, 704/244, 704/246, 704/250
US Class Current

704/240
CPC Class Codes

G10L 15/065 Adaptation

Lattice-based unsupervised maximum likelihood linear regression for speaker adaptation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

81 Citations

4 Claims

Specification

Solutions

Use Cases

Quick Links

Lattice-based unsupervised maximum likelihood linear regression for speaker adaptation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

81 Citations

4 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links