On-line background noise adaptation of parallel model combination HMM with discriminative learning using weighted HMM for noisy speech recognition

US 6,188,982 B1
Filed: 12/01/1997
Issued: 02/13/2001
Est. Priority Date: 12/01/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method of generating a composite noisy speech model, comprising the steps of:

generating frames of current input utterances based on received speech data, determining which of said generated frames are aligned with noisy states to produce a current noise model, re-estimating the produced current noise model by interpolating the number of frames in said current noise model with parameters from a previous noise model, combining the parameters of said current noise model with templates of a corresponding current clean speech model to generate templates of a composite noisy speech model, determining a discrimination function by generating a weighted current noise model based on said composite noisy speech model, determining a distance function by measuring the degree of mis-recognition based on said discrimination function, determining a loss function based on said distance function, said loss function being approximately equal to said distance function, determining a risk function representing the mean value of said loss function, and generating a current discriminative noise model based in part on said risk function, such that the input utterances correspond more accurately with the predetermined templates of the composite noisy speech model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for adaptively generating a composite noisy speech model to process speech in, e.g., a nonstationary environment comprises a speech recognizer, a re-estimation circuit, a combiner circuit, a classifier circuit, and a discrimination circuit. In particular, the speech recognizer generates frames of current input utterances based on received speech data and determines which of the generated frames are aligned with noisy states to produce a current noise model. The re-estimation circuit re-estimates the produced current noise model by interpolating the number of frames in the current noise model with parameters from a previous noise model. The combiner circuit combines the parameters of the current noise model with model parameters of a corresponding current clean speech model to generate model parameters of a composite noisy speech model. The classifier circuit determines a discrimination function by generating a weighted PMC HMM model. The discrimination learning circuit determines a distance function by measuring the degree of mis-recognition based on the discrimination function, determines a loss function based on the distance function, which is approximately equal to the distance function, determines a risk function representing the mean value of the loss function, and generates a current discriminative noise model based in part on the risk function, such that the input utterances correspond more accurately with the predetermined model parameters of the composite noisy speech model.

147 Citations

20 Claims

1. A method of generating a composite noisy speech model, comprising the steps of:
- generating frames of current input utterances based on received speech data, determining which of said generated frames are aligned with noisy states to produce a current noise model, re-estimating the produced current noise model by interpolating the number of frames in said current noise model with parameters from a previous noise model, combining the parameters of said current noise model with templates of a corresponding current clean speech model to generate templates of a composite noisy speech model, determining a discrimination function by generating a weighted current noise model based on said composite noisy speech model, determining a distance function by measuring the degree of mis-recognition based on said discrimination function, determining a loss function based on said distance function, said loss function being approximately equal to said distance function, determining a risk function representing the mean value of said loss function, and generating a current discriminative noise model based in part on said risk function, such that the input utterances correspond more accurately with the predetermined templates of the composite noisy speech model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein said step of re-estimating being based on the equation:
    - $λ (n + k) = \frac{n}{n + k} λ (n) + \frac{k}{n + k} λ (k),$
3. The method of claim 2, wherein said generated frames aligned with noisy states are determined by a Viterbi decoding scheme.
4. The method of claim 3, wherein said combining the parameters of the re-estimated current noise model with parameters of a corresponding current clean speech model to generate a composite noisy speech model is done by using a method of parallel model combination.
5. The method of claim 4, wherein said discrimination function being:
- $g_{j} (O, S_{j}; Λ) = \sum_{i = 1}^{K} (w_{j, i} \cdot {SC}_{j, i})$ where O=o₁, o₂. . . , o_Trepresents an input feature vector of T number of frames, K is the total number of states, SC_j,irepresents the corresponding accumulated log probability of state i in class j, and W_j,irepresents the corresponding weight of state i in class j.
6. The method of claim 1, wherein the current parameter is generated by the steps of:
- determining a distance function by measuring the degree of mis-recognition based on the discrimination function, determining a loss function based on the distance function, determining a risk function for representing the mean value of the lose function, and generating the current weighted parameters based in part on the risk function.
7. The method of claim 6, wherein said distance function being:
8. The method of claim 6, wherein said loss function being:
- $l (d_{α β} (O)) = \tan - 1 \frac{d_{α β}}{d_{o}}, d_{α β} < 0; 0, otherwise$ where d₀is a positive function.
9. The method of claim 6, wherein said risk function being:
- $\overline{R} (O; Λ) = \frac{1}{N} \sum_{k = 1}^{N} l (d (O^{k})),$ where O=O¹, O², . . . , O^N, and O^krepresents a k^thtraining speech data.
10. The method of claim 9, wherein said current discriminative noise model being represented by;
- ${\begin{matrix} Λ_{l + 1} = Λ_{l} + Δ Λ_{l}, if d (O) < τ \\ Δ Λ_{l} = - ɛ (l) U \nabla {\overline{R}}_{_{1}} (O; Λ_{l}), \end{matrix}$ where τ
  
  (τ
  
  >
  
  0) is a preset margin, ε
  
  (l) is a learning constant that is a decreasing function of l, and U is a positive-definitive matrix, such as an identity matrix.

11. A system for generating a composite noisy speech model, comprising:
- a speech recognizer for generating frames of current input utterances based on received speech data, and for determining which of said generated frames are aligned with noisy states to produce a current noise model, a re-estimation circuit for re-estimating the produced current noise model by interpolating the number of frames in said current noise model with parameters from a previous noise model, a combiner circuit for combining the parameters of said current noise model with templates of a corresponding current clean speech model to generate templates of a composite noisy speech model, a classifier circuit for determining a discrimination function by generating a weighted current noise model based on said composite noisy speech model, and a discrimination learning circuit, for determining a distance function by measuring the degree of mis-recognition based on said discrimination function, for determining a loss function based on said distance function, said loss function being approximately equal to said distance function, for determining a risk function representing the mean value of said loss function, and for generating a current discriminative noise model based in part on said risk function, such that the input utterances correspond more accurately with the predetermined templates of the composite noisy speech model.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The system of claim 11, wherein said step of re-estimating being based on the equation:
    - $λ (n + k) = \frac{n}{n + k} λ (n) + \frac{k}{n + k} λ (k),$
13. The system of claim 12, wherein said generated frames aligned with noisy states are determined by a Viterbi decoding scheme.
14. The system of claim 13, wherein said combining the parameters of the re-estimated current noise model with parameters of a corresponding current clean speech model to generate a composite noisy speech model is done by using a method of parallel model combination.
15. The system of claim 11, wherein the current parameter is generated by the steps of:
- determining a distance function by measuring the degree of mis-recognition based on the discrimination function, determining a loss function based on the distance function, determining a risk function for representing the mean value of the los function, and generating the current weighted parameters based in part on the risk function.
16. The system of claim 14, wherein said discrimination function being:
- $g_{j} (O, S_{j}; Λ) = \sum_{i = 1}^{K} (w_{j, i} \cdot {SC}_{j, i})$ where O=o₁, o₂. . . , o_Trepresents an input feature vector of T number of frames, K is the total number of states, SC_j,irepresents the corresponding accumulated log probability of state i in class j, and W_j,irepresents the corresponding weight of state i in class j.
17. The system of claim 15, wherein said distance function being:
18. The system of claim 15, wherein said loss function being:
- $l (d_{α β} (O)) = \tan - 1 \frac{d_{α β}}{d_{o}}, d_{α β} < 0; 0, otherwise$ where d₀is a positive function.
19. The system of claim 15, wherein said risk function being:
- $\overline{R} (O; Λ) = \frac{1}{N} \sum_{k = 1}^{N} l (d (O^{k})),$ where O=O¹, O², . . . , O^N, and O^krepresents a k^thtraining speech data.
20. The system of claim 19, wherein said current discriminative noise model being represented by:
- ${\begin{matrix} Λ_{l + 1} = Λ_{l} + Δ Λ_{l}, if d (O) < τ \\ Δ Λ_{l} = - ɛ (l) U \nabla {\overline{R}}_{_{1}} (O; Λ_{l}), \end{matrix}$ where τ
  
  (τ
  
  >
  
  0) is a preset margin, ε
  
  (l) is a learning constant that is a decreasing function of 1, and U is a positive-definite matrix, such as an identity matrix.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Industrial Technology Research Institute
Original Assignee
Industrial Technology Research Institute
Inventors
Chiang, Tung-Hui
Primary Examiner(s)
{haeck over (S)}mits, Ta̅livaldis I.

Application Number

US08/982,136
Time in Patent Office

1,170 Days
Field of Search

704/233, 704/256
US Class Current

704/256.5
CPC Class Codes

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/144   Training of HMMs

G10L 15/20   Speech recognition techniqu...

On-line background noise adaptation of parallel model combination HMM with discriminative learning using weighted HMM for noisy speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

147 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

On-line background noise adaptation of parallel model combination HMM with discriminative learning using weighted HMM for noisy speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

147 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links