Penalized maximum likelihood estimation methods, the baum welch algorithm and diagonal balancing of symmetric matrices for the training of acoustic models in speech recognition

US 6,374,216 B1
Filed: 09/27/1999
Issued: 04/16/2002
Est. Priority Date: 09/27/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A computer implemented method for machine recognition of speech, comprising the steps of:

inputting acoustic data;

forming a nonparametric density estimator $f_{n} (x) = \sum_{ \in Z_{n}} c_{i} k (x, x^{i}), x \in R^{d}, where Z_{n} = {1, 2, \dots, n}, k (x, y)$

is some specified positive kernel function, $c_{i} \geq 0,  \in Z_{n}, \sum_{i = 1}^{n} c_{i} = 1$

are parameters to be chosen, and {xⁱ}_iε

Z_nis a given set of training data;

setting a kernel for the estimator;

selecting a statistical criterion to be optimized to find values for parameters defining the nonparametric density estimator; and

iteratively computing the density estimator for finding a maximum likelihood estimation of acoustic data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A nonparametric family of density functions formed by histogram estimators for modeling acoustic vectors are used in automatic recognition of speech. A Gaussian kernel is set forth in the density estimator. When the densities are found for all the basic sounds in a training stage, an acoustic vector is assigned to a phoneme label corresponding to the highest likelihood for the basis of the decoding of acoustic vectors into text.

25 Citations

View as Search Results

15 Claims

1. A computer implemented method for machine recognition of speech, comprising the steps of:
- inputting acoustic data;
  
  forming a nonparametric density estimator $f_{n} (x) = \sum_{ \in Z_{n}} c_{i} k (x, x^{i}), x \in R^{d}, where Z_{n} = {1, 2, \dots, n}, k (x, y)$
  
  is some specified positive kernel function, $c_{i} \geq 0,  \in Z_{n}, \sum_{i = 1}^{n} c_{i} = 1$
  
  are parameters to be chosen, and {xⁱ}_iε
  
  Z_nis a given set of training data;
  
  setting a kernel for the estimator;
  
  selecting a statistical criterion to be optimized to find values for parameters defining the nonparametric density estimator; and
  
  iteratively computing the density estimator for finding a maximum likelihood estimation of acoustic data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The computer implemented method for machine recognition of speech as recited in claim 1, wherein the statistical criterion to be optimized is selected from a penalized maximum likelihood criteria or a maximum likelihood criteria.
  - 3. The computer implemented method for machine recognition of speech as recited in claim 2, wherein the maximum likelihood criteria is selected.
  - 4. The computer implemented method for machine recognition of speech as recited in claim 3, wherein a process of degree raising of polynomials is used for global maximization.
  - 5. The computer implemented method for machine recognition of speech as recited in claim 3, wherein the step of iteratively computing employs a process of diagonal balancing of matrices to maximize the likelihood.
  - 6. The computer implemented method for machine recognition of speech as recited in claim 5, wherein a form of the iteratively computing is ĉ
    - =c/nA^T((Ac)^−
      
      1), where c=(c₁, c₂, . . . , c_n), c_i≧
      
      0, $\sum_{i = 1}^{n} c_{i} = 1$
7. The computer implemented method for machine recognition of speech as recited in claim 2, wherein the penalized maximum likelihood criteria is selected.
8. The computer implemented method for machine recognition of speech as recited in claim 7, wherein the step of iteratively computing employs a process of diagonal balancing of matrices to maximize the penalized likelihood.
9. The computer implemented method for machine recognition of speech as recited in claim 8, wherein the step of iteratively computing the density estimator uses an update of parameters given as a unique vector vε
- intSⁿ(b) satisfying v·
  
  Kv=K((Kc)^−
  
  1)·
  
  c−
  
  γ
  
  v·
  
  b, where b_i=∫
  
  _R_^dk(x,xⁱ)dx, b=(b₁, b₂, . . . ,b_n), Sⁿ(b)={c;
  
  cε
  
  R^d, b^Tc=1} and γ
  
  =n−
  
  v^TKv.
10. The computer implemented method for machine recognition of speech as recited in claim 9, wherein the update parameter is given as $v = c \cdot$
- K
  
  ((Kc)-1-σ
  
  
  
  
  
  e)n-σ
  
  
  
  
  
  cT
  
  Kc,where σ
  
  >
  
  0 is a parameter chosen to yield a best possible performance.
11. The computer implemented method for machine recognition of speech as recited in claim 1, wherein the kernel is a Gaussian kernel.
12. The computer implemented method for machine recognition of speech as recited in claim 1, wherein the kernel is given by the formula $k$
- (x,y)=1(1+
  
  
  
  
  
  x-y
  
  
  
  
  
  2)2,x,yε
  
  R^d, where k(x,y), x,yε
  
  R^dis a reproducing kernel for a Hilbert space of functions on R^d.
13. The computer implemented method for machine recognition of speech as recited in claim 1, wherein $c = \frac{e}{n}$
- 
  
  and
  
  
  
  k
  
  (x,y)=1h
  
  k
  
  (x-yh).
14. The computer implemented method for machine recognition of speech as recited in claim 1, further comprising the step of assigning the maximum likelihood estimation to a phoneme label.
15. The computer implemented method for machine recognition of speech as recited in claim 1, wherein the non-parametric density estimator has the form $f_{n}$
- (x)=1nhd
  
  ∑
  
  
  
  ∈
  
  Zn
  
  k
  
  (x-xih),xε
  
  R^d, where Z_n={1, . . . ,n}, k is some specified function, and {xⁱ;
  
  iε
  
  Z_n} is a set of observations in R^dof some unknown random variable.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Micchelli, Charles A., Olsen, Peder A.
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Nolan, Daniel A.

Application Number

US09/404,995
Time in Patent Office

932 Days
Field of Search

704/251-256, 704/236-240
US Class Current

704/236
CPC Class Codes

G10L 15/063 Training

Penalized maximum likelihood estimation methods, the baum welch algorithm and diagonal balancing of symmetric matrices for the training of acoustic models in speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

25 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Penalized maximum likelihood estimation methods, the baum welch algorithm and diagonal balancing of symmetric matrices for the training of acoustic models in speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

25 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links