Penalized maximum likelihood estimation methods, the baum welch algorithm and diagonal balancing of symmetric matrices for the training of acoustic models in speech recognition
First Claim
Patent Images
1. A computer implemented method for machine recognition of speech, comprising the steps of:
- inputting acoustic data;
forming a nonparametric density estimator
is some specified positive kernel function,
are parameters to be chosen, and {xi}iε
Zn is a given set of training data;
setting a kernel for the estimator;
selecting a statistical criterion to be optimized to find values for parameters defining the nonparametric density estimator; and
iteratively computing the density estimator for finding a maximum likelihood estimation of acoustic data.
1 Assignment
0 Petitions
Accused Products
Abstract
A nonparametric family of density functions formed by histogram estimators for modeling acoustic vectors are used in automatic recognition of speech. A Gaussian kernel is set forth in the density estimator. When the densities are found for all the basic sounds in a training stage, an acoustic vector is assigned to a phoneme label corresponding to the highest likelihood for the basis of the decoding of acoustic vectors into text.
25 Citations
15 Claims
-
1. A computer implemented method for machine recognition of speech, comprising the steps of:
-
inputting acoustic data;
forming a nonparametric density estimator
is some specified positive kernel function,
are parameters to be chosen, and {xi}iε
Zn is a given set of training data;
setting a kernel for the estimator;
selecting a statistical criterion to be optimized to find values for parameters defining the nonparametric density estimator; and
iteratively computing the density estimator for finding a maximum likelihood estimation of acoustic data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
is an initial choice for the parameters, ĉ
is the updated parameter choice, and K={k(xi,xj)}i,jε
Zn , A=cK, where c is chosen such that Ae=e for e=(1, 1, . . . ,1).
-
-
7. The computer implemented method for machine recognition of speech as recited in claim 2, wherein the penalized maximum likelihood criteria is selected.
-
8. The computer implemented method for machine recognition of speech as recited in claim 7, wherein the step of iteratively computing employs a process of diagonal balancing of matrices to maximize the penalized likelihood.
-
9. The computer implemented method for machine recognition of speech as recited in claim 8, wherein the step of iteratively computing the density estimator uses an update of parameters given as a unique vector vε
- intSn(b) satisfying v·
Kv=K((Kc)−
1)·
c−
γ
v·
b, where bi=∫
Rdk(x,xi)dx, b=(b1, b2, . . . ,bn), Sn(b)={c;
cε
Rd, bTc=1} and γ
=n−
vTKv.
- intSn(b) satisfying v·
-
10. The computer implemented method for machine recognition of speech as recited in claim 9, wherein the update parameter is given as
-
K ( ( Kc ) - 1 - σ e ) n - σ c T Kc , where σ
>
0 is a parameter chosen to yield a best possible performance.
-
-
11. The computer implemented method for machine recognition of speech as recited in claim 1, wherein the kernel is a Gaussian kernel.
-
12. The computer implemented method for machine recognition of speech as recited in claim 1, wherein the kernel is given by the formula
-
( x , y ) = 1 ( 1 + x - y 2 ) 2 , x,yε
Rd, where k(x,y), x,yε
Rd is a reproducing kernel for a Hilbert space of functions on Rd.
-
-
13. The computer implemented method for machine recognition of speech as recited in claim 1, wherein
-
and k ( x , y ) = 1 h k ( x - y h ) .
-
-
14. The computer implemented method for machine recognition of speech as recited in claim 1, further comprising the step of assigning the maximum likelihood estimation to a phoneme label.
-
15. The computer implemented method for machine recognition of speech as recited in claim 1, wherein the non-parametric density estimator has the form
-
( x ) = 1 nh d ∑ ∈ Z n k ( x - x i h ) , xε
Rd, where Zn={1, . . . ,n}, k is some specified function, and {xi;
iε
Zn} is a set of observations in Rd of some unknown random variable.
-
Specification