Speech recognition with mixtures of bayesian networks
First Claim
1. A speech recognition network for inferring parts of speech from acoustic observations having n elements, with a common hidden variable having plural discrete states, comprising:
- a plurality of mixtures of Bayesian networks (MBNs), each of said MBNs encoding the probabilities of observing the sets of acoustic observations given the utterance of a respective one of said parts of speech;
each of said MBNs comprising;
a plurality of hypothesis-specific Bayesian networks (HSBNs), each of said HSBNs encoding the probabilities of observing the sets of acoustic observations given the utterance of a respective one of said parts of speech and given the hidden common variable being in a respective one of its states;
a combiner which combines outputs of said HSBNs to produce an MBN output of said MBN;
wherein each one of said HSBNs comprises;
plural nodes, each of said nodes corresponding to one of said n elements of the acoustic observations, at least some of said plural nodes having dependencies with others of said plural nodes within the one HSBN, a combiner connected to outputs of said nodes, said nodes receiving at their inputs the state of a respective one of the n elements of a current one of the acoustic observations.
2 Assignments
0 Petitions
Accused Products
Abstract
The invention performs speech recognition using an array of mixtures of Bayesian networks. A mixture of Bayesian networks (MBN) consists of plural hypothesis-specific Bayesian networks (HSBNs) having possibly hidden and observed variables. A common external hidden variable is associated with the MBN, but is not included in any of the HSBNs. The number of HSBNs in the MBN corresponds to the number of states of the common external hidden variable, and each HSBN models the world under the hypothesis that the common external hidden variable is in a corresponding one of those states. In accordance with the invention, the MBNs encode the probabilities of observing the sets of acoustic observations given the utterance of a respective one of said parts of speech. Each of the HSBNs encodes the probabilities of observing the sets of acoustic observations given the utterance of a respective one of the parts of speech and given a hidden common variable being in a particular state. Each HSBN has nodes corresponding to the elements of the acoustic observations. These nodes store probability parameters corresponding to the probabilities with causal links representing dependencies between ones of said nodes.
-
Citations
54 Claims
-
1. A speech recognition network for inferring parts of speech from acoustic observations having n elements, with a common hidden variable having plural discrete states, comprising:
-
a plurality of mixtures of Bayesian networks (MBNs), each of said MBNs encoding the probabilities of observing the sets of acoustic observations given the utterance of a respective one of said parts of speech;
each of said MBNs comprising;
a plurality of hypothesis-specific Bayesian networks (HSBNs), each of said HSBNs encoding the probabilities of observing the sets of acoustic observations given the utterance of a respective one of said parts of speech and given the hidden common variable being in a respective one of its states;
a combiner which combines outputs of said HSBNs to produce an MBN output of said MBN;
wherein each one of said HSBNs comprises;
plural nodes, each of said nodes corresponding to one of said n elements of the acoustic observations, at least some of said plural nodes having dependencies with others of said plural nodes within the one HSBN, a combiner connected to outputs of said nodes, said nodes receiving at their inputs the state of a respective one of the n elements of a current one of the acoustic observations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
there are 33 Cepstrum parameters, 6000 senones and said common hidden variable has 20 states.
-
-
8. The mixture of Bayesian networks of claim 1 wherein said common hidden variable is external in that it is not represented by any one of the nodes in said mixture of Bayesian networks.
-
9. The mixture of Bayesian networks of claim 1 wherein each HSBN is associated with an HSBN score, wherein;
-
each of said HSBNs further comprises an inference input defining observed data corresponding to said acoustic observations and an inference output corresponding to the likelihood of an acoustic observation given the utterance of a corresponding part of speech and given said common hidden variable being in one of its states corresponding to said HSBN; and
said mixture of Bayesian networks further comprises a weight multiplier which weights the inference output of each HSBN by a corresponding HSBN score and combines the weighted HSBN inference outputs into a single inference output of said mixture of Bayesian networks.
-
-
10. The mixture of Bayesian networks of claim 9 wherein said HSBN score corresponds to the likelihood of said common external hidden variable being in the corresponding one of the states of said common hidden variable.
-
11. The mixture of Bayesian networks of claim 10 wherein said HSBN score reflects the goodness of the corresponding HSBN at predicting observed data representing states of said observed variables.
-
12. The mixture of Bayesian networks of claim 11 wherein said HSBN score is computed by said mixture of Bayesian networks.
-
13. The mixture of Bayesian networks of claim 1 wherein the number of said HSBNs in said mixture of Bayesian network is selected to optimize the goodness of said mixture of Bayesian network at predicting observed data representing states of said observed variables.
-
14. The mixture of Bayesian networks of claim 1 wherein the nodes of different ones of said HSBNs represent the same set of hidden and observed variables.
-
15. The mixture of Bayesian networks of claim 14 wherein said probability parameters in different ones of said HSBNs differ to reflect different states of said common external hidden variable represented by the different ones of said HSBNs.
-
16. The mixture of Bayesian networks of claim 15 wherein the causal links in different ones of said HSBNs differ in reflecting different states of said common external hidden variable represented by the different ones of said HSBNs.
-
17. A method using a set of observed data for training a speech recognition network including a plurality of mixtures of Bayesian networks (MBNs), each of said MBNs encoding the probabilities of observing the sets of acoustic observations given the utterance of a respective one of said parts of speech and having a plurality of hypothesis-specific Bayesian networks (HSBNs), each of said HSBNs encoding the probabilities of observing the sets of acoustic observations given the utterance of a respective one of said parts of speech and given a hidden common variable being in a respective one of its states, each one of said HSBNs having plural nodes corresponding to the plural elements of the acoustic observations and storing probability parameters corresponding to said probabilities with causal links representing dependencies between ones of said nodes, said method of training comprising:
-
for each one of said HSBNs conducting a parameter search for a set of changes in said probability parameters which improves the goodness of said one HSBN in predicting said observed data, and modifying the probability parameters of said one HSBN accordingly;
for each one of said HSBNs, computing a structure score of said one HSBN reflecting the goodness of said one HSBN in predicting said observed data, conducting a structure search for a change in said causal links which improves said structure search score, and modifying the causal links of said one HSBN accordingly. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54)
computing from said observed data expected complete model sufficient statistics (ECMSS);
computing from said ECMSS sufficient statistics for said one HSBN;
computing said structure score from said sufficient statistics.
-
-
19. The method of claim 18 wherein the step of computing said ECMSS comprises:
-
computing the probability of each combination of the states of the discrete hidden and observed variables;
forming a vector for each observed case in said set of observed data, each entry in said vector corresponding to a particular one of the combinations of the states of said discrete variables; and
summing the vectors over plural cases of said observed data.
-
-
20. The method of claim 19 wherein the step of forming a vector is such that each entry in said vector is formed to have plural sub-entries comprising:
-
(a) the probability of the one combination of the states of the discrete variables, (b) sub-entry vectors representing the states of the continuous variables.
-
-
21. The method of claim 20 wherein each sub-entry is formed such that said sub-entry vector has a vector multiplier corresponding to the probability of the one combination of the states of the discrete variables.
-
22. The method of claim 21 wherein the step of computing sufficient statistics from said ECMSS comprises computing from said ECMSS the following:
-
(a) mean, (b) scatter, (c) sample size.
-
-
23. The method of claim 20 wherein the probability of the one combination of the states of the discrete variables is computed by inference in said mixture of Bayesian networks.
-
24. The method of claim 17 wherein the steps of conducting a parameter search and modifying said probability parameters are repeated consecutively until a parameter search convergence criteria is met.
-
25. The method of claim 23 further comprising:
repeating the steps of conducting a parameter search, computing the structure score and conducting a structure search until a structure search convergence criteria is met.
-
26. The method of claim 25 wherein said structure search convergence criteria comprises a determination of whether the structure score has worsened since a prior repetition of said structure search step.
-
27. The method of claim 25 wherein said structure search criteria comprises a determination of whether a current performance of the structure search has changed any of said causal links in the one HSBN.
-
28. The method of claim 17 further comprising:
repeating the steps of conducting a parameter search, computing the structure score and conducting a structure search until a structure search convergence criteria is met.
-
29. The method of claim 28 wherein said parameter search convergence criteria is a determination of whether the parameter search has converged at a local optimum.
-
30. The method of claim 28 wherein said parameter search convergence criteria is a determination of whether the parameter search has been repeated a certain number of times.
-
31. The method of claim 30 wherein said certain number of times is a set number.
-
32. The method of claim 30 wherein said certain number of times is a function of the number of times the structure search has been repeated.
-
33. The method of claim 30 wherein said parameter search convergence criteria limits the repetition of said parameter search to a limited number of repetitions and wherein said parameter search is repeated after convergence of said structure search.
-
34. The method of claim 17 wherein the step of conducting a structure search comprises:
-
attempting different modifications to said causal links at each node of said one HSBN;
for each one of said different modifications, computing the structure score of the one HSBN;
saving those modifications providing improvements to said structure score.
-
-
35. The method of claim 17 further comprising computing a combined score of said mixture of Bayesian networks from the structure scores of the individual HSBNs.
-
36. The method of claim 35 further comprising associating said mixture of Bayesian networks with said combined score.
-
37. The method of claim 36 further comprising choosing a different number of states of said discrete hidden and observed variables and repeating said parameter and structure search steps, to generate a different mixtures of Bayesian networks and scores thereof for different numbers of states of said discrete variables.
-
38. The method of claim 37 further comprising choosing the mixture of Bayesian networks having the highest score.
-
39. The method of claim 37 further comprising weighting inference outputs of the different mixtures of Bayesian networks in accordance with their individual scores.
-
40. The method of claim 17 wherein said parameter search is repeated whenever a performance of said structure search results in a change in the structure of said causal links.
-
41. The method of claim 40 wherein the parameter search is repeated a limited number of times while the structure search is always carried out to convergence.
-
42. The method of claim 40 wherein the parameter search is repeated to convergence and thereafter the structure search is repeated to convergence.
-
43. The method of claim 40 wherein the parameter search is repeated by a number of times which is a function of the number of times the structure search as been repeated.
-
44. The method of claim 40 wherein said parameter search is repeated a fixed number of times and said structure search is repeated a fixed number of times.
-
45. The method of claim 40 wherein the parameter search is repeated to convergence while the structure search is repeated a limited number of times.
-
46. The method of claim 40 wherein said parameter search is repeated a number of times which is a function of the number of structure searches performed thus fare while the structure search is repeated a fixed number of times.
-
47. The method of claim 17 further comprising repeating the steps of performing said parameter search and said structure search and interleaving repetitions of said parameter search and said structure search.
-
48. The method of claim 17 wherein the step of initializing said HSBNs comprises, for each HSBN:
-
defining a causal link from each hidden variable node to each continuous observed variable node;
initialize the probability parameters in each node.
-
-
49. The method of claim 48 wherein the step of initializing the probability parameters employs the same initial probability parameters from node to node.
-
50. The method of claim 48 wherein the step of defining a causal link further comprises defining a causal link from each discrete hidden variable node to each discrete observed variable node.
-
51. The method of claim 17 wherein the step of performing the parameter search comprises searching for a change in the probability parameters in each node which improves the performance of said one HSBN in predicting said observed data.
-
52. The method of claim 17 wherein one of said hidden variables is a common external discrete hidden variable not represented by any node in said mixture of Bayesian networks, and wherein the number of HSBNs in said mixture of Bayesian networks is equal to the number of states of said common external discrete hidden variable.
-
53. The method of claim 17 further comprising, for each HSBN, determining an optimum number m of HSBNs in the MBN, whereby m is different for each MBN.
-
54. A computer-readable medium storing computer-readable instructions for carrying out the steps of claim 17.
Specification