Mixtures of Bayesian networks

US 6,807,537 B1
Filed: 12/04/1997
Issued: 10/19/2004
Est. Priority Date: 12/04/1997
Status: Expired due to Fees

First Claim

Patent Images

1. In a decision support system that receives a set of observed data as an input on a signal-bearing medium, an apparatus for use in applying said set of observed data for improving the structure and parameters of a mixture of Bayesian networks comprising plural hypothesis-specific Bayesian networks (HSBNS) having nodes corresponding to hidden and observed variables, each of said nodes storing a set of parameters and structure representing dependence relationships among said nodes, comprising:

a processor;

a memory having executable instructions stored therein, wherein said processor, in response to said executable instructions;

chooses a number of said HSBNs;

chooses a number of states of said discrete variables, and initializes said HSBNs;

for each one of said HSBNs conducts a parameter search for a set of changes in said parameters which improves the goodness of said one HSBN in predicting said observed data, and modifies the parameters of said one HSBN accordingly;

for each one of said HSBNs, computes a structure score of said one HSBN reflecting the goodness of said one HSBN in predicting said observed data, conducts a structure search for a change in said structure which improves said structure search score, and modifies the structure of said one HSBN accordingly.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

One aspect of the invention is the construction of mixtures of Bayesian networks. Another aspect of the invention is the use of such mixtures of Bayesian networks to perform inferencing. A mixture of Bayesian networks (MBN) consists of plural hypothesis-specific Bayesian networks (HSBNs) having possibly hidden and observed variables. A common external hidden variable is associated with the MBN, but is not included in any of the HSBNs. The number of HSBNs in the MBN corresponds to the number of states of the common external hidden variable, and each HSBN is based upon the hypothesis that the common external hidden variable is in a corresponding one of those states. In one mode of the invention, the MBN having the highest MBN score is selected for use in performing inferencing. In another mode of the invention, some or all of the MBNs are retained as a collection of MBNs which perform inferencing in parallel, their outputs being weighted in accordance with the corresponding MBN scores and the MBN collection output being the weighted sum of all the MBN outputs. In one application of the invention, collaborative filtering may be performed by defining the observed variables to be choices made among a sample of users and the hidden variables to be the preferences of those users.

Citations

82 Claims

1. In a decision support system that receives a set of observed data as an input on a signal-bearing medium, an apparatus for use in applying said set of observed data for improving the structure and parameters of a mixture of Bayesian networks comprising plural hypothesis-specific Bayesian networks (HSBNS) having nodes corresponding to hidden and observed variables, each of said nodes storing a set of parameters and structure representing dependence relationships among said nodes, comprising:
- a processor;
  
  a memory having executable instructions stored therein, wherein said processor, in response to said executable instructions;
  
  chooses a number of said HSBNs;
  
  chooses a number of states of said discrete variables, and initializes said HSBNs;
  
  for each one of said HSBNs conducts a parameter search for a set of changes in said parameters which improves the goodness of said one HSBN in predicting said observed data, and modifies the parameters of said one HSBN accordingly;
  
  for each one of said HSBNs, computes a structure score of said one HSBN reflecting the goodness of said one HSBN in predicting said observed data, conducts a structure search for a change in said structure which improves said structure search score, and modifies the structure of said one HSBN accordingly.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
- - 2. The apparatus of claim 1 wherein said processor computes a structure score of said one HSBN, in that the processor:
3. The apparatus of claim 2 wherein said processor computes said ECMSS, in that the processor;
- computes the probability of each combination of the states of the discrete hidden and observed variables;
  
  forms a vector for each observed case in said set of observed data, each entry in said vector corresponding to a particular one of the combinations of the states of said discrete variables; and
  
  sums the vectors over plural cases of said observed data.
4. The apparatus of claim 3 wherein said processor forms a vector such that each entry in said vector is formed to have plural sub entries comprising:
- (a) the probability of the one combination of the states of the discrete variables, (b) sub-entry vectors representing the states of the continuous variables.
5. The apparatus of claim 4 wherein the processor computes the probability of the one combination of the states of the discrete variables by inference in said mixture of Bayesian networks.
6. The apparatus of claim 4 wherein the processor forms each sub-entry such that said sub entry vector has a vector multiplier corresponding to the probability of the one combination of the states of the discrete variables.
7. The apparatus of claim 6 wherein the processor computes sufficient statistics from said ECMSS in that the processor computes:
- (a) mean, (b) scatter, (c) sample size.
8. The apparatus of claim 1 wherein the processor conducts a parameter search and modifies said parameters repeatedly and consecutively until a parameter search convergence criteria is met.
9. The apparatus of claim 1, in that the processor repeatedly conducts a parameter search, computes the structure score and conducts a structure search until a structure search convergence criteria is met.
10. The apparatus of claim 8 in that the processor repeatedly conducts a parameter search, computes the structure score and conducts a structure search until a structure search convergence criteria is met.
11. The apparatus of claim 9 wherein the processor determines whether the parameter search has converged to a local optimum by said parameter search convergence criteria.
12. The apparatus of claim 9 wherein said processor determines whether the parameter search has been repeated a certain number of times by said parameter search convergence criteria.
13. The apparatus of claim 12 wherein said certain number of times is a set number.
14. The apparatus of claim 12 wherein said certain number of times is a function of the number of times the structure search has been repeated.
15. The apparatus of claim 12 wherein said parameter search convergence criteria limits the repetition of said parameter search to a limited number of repetitions and wherein said parameter search is repeated after convergence of said structure search.
16. The apparatus of claim 10 wherein said structure search convergence criteria comprises a determination of whether the structure score has worsened since a prior repetition of said structure search step.
17. The apparatus of claim 10 wherein said structure search criteria comprises a determination of whether a current performance of the structure search has changed any of said structure in the one HSBN.
18. The apparatus of claim 1 wherein the processor conducts a structure search, in that the processor:
- attempts different modifications to said structure at each node of said one HSBN;
  
  computes the structure score of the one HSBN for each one of said different modifications; and
  
  saves those modifications providing improvements to said structure score.
19. The apparatus of claim 1 further comprising instructions, in response to which the processor computes a combined score of said mixture of Bayesian networks from the structure scores of the individual HSBNs.
20. The apparatus of claim 19 further comprising instructions in response to which the processor associates said mixture of Bayesian networks with said combined score.
21. The apparatus of claim 20 further comprising instructions in response to which the processor chooses a different number of states of said discrete hidden and observed variables, repeats said parameter and structure search steps, to generate a different mixture of Bayesian networks and scores thereof for different numbers of states of said discrete variables.
22. The apparatus of claim 21 further comprising instructions in response to which the processor chooses one of the mixtures of Bayesian networks having the highest score.
23. The apparatus of claim 21 further comprising instructions in response to which the processor weights inference outputs of the different mixtures of Bayesian networks in accordance with their individual scores.
24. The apparatus of claim 1 wherein the processor conducts said parameter search in that said parameter search is repeated whenever a performance of said structure search results in a change in the structure of said HSBN.
25. The apparatus of claim 24 wherein the processor conducts said parameter search in that said parameter search is repeated a limited number of times while the structure search is always carried out to convergence.
26. The apparatus of claim 24 wherein the processor conducts said parameter search in that said parameter search is repeated to convergence and thereafter the structure search is repeated to convergence.
27. The apparatus of claim 24 wherein the processor conducts said parameter search in that said parameter search is repeated by a number of times which is a function of the number of times the structure search has been repeated.
28. The apparatus of claim 24 wherein the processor conducts said parameter search in that said parameter search is repeated a fixed number of times and said structure search is repeated a fixed number of times.
29. The apparatus of claim 24 wherein said processor conducts said parameter search in that said parameter search is repeated to convergence while the structure search is repeated a limited number of times.
30. The apparatus of claim 24 wherein said processor conducts said parameter search in that said parameter search is repeated a number of times which is a function of the number of structure searches performed thus far, while the structure search is repeated a fixed number of times.
31. The apparatus of claim 1 further comprising instructions in response to which the processor repeats the steps of performing said parameter search and said structure search and interleaves repetitions of said parameter search and said structure search.
32. The apparatus of claim 1 wherein said processor initializes said HSBNs, in that for each HSBN the processor:
- defines a structural link from each discrete hidden variable node to each observed variable node and from each continuous hidden variable node to each continuous observed variable node; and
  
  initializes the parameters in each node.
33. The apparatus of claim 32 wherein the processor initializes the parameters in that said processor employs the same initial parameters from node to node.
34. The apparatus of claim 32 wherein said processor initializes said parameters, in that the processor:
- removes hidden nodes and adjacent arcs from the HSBN;
  
  determines the maximum a postiori (MAP) configuration data;
  
  creates a conjugate distribution for said MAP parameters;
  
  for each observed node in said HSBN and for each MAP configuration of the observed node'"'"'s parents, initializing the parameters of the local distribution family of said observed node from said conjugate distribution;
  
  for each hidden discrete node in said HSBN and for each MAP configuration of said hidden discrete node'"'"'s parents, if any, initialize the parameters of the local distribution family of said hidden discrete node to be a fixed distribution.
35. The apparatus of claim 34 wherein said HSBN contains no hidden continuous variables.
36. The apparatus of claim 32 wherein the processor initializes said parameters in that the processor initializes said parameters randomly.
37. The apparatus of claim 36 wherein said HSBN contains at least one hidden continuous variable.
38. The apparatus of claim 36 wherein the processor initializes said parameters randomly in that the processor:
- (a) sets the parameters of said HSBN to be equal; and
  
  (b) draws the parameters from a Dirichlet distribution.
39. The apparatus of claim 1 wherein the processor performs a parameter step in that the processor searches for a change in the parameters in each node which improves the performance of said one HSBN in predicting said observed data.
40. The apparatus of claim 1 wherein one of said hidden variables is a common external discrete hidden variable not represented by any node in said mixture of Bayesian networks, and wherein the number of HSBNs in said mixture of Bayesian networks is equal to the number of states of said common external discrete hidden variable.

41. In a decision support system that receives a set of observed data as an input on a signal-bearing medium, an apparatus for finding the likeliest number of states of hidden discrete variables in a mixture of Bayesian networks comprising plural hypothesis-specific Bayesian networks (HSBNs) having nodes corresponding to hidden and observed variables, each of said nodes storing a structure and a set of parameters representing causal relationships among said nodes, said apparatus comprising:
- a processor;
  
  a memory having executable instructions stored therein; and
  
  wherein said processor, in response to said executable instructions;
  
  chooses successive numbers of states of said discrete hidden and observed variables, and for each one of said successive numbers of states;
  
  initializes said HSBNs;
  
  conducts, for each one of said HSBNs, a parameter search for a set of changes in said parameters which improves the goodness of said one HSBN in predicting said observed data, and modifies the parameters of said one HSBN accordingly;
  
  computes, for each one of said HSBNs, a structure score of said one HSBN reflecting the goodness of said one HSBN in predicting said observed data, conducts a structure search for a change in said structure which improves said structure search score, and modifies the structure of said one HSBN accordingly;
  
  computes a combined score of the mixture of Bayesian networks corresponding to the current number of states of said discrete variables; and
  
  chooses the mixture of Bayesian networks having the best score.
- View Dependent Claims (42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73)
- - 42. The apparatus of claim 41 wherein one of said hidden variable is a common external discrete hidden variable not represented by any node in said mixture of Bayesian networks, and wherein the number of HSBNs in said mixture of Bayesian networks is equal to the number of states of said common external discrete hidden variable, whereby each mixture of Bayesian networks corresponds to an assumption of a certain number of states of said common external discrete hidden variable.
  - 43. The apparatus of claim 41 wherein the processor computes a structure score of said one HSBN in that the processor:
44. The apparatus of claim 41 wherein the processor computes said ECMSS in that the processor:
- computes the probability of each combination of the states of the discrete hidden and observed variables;
  
  forms a vector for each observed case in said set of observed data, each entry in said vector corresponding to a particular one of the combinations of the states of said discrete variables; and
  
  sums the vectors over plural cases of said observed data.
45. The apparatus of claim 44 wherein the processor forms a vector such that each entry in said vector is formed to have plural sub entries comprising:
- (a) the probability of the one combination of the states of the discrete variables;
  
  (b) sub-entry vectors representing the states of the continuous variables.
46. The apparatus of claim 45 wherein the probability of the one combination of the states of the discrete variables is computed by the inference in said mixture of Bayesian networks.
47. The apparatus of claim 45 wherein each sub-entry is formed such that said sub-entry vector has a vector multiplier corresponding to the probability of the one combination of the states of the discrete variables.
48. The apparatus of claim 43 wherein the processor computes sufficient statistics from said ECMSS in that the processor computes:
- (d) mean, (e) scatter, (f) sample size.
49. The apparatus of claim 41 wherein the processor conducts a parameter search and modifies said parameters repeatedly and consecutively until a parameter search convergence criteria is met.
50. The apparatus of claim 41, in that the processor repeatedly conducts a parameter search, computes the structure score and conducts a structure search until a structure search convergence criteria is met.
51. The apparatus of claim 45 in that the processor repeatedly conducts a parameter search, computes the structure score and conducts a structure search until a structure search convergence criteria is met.
52. The apparatus of claim 50 wherein the processor determines whether the parameter search has converged to a local optimum by said parameter search convergence criteria.
53. The apparatus of claim 50 wherein said processor determines whether the parameter search has been repeated a certain number of times by said parameter search convergence criteria.
54. The apparatus of claim 53 wherein said certain number of times is a set number.
55. The apparatus of claim 53 wherein said certain number of times is a function of the number of times the structure search has been repeated.
56. The apparatus of claim 53 wherein said parameter search convergence criteria limits the repetition of said parameter search to a limited number of repetitions and wherein said parameter search is repeated after convergence of said structure search.
57. The apparatus of claim 51 wherein said structure search convergence criteria comprises a determination of whether the structure score has worsened since a prior repetition of said structure search step.
58. The apparatus of claim 51 wherein said structure search criteria comprises a determination of whether a current performance of the structure search has changed any of said structure in the one HSBN.
59. The apparatus of claim 41 wherein the processor conducts a structure search, in that the processorattempts different modifications to said structure at each node of said one HSBN;
- computes the structure score of the one HSBN for each one of said different modifications; and
  
  saves those modifications providing improvements to said structure score.
60. The apparatus of claim 41 further comprising instructions, in response to which the processor computes a combined score of said mixture of Bayesian networks from the structure scores of the individual HSBNs.
61. The apparatus of claim 60 further comprising instructions in response to which the processor associates said mixture of Bayesian networks with said combined score.
62. The apparatus of claim 60 further comprising instructions in response to which the processor chooses a different number of states of said discrete hidden and observed variables, repeat said parameter and structure search steps, to generate a different mixture of Bayesian networks and scores thereof for different numbers of states of said discrete variables.
63. The apparatus of claim 62 further comprising instructions in response to which the processor chooses one of the mixtures of Bayesian networks having the highest score.
64. The apparatus of claim 62 further comprising instructions in response to which the processor weights inference outputs of the different mixtures of Bayesian networks in accordance with their individual scores.
65. The apparatus of claim 41 wherein the processor conducts said parameter search in that said parameter search is repeated whenever a performance of said structure search results in a change in the structure of said HSBN.
66. The apparatus of claim 65 wherein the processor conducts said parameter search in that said parameter search is repeated a limited number of times while the structure search is always carried out to convergence.
67. The apparatus of claim 65 wherein the processor conducts said parameter search in that said parameter search is repeated to convergence and thereafter the structure search is repeated to convergence.
68. The apparatus of claim 65 wherein the processor conducts said parameter search in that said parameter search is repeated by a number of times which is a function of the number of times the structure search has been repeated.
69. The apparatus of claim 65 wherein the processor conducts said parameter search in that said parameter search is repeated a fixed number of times and said structure search is repeated a fixed number of times.
70. The apparatus of claim 65 wherein said processor conducts said parameter search in that said parameter search is repeated to convergence while the structure search is repeated a limited number of times.
71. The apparatus of claim 65 wherein said processor conducts said parameter search in that said parameter search is repeated a number of times which is a function of the number of structure searches performed thus far, while the structure search is repeated a fixed number of times.
72. The apparatus of claim 41 further comprising instructions in response to which the processor repeats the steps of performing said parameter search and said structure search and interleaves repetitions of said parameter search and said structure search.
73. The apparatus of claim 41 wherein said processor performs a parameter search in that the processor searches for a change in the parameters in each node which improves the performance of said one HSBN in predicting said observed data.

74. In a decision support system that receives a set of observed data as an input on a signal-bearing medium, an apparatus for converting said set of observed data into complete statistics for training a mixture of Bayesian networks comprising plural hypothesis-specific Bayesian networks (HSBNs) having nodes corresponding to hidden and observed variables, each of said nodes storing a structure and a set of parameters representing causal relationships among said nodes, said apparatus comprising:
- a processor;
  
  a memory having executable instructions stored therein; and
  
  wherein said processor, in response to instructions stored in memory;
  
  chooses a number of states of said discrete hidden and observed variables, and initializing said HSBNs;
  
  conducts a parameter search for each one of said HSBNs for a set of changes in said parameters which improves the goodness of said one HSBN in predicting said observed data, and modifying the parameters of said one HSBN accordingly;
  
  computes a structure score for each one of said HSBNs reflecting the goodness of said one HSBN in predicting said observed data;
  
  conducts a structure search for a change in said structure which improves said structure search score, and modifies the structure of said one HSBN accordingly;
  
  wherein the processor computes a structure score of said one HSBN in that the processor computes from said observed data expected complete model sufficient statistics (ECMSS), computes from said ECMSS sufficient statistics for said one HSBN, computes said structure score from said sufficient statistics;
  
  wherein said processor computes said ECMSS, in that the processor computes the probability of each combination of the states of the discrete hidden and observed variables;
  
  forms a vector for each observed case in said set of observed data, each entry in said vector corresponding to a particular one of the combinations of the states of said discrete variables; and
  
  sums the vectors over plural cases of said observed data whereby to render a complete set of information.
- View Dependent Claims (75, 76, 77, 78)
- - 75. The apparatus of claim 74 wherein the processor forms a vector such that each entry in said vector is formed to have plural sub entries comprising:
76. The apparatus of claim 75 wherein the probability of the one combination of the states of the discrete variables is computed by the inference in said mixture of Bayesian networks.
77. The apparatus of claim 76 wherein each sub-entry is formed such that said sub-entry vector has a vector multiplier corresponding to the probability of the one combination of the states of the discrete variables.
78. The apparatus of claim 77 wherein the processor computes sufficient statistics from said ECMSS in that the processor computes(c) mean, (d) scatter, (e) sample size.

79. In a decision support system that receives a set of observed data as an input on a signal-bearing medium, a method for applying the set of observed data for improving the structure and parameters of a mixture of Bayesian networks comprising plural hypothesis-specific Bayesian networks (HSBNs) having nodes corresponding to hidden and observed variables, each of said nodes storing a set of parameters and structure representing dependence relationships among said nodes, said method comprising:
- storing the mixture of Bayesian networks in a memory;
  
  receiving the set of observed data as an input on the signal-bearing medium;
  
  choosing a number of said HSBNs;
  
  choosing a number of states of said discrete variables, and initializing said HSBNs;
  
  for each one of said HSBNs conducting a parameter search for a set of changes in said parameters which improves the goodness of said one HSBN in predicting said observed data, and modifying the parameters of said one HSBN accordingly;
  
  for each one of said HSBNs, computing a structure score of said one HSBN reflecting the goodness of said one HSBN in predicting said observed data, conducting a structure search for a change in said structure which improves said structure search score, and modifying the structure of said one HSBN accordingly.
- View Dependent Claims (80)
- - 80. A computer-readable medium storing computer-executable instructions for carrying out the steps of claim 79.

81. In a decision support system that receives a set of observed data as an input on a signal-bearing medium, a method for finding the likeliest number of states of hidden discrete variables in a mixture of Bayesian networks comprising plural hypothesis-specific Bayesian networks (HSBNs) having nodes corresponding to hidden and observed variables, each of said nodes storing a structure and a set of parameters representing causal relationships among said nodes, said method comprising:
- storing the mixture of Bayesian networks in a memory;
  
  receiving the set of observed data as an input on the signal-bearing medium;
  
  choosing successive numbers of states of said discrete hidden and observed variables, and for each one of said successive numbers of states;
  
  initializing said HSBNs;
  
  conducting, for each one of said HSBNs, a parameter search for a set of changes in said parameters which improves the goodness of said one HSBN in predicting said observed data, and modifying the parameters of said one HSBN accordingly;
  
  computing, for each one of said HSBNs, a structure score of said one HSBN reflecting the goodness of said one HSBN in predicting said observed data, conducting a structure for a change in said structure which improves said structure search score, and modifying the structure of said one HSBN accordingly;
  
  computing a combined score of the mixture of Bayesian networks corresponding to the current number of states of said discrete variables; and
  
  choosing the mixture of Bayesian networks having the best score.
- View Dependent Claims (82)
- - 82. A computer-readable medium storing computer-executable instructions for carrying out the steps of claim 81.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Chickering, David Maxwell, Heckerman, David Earl, Thiesson, Bo, Meek, Christopher A.
Primary Examiner(s)
Powell, Mark R.
Assistant Examiner(s)
STARKS, WILBERT L

Application Number

US08/985,114
Time in Patent Office

2,511 Days
Field of Search

706/52, 706/45, 706/46, 706/47
US Class Current

706/52
CPC Class Codes

G06F 18/29   Graphical models, e.g. Baye...

G06N 5/025   Extracting rules from data

Y10S 707/99945   Object-oriented database st...

Y10S 707/99948   Application of database or ...

Mixtures of Bayesian networks

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

82 Claims

Specification

Solutions

Use Cases

Quick Links

Mixtures of Bayesian networks

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

82 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links