Pattern recognition using generalized association rules
First Claim
1. A method for predicting an unknown value of an attribute of interest of a given item from a population of items, each item in the population having a plurality of variable attributes including the attribute of interest, comprising:
- storing in a memory known attribute values regarding a training sample of items within the population including the attribute of interest; and
processing the stored attribute values to determine association rules regarding the training sample, including at least one generalized association rule, each association rule comprising one or more conditions on one or more respective attribute values of the items predictive of the value of the attribute of interest, and the at least one generalized rule comprising a logical combination of a plurality of such conditions using at least one logical operation from a group consisting of disjunction and negation.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for predicting an unknown value of an attribute of interest of a given item from a population of items, each item in the population having a plurality of variable attributes including the attribute of interest. Known attribute values regarding a training sample of items within the population including the attribute of interest are stored in a memory. The stored attribute values are processed to determine association rules regarding the training sample, including at least one generalized association rule, each association rule including one or more conditions on one or more respective attribute values of the items predictive of the value of the attribute of interest, and the at least one generalized rule including a logical combination of a plurality of such conditions using at least one logical operation from the group consisting of disjunction and negation. Data are received from an input device, the data including values of at least some of the attributes of the given item. The association rules including the at least one generalized association rule are applied to the values included in the data so as to predict the unknown value of the attribute of interest of the given item.
-
Citations
108 Claims
-
1. A method for predicting an unknown value of an attribute of interest of a given item from a population of items, each item in the population having a plurality of variable attributes including the attribute of interest, comprising:
-
storing in a memory known attribute values regarding a training sample of items within the population including the attribute of interest; and
processing the stored attribute values to determine association rules regarding the training sample, including at least one generalized association rule, each association rule comprising one or more conditions on one or more respective attribute values of the items predictive of the value of the attribute of interest, and the at least one generalized rule comprising a logical combination of a plurality of such conditions using at least one logical operation from a group consisting of disjunction and negation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
receiving data from an input device, the data including values of at least some of the attributes of the given item; and
applying the association rules including the at least one generalized association rule to the values included in the data so as to predict the unknown value of the attribute of interest of the given item.
-
-
13. A method according to claim 12, wherein applying the association rules comprises applying both the simple and the at least one generalized association rules jointly to predict the unknown value.
-
14. A method according to claim 13, wherein applying the rules jointly comprises computing a weighted sum of values of the attribute of interest predicted by the rules.
-
15. A method according to claim 14, wherein computing the weighted sum comprises computing probabilities respectively associated with the simple and generalized rules, and weighting the predicted values by the respective probabilities.
-
16. A method according to claim 11, wherein finding the association rules comprises finding the at least one generalized association rule by combining a plurality of the simple association rules.
-
17. A method according to claim 16, wherein combining the plurality of the simple association rules comprises combining the rules to find a generalized rule which includes a disjunction of two or more of the simple rules.
-
18. A method according to claim 16, wherein combining the plurality of the simple association rules comprises combining the rules to find a generalized rule which includes a negation of one or more of the simple rules.
-
19. A method according to claim 11, wherein determining the simple association rules comprises determining substantially all simple association rules pertaining to the sample having respective probability and support greater than predetermined minimum values thereof.
-
20. A method according to claim 1, wherein processing the attribute values comprises encoding values of the attributes according to the frequency of their occurrence in the training sample.
-
21. A method according to claim 20, wherein encoding the values comprises calculating hash functions.
-
22. A method according to claim 20, wherein encoding the values comprises assigning a distinguishable code to values occurring at less than a predetermined frequency in the training sample, whereby such values are substantially excluded from the determination the of association rules.
-
23. A method according to claim 1, and comprising:
-
receiving data from an input device, the data including values of at least some of the attributes of the given item; and
applying the association rules including the at least one generalized association rule to the values included in the data so as to predict the unknown value of the attribute of interest of the given item.
-
-
24. A method according to claim 23, wherein applying the association rules comprises applying a subset of the rules consisting of rules whose one or more conditions are fulfilled by known values of attributes of the given item other than the item of interest.
-
25. A method according to claim 23, wherein processing the attribute values comprises finding probabilities corresponding to the determined association rules, and wherein applying the association rules comprises applying the probabilities to compute a cumulative probability that the attribute of interest has a given value.
-
26. A method according to claim 25, wherein computing the cumulative probability comprises computing a weighted sum of the probabilities corresponding respectively to the association rules applied in predicting the value.
-
27. A method according to claim 25, and comprising determining a probability decision point such that when the cumulative probability is greater than the decision point, the attribute of interest is predicted to have a first value, and when the probability of interest is less than the decision point, the attribute of interest is predicted to have a different, second value.
-
28. A method according to claim 27, wherein determining the decision point comprises defining an ambiguity range of probabilities including the decision point in which the predicted value is ambiguous.
-
29. A method according to claim 28, wherein defining the ambiguity range comprises comparing the training sample and at least a portion of the overall population from which the given item is taken, and determining an extent of the ambiguity range responsive to a measure of the similarity of the training sample and the at least portion of the overall population.
-
30. A method according to claim 27, wherein determining the decision point comprises defining a point such that a total number of prediction errors is minimized.
-
31. A method according to claim 27, wherein an error cost is assigned to each of a plurality of types of prediction errors, and wherein determining the decision point comprises defining a point such that a total cost of prediction errors is minimized.
-
32. A method according to claim 23, wherein the items comprise records in a database, and the attributes comprise fields in the records, and wherein applying the association rules comprises predicting the unknown value of a database field.
-
33. A method according to claim 32, wherein predicting the unknown value comprises predicting a Boolean value.
-
34. A method according to claim 23, wherein the items comprise sounds, and the attribute values comprise characteristics of sound signals corresponding to the sounds, and wherein applying the association rules comprises identifying a sound signal.
-
35. A method according to claim 34, wherein identifying the sound signal comprises finding a word corresponding to the signal.
-
36. A method according to claim 34, wherein identifying the sound signal comprises identifying a speaker who generated the sound signal.
-
37. A method according to claim 34, wherein receiving the data comprises receiving data from a microphone.
-
38. A method according to claim 23, wherein the items comprise images, and the attribute values comprise image features, and wherein applying the association rules comprises processing an image.
-
39. A method according to claim 38, wherein processing the image comprises identifying a subject of the image.
-
40. A method according to claim 38, wherein receiving the data comprises receiving data from a camera.
-
41. A method according to claim 38, wherein receiving the data comprises receiving data from a scanner.
-
42. A method according to claim 23, and comprising outputting an indication of the predicted value to an output device.
-
43. A method according to claim 42, wherein outputting the indication comprises displaying the predicted value and a probability thereof.
-
44. A method according to claim 42, wherein outputting the indication comprises controlling an access responsive to the predicted value.
-
45. A method according to claim 42, wherein outputting the indication comprises sorting the given item responsive to the predicted value.
-
46. A method for predicting an unknown value of an attribute of interest of a given item from a population of items, each item in the population having a plurality of variable attributes including the attribute of interest, comprising:
-
storing in a memory known attribute values regarding a training sample of items within the population including the attribute of interest;
processing the attribute values to determine simple association rules regarding the training sample, each simple association rule comprising one or more conditions on one or more respective attribute values of the items predictive of the value of the attribute of interest, such that if the simple association rule includes more than one such condition, the conditions are combined using a logical conjunction operation in defining the conditions of the rule, wherein substantially all simple association rules applicable to the sample having respective probability and support greater than predetermined minimum values thereof are determined. - View Dependent Claims (47, 48)
receiving data from an input device, the data including values of at least some of the attributes of the given item; and
applying the association rules to the values included in the data so as to predict the unknown value of the attribute of interest of the given item.
-
-
49. A method for predicting an unknown value of an attribute of interest of a given item from a population of items, each item in the population having a plurality of variable attributes including the attribute of interest, comprising:
-
storing in a memory known attribute values regarding a training sample of items within the population including the attribute of interest;
processing the attribute values to determine association rules regarding the training sample, each association rule comprising one or more conditions on one or more respective attribute values of the items predictive of the value of the attribute of interest, wherein the attribute values are processed by constructing a contingency table, each of whose entries corresponds to the number of items in the sample having a given value of the attribute of interest and satisfying a given, respective condition on one or more of the attributes other than the attribute of interest, and wherein the association rules are determined with respect to the contingency table. - View Dependent Claims (50, 51, 52, 53, 54)
receiving data from an input device, the data including values of at least some of the attributes of the given item; and
applying the association rules to the values included in the data so as to predict the unknown value of the attribute of interest of the given item.
-
-
55. A system for predicting an unknown value of an attribute of interest of a given item from a population of items, each item in the population having a plurality of variable attributes including the attribute of interest, comprising:
-
an input device, which receives data indicative of values of at least some of the attributes of the given item;
a memory, which stores association rules regarding the population, the association rules including at least one generalized association rule, each association rule comprising one or more conditions on one or more respective attribute values of the items predictive of the value of the attribute of interest, and the at least one generalized rule comprising a logical combination of such conditions using at least one logical operation from a group consisting of disjunction and negation in defining the conditions of the rule; and
a processor, which receives the data from the input device and reads the association rules from the memory, and which applies the association rules including the at least one generalized association rule to the values included in the data so as to predict the unknown value of the attribute of interest and which generates an output responsive to the prediction. - View Dependent Claims (56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80)
-
-
81. A system for determining association rules for prediction of an unknown value of an attribute of interest of a given item from a population of items, each item in the population having a plurality of variable attributes including the attribute of interest, comprising:
-
an input device, which receives data indicative of values of attributes of a training sample of items within the population including the attribute of interest;
a memory, which stores the values of the attributes; and
a computer, which reads the values from the memory and determines association rules regarding the population, the association rules including at least one generalized association rule, each association rule comprising one or more conditions on one or more respective attribute values of the items predictive of the value of the attribute of interest, and the at least one generalized rule comprising a logical combination of such conditions using at least one logical operation from a group consisting of disjunction and negation in defining the conditions of the rule, and which stores the association rules in the memory. - View Dependent Claims (82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101)
-
-
102. A system for determining association rules for prediction of an unknown value of an attribute of interest of a given item from a population of items, each item in the population having a plurality of variable attributes including the attribute of interest, comprising:
-
an input device, which receives data indicative of values of attributes of a training sample of items within the population including the attribute of interest;
a memory, which stores the values of the attributes; and
a computer, which reads the values from the memory and determines simple association rules regarding the population, each simple association rule comprising one or more conditions on one or more respective attribute values of the items predictive of the value of the attribute of interest, such that if the simple association rule includes more than one such condition, the conditions are combined using a logical conjunction operation in defining the conditions of the rule, and which stores the association rules in the memory, wherein substantially all simple association rules applicable to the sample having respective probability and support greater than predetermined minimum values thereof are determined. - View Dependent Claims (103)
-
-
104. A system for determining association rules for prediction of an unknown value of an attribute of interest of a given item from a population of items, each item in the population having a plurality of variable attributes including the attribute of interest, comprising:
-
an input device, which receives data indicative of values of attributes of a training sample of items within the population including the attribute of interest;
a memory, which stores the values of the attributes; and
a computer, which reads the values from the memory and determines association rules regarding the population, each association rule comprising one or more conditions on one or more respective attribute values of the items predictive of the value of the attribute of interest, and which stores the association rules in the memory, wherein the computer determines the association rules by constructing one or more contingency tables, each of whose entries corresponds to the number of items in the sample having a given value of the attribute of interest and satisfying a given, respective condition on one or more of the attributes other than the attribute of interest. - View Dependent Claims (105, 106, 107, 108)
-
Specification