Mining of generalized disjunctive association rules
First Claim
1. A method for mining data, wherein said method generates generalized disjunctive association rules to capture local relationships between data items with reference to a given context comprising any arbitrary subset of a set of transactions in order to provide improved data analysis independently of taxonomies, said method comprising:
- generating a list of all possible data items that can influence said context, discovering disjunctive association rules for data items in said list that co-occur based on a defined overlap threshold within said context, using a cutoff parameter to eliminate trivial data items from said list, wherein said trivial data items create trivial disjunctive association rules having disjunctive antecedents or consequents greater than disjunctive antecedents or consequents of said disjunctive association rules occurring in said defined overlap threshold, clustering said data items to form a set of generalized disjunctive rules based on a defined confidence/support threshold, and iterating the above steps until all items in said list are covered.
3 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a system and a method for mining a new kind of association rules called disjunctive association rules, where the antecedent or the consequent of a rule may contain disjuncts of terms (XY or X⊕Y). Such rules are a natural generalisation to the kind of rules that have been mined hitherto. Furthermore, disjunctive association rules are generalised in the sense that the algorithm also mines rules which have disjunctions of conjuncts (C(AB)(DE)). Since the number of combinations of disjuncts is explosive, we use clustering to find a generalized subset. The said clustering is preferably performed using agglomerative clustering methods for finding the greedy subset.
38 Citations
42 Claims
-
1. A method for mining data, wherein said method generates generalized disjunctive association rules to capture local relationships between data items with reference to a given context comprising any arbitrary subset of a set of transactions in order to provide improved data analysis independently of taxonomies, said method comprising:
-
generating a list of all possible data items that can influence said context, discovering disjunctive association rules for data items in said list that co-occur based on a defined overlap threshold within said context, using a cutoff parameter to eliminate trivial data items from said list, wherein said trivial data items create trivial disjunctive association rules having disjunctive antecedents or consequents greater than disjunctive antecedents or consequents of said disjunctive association rules occurring in said defined overlap threshold, clustering said data items to form a set of generalized disjunctive rules based on a defined confidence/support threshold, and iterating the above steps until all items in said list are covered. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system for mining data, and operable for generalized disjunctive association rules to capture local relationships between data items with reference to a given context comprising any arbitrary subset of a set of transactions in order to provide improved data analysis independently of taxonomies, said system comprising:
-
means for generating a list of all possible data items that can influence said context, means for discovering disjunctive association rules for data items in said list that co-occur based on a defined overlap threshold within said context, means for using a cutoff parameter to eliminate trivial data items from said list, wherein said trivial data items create trivial disjunctive association rules having disjunctive antecedents or consequences greater than disjunctive antecedents or consequents of said disjunctive association rules occurring in said defined overlap threshold, means for clustering said data items to form a set of generalized disjunctive rules based on a defined confidence threshold, and means for iterating the above steps until all items in said list are covered. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A computer program product comprising computer readable program code stored on computer readable storage medium embodied therein for mining data, wherein said method generates generalized disjunctive association rules to capture local relationships between data items with reference to a given context comprising any arbitrary subset of a set of transactions in order to provide improved data analysis independently of taxonomies, said computer program product comprising:
-
computer readable program code means configured for generating a list of all possible data items that can influence said context, computer readable program code means configured for discovering disjunctive association rules for data items in said list that co-occur based on a defined overlap threshold within said context, computer readable program code means configured for using a cutoff parameter to eliminate trivial data items from said list wherein said trivial data items create trivial disjunctive association rules having disjunctive antecedents or consequents greater than disjunctive antecedents or consequents of said disjunctive association rules occurring in said defined overlap threshold, computer readable program code means configured for clustering of said data items to form a set of generalized disjunctive rules based on a defined confidence threshold, and computer readable program code means configured for iterating the above steps until all items in said list are covered. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
-
Specification