Method and apparatus for deriving an association rule between data
First Claim
1. A method for deriving an association rule between a plurality of data in a database, each said data including at least two kinds of numerical attributes and one kind of true-false attribute, comprising the steps of:
- constituting a plane, said plane having two axes corresponding to said two kinds of numerical attributes, divided into N×
M buckets, and storing a number u(i, j) of data included in each bucket (i, j) and a number v(i, j) of data whose true-false attribute represents true in each bucket;
inputting a condition θ
;
segmenting an area S from the plane, wherein the buckets included in said area S maximize ##EQU23## outputting data included in said segmented area S.
1 Assignment
0 Petitions
Accused Products
Abstract
A method is described for finding correlation between a plurality of data having two kinds of numerical attributes and a true-false attribute. The method comprises the steps of: constituting a plane with two numerical attributes, dividing the plane into meshes, and counting the number of data in each mesh (also called a "bucket") and the number of data whose true-false attribute represents true. If each mesh is assumed to be a pixel, such plane can be considered as a plane image in which the number of data corresponds to brilliance, and the number of data whose true-false attribute represents true corresponds to saturation. The method further includes the step of segmenting an admissible image which is convex along an axis of the plane according to a predetermined condition θ to find an area with strong correlation. If the segmented area as the admissible image satisfies the above-described condition such as the maximized support rule, the method also presents the area to the user. In addition, necessary attributes for data included in the area are also extracted from a database, as required.
-
Citations
23 Claims
-
1. A method for deriving an association rule between a plurality of data in a database, each said data including at least two kinds of numerical attributes and one kind of true-false attribute, comprising the steps of:
-
constituting a plane, said plane having two axes corresponding to said two kinds of numerical attributes, divided into N×
M buckets, and storing a number u(i, j) of data included in each bucket (i, j) and a number v(i, j) of data whose true-false attribute represents true in each bucket;inputting a condition θ
;segmenting an area S from the plane, wherein the buckets included in said area S maximize ##EQU23## outputting data included in said segmented area S. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An apparatus for deriving an association rule between a plurality of data in a database, each said data including at least two kinds of numerical attributes and one kind of true-false attribute, comprising:
-
means for constituting a plane, said plane having two axes corresponding to said two kinds of numerical attributes, divided into N×
M buckets, and storing a number u(i, j) of data included in each bucket (i, j) and a number v(i, j) of data whose true-false attribute represents true in each bucket;means for inputting a condition θ
;means for segmenting an area S from the plane, wherein the buckets included in said area S maximize ##EQU30## means for outputting data included in said segmented area S. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
15. The apparatus for deriving an association rule as set forth in claim 10, further comprising:
-
means for inputting the minimum support number Umin, wherein Umin is the minimum number of data included in the area to be segmented; means for comparing the number of data U(S) included in said segmented area S with the minimum support number Umin ; if said comparison indicates Umin =U(S), then means for outputting said area S as the area to be segmented; and
,if said comparison indicates Umin >
U(S) or Umin <
U(S), then means for directing said area segmenting means to operate under a new condition θ
5.
-
-
16. The apparatus for deriving an association rule as set forth in claim 10, further comprising:
-
means for inputting minconf, which is the ratio of said number of data whose true-false attribute represents true in the area to be segmented to the number of data in the area to be segmented; if minconf=V(S)/U(S) for said segmented area S, where U(S) is the number of data included in said area S, V(S) is the number of data included in said area S and whose true-false attribute represents true, then means for outputting said area S; and
,if minconf<
V(S)/U(S) or minconf>
V(S)/U(S), then means for directing said area segmenting means to operate under a new condition θ
8.
-
-
17. The apparatus for deriving an association rule as set forth in claim 10, further comprising:
-
means, for said segmented area S, for calculating an entropy by ##EQU32## where Usum is the number of data over said entire plane, and Vsum is the number of data included in said entire plane whose true-false attribute represents true, and for storing the entropy value in correspondence to said area S; means for directing said area segmenting means and said entropy calculating means to operate with a modified condition θ
; andmeans for outputting an area S which makes f (U(S), V(S)), stored in said entropy calculating means, maximized.
-
-
18. The apparatus for deriving an association rule as set forth in claim 10, further comprising:
-
means, for said segment area S, for calculating an interclass variance by ##EQU33## where Usum is the number of data over said entire plane, and Vsum is the number of data included in said entire plane whose true-false attribute represents true, and for storing the interclass variance value in correspondence to said area S; means for directing said area segmenting means and said interclass variance calculating means to operate with a modified condition θ
;outputting an area S which makes f (U(S), V(S)), stored in said interclass variance calculation means, maximized.
-
-
-
19. A storage device comprising program code means for causing a computer to derive an association rule between a plurality of data in a database, each said data including at least two kinds of numerical attributes and one kind of true-false attribute, said program code means comprising:
-
plane constituting program code means, said plane having two axes corresponding to said two kinds of numerical attributes, and divided into N×
M buckets, said plane constituting program code means causing the computer to store a number u(i, j) of data included in each bucket (i, j) and a number v(i, j) of data whose true-false attribute represents true in each bucket;input program code means for causing the computer to input a condition θ
;area segmentation program code means for causing the computer to segment an area S from the plane, wherein the buckets included in said area S maximize ##EQU34## - View Dependent Claims (20, 21, 22, 23)
-
Specification