Computation of frequent data values
First Claim
1. A method for generating a list of at least N frequent data values obtained from a data set comprising a plurality of data values and associated counts representative of frequencies of occurrence of said data values, the method comprising:
- (a) comparing the associated count of a selected data value with a threshold; and
(b) if said count is greater than said threshold and said list comprises N data values, replacing the least frequently occurring data value and associated count in said list with said selected data value and associated count, and modifying said threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
Computing frequent value statistics, such as the top most frequent values in a data column, in a database management system. In one aspect, a list is generated of at least N data values obtained from a data set that comprises data values and associated counts, where the counts are representative of the frequency of occurrence of each data value. For a selected data value, the associated count is compared with a threshold and if the count is greater than the threshold, and the list has N data values, the least frequently occurring data value and associated count in the list are replaced with the selected data value and associated count, and the threshold is modified.
-
Citations
22 Claims
-
1. A method for generating a list of at least N frequent data values obtained from a data set comprising a plurality of data values and associated counts representative of frequencies of occurrence of said data values, the method comprising:
-
(a) comparing the associated count of a selected data value with a threshold; and
(b) if said count is greater than said threshold and said list comprises N data values, replacing the least frequently occurring data value and associated count in said list with said selected data value and associated count, and modifying said threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for generating a list of frequent data values obtained from a data set, said data set comprising data values and associated counts, said counts representative of the frequency of occurrence of each said data value in said data set, the method comprising:
-
(a) comparing said count associated with a selected data value with a threshold; and
(b) if said count is greater than said threshold and said list is full, replacing the most frequently occurring data value and associated count in said list with said selected data value and associated count, and obtaining a new threshold to replace said threshold. - View Dependent Claims (9, 10, 12, 13)
-
-
11. A method for determining the frequency of data values in a set of data values comprising:
-
(a) obtaining a data value from among data values in a set of data values;
(b) mapping the obtained data value to a position in an array of counts and incrementing a count value associated with the position;
(c) obtaining the next data value if the count value associated with the obtained data value is less than or equal to a threshold value; and
(d) if the associated count value is greater than the threshold value;
(i) if a list of most frequent values is not full, writing the obtained data value and associated count value to the list, and if the list is now full, obtaining a new threshold value;
(ii) if the list of most frequent values is full;
(A) copying the associated count value of the selected data value to the count value associated with a matching data value found in the list, and if the selected data value is not already in the list, replacing the least frequent data value and associated count value in the list with the selected data value and associated count value; and
(B) obtaining a new threshold value;
(iii) obtaining the next data value and returning to step (b).
-
-
14. A computer system comprising:
-
means for selecting a data value and comparing a count associated with said selected unique data value with a threshold;
means for inserting said selected data value and associated count into a list if said count is greater than said threshold and said list is not full;
means for replacing the least frequently occurring data value and associated count in said list with said selected data value and associated count if said count is greater than said threshold, and said list is full; and
means for modifying said threshold. - View Dependent Claims (15, 16, 17)
-
-
18. A computer readable medium including program instructions for determining a list of frequent data values in a database management system, the program instructions for implementing steps comprising:
-
selecting a data value and comparing a count associated with said selected data value with a threshold;
inserting said selected data value and associated count into said list if said count is greater than said threshold and said list is not full;
replacing the least frequently occurring data value and associated count in said list with said selected data value and associated count, and modifying said threshold, if said count is greater than said threshold and said list is full. - View Dependent Claims (19, 20, 21, 22)
-
Specification