Computing frequency distribution for many fields in one pass in parallel
First Claim
1. A computer-implemented method for determining a frequency distribution for a set of records, comprising:
- building a count table of frequency distributions in memory for each field in the set of records, wherein each record of each count table includes a field identifier, a field value, and a count of a number of times the field value occurs in the set of records, and wherein the field identifier concatenated with the field value comprises a composite key value;
determining that at least one count table of frequency distributions is approaching a maximum amount of memory allocated to that count table; and
sending the records of the at least one count table that is approaching the maximum amount of memory for sorting and additional counting, wherein the records include composite key values.
1 Assignment
0 Petitions
Accused Products
Abstract
Provided are a techniques for determining a frequency distribution for a set of records. A count table of frequency distributions is built in memory for each field in the set of records, wherein each record of each count table includes a field identifier, a field value, and a count of a number of times the field value occurs in the set of records, and wherein the field identifier concatenated with the field value comprises a composite key value. It is determined that at least one count table of frequency distributions is approaching a maximum amount of memory allocated to that count table. The records of the at least one count table that is approaching the maximum amount of memory are sent for sorting and additional counting, wherein the records include composite key values.
-
Citations
24 Claims
-
1. A computer-implemented method for determining a frequency distribution for a set of records, comprising:
-
building a count table of frequency distributions in memory for each field in the set of records, wherein each record of each count table includes a field identifier, a field value, and a count of a number of times the field value occurs in the set of records, and wherein the field identifier concatenated with the field value comprises a composite key value;
determining that at least one count table of frequency distributions is approaching a maximum amount of memory allocated to that count table; and
sending the records of the at least one count table that is approaching the maximum amount of memory for sorting and additional counting, wherein the records include composite key values. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer program product for determining a frequency distribution for a set of records, comprising a computer useable medium including a computer readable program, wherein the computer readable program when executed on a computer causes the computer to:
-
build a count table of frequency distributions in memory for each field in the set of records, wherein each record of each count table includes a field identifier, a field value, and a count of a number of times the field value occurs in the set of records, and wherein the field identifier concatenated with the field value comprises a composite key value;
determine that at least one count table of frequency distributions is approaching a maximum amount of memory allocated to that count table; and
send the records of the at least one count table that is approaching the maximum amount of memory for sorting and additional counting, wherein the records include composite key values. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system for determining a frequency distribution for a set of records, comprising:
logic capable of performing operations, the operations comprising;
building a count table of frequency distributions in memory for each field in the set of records, wherein each record of each count table includes a field identifier, a field value, and a count of a number of times the field value occurs in the set of records, and wherein the field identifier concatenated with the field value comprises a composite key value;
determining that at least one count table of frequency distributions is approaching a maximum amount of memory allocated to that count table; and
sending the records of the at least one count table that is approaching the maximum amount of memory for sorting and additional counting, wherein the records include composite key values. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
Specification