Method, system, and computer program product for computing histogram aggregations
First Claim
1. A system for transforming a stream of sorted input data records into an aggregate record, each input data record including at least one group-by field, at least one field to be binned, and at least one value field, and each aggregate record including at least one group-by field and an aggregation result field, wherein the input data record group-by field stores data to be matched against data stored in the aggregate record group-by field, wherein the input data records are sorted by the data in the group-by field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the system comprising:
- a binning module that, for each input data record in the stream, evaluates data in the input data record field to be binned to determine a bin-index value, wherein the bin-index value identifies locations of the aggregation result field the value field data should be aggregated into; and
a histogram aggregation module that, for each input data record in the stream, aggregates the value field data into the aggregation result field location identified by the bin-index value, when data in the input data record group-by field matches data in the aggregate record group-by field.
6 Assignments
0 Petitions
Accused Products
Abstract
A data record transformation that computes histograms and aggregations quickly for an incoming record stream. The data record transformation computes histograms and aggregations in one-step, thereby, avoiding the creation of a large intermediate result. The data record transformation operates in a streaming fashion on each record in an incoming record stream. Little memory is required to operate on one record or a few records at a time. According to a first embodiment, a method, system, and computer program product for transforming sorted data records is provided. A data transformation unit includes a binning module and a histogram aggregation module. The histogram aggregation module processes each binned and sorted record to form an aggregate record in a histogram format in one step. Data received in each incoming binned and sorted record is expanded and accumulated in an aggregate record for matching group-by fields. According to a second embodiment, a method, system, and computer program product for transforming unsorted data records is provided. An associative data structure holds a collection of partially aggregated histogram records. A histogram aggregation module processes each binned record to form an aggregate record in a histogram format in one step. Input records from the unordered record stream are matched against the collection of partially aggregated histogram records and expanded and accumulated into the aggregate histogram record having matching group-by fields.
-
Citations
18 Claims
-
1. A system for transforming a stream of sorted input data records into an aggregate record, each input data record including at least one group-by field, at least one field to be binned, and at least one value field, and each aggregate record including at least one group-by field and an aggregation result field, wherein the input data record group-by field stores data to be matched against data stored in the aggregate record group-by field, wherein the input data records are sorted by the data in the group-by field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the system comprising:
-
a binning module that, for each input data record in the stream, evaluates data in the input data record field to be binned to determine a bin-index value, wherein the bin-index value identifies locations of the aggregation result field the value field data should be aggregated into; and a histogram aggregation module that, for each input data record in the stream, aggregates the value field data into the aggregation result field location identified by the bin-index value, when data in the input data record group-by field matches data in the aggregate record group-by field. - View Dependent Claims (2)
-
-
3. A method for transforming a stream of sorted input data records into an aggregate record, each input data record including at least one group-by field, at least one field to be binned, and at least one value field, and each aggregate record including at least one group-by field and an aggregation result field, wherein the input data record group-by field stores data to be matched against data stored in the aggregate record group-by field, wherein the input data records are sorted by the data in the group-by field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the method comprising the steps of:
-
evaluating, for each input data record in the stream, data in the input data record field to be binned to determine a bin-index value, wherein the bin-index value identifies locations of the aggregation result field the value field data should be aggregated into; and for each input data record in the stream, aggregating the value field data into the aggregation result field location identified by the bin-index value, when data in the input data record group-by field matches data in the aggregate record group-by field. - View Dependent Claims (4)
-
-
5. A computer program product comprising a computer useable medium having computer program logic recorded thereon for enabling a processor in a computer system to transform a stream of sorted input data records into an aggregate record, each input data record including at least one group-by field, at least one field to be binned, and at least one value field, and each aggregate record including at least one group-by field and an aggregation result field, wherein the input data record group-by field stores data to be matched against data stored in the aggregate record group-by field, wherein the input data records are sorted by the data in the group-by field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the computer program logic comprising:
-
means for enabling the processor, for each input data record in the stream, to evaluate data in the input data record field to be binned to determine a bin-index value, wherein the bin-index value identifies locations of the aggregation result field the value field data should be aggregated into; and means for enabling the processor, for each input data record in the stream, to aggregate the value field data into the aggregation result field location identified by the bin-index value, when data in the input data record group-by field matches data in the aggregate record group-by field. - View Dependent Claims (6)
-
-
7. A system for transforming a stream of input data records into a partially aggregated record, each input data record including at least one group-by field, at least one field to be binned, and at least one value field, and each partially aggregated record including at least one group-by field and an aggregation result field, wherein the input data record group-by field stores data to be matched against data stored in the partially aggregated record group-by field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the system comprising:
-
a binning module that, for each input data record in the stream, evaluates data in the input data record field to be binned to determine a bin-index value, wherein the bin-index value identifies locations of the aggregation result field the value field data should be aggregated into; and a histogram aggregation module that, for each input data record in the stream, aggregates the value field data into the aggregation result field location identified by the bin-index value, when data in the input data record group-by field matches data in the partially aggregated record group-by field.
-
-
8. A method for transforming a stream of input data records into a partially aggregated record, each input data record including at least one group-by field, at least one field to be binned, and at least one value field, and each partially aggregated record including at least one group-by field and an aggregation result field, wherein the input data record group-by field stores data to be matched against data stored in the partially aggregated record group-by field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the method comprising the steps of:
-
evaluating, for each input data record in the stream, data in the input data record field to be binned to determine a bin-index value, wherein the bin-index value identifies locations of the aggregation result field the value field data should be aggregated into; and for each input data record in the stream, aggregating the value field data into the aggregation result field location identified by the bin-index value, when data in the input data record group-by field matches data in the partially aggregated record group-by field. - View Dependent Claims (9)
-
-
10. A system for transforming a stream of sorted input data records into an aggregate record, each input data record including at least one group-by field, at least one field to be binned, and at least one value field, and each aggregate record including at least one group-by field and an aggregation result field, wherein the input data record group-by field stores data to be matched against data stored in the aggregate record group-by field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the system comprising:
-
evaluating means for evaluating each input data record in the stream, the evaluating means evaluates data in the input data record field to be binned to determine a bin-index value, wherein the bin-index value identifies locations of the aggregation result field the value field data should be aggregated into; histogram aggregation means that, for each input data record in the stream, aggregates the value field data into the aggregation result field location identified by the bin-index value, when data in the input data record group-by field matches data in the aggregate record group-by field; and a graphical user-interface means for providing parameters for the histogram aggregation means, the parameters identifying a selected histogram aggregation operation and selected group-by fields.
-
-
11. A system for transforming a stream of input data records into a partially aggregated record, each input data record including at least one group-by field, at least one field to be binned, and at least one value field, and each partially aggregated record including at least one group-by field and an aggregation result field, wherein the input data record group-by field stores data to be matched against data stored in the partially aggregated record group-by field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the system comprising:
-
an evaluating means that, for each input data record in the stream, evaluates data in the input data record field to be binned to determine a bin-index value, wherein the bin-index value identifies locations of the aggregation result field the value field data should be aggregated into; and a histogram aggregation means that, for each input data record in the stream, aggregates the value field data into the aggregation result field location identified by the bin-index value, when data in the input data record group-by field matches data in the partially aggregated record group-by field. - View Dependent Claims (12, 13, 14)
-
-
15. A computer program product comprising a computer useable medium having computer program logic recorded thereon for enabling a processor in a computer system to transform a stream of input data records into a partially aggregated record, each input data record including at least one group-by field, at least one field to be binned, and at least one value field, and each partially aggregated record including at least one group-by field and an aggregation result field, wherein the input data record group-by field stores data to be matched against data stored in the partially aggregated record group-by field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the computer program logic comprising:
-
means for enabling the processor, for each input data record in the stream, to evaluate data in the input data record field to be binned to determine a bin-index value, wherein the bin-index value identifies locations of the aggregation result field the value field data should be aggregated into; and means for enabling the processor, for each input data record in the stream, to aggregate the value field data into the aggregation result field location identified by the bin-index value, when data in the input data record group-by field matches data in the partially aggregated record group-by field. - View Dependent Claims (16)
-
-
17. A system for transforming a stream of sorted input data records into an aggregate record, each input data record including at least one field to be binned and at least one value field, and each aggregate record including an aggregation result field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the system comprising:
-
a binning module that, for each input data record in the stream, evaluates data in the input data record field to be binned to determine a bin-index value, wherein the bin-index value identifies locations of the aggregation result field the value field data should be aggregated into; and a histogram aggregation module that, for each input data record in the stream, aggregates the value field data into the aggregation result field location identified by the bin-index value.
-
-
18. A system for transforming a stream of input data records into a partially aggregated record, each input data record including at least one field to be binned and at least one value field, and each partially aggregated record including an aggregation result field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the system comprising:
-
a binning module that, for each input data record in the stream, evaluates data in the input data record field to be binned to determine a bin-index value, wherein the bin-index value identifies locations of the aggregation result field the value field data should be aggregated into; and a histogram aggregation module that, for each input data record in the stream, aggregates the value field data into the aggregation result field location identified by the bin-index value.
-
Specification