×

System and method for multiple distinct aggregate queries

  • US 8,005,868 B2
  • Filed: 03/07/2008
  • Issued: 08/23/2011
  • Est. Priority Date: 03/07/2008
  • Status: Active Grant
First Claim
Patent Images

1. A data processor implemented method of executing multiple distinct aggregate type queries, comprising:

  • providing at least one Counting Bloom Filter for each distinct column of an input data stream;

    reviewing count values in the at least one Counting Bloom Filter for the existence of duplicates in each distinct column;

    using a distinct hash operator to remove duplicates from each distinct column of the input data stream, thereby removing the need for replicating the input data stream and minimizing distinct hash operator processing;

    in a build phase, during execution of a group by operation, creating for each group of tuples a Counting Bloom Filter for each distinct column, hashing the values of each distinct column into their respective Counting Bloom Filters, and incrementing the Counting Bloom Filter bit settings; and

    in a probe phase, once the group by operation is finished, reviewing the count values in the Counting Bloom Filter for each distinct column, and if the count value for a distinct column is greater than one, then sending the identified duplicate values to the distinct hash operator,wherein in the probe phase, reviewing the count values in the Counting Bloom Filter comprises;

    for each tuple in a group, querying the values of the distinct columns in their respective Counting Bloom Filters;

    when the lowest of the counters for a given value is one, then determining that the value is unique, and passing the value to the aggregate operator and bypassing the distinct hash operator;

    probing a value into a distinct hash table if the value is not bypassed, andwhen a match is found, discarding the value;

    when a match is not found, turning the probing into an insertion;

    after finishing the processing of the tuples in group, traversing the distinct hash tables and flowing the distinct values up to the aggregate operator, andwherein the distinct hash operator comprises a set of hash tables, one per distinct column, and the method further comprises;

    sizing the hash table depending on an estimated number of distinct values per group, per distinct column, that are not bypassed to the aggregate operator; and

    when the estimated number of incoming distinct values for the next group is different, then resizing the hash table in dependence upon the estimated number.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×