Parallel processing of count distinct values

US 20070239663A1
Filed: 04/06/2006
Published: 10/11/2007
Est. Priority Date: 04/06/2006
Status: Abandoned Application

First Claim

Patent Images

1. A method for performing a count distinct function on values in at least one column of data comprising:

a) splitting the data into chunks based on the values in the at least one column of data upon which the count distinct function is to be performed, where no value appears in more than one chunk;

b) determining if each chunk is of a size that enables it to fit into available memory, and i) if not, recursively splitting the oversized chunks until each chunk is of a size that enables it to fit into available memory; and

c) performing an in memory count distinct function on each chunk and summing a number of distinct values from each chunk for display in at least one cell of a results grid.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for efficiently determining the number of distinct values in a column of source data is disclosed. Source data (e.g., source table) may be in the form of rows and columns that represent information. From the source table a count distinct function may be carried out to determine the number of distinct values in one or more columns of the source table. Results from an in memory count distinct function performed by a plurality of parallel query processors may be placed into a results grid. Another aspect of the invention relates to determining how many distinct values fall into each cell of the results grid.

24 Citations

View as Search Results

18 Claims

1. A method for performing a count distinct function on values in at least one column of data comprising:
- a) splitting the data into chunks based on the values in the at least one column of data upon which the count distinct function is to be performed, where no value appears in more than one chunk;
  
  b) determining if each chunk is of a size that enables it to fit into available memory, and i) if not, recursively splitting the oversized chunks until each chunk is of a size that enables it to fit into available memory; and
  
  c) performing an in memory count distinct function on each chunk and summing a number of distinct values from each chunk for display in at least one cell of a results grid.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein the at least one cell of the results grid represents one or more rows of the at least one column of data.
  - 3. The method of claim 1, further comprising hashing the data in at least one column of data according to value before splitting the data into chunks.
  - 4. The method of claim 1, wherein a number of cells is 2^n-1, wherein n is a number of dimensions of the results grid.
  - 5. The method of claim 1, wherein the in-memory count distinct function further includes hashing the data in the chunks by value.

6. A method for performing a count distinct function on values in at least one column of data from source data having at least one or more rows and one or more columns, comprising:
- a) assigning a row of the source data to a cell in a grid b) creating a hash table based on a value in a column of the row and the cell assigned to the row;
  
  c) splitting the hash table of cell-value pairs into chunks based on the values, where no value appears in more than one chunk;
  
  b) determining if each chunk is of a size that enables it to fit into available memory, and i) if not, recursively splitting the oversized chunks until each chunk is of a size that enables it to fit into available memory; and
  
  c) performing an in memory count distinct function on each chunk and summing a number of distinct values from each chunk for display in at least one cell of a results grid.
- View Dependent Claims (7, 8, 9)
- - 7. The method of claim 6, wherein the at least one cell of the results grid represents one or more rows of the at least one column of data.
  - 8. The method of claim 6, wherein a number of cells is 2^n-1, wherein n is a number of dimensions of the results grid.
  - 9. The method of claim 6, wherein the in-memory count distinct function further includes creating another hash table for in the chunks by value.

10. A relational database system having data storage and one or more processors for performing a count distinct function on values in at least one column of data comprising:
- a) means for splitting the data into chunks based on the values in the column(s) of data upon which the count distinct function is to be performed so that no value appears in more than one chunk;
  
  b) means for determining if each chunk is of a size that enables it to fit into available memory, and i) if not, recursively splitting the chunks until each chunk is of a size that enables it to fit into available memory; and
  
  c) means for performing an in memory count distinct function on each chunk and summing a number of distinct values from each chunk for display in at least one cell of a results grid.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The system of claim 10, wherein the at least one cell of the results grid represents one or more rows of the at least one column of data.
  - 12. The system of claim 10, further comprising means for hashing the data in at least one column of data according to value before splitting the data into chunks.
  - 13. The system of claim 10, wherein a number of cells is 2^n-1, wherein n is a number of dimensions of the results grid.
  - 14. The system of claim 10, wherein the means for performing an in-memory count distinct function further includes means for hashing the data in the chunks by value.

15. A relational database system having data storage and one or more processors for performing a count distinct function on values in at least one column of data from source data having at least one or more rows and one or more columns, comprising:
- a) means for assigning a row of the source data to a cell in a grid b) means for creating a hash table based on a value in a column of the row and the cell assigned to the row;
  
  c) means for splitting the hash table of cell-value pairs into chunks based on the values, where no value appears in more than one chunk;
  
  b) means for determining if each chunk is of a size that enables it to fit into available memory, and i) if not, a means for recursively splitting the oversized chunks until each chunk is of a size that enables it to fit into available memory; and
  
  c) means for performing an in memory count distinct function on each chunk and summing a number of distinct values from each chunk for display in at least one cell of a results grid.
- View Dependent Claims (16, 17, 18)
- - 16. The system of claim 15, wherein the at least one cell of the results grid represents one or more rows of the at least one column of data.
  - 17. The system of claim 15, wherein a number of cells is 2^n-1, wherein n is a number of dimensions of the results grid.
  - 18. The system of claim 15, wherein the in-memory count distinct function further includes means for creating another hash table for the chunks by value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Clareos, Inc.
Original Assignee
Clareos, Inc.
Inventors
Dyskant, Raymi

Application Number

US11/398,596
Publication Number

US 20070239663A1
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/24532 of parallel queries

Parallel processing of count distinct values

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

24 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Parallel processing of count distinct values

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

24 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links