Distributed hash group-by cooperative processing
First Claim
1. A computer-implemented method for parallel and cooperative processing of data in a distributed data processing system having at least one host central processing unit (CPU) and one or more input/output processors (IOPs), wherein the data is a table of a relational database and wherein a coordinator process on the on the host CPU cooperates with one or more agent processes on the IOPs, comprising the steps of:
- off loading by the coordinator process a portion of a data processing function to the agent processes;
responding to the portion of the data processing function off loaded by the coordinator process by reading and processing data from the table of relational database by the agent processes;
accumulating, by each of the agent processes, partial results of the processing performed by the particular agent process;
responsive to statistics collected on the content of the data processed, returning some of the partial results from the agent processes to the coordinator process; and
iteratively repeating the previous steps until the portion of the data processing function off loaded by the coordinator process has been completed and all partial results of the off loaded data processing function have been returned to the coordinator process.
1 Assignment
0 Petitions
Accused Products
Abstract
A method is provided for parallel and cooperative processing of data in a system wherein a coordinator process cooperates with one or more agent processes to which portions of the data processing function is off loaded. The agent processes read and process the data and accumulate a partial result. Each agent process, responsive to statistics collected on the content of the data processed, returns a partial result of the processing to the coordinator process. These steps are repeated iteratively until the processing has been completed. In a specific application, the performance of data processing systems is improved by speeding up database group-by queries. The group-by operation processing is distributed between the host central processing unit (CPU) and the input/output (I/O) processors (IOPs). Essentially, the IOPs are sent group-by requests to be performed on a set of disk blocks (extents), along with a predicate for tuples to be selected for query. The IOPs build a hash table with entries of the group-by element and a running aggregation function (sum for example). The IOPs retrieve the extents, extract the records, select records using the predicate specified, enter the element in the hash table if it is not already there, and perform the corresponding aggregation function.
128 Citations
10 Claims
-
1. A computer-implemented method for parallel and cooperative processing of data in a distributed data processing system having at least one host central processing unit (CPU) and one or more input/output processors (IOPs), wherein the data is a table of a relational database and wherein a coordinator process on the on the host CPU cooperates with one or more agent processes on the IOPs, comprising the steps of:
-
off loading by the coordinator process a portion of a data processing function to the agent processes; responding to the portion of the data processing function off loaded by the coordinator process by reading and processing data from the table of relational database by the agent processes; accumulating, by each of the agent processes, partial results of the processing performed by the particular agent process; responsive to statistics collected on the content of the data processed, returning some of the partial results from the agent processes to the coordinator process; and iteratively repeating the previous steps until the portion of the data processing function off loaded by the coordinator process has been completed and all partial results of the off loaded data processing function have been returned to the coordinator process. - View Dependent Claims (2, 3, 4, 5, 6, 8, 9)
-
-
7. A distributed data processing system for parallel and cooperative processing of data in the system, wherein a coordinator process on the system cooperates with one or more agent processes, comprising:
-
one or more input/output processors (IOPs), each said input/output processor including an input/output central processing unit and an input/output memory, an agent process running on said input/output processor; a host central processing unit (CPU) on which a coordinator process is run, said coordinator process cooperating with each said agent process, said coordinator process off loading a portion of a data processing function to the agent processes running on the input/output processors; an input/output bus connecting said main central processing unit with said plurality of input/output processors; and at least one direct access storage device connected to each of said plurality of input/output processors, said direct access storage device storing the data as a table of a relational database, each said agent process reading and processing data, accumulating partial results of the processing, and responsive to statistics collected on the content of the data processed, returning some of the partial results from the agent process to the coordinator process, each said agent process iteratively repeating the reading and processing data, accumulating partial results and returning some of the partial results until the processing has been completed and all partial results of the off loaded data processing function have been returned to the coordinator process. - View Dependent Claims (10)
-
Specification