Distributed hash group-by cooperative processing

US 5,655,080 A
Filed: 08/14/1995
Issued: 08/05/1997
Est. Priority Date: 08/14/1995
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method for parallel and cooperative processing of data in a distributed data processing system having at least one host central processing unit (CPU) and one or more input/output processors (IOPs), wherein the data is a table of a relational database and wherein a coordinator process on the on the host CPU cooperates with one or more agent processes on the IOPs, comprising the steps of:

off loading by the coordinator process a portion of a data processing function to the agent processes;

responding to the portion of the data processing function off loaded by the coordinator process by reading and processing data from the table of relational database by the agent processes;

accumulating, by each of the agent processes, partial results of the processing performed by the particular agent process;

responsive to statistics collected on the content of the data processed, returning some of the partial results from the agent processes to the coordinator process; and

iteratively repeating the previous steps until the portion of the data processing function off loaded by the coordinator process has been completed and all partial results of the off loaded data processing function have been returned to the coordinator process.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method is provided for parallel and cooperative processing of data in a system wherein a coordinator process cooperates with one or more agent processes to which portions of the data processing function is off loaded. The agent processes read and process the data and accumulate a partial result. Each agent process, responsive to statistics collected on the content of the data processed, returns a partial result of the processing to the coordinator process. These steps are repeated iteratively until the processing has been completed. In a specific application, the performance of data processing systems is improved by speeding up database group-by queries. The group-by operation processing is distributed between the host central processing unit (CPU) and the input/output (I/O) processors (IOPs). Essentially, the IOPs are sent group-by requests to be performed on a set of disk blocks (extents), along with a predicate for tuples to be selected for query. The IOPs build a hash table with entries of the group-by element and a running aggregation function (sum for example). The IOPs retrieve the extents, extract the records, select records using the predicate specified, enter the element in the hash table if it is not already there, and perform the corresponding aggregation function.

128 Citations

10 Claims

1. A computer-implemented method for parallel and cooperative processing of data in a distributed data processing system having at least one host central processing unit (CPU) and one or more input/output processors (IOPs), wherein the data is a table of a relational database and wherein a coordinator process on the on the host CPU cooperates with one or more agent processes on the IOPs, comprising the steps of:
- off loading by the coordinator process a portion of a data processing function to the agent processes;
  
  responding to the portion of the data processing function off loaded by the coordinator process by reading and processing data from the table of relational database by the agent processes;
  
  accumulating, by each of the agent processes, partial results of the processing performed by the particular agent process;
  
  responsive to statistics collected on the content of the data processed, returning some of the partial results from the agent processes to the coordinator process; and
  
  iteratively repeating the previous steps until the portion of the data processing function off loaded by the coordinator process has been completed and all partial results of the off loaded data processing function have been returned to the coordinator process.
- View Dependent Claims (2, 3, 4, 5, 6, 8, 9)
- - 2. The method of claim 1, wherein the result of the processing is a set of elements determined by a group-by query, and wherein the statistics include a partial count of a number of tuples corresponding to each of the elements in the set to be finally returned.
  - 3. The method of claim 2, wherein the elements with the smallest partial counts are returned to the coordinator process.
  - 4. The method of claim 1, wherein at least some of the agent processes are executed on different nodes of the system.
  - 5. The method of claim 4, wherein at least one of the agent processes is executed on a different node than the coordinator process.
  - 6. The method of claim 5, wherein the coordinator process is executed on a central processing unit and the agent processes are executed on one or more input/output processing units.
  - 8. The distributed computer system as recited in claim 1 wherein the result of the processing by each said agent process is a set of elements determined by a group-by query, and wherein the statistics include a partial count of a number of tuples corresponding to each of the elements in the set to be finally returned.
  - 9. The distributed computer system as recited in claim 8 wherein the elements with the smallest partial counts are returned to the coordinator process by each said agent process.

7. A distributed data processing system for parallel and cooperative processing of data in the system, wherein a coordinator process on the system cooperates with one or more agent processes, comprising:
- one or more input/output processors (IOPs), each said input/output processor including an input/output central processing unit and an input/output memory, an agent process running on said input/output processor;
  
  a host central processing unit (CPU) on which a coordinator process is run, said coordinator process cooperating with each said agent process, said coordinator process off loading a portion of a data processing function to the agent processes running on the input/output processors;
  
  an input/output bus connecting said main central processing unit with said plurality of input/output processors; and
  
  at least one direct access storage device connected to each of said plurality of input/output processors, said direct access storage device storing the data as a table of a relational database, each said agent process reading and processing data, accumulating partial results of the processing, and responsive to statistics collected on the content of the data processed, returning some of the partial results from the agent process to the coordinator process, each said agent process iteratively repeating the reading and processing data, accumulating partial results and returning some of the partial results until the processing has been completed and all partial results of the off loaded data processing function have been returned to the coordinator process.
- View Dependent Claims (10)
- - 10. The distributed computer system recited in claim 7 wherein there are a plurality of input/output processors, each said input/output processor running an agent process cooperating with said coordinator process.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Polyzois, Christos Alkiviadis, Dias, Daniel Manual, Hoffman, Roy Louis, King, Richard Pervin, Pinnow, Kurt Walter, Egan, Randy Lynn
Primary Examiner(s)
SHAH, ALPESH

Application Number

US08/514,543
Time in Patent Office

722 Days
Field of Search

395/200.01, 395/200.03, 395/200.09, 395/200.15, 395/200.18, 395/821, 395/842, 395/439, 395/600, 395/650, 395/375, 395/700, 395/800, 395/200.05 , 395/825-827, 364/131, 364/133, 364/591.01, 364/554
US Class Current

709/202
CPC Class Codes

G06F 16/244   Grouping and aggregation

G06F 2209/509   Offload

G06F 9/5027   the resource being a machin...

Distributed hash group-by cooperative processing

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

128 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Distributed hash group-by cooperative processing

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

128 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links