Asymmetric streaming record data processor method and apparatus
First Claim
1. An asymmetric data processor comprising:
- one or more host computers, each including a memory, a network interface and at least one CPU, each host computer being responsive to requests from end users and applications to process data;
a plurality of Job Processing Units (JPUs), each having a memory, a network interface, one or more storage devices, and at least one CPU, each JPU being responsive to requests from host computers and from other JPUs to process data;
a network enabling the host computers and the JPUs to communicate between and amongst each other, each of the host computers and JPUs forming a respective node on the network; and
a plurality of software operators configured to process data at the nodes according to a logical data flow, wherein (i) for each operator in a given sequence of operators in the logical data flow, output of the operator is input to a respective succeeding operator in the sequence in a manner free of necessarily materializing data, and (ii) data processing at each operator is based on readiness of a record, such that the operator transmits ready record data for processing at a successive operator in the logical data flow independent of transmission at other operators, the transmission of ready record data during data processing being substantially continuous so as to form a stream of record processing from operator to operator within nodes and across nodes of the network;
wherein record data are processed at intermediate parts on the logical data flow as a collection of data field values in a manner free of being materialized as whole records between two successive operators; and
wherein the plurality of operators includes one or more join operators, each join operator having multiple input streams and an output stream with references to original records in their packed form, and the output stream for the operator referring to data field values within the record data of the input streams at known offsets from a base pointer to a start of a packed record.
8 Assignments
0 Petitions
Accused Products
Abstract
An asymmetric data record processor and method includes host computers and Job processing units (JPU'"'"'s) coupled together on a network. Each host computer and JPU forms a node on the network. A plurality of software operators allow each node to process streams of records. For each operator in a given sequence within nodes and across nodes, output of the operator is input to a respective succeeding operator. Data processing follows a logical data flow based on readiness of a record. As soon as a record is ready it is passed for processing from one part to a next part in the logical data flow. The flow of records during data processing is substantially continuous and of a streaming fashion.
-
Citations
43 Claims
-
1. An asymmetric data processor comprising:
-
one or more host computers, each including a memory, a network interface and at least one CPU, each host computer being responsive to requests from end users and applications to process data; a plurality of Job Processing Units (JPUs), each having a memory, a network interface, one or more storage devices, and at least one CPU, each JPU being responsive to requests from host computers and from other JPUs to process data; a network enabling the host computers and the JPUs to communicate between and amongst each other, each of the host computers and JPUs forming a respective node on the network; and a plurality of software operators configured to process data at the nodes according to a logical data flow, wherein (i) for each operator in a given sequence of operators in the logical data flow, output of the operator is input to a respective succeeding operator in the sequence in a manner free of necessarily materializing data, and (ii) data processing at each operator is based on readiness of a record, such that the operator transmits ready record data for processing at a successive operator in the logical data flow independent of transmission at other operators, the transmission of ready record data during data processing being substantially continuous so as to form a stream of record processing from operator to operator within nodes and across nodes of the network; wherein record data are processed at intermediate parts on the logical data flow as a collection of data field values in a manner free of being materialized as whole records between two successive operators; and wherein the plurality of operators includes one or more join operators, each join operator having multiple input streams and an output stream with references to original records in their packed form, and the output stream for the operator referring to data field values within the record data of the input streams at known offsets from a base pointer to a start of a packed record. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A method of data processing comprising the steps of:
-
providing one or more host computers, each including a memory, a network interface and at least one CPU, each host computer being responsive to requests from end users and applications to process data; providing a plurality of Job Processing Units (JPUs), each having a memory, a network interface, one or more storage devices, and at least one CPU, each JPU being responsive to requests from host computers and from other JPUs to process data; networking the host computers and the JPUs to communicate between and amongst each other, each of the host computers and JPUs forming a respective node on the network; using a plurality of software operators, processing data according to a logical data flow, wherein (i) for each operator in a given sequence of said operators in the logical data flow, output of the operator is input to a respective succeeding operator in the sequence in a manner free of necessarily materializing data, and (ii) data processing at each operator is based on readiness of a record, such that the operator transmits ready record data for processing at a successive operator along the logical data path independent of transmission at other operators, the transmission of ready record data on the logical data path during data processing being substantially continuous so as to form a stream of record processing from operator to operator across nodes and within nodes of the network; and processing record data at intermediate locations on the logical data path as a collection of data field values, in a manner free of being materialized as whole records between two successive operators; wherein the plurality of operators includes one or more join operators, each join operator having multiple input streams and an output stream with references to original records in their packed form, and the output stream of the join operator referring to data field values within the record data of the input stream at known offsets from a base pointer to a start of a packed record. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43)
-
Specification