Custom operators for a parallel query engine
First Claim
1. At a computer system including a processor and a memory, in a computer networking environment including a plurality of computing systems, a computer-implemented method for implementing one or more custom operators in a query for a parallel query engine, the method comprising:
- an act of receiving a portion of partitioned input data at a parallel query engine, wherein the parallel query engine is configured to process the partitioned input data using a sequence of operators, including one or more built-in operators that are part of the parallel query engine;
an act of incorporating at least one user-defined custom operator into the sequence of operators for processing the partitioned input data, the at least one user-defined custom operator being provided to the parallel query engine by a user and being configured for processing along with the one or more built-in operators by being configured to;
poll a predecessor operator of the at least one user-defined custom operator in the sequence of operators to determine the predecessor operator'"'"'s output information, the output information including (i) a number of partitions of the input data that are requestable by the at least one user-defined custom operator and (ii) one or more ordering guarantees that apply to output of the at least one user-defined custom operator;
repartition the input data by adding or reducing the number of partitions of the input data during processing of the input data; and
determine whether changes have occurred that affect the one or more ordering guarantees and, when changes have occurred that affect the one or more ordering guarantees, modify at least a portion of the one or more ordering guarantees that apply to the output of the at least one user-defined custom operator;
an act of accessing the sequence of operators to determine how the partitioned input data is to be processed, wherein the at least one user-defined custom operator is accessed in the same manner as the built-in operators as part of the sequence of operators; and
an act of processing the sequence of operators, including processing both the built-in operators and the at least one user-defined custom operator, according to the determination indicating how the data is to be processed.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments are directed to implementing custom operators in a query for a parallel query engine and to generating a partitioned representation of a sequence of query operators in a parallel query engine. A computer system receives a portion of partitioned input data at a parallel query engine, where the parallel query engine is configured to process data queries in parallel, and where the queries include a sequence of built-in operators. The computer system incorporates a custom operator into the sequence of built-in operators for a query and accesses the sequence of operators to determine how the partitioned input data is to be processed. The custom operator is accessed in the same manner as the built-in operators. The computer system also processes the sequence of operators including both the built-in operators and at least one custom operator according to the determination indicating how the data is to be processed.
-
Citations
11 Claims
-
1. At a computer system including a processor and a memory, in a computer networking environment including a plurality of computing systems, a computer-implemented method for implementing one or more custom operators in a query for a parallel query engine, the method comprising:
-
an act of receiving a portion of partitioned input data at a parallel query engine, wherein the parallel query engine is configured to process the partitioned input data using a sequence of operators, including one or more built-in operators that are part of the parallel query engine; an act of incorporating at least one user-defined custom operator into the sequence of operators for processing the partitioned input data, the at least one user-defined custom operator being provided to the parallel query engine by a user and being configured for processing along with the one or more built-in operators by being configured to; poll a predecessor operator of the at least one user-defined custom operator in the sequence of operators to determine the predecessor operator'"'"'s output information, the output information including (i) a number of partitions of the input data that are requestable by the at least one user-defined custom operator and (ii) one or more ordering guarantees that apply to output of the at least one user-defined custom operator; repartition the input data by adding or reducing the number of partitions of the input data during processing of the input data; and determine whether changes have occurred that affect the one or more ordering guarantees and, when changes have occurred that affect the one or more ordering guarantees, modify at least a portion of the one or more ordering guarantees that apply to the output of the at least one user-defined custom operator; an act of accessing the sequence of operators to determine how the partitioned input data is to be processed, wherein the at least one user-defined custom operator is accessed in the same manner as the built-in operators as part of the sequence of operators; and an act of processing the sequence of operators, including processing both the built-in operators and the at least one user-defined custom operator, according to the determination indicating how the data is to be processed. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer program product for implementing a method for generating a partitioned representation of a sequence of query operators in a parallel query engine, the computer program product comprising one or more hardware computer-readable storage devices having stored thereon computer-executable instructions that, when executed by one or more processors of a computing system, cause the computing system to perform the method, the method comprising:
-
an act of accessing a sequence of operators configured to process a portion of partitioned input data in a parallel query system, the sequence of operators comprising one or more built-in operators that are part of a parallel query engine and at least one user-defined custom operator, the at least one user-defined custom operator being provided to the parallel query engine by a user and being configured for processing along with the one or more built-in operators by being configured to; poll a predecessor operator of the at least one user-defined custom operator in the sequence of operators to determine the predecessor operator'"'"'s output information, the output information including (i) a number of partitions of the input data that are requestable by the at least one user-defined custom operator and (ii) one or more ordering guarantees that apply to output of the at least one user-defined custom operator; repartition the input data by adding or reducing the number of partitions of the input data during processing of the input data; and determine whether changes have occurred that affect the one or more ordering guarantees and, when changes have occurred that affect the one or more ordering guarantees, modify at least a portion of the one or more ordering guarantees that apply to the output of the at least one user-defined custom operator; an act of generating a list of partitions into which the input data has been partitioned; an act of determining a number of partitions that will be made during a re-partitioning operation at an indicated built-in operator in the sequence of operators; and an act of generating a partitioned representation of the sequence of operators, wherein the partitioned representation of the sequence of operators provides internal information regarding the processing of the input data by the indicated built-in operator, the internal information enabling processing of the input data by one of the at least one user-defined custom operator. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer system comprising the following:
-
one or more processors; system memory; one or more computer-readable storage media having stored thereon computer-executable instructions that, when executed by the one or more processors, causes the computing system to perform a method for implementing one or more custom operators in a query for a parallel query engine, the method comprising the following; an act of receiving a portion of partitioned input data at a parallel query engine, wherein the parallel query engine is configured to process the partitioned input data using a sequence of operators, including one or more built-in operators that are part of the parallel query engine; an act of incorporating at least one user-defined custom operator into the sequence of operators for processing the partitioned input data, the at least one user-defined custom operator being provided to the parallel query engine by a user and being configured for processing along with the one or more built-in operators by being configured to; poll a predecessor operator of the at least one user-defined custom operator in the sequence of operators to determine the predecessor operator'"'"'s output information, the output information including (i) a number of partitions of the input data that are requestable by the at least one user-defined custom operator and (ii) one or more ordering guarantees that apply to output of the at least one user-defined custom operator; repartition the input data by adding or reducing the number of partitions of the input data during processing of the input data; and determine whether changes have occurred that affect the one or more ordering guarantees and, when changes have occurred that affect the one or more ordering guarantees, modify at least a portion of the one or more ordering guarantees that apply to the output of the at least one user-defined custom operator; an act of accessing the sequence of operators to determine how the partitioned input data is to be processed, wherein the at least one user-defined custom operator is accessed in the same manner as the built-in operators as part of the sequence of operators; and an act of processing the sequence of operators, including processing both the built-in operators and the at least user-defined one custom operator, processing the sequence of operators including processing the sequence of operators in reverse order such that a last operator in the sequence of operators polls its predecessor operator in the sequence of operators to determine the predecessor operator'"'"'s output information.
-
Specification