Formal language and translator for parallel processing of data
First Claim
1. A method, comprising:
- accepting as input a program written in a formal language, wherein the input program comprises a plurality of operators that enable a declarative co-grouping of one or more tables, each with an alignment function, and a specification of zero or more procedural operations to be performed on each resulting co-group;
co-grouping one or more tables referenced in the program into one or more co-groups according to one or more operators of the formal language used in the program;
determining zero or more user-specified operations to be performed on each co-group according to the one or more operators of the formal language used in the program; and
a translator corresponding to the formal language program translating the program into one or more jobs according to the one or more operators of the formal language used in the program and based on the one or more co-groups and the zero or more operations to be performed on each co-group, wherein each job comprises one or more structured calls to an application programming interface for encoded logic that is operable to generate a plurality of tasks for the parallel processing of the job on one or more data processing devices in a distributed system.
11 Assignments
0 Petitions
Accused Products
Abstract
The present invention, in an example embodiment, provides a special-purpose formal language and translator for the parallel processing of large databases in a distributed system. The special-purpose language has features of both a declarative programming language and a procedural programming language and supports the co-grouping of tables, each with an arbitrary alignment function, and the specification of procedural operations to be performed on the resulting co-groups. The language'"'"'s translator translates a program in the language into optimized structured calls to an application programming interface for implementations of functionality related to the parallel processing of tasks over a distributed system. In an example embodiment, the application programming interface includes interfaces for MapReduce functionality, whose implementations are supplemented by the embodiment.
69 Citations
20 Claims
-
1. A method, comprising:
-
accepting as input a program written in a formal language, wherein the input program comprises a plurality of operators that enable a declarative co-grouping of one or more tables, each with an alignment function, and a specification of zero or more procedural operations to be performed on each resulting co-group; co-grouping one or more tables referenced in the program into one or more co-groups according to one or more operators of the formal language used in the program; determining zero or more user-specified operations to be performed on each co-group according to the one or more operators of the formal language used in the program; and a translator corresponding to the formal language program translating the program into one or more jobs according to the one or more operators of the formal language used in the program and based on the one or more co-groups and the zero or more operations to be performed on each co-group, wherein each job comprises one or more structured calls to an application programming interface for encoded logic that is operable to generate a plurality of tasks for the parallel processing of the job on one or more data processing devices in a distributed system. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. One or more computer-readable non-transitory storage media Embodying software operable when executed by one or more computer systems to:
-
accept as input a program written in a formal language, wherein the input program comprises a plurality of operators that enable a declarative co-grouping of one or more tables, each with an alignment function, and a specification of zero or more procedural operations to be performed on each resulting co-group; co-group one or more tables referenced in the program into one or more co-groups according to one or more operators of the formal language used in the program; determine zero or more user-specified operations to be performed on each co-group according to the one or more operators of the formal language used in the program; and translate the program into one or more jobs according to the one or more operators of the formal language used in the program and based on the one or more co-groups and the zero or more operations to be performed on each co-group, wherein each job comprises one or more structured calls to an application programming interface for encoded logic that is operable to generate a plurality of tasks for the parallel processing of the job on one or more data processing devices in a distributed system. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. An apparatus comprising:
-
a processor and a non-transitory memory storing executable instructions which when executed perform plurality of steps comprising; accepting as input a program written in a formal language, wherein the input program comprises a plurality of operators that enable a declarative co-grouping of one or more tables, each with an alignment function, and a specification of zero or more procedural operations to be performed on each resulting co-group; and co-grouping one or more tables referenced in the program into one or more co-groups according to one or more operators of the formal language used in the program; determining zero or more user-specified operations to be performed on each co-group according to the one or more operators of the formal language used in the program; translating the program into one or more jobs according to the one or more operators of the formal language used in the program and based on the one or more co-groups and the zero or more operations to be performed on each co-group, wherein each job comprises one or more structured calls to an application programming interface for encoded logic that is operable to generate a plurality of tasks for the parallel processing of the job on one or more data processing devices in a distributed system; assigning the tasks to data-processing devices in a distributed system; and processing the tasks in parallel. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification