Integrating map-reduce into a distributed relational database
First Claim
Patent Images
1. A distributed database comprising:
- a plurality of segment hosts each comprising one or more processors; and
a master host comprising one or more processors, wherein;
the master host is programmed to perform operations comprising;
submitting a map-reduce document as an input to a map-reduce program executing on the master host, the map-reducing program configured to cause operations specified in the map-reduce document to be executed in the distributed database system in parallel, wherein the map-reduce document includes comprises an input source and a map-reduce function definition, wherein;
the input source includes a query in Structured Query Language (SQL), andthe map-reduce function definition defines, in a scripting language that is different from SQL, a map function to be performed on the input source and a reduce function to be performed on results of the map function; and
distributing, using the map-reduce program, the map function and reduce function to the segment hosts as tasks; and
each of the segment hosts is programmed to perform the tasks, including executing, as SQL queries, both the map function and reduce function defined in the map-reduce function definition and the query of the input source.
4 Assignments
0 Petitions
Accused Products
Abstract
A computer readable storage medium includes executable instructions to define a map-reduce document that coordinates processing of data in a distributed database. The map-reduce document complies with a map-reduce specification that integrates map-reduce functions with queries in a query language. The operations specified by the map-reduce document are executed in the distributed database.
11 Citations
18 Claims
-
1. A distributed database comprising:
-
a plurality of segment hosts each comprising one or more processors; and a master host comprising one or more processors, wherein; the master host is programmed to perform operations comprising; submitting a map-reduce document as an input to a map-reduce program executing on the master host, the map-reducing program configured to cause operations specified in the map-reduce document to be executed in the distributed database system in parallel, wherein the map-reduce document includes comprises an input source and a map-reduce function definition, wherein; the input source includes a query in Structured Query Language (SQL), and the map-reduce function definition defines, in a scripting language that is different from SQL, a map function to be performed on the input source and a reduce function to be performed on results of the map function; and
distributing, using the map-reduce program, the map function and reduce function to the segment hosts as tasks; andeach of the segment hosts is programmed to perform the tasks, including executing, as SQL queries, both the map function and reduce function defined in the map-reduce function definition and the query of the input source. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
submitting, by a master host of a distributed database system, a map-reduce document as an input to a map-reduce program executing on the master host, the map-reducing program configured to cause operations specified in the map-reduce document to be executed in the distributed database system in parallel, wherein the map-reduce document comprises an input source and a map-reduce function definition, wherein; the input source includes a query in Structured Query Language (SQL), and the map-reduce function definition defines, in a scripting language that is different from SQL, a map function to be performed on the input source and a reduce function to be performed on results of the map function; and distributing, by the master host using the map-reduce program, the map function and reduce function to a plurality of segment hosts of the distributed database system as tasks; and performing the tasks by the segment hosts, including executing, as SQL queries, both the map function and reduce function defined in the map-reduce function definition and the query of the input source, wherein each host of the distributed database system includes one or more processors. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A computer readable non-transitory storage medium storing instructions that, when executed by a distributed database system, causes the distributed database system to perform operations comprising:
-
submitting, by a master host of a distributed database system, a map-reduce document as an input to a map-reduce program executing on the master host, the map-reducing program configured to cause operations specified in the map-reduce document to be executed in the distributed database system in parallel, wherein the map-reduce document comprises an input source and a map-reduce function definition, wherein; the input source includes a query in Structured Query Language (SQL), and the map-reduce function definition defines, in a scripting language that is different from SQL, a map function to be performed on the input source and a reduce function to be performed on results of the map function; and distributing, by the master host using the map-reduce program, the map function and reduce function to a plurality of segment hosts of the distributed database system as tasks; and performing the tasks by the segment hosts, including executing, as SQL queries, both the map function and reduce function defined in the map-reduce function definition and the query of the input source, wherein each host of the distributed database system includes one or more processors. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification