Parallel streaming of external data
First Claim
1. A computer-implemented method comprising:
- receiving, by a first distributed system that comprises multiple segment nodes, a query that requests rows of an external table representing data stored in multiple data fragments on multiple respective data nodes in a second distributed system, wherein each of the data nodes of the second distributed system are distinct from the segment nodes of the first distributed system, wherein the first distributed system operates under control of a first master node and the second distributed system operates under control of a second master node that is distinct from the first master node, and wherein the second distributed system stores data in a second format that is different from a first format of data stored by the first distributed system;
initiating communication by a plurality of extension services between nodes of the first distributed system and nodes of the second distributed system, including a first extension service communicating with the master nodes of the first and second distributed system and a plurality of second extension services communicating between data nodes of the second distributed system and segment nodes of the first distributed system;
requesting, by the master node of the first distributed system from the first extension service, location information for a plurality of the data fragments in the second distributed system represented by the external table;
allocating each data fragment of the plurality of data fragments to a respective one of the plurality of second extension services and each of the second extension services to one of a plurality of segment nodes of the first distributed system, including providing each segment node with location information for respective fragments of the plurality of fragments allocated to the segment node;
receiving, by each of the plurality of segment nodes of the first distributed system, a stream of converted data in the first format, wherein each segment node communicates with a respective extension service to;
obtain fragments in the second format from one or more data nodes of the second distributed system according to the location information for the respective fragments,convert the obtained fragments from the second format to the first format, andprovide to the segment node of the first distributed system a stream of converted data corresponding to the rows of the external table; and
computing a result for the received query using the stream of converted data corresponding to the rows of the external table.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for streaming external data in parallel from a second distributed system to a first distributed system. One of the methods includes receiving a query that requests a join of first rows of a first table in a first distributed system with second rows of an external table, the external table representing data in a second distributed system. Each of the segment nodes communicates with a respective extension service that obtains fragments from one or more data nodes of the second distributed system according to location information for the respective fragments, and provides to the segment node a stream of data corresponding to second rows of the external table. Each of the segment nodes computes joined rows between the first rows of the first table and the stream of data corresponding to second rows of the external table.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
receiving, by a first distributed system that comprises multiple segment nodes, a query that requests rows of an external table representing data stored in multiple data fragments on multiple respective data nodes in a second distributed system, wherein each of the data nodes of the second distributed system are distinct from the segment nodes of the first distributed system, wherein the first distributed system operates under control of a first master node and the second distributed system operates under control of a second master node that is distinct from the first master node, and wherein the second distributed system stores data in a second format that is different from a first format of data stored by the first distributed system; initiating communication by a plurality of extension services between nodes of the first distributed system and nodes of the second distributed system, including a first extension service communicating with the master nodes of the first and second distributed system and a plurality of second extension services communicating between data nodes of the second distributed system and segment nodes of the first distributed system; requesting, by the master node of the first distributed system from the first extension service, location information for a plurality of the data fragments in the second distributed system represented by the external table; allocating each data fragment of the plurality of data fragments to a respective one of the plurality of second extension services and each of the second extension services to one of a plurality of segment nodes of the first distributed system, including providing each segment node with location information for respective fragments of the plurality of fragments allocated to the segment node; receiving, by each of the plurality of segment nodes of the first distributed system, a stream of converted data in the first format, wherein each segment node communicates with a respective extension service to; obtain fragments in the second format from one or more data nodes of the second distributed system according to the location information for the respective fragments, convert the obtained fragments from the second format to the first format, and provide to the segment node of the first distributed system a stream of converted data corresponding to the rows of the external table; and computing a result for the received query using the stream of converted data corresponding to the rows of the external table. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving, by a first distributed system that comprises multiple segment nodes, a query that requests rows of an external table representing data stored in multiple data fragments on multiple respective data nodes in a second distributed system, wherein each of the data nodes of the second distributed system are distinct from the segment nodes of the first distributed system, wherein the first distributed system operates under control of a first master node and the second distributed system operates under control of a second master node that is distinct from the first master node, and wherein the second distributed system stores data in a second format that is different from a first format of data stored by the first distributed system; initiating communication by a plurality of extension services between nodes of the first distributed system and nodes of the second distributed system, including a first extension service communicating with the master nodes of the first and second distributed system and a plurality of second extension services communicating between data nodes of the second distributed system and segment nodes of the first distributed system; requesting, by the master node of the first distributed system from the first extension service, location information for a plurality of the data fragments in the second distributed system represented by the external table; allocating each data fragment of the plurality of data fragments to a respective one of the plurality of second extension services and each of the second extension services to one of a plurality of segment nodes of the first distributed system, including providing each segment node with location information for respective fragments of the plurality of fragments allocated to the segment node; receiving, by each of the plurality of segment nodes of the first distributed system, a stream of converted data in the first format, wherein each segment node communicates with a respective extension service to; obtain fragments in the second format from one or more data nodes of the second distributed system according to the location information for the respective fragments, convert the obtained fragments from the second format to the first format, and provide to the segment node of the first distributed system a stream of converted data corresponding to the rows of the external table; and computing a result for the received query using the stream of converted data corresponding to the rows of the external table. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
19. A computer program product, encoded on one or more non-transitory computer storage media, comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
receiving, by a first distributed system that comprises multiple segment nodes, a query that requests rows of an external table representing data stored in multiple data fragments on multiple respective data nodes in a second distributed system, wherein each of the data nodes of the second distributed system are distinct from the segment nodes of the first distributed system, wherein the first distributed system operates under control of a first master node and the second distributed system operates under control of a second master node that is distinct from the first master node, and wherein the second distributed system stores data in a second format that is different from a first format of data stored by the first distributed system; initiating communication by a plurality of extension services between nodes of the first distributed system and nodes of the second distributed system, including a first extension service communicating with the master nodes of the first and second distributed system and a plurality of second extension services communicating between data nodes of the second distributed system and segment nodes of the first distributed system; requesting, by the master node of the first distributed system from the first extension service, location information for a plurality of the data fragments in the second distributed system represented by the external table; allocating each data fragment of the plurality of data fragments to a respective one of the plurality of second extension services and each of the second extension services to one of a plurality of segment nodes of the first distributed system, including providing each segment node with location information for respective fragments of the plurality of fragments allocated to the segment node; receiving, by each of the plurality of segment nodes of the first distributed system, a stream of converted data in the first format, wherein each segment node communicates with a respective extension service to; obtain fragments in the second format from one or more data nodes of the second distributed system according to the location information for the respective fragments, convert the obtained fragments from the second format to the first format, and provide to the segment node of the first distributed system a stream of converted data corresponding to the rows of the external table; and computing a result for the received query using the stream of converted data corresponding to the rows of the external table. - View Dependent Claims (20)
-
Specification