Shuffle embedded distributed storage system supporting virtual merge and method thereof
First Claim
1. A shuffle embedded distributed storage system supporting virtual merge, the system comprising:
- a distributed shared storage configured to store a virtual merged file;
a plurality of map servers connected to the distributed shared storage via a network, and configured to perform a map function and record a map result data computed as a result of the map function in an aligned format in the distributed shared storage by means of a map result file; and
a plurality of reduce servers connected to the distributed shared storage and the map servers via the network for performing a reduce function on the map result files received from the virtual merged file of the distributed shared storage,wherein the virtual merged file comprises a list of the map result files generated by the plurality of map servers, and an identifier of one of the plurality of reduce servers to which the virtual merged file is to be transmitted, wherein the map result files are registered in the virtual merged file,wherein one or more of the plurality of map servers transmits an identifier of the virtual merged file to one or more of the plurality of the reduce servers, andwherein, in response to receiving a request for data reading from a selected one of the plurality of reduce servers, the distributed shared storage searches the virtual merged file having an identifier that is identical to the selected reduce server, and reads and aligns data of the map result files included in the searched virtual merged file consecutively, and transmits the aligned data to the selected reduce server without merging the map result files.
1 Assignment
0 Petitions
Accused Products
Abstract
Provided herein is a shuffle embedded distributed storage system and method supporting virtual merge, the system and method including a distributed shared storage configured to store a virtual merged file; a plurality of map servers connected to the distributed shared storage via a network, and configured to perform a map function and record a map result data computed as a result of the map function in the distributed shared storage by means of a map result file; and a plurality of reduce servers connected to the distributed shared storage and the map servers via the network, wherein the virtual merged file includes a list of the map result files recorded by the plurality of map servers, and an identifier of a reduce server to which the virtual merged file is to be transmitted.
19 Citations
6 Claims
-
1. A shuffle embedded distributed storage system supporting virtual merge, the system comprising:
-
a distributed shared storage configured to store a virtual merged file; a plurality of map servers connected to the distributed shared storage via a network, and configured to perform a map function and record a map result data computed as a result of the map function in an aligned format in the distributed shared storage by means of a map result file; and a plurality of reduce servers connected to the distributed shared storage and the map servers via the network for performing a reduce function on the map result files received from the virtual merged file of the distributed shared storage, wherein the virtual merged file comprises a list of the map result files generated by the plurality of map servers, and an identifier of one of the plurality of reduce servers to which the virtual merged file is to be transmitted, wherein the map result files are registered in the virtual merged file, wherein one or more of the plurality of map servers transmits an identifier of the virtual merged file to one or more of the plurality of the reduce servers, and wherein, in response to receiving a request for data reading from a selected one of the plurality of reduce servers, the distributed shared storage searches the virtual merged file having an identifier that is identical to the selected reduce server, and reads and aligns data of the map result files included in the searched virtual merged file consecutively, and transmits the aligned data to the selected reduce server without merging the map result files.
-
-
2. A shuffle embedded distributed storage method supporting virtual merge, the method comprising:
-
reading, by a plurality of map servers, a map input file from a distributed shared storage, and performing a map function; performing the map function, and recording a computed map result data in an aligned format in the distributed shared storage by means of a map result file; registering information on map result files recorded by the plurality of map servers in a virtual merged file, wherein the virtual merged file comprises a list of the map result files generated by the plurality of map servers, and an identifier of one of the plurality of reduce servers to which the virtual merged file is to be transmitted, transmitting by one or more of the plurality of map servers an identifier of the virtual merged file to one or more of the plurality of the reduce servers, in response to receiving a request for data reading from a selected one of the plurality of reduce servers, searching the virtual merged file having an identifier that is identical to the selected reduce server, and reading and aligning data of the map result files included in the searched virtual merged file consecutively, and transmitting the aligned data to the selected reduce server without merging the map result files. - View Dependent Claims (3, 4, 5, 6)
-
Specification