Efficient data deployment for a parallel data processing system
First Claim
1. A method for deploying a data block comprising:
- at a virtualization platform running a parallel processing application that includes one or more virtual data nodes;
receiving a first command to write a data block to a storage device;
determining whether the first command was sent by a first virtual data node; and
if the first command was sent by a first virtual data node;
writing the data block to a first location in the storage device,returning the first location to the first virtual data node,determining whether the data should be replicated, andif the data should be replicated, instructing the storage device to internally make a copy of the data block to a second location in the storage device and storing the second location in a tracking structure.
2 Assignments
0 Petitions
Accused Products
Abstract
This document describes techniques for efficient data deployment for a parallel data processing system. In one embodiment, a virtualization platform running a parallel processing application that includes one or more virtual data nodes receives a first command to write a data block to a storage device. The platform then determines whether the first command was sent by a first virtual data node. If the first command was sent by a first virtual data node, the platform then 1) writes, the data block to a first location in the storage device; 2) returns the first location to the first virtual data node and 3) determines whether the data should be replicated. If the data should be replicated, the platform instructs the storage device to make a copy of the data block to a second location in the storage device and storing the second location in a tracking structure.
-
Citations
18 Claims
-
1. A method for deploying a data block comprising:
at a virtualization platform running a parallel processing application that includes one or more virtual data nodes; receiving a first command to write a data block to a storage device; determining whether the first command was sent by a first virtual data node; and if the first command was sent by a first virtual data node; writing the data block to a first location in the storage device, returning the first location to the first virtual data node, determining whether the data should be replicated, and if the data should be replicated, instructing the storage device to internally make a copy of the data block to a second location in the storage device and storing the second location in a tracking structure. - View Dependent Claims (2, 3, 4, 5, 6)
-
7. A computer system for deploying a data block comprising:
-
a processor; a volatile memory; a nonvolatile storage device; and a non-transitory computer readable storage medium having stored thereon program code that, when executed by the processor, causes the processor to; at a virtualization platform running a parallel processing application that includes one or more virtual data nodes; receiving a first command to write a data block to a storage device; determining whether the first command was sent by a first virtual data node; and if the first command was sent by a first virtual data node; writing the data block to a first location in the storage device, returning the first location to the first virtual data node, determining whether the data should be replicated, and if the data should be replicated, instructing the storage device to internally make a copy of the data block to a second location in the storage device and storing the second location in a tracking structure. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory computer readable storage medium having stored thereon program code executable by computer system, the program code embodying a method for deploying a data block comprising:
at a virtualization platform running a parallel processing application that includes one or more virtual data nodes; receiving a first command to write a data block to a storage device; determining whether the first command was sent by a first virtual data node; and if the first command was sent by a first virtual data node; writing the data block to a first location in the storage device, returning the first location to the first virtual data node, determining whether the data should be replicated, and if the data should be replicated, instructing the storage device to internally make a copy of the data block to a second location in the storage device and storing the second location in a tracking structure. - View Dependent Claims (14, 15, 16, 17, 18)
Specification