High performance hadoop with new generation instances
First Claim
1. A distributed computing system comprising a plurality of computational clusters, each computational cluster utilized in a MapReduce model and comprising a plurality of compute optimized instances, each instance comprising local instance data storage and in communication with reserved disk storage, wherein processing hierarchy is configured to use local instance data storage unless there is insufficient space on the local instance data storage, thereby providing priority to local instance data storage before providing priority to reserved disk storage,wherein intermediate data files within the MapReduce model are stored at least in part on the compute optimized instances comprising a list of directories residing on the local instance data storage and a list of directories residing on the reserved disk storage, wherein the directories on the reserved disk storage are accessed when a processing request cannot be handled by the local instance data storage;
- andwherein the distributed computer system is configured to auto-scale;
upon adding an instance to a cluster, mounting a reserved disk storage associated with the instance is delayed until disk utilization on a cluster exceeds a predetermined threshold; and
upon terminating an instance from a cluster, terminating any reserved disk storage associated with the instance.
4 Assignments
0 Petitions
Accused Products
Abstract
The present invention is generally directed to a distributed computing system comprising a plurality of computational clusters, each computational cluster comprising a plurality of compute optimized instances, each instance comprising local instance data storage and in communication with reserved disk storage, wherein processing hierarchy provides priority to local instance data storage before providing priority to reserved disk storage.
15 Citations
17 Claims
-
1. A distributed computing system comprising a plurality of computational clusters, each computational cluster utilized in a MapReduce model and comprising a plurality of compute optimized instances, each instance comprising local instance data storage and in communication with reserved disk storage, wherein processing hierarchy is configured to use local instance data storage unless there is insufficient space on the local instance data storage, thereby providing priority to local instance data storage before providing priority to reserved disk storage,
wherein intermediate data files within the MapReduce model are stored at least in part on the compute optimized instances comprising a list of directories residing on the local instance data storage and a list of directories residing on the reserved disk storage, wherein the directories on the reserved disk storage are accessed when a processing request cannot be handled by the local instance data storage; - and
wherein the distributed computer system is configured to auto-scale; upon adding an instance to a cluster, mounting a reserved disk storage associated with the instance is delayed until disk utilization on a cluster exceeds a predetermined threshold; and upon terminating an instance from a cluster, terminating any reserved disk storage associated with the instance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- and
-
12. An auto-scaling distributed computing system comprising:
a plurality of computational clusters, each computational cluster comprising a plurality of compute optimized instances, each instance comprising local instance data storage and in communication with persistent reserved disk storage, wherein; the distributed computing system is configured to use local instance data storage unless there is insufficient space on the local instance data storage; upon adding an instance to a cluster, mounting a reserved disk storage associated with the instance is delayed until disk utilization on a cluster exceeds a predetermined threshold; and upon terminating an instance from a cluster, terminating any reserved disk storage associated with the instance. - View Dependent Claims (13, 14, 15)
-
16. An auto-scaling distributed computing system comprising:
a plurality of computational clusters in a MapReduce and/or a scalable distributed file system model, each computational cluster comprising a plurality of C3 compute optimized instances offered by a cloud services platform, each instance comprising local instance data storage and in communication with persistent reserved disk storage, wherein; the distributed computing system is configured to use local instance data storage unless there is insufficient space on the local instance data storage; upon adding an instance to a cluster, mounting a reserved disk storage associated with the instance is delayed until disk utilization on a cluster exceeds a predetermined threshold; and upon terminating an instance from a cluster, terminating any reserved disk storage associated with the instance.
-
17. A method of providing an auto-scaling distributed computing system comprising a plurality of computational clusters in a MapReduce and/or a scalable distributed file system model, each computational cluster comprising a plurality of C3 compute optimized instances offered by a cloud services platform, each instance comprising local instance data storage and in communication with persistent reserved disk storage, the method comprising:
-
receiving a processing request; using local instance data storage for the processing request, unless; there is insufficient space on the local instance data storage;
orupon a determination that intermediate data files are to be stored at least in part on directories residing on the reserved disk storage; and auto-scaling the system by; upon adding an instance to a cluster, mounting a reserved disk storage associated with the instance is delayed until disk utilization on a cluster exceeds a predetermined threshold; and upon terminating an instance from a cluster, terminating any reserved disk storage associated with the instance.
-
Specification