Multi-tenant distributed computing and database
First Claim
1. A method for executing a distributed computing application within a virtualized computing environment, the method comprising:
- instantiating a first plurality of virtual machines (VMs) on a plurality of hosts as data nodes of a first distributed file system,wherein the data nodes of the first distributed file system are concurrently accessible by a plurality of compute VMs including a first compute VM and a second compute VM,wherein each compute VM is configured to request and obtain data blocks from one or more of the data nodes of the first distributed file system, the data blocks containing a portion of an input data set, and process the portion of the input data set stored in the first distributed file system,wherein the first compute VM and the second compute VM are configured to concurrently process respectively a first portion of the input data set and a second portion of the input data set, andwherein data nodes and compute VMs are separate VMs; and
instantiating a second plurality of VMs on the plurality of hosts as data nodes of a second distributed file system,wherein data nodes of the second distributed file system are concurrently accessible by a plurality of region server nodes associated with a distributed database application, the plurality of region server nodes including a first region server node and a second region server node,wherein each region server node is a virtual machine configured to request from and obtain data blocks from one or more of the data nodes of the second distributed file system, the data blocks containing a portion of a data table and to perform database operations on the portion of a data table stored in the second distributed file system,wherein the first region server node and the second region server node are configured to concurrently process respectively a first portion of the data table and a second portion of the data table, andwherein the data nodes and the region server nodes are separate VMs;
wherein each compute VM is coupled to at least one of the plurality of region server nodes for access to a portion of the data table which the at least one of the plurality of region server nodes is configured to serve; and
wherein the access provided by the coupling allows each compute VM to request from and obtain data blocks from one or more of the data nodes of the second distributed file system, the data blocks containing a portion of the data table, and to process as part of the distributed computing application the portion of the data table which the at least one of the plurality of region server nodes is configured to process.
2 Assignments
0 Petitions
Accused Products
Abstract
A distributed computing application is described that provides a highly elastic and multi-tenant platform for Hadoop applications and other workloads running in a virtualized environment. Deployments of a distributed computing application, such as Hadoop, may be executed concurrently with a distributed database application, such as HBase, using a shared instance of a distributed filesystem, or in other cases, multiple instances of the distributed filesystem. Computing resources allocated to region server nodes executing as VMs may be isolated from compute VMs of the distributed computing application, as well as from data nodes executing as VMs of the distributed filesystem.
20 Citations
20 Claims
-
1. A method for executing a distributed computing application within a virtualized computing environment, the method comprising:
-
instantiating a first plurality of virtual machines (VMs) on a plurality of hosts as data nodes of a first distributed file system, wherein the data nodes of the first distributed file system are concurrently accessible by a plurality of compute VMs including a first compute VM and a second compute VM, wherein each compute VM is configured to request and obtain data blocks from one or more of the data nodes of the first distributed file system, the data blocks containing a portion of an input data set, and process the portion of the input data set stored in the first distributed file system, wherein the first compute VM and the second compute VM are configured to concurrently process respectively a first portion of the input data set and a second portion of the input data set, and wherein data nodes and compute VMs are separate VMs; and instantiating a second plurality of VMs on the plurality of hosts as data nodes of a second distributed file system, wherein data nodes of the second distributed file system are concurrently accessible by a plurality of region server nodes associated with a distributed database application, the plurality of region server nodes including a first region server node and a second region server node, wherein each region server node is a virtual machine configured to request from and obtain data blocks from one or more of the data nodes of the second distributed file system, the data blocks containing a portion of a data table and to perform database operations on the portion of a data table stored in the second distributed file system, wherein the first region server node and the second region server node are configured to concurrently process respectively a first portion of the data table and a second portion of the data table, and wherein the data nodes and the region server nodes are separate VMs; wherein each compute VM is coupled to at least one of the plurality of region server nodes for access to a portion of the data table which the at least one of the plurality of region server nodes is configured to serve; and wherein the access provided by the coupling allows each compute VM to request from and obtain data blocks from one or more of the data nodes of the second distributed file system, the data blocks containing a portion of the data table, and to process as part of the distributed computing application the portion of the data table which the at least one of the plurality of region server nodes is configured to process. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer-readable storage medium comprising instructions that, when executed in a computing device, execute a distributed computing application within a virtualized computing environment, by performing the steps of:
-
instantiating a first plurality of virtual machines (VMs) on a plurality of hosts as data nodes of a first distributed file system, wherein the data nodes of the first distributed file system are concurrently accessible by a plurality of compute VMs including a first compute VM and a second compute VM, wherein each compute VM is configured to request and obtain data blocks from one or more of the data nodes of the first distributed file system, the data blocks containing a portion of an input data set, and process the portion of the input data set stored in the first distributed file system, wherein the first compute VM and the second compute VM are configured to concurrently process respectively a first portion of the input data set and a second portion of the input data set, and wherein data nodes and compute VMs are separate VMs; and instantiating a second plurality of VMs on the plurality of hosts as data nodes of a second distributed file system, wherein data nodes of the second distributed file system are concurrently accessible by a plurality of region server nodes associated with a distributed database application, the plurality of regions server nodes including a first region server node and a second region server node, and wherein each region server node is a virtual machine configured to request from and obtain data blocks from one or more of the data nodes of the second distributed file system, the data blocks containing a portion of a data table and to perform database operations on the portion of a data table stored in the second distributed file system, wherein the first region server node and the second region server node are configured to concurrently process respectively a first portion of the data table and a second portion of the data table, wherein the data nodes and the region server nodes are separate VMs; wherein each compute VM is coupled to at least one of the plurality of region server nodes for access to a portion of the data table which the at least one of the plurality of region server nodes is configured to serve; and wherein the access provided by the coupling allows each compute VM to request from and obtain data blocks from one or more of the data nodes of the second distributed file system, the data blocks containing a portion of the data table, and to process as part of the distributed computing application the portion of the data table which the at least one of the plurality of region server nodes is configured to process. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A host computer system for executing a distributed computing application within a virtualized computing environment, the host computer system comprising:
-
a storage device having a first virtual disk and a second virtual disk; a processor programmed to carry out the steps of; executing a first virtual machine (VM) on the host computer system, wherein the first VM includes the first virtual disk and is one of a plurality of data nodes of a first distributed file system, the plurality of data nodes of first distributed file system being concurrently accessible by a plurality of compute VMs including a first compute VM and a second compute VM, wherein each compute VM is configured to request from and obtain data blocks from one or more of the data nodes of the first distributed file system, the data blocks containing a portion of an input data set, and process the portion of the input data set stored in the first distributed file system, and wherein data nodes and compute VMs are separate VMs; executing, on the host computer system, the first compute VM and the second compute VM, wherein the first compute VM and the second compute VM are configured to concurrently process respectively a first portion of the input data set and a second portion of the input data set stored in the first virtual disk; and executing a third VM on the host computer system, wherein the third VM includes the second virtual disk and is one of a plurality of data nodes of a second distributed file system, wherein the data nodes of the second distributed file system are concurrently accessible by a plurality of region server nodes associated with a distributed database application, the plurality of region server nodes including a first region server node and a second region server node, wherein each region server node is a virtual machine configured to request from and obtain data blocks from one or more of the data nodes of the second distributed file system, the data blocks containing a portion of a data table and to perform database operations on the portion of a data table stored in the second distributed file system, wherein the first region server node and the second region server node are configured to concurrently process respectively a first portion of the data table and a second portion of the data table, and wherein the data nodes and the region server nodes are separate VMs; wherein each compute VM is coupled to at least one of the plurality of region server nodes for access to a portion of the data table which the at least one of the plurality of region server nodes is configured to serve; and wherein the access provided by the coupling allows each compute VM to process the portion of the data table which the at least one of the plurality of region server nodes is configured to serve as part of the distributed computing application. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification