Multi-tenant distributed computing and database

US 10,642,800 B2
Filed: 07/11/2014
Issued: 05/05/2020
Est. Priority Date: 10/25/2013
Status: Active Grant

First Claim

Patent Images

1. A method for executing a distributed computing application within a virtualized computing environment, the method comprising:

instantiating a first plurality of virtual machines (VMs) on a plurality of hosts as data nodes of a first distributed file system,wherein the data nodes of the first distributed file system are concurrently accessible by a plurality of compute VMs including a first compute VM and a second compute VM,wherein each compute VM is configured to request and obtain data blocks from one or more of the data nodes of the first distributed file system, the data blocks containing a portion of an input data set, and process the portion of the input data set stored in the first distributed file system,wherein the first compute VM and the second compute VM are configured to concurrently process respectively a first portion of the input data set and a second portion of the input data set, andwherein data nodes and compute VMs are separate VMs; and

instantiating a second plurality of VMs on the plurality of hosts as data nodes of a second distributed file system,wherein data nodes of the second distributed file system are concurrently accessible by a plurality of region server nodes associated with a distributed database application, the plurality of region server nodes including a first region server node and a second region server node,wherein each region server node is a virtual machine configured to request from and obtain data blocks from one or more of the data nodes of the second distributed file system, the data blocks containing a portion of a data table and to perform database operations on the portion of a data table stored in the second distributed file system,wherein the first region server node and the second region server node are configured to concurrently process respectively a first portion of the data table and a second portion of the data table, andwherein the data nodes and the region server nodes are separate VMs;

wherein each compute VM is coupled to at least one of the plurality of region server nodes for access to a portion of the data table which the at least one of the plurality of region server nodes is configured to serve; and

wherein the access provided by the coupling allows each compute VM to request from and obtain data blocks from one or more of the data nodes of the second distributed file system, the data blocks containing a portion of the data table, and to process as part of the distributed computing application the portion of the data table which the at least one of the plurality of region server nodes is configured to process.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A distributed computing application is described that provides a highly elastic and multi-tenant platform for Hadoop applications and other workloads running in a virtualized environment. Deployments of a distributed computing application, such as Hadoop, may be executed concurrently with a distributed database application, such as HBase, using a shared instance of a distributed filesystem, or in other cases, multiple instances of the distributed filesystem. Computing resources allocated to region server nodes executing as VMs may be isolated from compute VMs of the distributed computing application, as well as from data nodes executing as VMs of the distributed filesystem.

20 Citations

View as Search Results

20 Claims

1. A method for executing a distributed computing application within a virtualized computing environment, the method comprising:
- instantiating a first plurality of virtual machines (VMs) on a plurality of hosts as data nodes of a first distributed file system,wherein the data nodes of the first distributed file system are concurrently accessible by a plurality of compute VMs including a first compute VM and a second compute VM,wherein each compute VM is configured to request and obtain data blocks from one or more of the data nodes of the first distributed file system, the data blocks containing a portion of an input data set, and process the portion of the input data set stored in the first distributed file system,wherein the first compute VM and the second compute VM are configured to concurrently process respectively a first portion of the input data set and a second portion of the input data set, andwherein data nodes and compute VMs are separate VMs; and
  
  instantiating a second plurality of VMs on the plurality of hosts as data nodes of a second distributed file system,wherein data nodes of the second distributed file system are concurrently accessible by a plurality of region server nodes associated with a distributed database application, the plurality of region server nodes including a first region server node and a second region server node,wherein each region server node is a virtual machine configured to request from and obtain data blocks from one or more of the data nodes of the second distributed file system, the data blocks containing a portion of a data table and to perform database operations on the portion of a data table stored in the second distributed file system,wherein the first region server node and the second region server node are configured to concurrently process respectively a first portion of the data table and a second portion of the data table, andwherein the data nodes and the region server nodes are separate VMs;
  
  wherein each compute VM is coupled to at least one of the plurality of region server nodes for access to a portion of the data table which the at least one of the plurality of region server nodes is configured to serve; and
  
  wherein the access provided by the coupling allows each compute VM to request from and obtain data blocks from one or more of the data nodes of the second distributed file system, the data blocks containing a portion of the data table, and to process as part of the distributed computing application the portion of the data table which the at least one of the plurality of region server nodes is configured to process.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein instantiating the second plurality of VMs on the plurality of hosts to form the second distributed file system comprises:
    - instantiating a first VM comprising the first region server node and a first data node executing on the same VM, wherein the first data node is configured to store a portion of the data table associated with the first region server node.
  - 3. The method of claim 2, wherein the plurality of compute VMs comprises a first resource pool associated with the distributed computing application, and the second plurality of VMs comprises a second resource pool associated with the distributed database application, wherein the second resource pool has a higher priority for computing resources than the first resource pool.
  - 4. The method of claim 1,wherein instantiating the second plurality of VMs on the plurality of hosts to form the second distributed file system comprises:
    - instantiating a first VM comprising the first region server node; and
      
      instantiating a second VM comprising a first data node configured to store a portion of the data table associated with the first region server node, wherein the first VM and the second VM are executing on the same host.
  - 5. The method of claim 4,wherein the plurality of compute VMs comprises a first resource pool associated with the distributed computing application,wherein the first VM is a member of a second resource pool associated with the distributed database application,wherein the second VM is a member of a third resource pool associated with the second distributed file system.
  - 6. The method of claim 4, further comprising:
    - responsive to an indication to expand the distributed database application, instantiating a third VM comprising the second region server node configured to store a portion of the data table associated with the second region server node within the second VM; and
      
      responsive to an indication to shrink the distributed database application, powering off the first VM comprising the first region server node.
  - 7. The method of claim 1, further comprising:
    - executing an HBase query on at least one of the plurality of region server nodes while concurrently executing a MapReduce job on the plurality of compute VMs.

8. A non-transitory computer-readable storage medium comprising instructions that, when executed in a computing device, execute a distributed computing application within a virtualized computing environment, by performing the steps of:
- instantiating a first plurality of virtual machines (VMs) on a plurality of hosts as data nodes of a first distributed file system,wherein the data nodes of the first distributed file system are concurrently accessible by a plurality of compute VMs including a first compute VM and a second compute VM,wherein each compute VM is configured to request and obtain data blocks from one or more of the data nodes of the first distributed file system, the data blocks containing a portion of an input data set, and process the portion of the input data set stored in the first distributed file system,wherein the first compute VM and the second compute VM are configured to concurrently process respectively a first portion of the input data set and a second portion of the input data set, andwherein data nodes and compute VMs are separate VMs; and
  
  instantiating a second plurality of VMs on the plurality of hosts as data nodes of a second distributed file system,wherein data nodes of the second distributed file system are concurrently accessible by a plurality of region server nodes associated with a distributed database application, the plurality of regions server nodes including a first region server node and a second region server node, andwherein each region server node is a virtual machine configured to request from and obtain data blocks from one or more of the data nodes of the second distributed file system, the data blocks containing a portion of a data table and to perform database operations on the portion of a data table stored in the second distributed file system,wherein the first region server node and the second region server node are configured to concurrently process respectively a first portion of the data table and a second portion of the data table,wherein the data nodes and the region server nodes are separate VMs;
  
  wherein each compute VM is coupled to at least one of the plurality of region server nodes for access to a portion of the data table which the at least one of the plurality of region server nodes is configured to serve; and
  
  wherein the access provided by the coupling allows each compute VM to request from and obtain data blocks from one or more of the data nodes of the second distributed file system, the data blocks containing a portion of the data table, and to process as part of the distributed computing application the portion of the data table which the at least one of the plurality of region server nodes is configured to process.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The non-transitory computer-readable storage medium of claim 8, wherein the step of instantiating the second plurality of VMs on the plurality of hosts to form the second distributed file system comprises:
    - instantiating a first VM comprising the first region server node and a first data node executing on the same VM, wherein the first data node is configured to store a portion of the data table associated with the first region server node.
  - 10. The non-transitory computer-readable storage medium of claim 9,wherein the plurality of compute VMs comprises a first resource pool associated with the distributed computing application, and the second plurality of VMs comprises a second resource pool associated with the distributed database application,wherein the second resource pool has a higher priority for computing resources than the first resource pool.
  - 11. The non-transitory computer-readable storage medium of claim 8, wherein the step of instantiating the second plurality of VMs on the plurality of hosts to form the second distributed file system comprises:
    - instantiating a first VM comprising the first region server node; and
      
      instantiating a second VM comprising a first data node configured to store a portion of the data table associated with the first region server node, wherein the first VM and the second VM are executing on the same host.
  - 12. The non-transitory computer-readable storage medium of claim 11,wherein the plurality of compute VMs comprises a first resource pool associated with the distributed computing application,wherein the first VM is a member of a second resource pool associated with the distributed database application, andwherein the second VM is a member of a third resource pool associated with the second distributed file system.
  - 13. The non-transitory computer-readable storage medium of claim 11, further comprising:
    - responsive to an indication to expand the distributed database application, instantiating a third VM comprising the second region server node; and
      
      responsive to an indication to shrink the distributed database application, powering off the first VM comprising the first region server node.
  - 14. The non-transitory computer-readable storage medium of claim 8, further comprising:
    - executing an HBase query on at least one of the plurality of region server nodes while concurrently executing a MapReduce job on the plurality of compute VMs.

15. A host computer system for executing a distributed computing application within a virtualized computing environment, the host computer system comprising:
- a storage device having a first virtual disk and a second virtual disk;
  
  a processor programmed to carry out the steps of;
  
  executing a first virtual machine (VM) on the host computer system, wherein the first VM includes the first virtual disk and is one of a plurality of data nodes of a first distributed file system, the plurality of data nodes of first distributed file system being concurrently accessible by a plurality of compute VMs including a first compute VM and a second compute VM,wherein each compute VM is configured to request from and obtain data blocks from one or more of the data nodes of the first distributed file system, the data blocks containing a portion of an input data set, and process the portion of the input data set stored in the first distributed file system, andwherein data nodes and compute VMs are separate VMs;
  
  executing, on the host computer system, the first compute VM and the second compute VM, wherein the first compute VM and the second compute VM are configured to concurrently process respectively a first portion of the input data set and a second portion of the input data set stored in the first virtual disk; and
  
  executing a third VM on the host computer system,wherein the third VM includes the second virtual disk and is one of a plurality of data nodes of a second distributed file system,wherein the data nodes of the second distributed file system are concurrently accessible by a plurality of region server nodes associated with a distributed database application, the plurality of region server nodes including a first region server node and a second region server node,wherein each region server node is a virtual machine configured to request from and obtain data blocks from one or more of the data nodes of the second distributed file system, the data blocks containing a portion of a data table and to perform database operations on the portion of a data table stored in the second distributed file system,wherein the first region server node and the second region server node are configured to concurrently process respectively a first portion of the data table and a second portion of the data table, andwherein the data nodes and the region server nodes are separate VMs;
  
  wherein each compute VM is coupled to at least one of the plurality of region server nodes for access to a portion of the data table which the at least one of the plurality of region server nodes is configured to serve; and
  
  wherein the access provided by the coupling allows each compute VM to process the portion of the data table which the at least one of the plurality of region server nodes is configured to serve as part of the distributed computing application.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The host computer system of claim 15,wherein the third VM comprises the first region server node and a first data node executing on the same VM, andwherein the first data node is configured to store a portion of the data table associated with the first region server node.
  - 17. The host computer system of claim 16,wherein the first and second compute VMs are members of a first resource pool associated with the distributed computing application, and the third VM is a member of a second resource pool associated with the distributed database application, andwherein the second resource pool has a higher priority for computing resources of the host computer system than the first resource pool.
  - 18. The host computer system of claim 15, wherein the processor is further programmed to carry out the steps of:
    - executing a fourth VM comprising the first region server node;
      
      wherein the third VM comprises a first data node configured to store a portion of the data table associated with the first region server node.
  - 19. The host computer system of claim 18,wherein the first and second compute VMs comprise a first resource pool associated with the distributed computing application,wherein the fourth VM is a member of a second resource pool associated with the distributed database application, andwherein the third VM is a member of a third resource pool associated with the second distributed file system.
  - 20. The host computer system of claim 18, wherein the processor is further programmed to carry out the steps of:
    - responsive to an indication to expand the distributed database application, instantiating a fifth VM comprising a third region server node configured to store a portion of the data table associated with the first region server node within the third VM; and
      
      responsive to an indication to shrink the distributed database application, powering off the fourth VM comprising the first region server node.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Vmware LLC (Broadcom, Inc.)
Original Assignee
VMware, Inc. (Broadcom, Inc.)
Inventors
Gummaraju, Jayanth, Lu, Yunshan, Magdon-Ismail, Tariq
Primary Examiner(s)
Park, Grace
Assistant Examiner(s)
Cheung, Hubert

Application Number

US14/329,132
Publication Number

US 20150121371A1
Time in Patent Office

2,125 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/182   Distributed file systems

G06F 16/188   Virtual file systems

G06F 16/27   Replication, distribution o...

G06F 2009/45562   Creating, deleting, cloning...

G06F 9/45558   Hypervisor-specific managem...

Multi-tenant distributed computing and database

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

20 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-tenant distributed computing and database

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

20 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links