Dynamic node group allocation

US 9,762,672 B2
Filed: 06/15/2015
Issued: 09/12/2017
Est. Priority Date: 06/15/2015
Status: Expired due to Fees

First Claim

Patent Images

1. A method, comprising:

connecting a parallel application server to a data source structure, wherein the data source structure contains a big data distributed file system, wherein the big data distributed file system contains multiple nodes and data blocks;

in response to the parallel application operating the data source structure within the multiple nodes of the big data distributed file system, the parallel application server and the data source structure performing read and write operations on the data blocks in a local mode setting;

in response to the parallel application operating the data source structure outside of the multiple nodes of the big data distributed file system, the parallel application server and the data source structure performing read and write operations on the data blocks in a remote mode setting;

in response to a consumer job starting to read one or more files in the big data distributed file system, retrieving node group information for the one or more files to be read, wherein the node group information identifies nodes from the multiple nodes on which a producer job wrote the one or more files;

implementing a node grouping mechanism to read and write the data blocks within the local mode setting over the remote mode setting;

assigning the consumer job to the nodes identified by the node group information to allow for reading of the one or more files by the consumer job within the local mode setting, wherein the local mode setting reads and writes the data blocks;

in response to assigning the consumer job to the nodes identified by the node group information, generating a configuration file, wherein the configuration file comprises a dynamically generated configuration file and a non-dynamically generated configuration file;

wherein the dynamically generated configuration file corresponds to the consumer job and the dynamically generated configuration file is dynamically assigned to the node group for the consumer job;

in response to retrieving the node group information, requesting logical resources;

executing the consumer job with the configuration file identifying the nodes on which the consumer job is to run; and

in response to determining that logical resources cannot be allocated in the nodes identified by the node group information, attempting to allocate logical resources in nodes close to the nodes identified by the node group information.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Provided are techniques for improving data locality for parallel applications running in a big data distributed file system with a dynamic node group. In response to a consumer job starting to read one or more files in a big data distributed file system having multiple nodes, node group information for the one or more files to be read is retrieved, wherein the node group information identifies nodes from the multiple nodes on which a producer job wrote the one or more files, and the consumer job is assigned to the nodes identified by the node group information to allow for local reading of the one or more files by the consumer job.

Citations

9 Claims

1. A method, comprising:
- connecting a parallel application server to a data source structure, wherein the data source structure contains a big data distributed file system, wherein the big data distributed file system contains multiple nodes and data blocks;
  
  in response to the parallel application operating the data source structure within the multiple nodes of the big data distributed file system, the parallel application server and the data source structure performing read and write operations on the data blocks in a local mode setting;
  
  in response to the parallel application operating the data source structure outside of the multiple nodes of the big data distributed file system, the parallel application server and the data source structure performing read and write operations on the data blocks in a remote mode setting;
  
  in response to a consumer job starting to read one or more files in the big data distributed file system, retrieving node group information for the one or more files to be read, wherein the node group information identifies nodes from the multiple nodes on which a producer job wrote the one or more files;
  
  implementing a node grouping mechanism to read and write the data blocks within the local mode setting over the remote mode setting;
  
  assigning the consumer job to the nodes identified by the node group information to allow for reading of the one or more files by the consumer job within the local mode setting, wherein the local mode setting reads and writes the data blocks;
  
  in response to assigning the consumer job to the nodes identified by the node group information, generating a configuration file, wherein the configuration file comprises a dynamically generated configuration file and a non-dynamically generated configuration file;
  
  wherein the dynamically generated configuration file corresponds to the consumer job and the dynamically generated configuration file is dynamically assigned to the node group for the consumer job;
  
  in response to retrieving the node group information, requesting logical resources;
  
  executing the consumer job with the configuration file identifying the nodes on which the consumer job is to run; and
  
  in response to determining that logical resources cannot be allocated in the nodes identified by the node group information, attempting to allocate logical resources in nodes close to the nodes identified by the node group information.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, further comprising:
    - storing a full path file name along with the node group information in a table.
  - 3. The method of claim 1, wherein software is provided as a service in a cloud environment.

4. A computer system, comprising:
- one or more processors, one or more computer-readable memories and one or more computer-readable, tangible storage devices;
  
  a parallel application server connected to a data source structure, wherein the data source structure contains a big data distributed file system, wherein the big data distributed file system contains multiple nodes and data blocks; and
  
  program instructions, stored on at least one of the one or more computer-readable, tangible storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to perform;
  
  in response to the parallel application operating the data source structure within the multiple nodes of the big data distributed file system, the parallel application server and the data source structure performing read and write operations on the data blocks within a local mode setting;
  
  in response to the parallel application operating the data source structure outside the multiple nodes of the big data distributed file system, the parallel application server and the data source structure performing read and write operations on the data blocks within a remote mode setting;
  
  in response to a consumer job starting to read one or more files in the big data distributed file system, retrieving node group information for the one or more files to be read, wherein the node group information identifies nodes from the multiple nodes on which a producer job wrote the one or more files;
  
  implementing a node grouping mechanism to read and write the data blocks within the local mode setting over the remote mode setting;
  
  assigning the consumer job to the nodes identified by the node group information to allow for reading of the one or more files by the consumer job within the local mode setting, wherein the local mode setting reads and writes the data blocks;
  
  in response to assigning the consumer job to the nodes identified by the node group information, generating a configuration file, wherein the configuration file comprises a dynamically generated configuration file and a non-dynamically generated configuration file;
  
  wherein the dynamically generated configuration file corresponds to the consumer job and the dynamically generated configuration file is dynamically assigned to the node group for the consumer job;
  
  in response to retrieving the node group information, requesting logical resources;
  
  executing the consumer job with the configuration file identifying the nodes on which the consumer job is to run; and
  
  in response to determining that logical resources cannot be allocated in the nodes identified by the node group information, attempting to allocate logical resources in nodes close to the nodes identified by the node group information.
- View Dependent Claims (5, 6)
- - 5. The computer system of claim 4, wherein the operations further comprise:
    - storing a full path file name along with the node group information in a table.
  - 6. The computer system of claim 4, wherein a Software as a Service (SaaS) is configured to perform the system operations.

7. A computer program product, the computer program product comprising a computer readable storage medium having program code embodied therewith, the program code executable by at least one processor to perform:
- connecting a parallel application server to a data source structure, wherein the data source structure contains a big data distributed file system, wherein the big data distributed file system contains multiple nodes and data blocks;
  
  in response to the parallel application operating the data source structure within the multiple nodes of the big data distributed file system, the parallel application server and the data source structure perform read and write operations on the data blocks within a local mode setting;
  
  in response to the parallel application operating the data source structure outside the multiple nodes of the big data distributed file system, the parallel application server and the data source structure perform read and write operations on the data blocks within a remote mode setting;
  
  in response to a consumer job starting to read one or more files in the big data distributed file system, retrieving node group information for the one or more files to be read, wherein the node group information identifies nodes from the multiple nodes on which a producer job wrote the one or more files;
  
  implementing a node grouping mechanism to read and write the data blocks within the local mode setting over the remote mode setting;
  
  assigning the consumer job to the nodes identified by the node group information to allow for reading of the one or more files by the consumer job within the local mode setting, wherein the local mode setting reads and writes the data blocks;
  
  in response to assigning the consumer job to the nodes identified by the node group information, generating a configuration file, wherein the configuration file comprises a dynamically generated configuration file and a non-dynamically generated configuration file;
  
  wherein the dynamically generated configuration file corresponds to the consumer job and the dynamically generated configuration file is dynamically assigned to the node group for the consumer job;
  
  in response to retrieving the node group information, requesting logical resources;
  
  executing the consumer job with the configuration file identifying the nodes on which the consumer job is to run; and
  
  in response to determining that logical resources cannot be allocated in the nodes identified by the node group information, attempting to allocate logical resources in nodes close to the nodes identified by the node group information.
- View Dependent Claims (8, 9)
- - 8. The computer program product of claim 7, wherein the program code is executable by at the least one processor to perform:
    - storing a full path file name along with the node group information in a table.
  - 9. The computer program product of claim 7, wherein a Software as a Service (SaaS) is configured to perform the computer program product operations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Bonagiri, Krishna K., Jacobson, Eric A., Li, Yong, Liu, Ron E., Pu, Xiaoyan
Primary Examiner(s)
Hussain, Tauqir
Assistant Examiner(s)
Mohammadi, Kamran

Application Number

US14/740,050
Publication Number

US 20160366224A1
Time in Patent Office

820 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/182   Distributed file systems

H04L 47/762   triggered by the network

H04L 67/1097   for distributed storage of ...

Dynamic node group allocation

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Dynamic node group allocation

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links