Systems, methods, and devices for dynamic resource monitoring and allocation in a cluster system
DCFirst Claim
Patent Images
1. A computer cluster comprising:
- a management computing device comprising a supervisor controller configured to coordinate processing of a plurality of sub-jobs for a plurality of overall jobs;
a plurality of computer system nodes configured to communicate with the management computing device, and to perform processing of received sub-jobs, the computing system nodes each comprising;
one or more processors configured to perform computing processes on received sub-jobs;
an agent controller comprising;
a monitoring interface configured to monitor utilization by sub-jobs of system resources of a first computing system node; and
a reporting controller configured to transmit the monitored system resources utilization to the supervisor controller in substantially real-time;
wherein the supervisor controller is configured to assign an additional sub-job to the first computing system node based on determining that the utilization of at least one system resource of the first computing system node is below a threshold level, the determining based on the monitored system resources utilization transmitted from the reporting controller to the supervisor controller;
wherein the at least one system resource of the first computing system node is a first electronic random access memory capacity,wherein the supervisor controller is configured to monitor a second electronic random access memory capacity of a second computing system node,wherein the assigning by the supervisor controller of the additional sub-job comprises assigning the additional sub-job to the first computing system node based on determining that utilization of the first electronic random access memory capacity is below the threshold level,wherein the supervisor controller is configured to prevent assignment of additional sub-jobs to the second computing system node based on determining that utilization of the second electronic random access memory capacity is at or above a threshold value,wherein the additional sub-job requires utilization of the first electronic random access memory capacity that is unused on the first computing system node.
1 Assignment
Litigations
0 Petitions
Accused Products
Abstract
In an embodiment, the systems, methods, and devices disclosed herein comprise a computer resource monitoring and allocation system. In an embodiment, the resource monitoring and allocation system can be configured to allocate computer resources that are available on various nodes of a cluster to specific jobs and/or sub-jobs and/or tasks and/or processes.
78 Citations
36 Claims
-
1. A computer cluster comprising:
-
a management computing device comprising a supervisor controller configured to coordinate processing of a plurality of sub-jobs for a plurality of overall jobs; a plurality of computer system nodes configured to communicate with the management computing device, and to perform processing of received sub-jobs, the computing system nodes each comprising; one or more processors configured to perform computing processes on received sub-jobs; an agent controller comprising; a monitoring interface configured to monitor utilization by sub-jobs of system resources of a first computing system node; and a reporting controller configured to transmit the monitored system resources utilization to the supervisor controller in substantially real-time; wherein the supervisor controller is configured to assign an additional sub-job to the first computing system node based on determining that the utilization of at least one system resource of the first computing system node is below a threshold level, the determining based on the monitored system resources utilization transmitted from the reporting controller to the supervisor controller; wherein the at least one system resource of the first computing system node is a first electronic random access memory capacity, wherein the supervisor controller is configured to monitor a second electronic random access memory capacity of a second computing system node, wherein the assigning by the supervisor controller of the additional sub-job comprises assigning the additional sub-job to the first computing system node based on determining that utilization of the first electronic random access memory capacity is below the threshold level, wherein the supervisor controller is configured to prevent assignment of additional sub-jobs to the second computing system node based on determining that utilization of the second electronic random access memory capacity is at or above a threshold value, wherein the additional sub-job requires utilization of the first electronic random access memory capacity that is unused on the first computing system node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A supervisor controller configured to dynamically manage assignment of job processes in a computer cluster, the supervisor controller comprising:
-
a management controller interface configured to communicate with a management controller to access data representing an assignment of a plurality of job processes across a plurality of computer system nodes in the computer cluster; an agent controller interface configured to communicate with an agent controller operating on a first computing system node, the agent controller configured to transmit to the agent controller interface data representing utilization of system resources by the plurality of job processes operating on the first computing system node; and a system resource allocation engine configured to dynamically assign an additional job process to the first computing system node based on determining that the utilization of at least one system resource of the first computing system node is below a threshold level, the determining based on the data representing utilization of system resources transmitted from the agent controller to the agent controller interface, wherein the at least one system resource of the first computing system node is a first electronic random access memory capacity, wherein the system resource allocation engine is configured to monitor a second electronic random access memory capacity of a second computing system node, wherein the dynamically assigning by the system resource allocation engine of the additional job process comprises assigning the additional job process to the first computing system node based on determining that utilization of the first electronic random access memory capacity is below the threshold level, wherein the system resource allocation engine is configured to prevent assignment of additional job processes to the second computing system node based on determining that utilization of the second electronic random access memory capacity is at or above a threshold value, wherein the additional job process requires utilization of the first electronic random access memory capacity that is unused on the first computing system node. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer cluster comprising:
-
a management computing device comprising a supervisor controller configured to coordinate processing of a plurality of sub-jobs for a plurality of overall jobs; a plurality of computer system nodes configured to communicate with the management computing device, and to perform processing of received sub-jobs, the computing system nodes each comprising; one or more processors configured to perform computing processes on received sub-jobs; an agent controller comprising; a monitoring interface configured to monitor utilization by sub-jobs of system resources of a first computing system node; and a reporting controller configured to transmit the monitored system resources utilization to the supervisor controller in substantially real-time; wherein the supervisor controller is configured to assign an additional sub-job to the first computing system node based on determining that the utilization of at least one system resource of the first computing system node is below a threshold level, the determining based on the monitored system resources utilization transmitted from the reporting controller to the supervisor controller; wherein the at least one system resource of the first computing system node is a first CPU processor capacity, wherein the supervisor controller is configured to monitor a second CPU processor capacity of a second computing system node, wherein the assigning by the supervisor controller of the additional sub-job comprises assigning the additional sub-job to the first computing system node based on determining that utilization of the first CPU processor capacity is below the threshold level, wherein the supervisor controller is configured to prevent assignment of additional sub-jobs to the second computing system node based on determining that utilization of the second CPU processor capacity is at or above a threshold value, wherein the additional sub-job requires utilization of the first CPU processor capacity that is unused on the first computing system node. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A supervisor controller configured to dynamically manage assignment of job processes in a computer cluster, the supervisor controller comprising:
-
a management controller interface configured to communicate with a management controller to access data representing an assignment of a plurality of job processes across a plurality of computer system nodes in the computer cluster; an agent controller interface configured to communicate with an agent controller operating on a first computing system node, the agent controller configured to transmit to the agent controller interface data representing utilization of system resources by the plurality of job processes operating on the first computing system node; and a system resource allocation engine configured to dynamically assign an additional job process to the first computing system node based on determining that the utilization of at least one system resource of the first computing system node is below a threshold level, the determining based on the data representing utilization of system resources transmitted from the agent controller to the agent controller interface, wherein the at least one system resource of the first computing system node is a first CPU processor capacity, wherein the system resource allocation engine is configured to monitor a second CPU processor capacity of a second computing system node, wherein the dynamically assigning by the system resource allocation engine of the additional job process comprises assigning the additional job process to the first computing system node based on determining that utilization of the first CPU processor capacity is below the threshold level, wherein the system resource allocation engine is configured to prevent assignment of additional job processes to the second computing system node based on determining that utilization of the second CPU processor capacity is at or above a threshold value, wherein the additional job process requires utilization of the first CPU processor capacity that is unused on the first computing system node. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36)
-
Specification