Event-based dynamic resource provisioning
First Claim
1. A computer-implemented method for operating a supercomputing system, comprising:
- processing a first supercomputing job with a first amount of resources of the supercomputing system;
determining that an event occurred while processing a data set of the first supercomputing job, wherein the determining includes automatically determining the event occurred based on analysis of the data set by the processing;
in response to determining that the event occurred;
notifying a resource manager that the event occurred;
determining a first amount of additional resources of the supercomputing system based on a first resolution of data employed in the processing of the data set, a second resolution of data to be employed in the processing of the data set, a size of the data set, and a target completion time for the first supercomputing job;
allocating the first amount of additional resources of the supercomputing system;
distributing at least a portion of the data set to the first additional computing resources;
processing the first supercomputing job at the second resolution of the data set with the first amount of resources of the supercomputing system and the first amount of additional resources of the supercomputing system;
during said processing of the first supercomputing job at the second resolution of the data set with the first amount of resources of the supercomputing system and the first amount of additional resources of the supercomputing system, determining whether the first supercomputing job is processing anomalous data not indicative of the event; and
in response to determining that the first supercomputing job is processing anomalous data not indicative of the event, de-allocating the first amount of additional resources of the supercomputing system and resuming processing of the first supercomputing job at the first resolution with the first amount of resources of the supercomputing system.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are a method, a system and a computer program product for automatically allocating and de-allocating resources for jobs executed or processed by one or more supercomputer systems. In one or more embodiments, a supercomputing system can process multiple jobs with respective supercomputing resources. A global resource manager can automatically allocate additional resources to a first job and de-allocate resources from a second job. In one or more embodiments, the global resource manager can provide the de-allocated resources to the first job as additional supercomputing resources. In one or more embodiments, the first job can use the additional supercomputing resources to perform data analysis at a higher resolution, and the additional resources can compensate for an amount of time the higher resolution analysis would take using originally allocated supercomputing resources.
-
Citations
18 Claims
-
1. A computer-implemented method for operating a supercomputing system, comprising:
-
processing a first supercomputing job with a first amount of resources of the supercomputing system; determining that an event occurred while processing a data set of the first supercomputing job, wherein the determining includes automatically determining the event occurred based on analysis of the data set by the processing; in response to determining that the event occurred; notifying a resource manager that the event occurred; determining a first amount of additional resources of the supercomputing system based on a first resolution of data employed in the processing of the data set, a second resolution of data to be employed in the processing of the data set, a size of the data set, and a target completion time for the first supercomputing job; allocating the first amount of additional resources of the supercomputing system; distributing at least a portion of the data set to the first additional computing resources; processing the first supercomputing job at the second resolution of the data set with the first amount of resources of the supercomputing system and the first amount of additional resources of the supercomputing system; during said processing of the first supercomputing job at the second resolution of the data set with the first amount of resources of the supercomputing system and the first amount of additional resources of the supercomputing system, determining whether the first supercomputing job is processing anomalous data not indicative of the event; and in response to determining that the first supercomputing job is processing anomalous data not indicative of the event, de-allocating the first amount of additional resources of the supercomputing system and resuming processing of the first supercomputing job at the first resolution with the first amount of resources of the supercomputing system. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A supercomputing system, comprising:
-
a plurality of computer nodes, wherein each of the plurality of computer nodes is coupled to another of the plurality of computer nodes; data storage coupled to at least a first computer node of the plurality of computer nodes, wherein the data storage includes instructions that when executed on the first computer node provides logic for performing the functions of; processing a first supercomputing job with a first amount of resources of the supercomputing system; determining that an event occurred while processing a data set of the first supercomputing job, wherein the determining includes automatically determining the event occurred based on analysis of the data set by the processing; in response to determining that the event occurred; providing a notification that the event occurred; determining a first amount of additional resources of the supercomputing system based on a first resolution of data employed in the processing of the data set, a second resolution of data to be employed in the processing of the data set, a size of the data set, and a target completion time for the first supercomputing job; allocating the first amount of additional resources of the supercomputing system; distributing at least a portion of the data set to the first additional computing resources; and processing the first supercomputing job at the second resolution of the data set with the first amount of resources of the supercomputing system and the first amount of additional resources of the supercomputing system; during said processing of the first supercomputing job at the second resolution of the data set with the first amount of resources of the supercomputing system and the first amount of additional resources of the supercomputing system not indicative of the event, determining whether the first supercomputing job is processing anomalous data; and in response to determining that the first supercomputing job is processing anomalous data not indicative of the event, de-allocating the first amount of additional resources of the supercomputing system and resuming processing of the first supercomputing job at the first resolution with the first amount of resources of the supercomputing system. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer readable memory medium comprising instructions, which when executed on a processing system of a supercomputing system, cause the supercomputing system to perform:
-
processing a first supercomputing job with a first amount of resources of the supercomputing system; determining that an event occurred while processing a data set of the first supercomputing job, wherein the determining includes automatically determining the event occurred based on analysis of the data set by the processing; in response to determining that the event occurred; notifying a resource manager that the event occurred; determining a first amount of additional resources of the supercomputing system based on a first resolution of data employed in the processing of the data set, a second resolution of data to be employed in the processing of the data set, a size of the data set, and a target completion time for the first supercomputing job; allocating the first amount of additional resources of the supercomputing system; distributing at least a portion of the data set to the first additional computing resources; processing the first supercomputing job at the second resolution of the data set with the first amount of resources of the supercomputing system and the first amount of additional resources of the supercomputing system; during said processing of the first supercomputing job at the second resolution of the data set with the first amount of resources of the supercomputing system and the first amount of additional resources of the supercomputing system not indicative of the event, determining whether the first supercomputing job is processing anomalous data; and in response to determining that the first supercomputing job is processing anomalous data not indicative of the event, de-allocating the first amount of additional resources of the supercomputing system and resuming processing of the first supercomputing job at the first resolution with the first amount of resources of the supercomputing system. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification