Method, apparatus, and computer program product for design and selection of an I/O subsystem of a supercomputer
First Claim
1. An apparatus for simulating a parallel supercomputing cluster having a system of compute nodes and an I/O subsystem connected to the compute nodes for storing checkpoint data from the compute nodes, the apparatus comprising a digital computer including a data processor and non-transitory computer readable storage medium storing a simulation program, the simulation program including a model of the system of compute nodes and a model of the I/O subsystem and also including computer instructions that, when executed by the data processor, perform the steps of:
- (a) receiving input parameters defining the system of compute nodes and input parameters defining the I/O subsystem;
(b) computing a total number of computational floating point operations for a time between checkpoints for different configurations of the parallel supercomputing cluster, and computing an amount of disk storage required to store the checkpoint data for the different configurations of the parallel supercomputing cluster; and
(c) presenting, to a user, a summary of the computed total number of computational floating point operations for the time between checkpoints for the different configurations of the parallel supercomputing cluster, and presenting, to the user, a summary of the computed amount of disk storage required to store the checkpoint data for the different configurations of the parallel supercomputing cluster.
9 Assignments
0 Petitions
Accused Products
Abstract
For simulating a parallel supercomputing cluster, a simulation program includes a model of the system of compute nodes and a model of an I/O subsystem that stores checkpoints from the compute nodes. When executed, instructions in the simulation program perform the steps of receiving input parameters defining the compute nodes and the I/O subsystem, computing a total number of computational flops for a time between checkpoints and an amount of disk storage required to store the checkpoint data for different configurations of the parallel supercomputing cluster, and presenting a summary to a user of the computed number of computational flops for a time between checkpoints and an amount of disk storage required to store the checkpoint data for the different configurations of the parallel supercomputing cluster.
-
Citations
20 Claims
-
1. An apparatus for simulating a parallel supercomputing cluster having a system of compute nodes and an I/O subsystem connected to the compute nodes for storing checkpoint data from the compute nodes, the apparatus comprising a digital computer including a data processor and non-transitory computer readable storage medium storing a simulation program, the simulation program including a model of the system of compute nodes and a model of the I/O subsystem and also including computer instructions that, when executed by the data processor, perform the steps of:
-
(a) receiving input parameters defining the system of compute nodes and input parameters defining the I/O subsystem; (b) computing a total number of computational floating point operations for a time between checkpoints for different configurations of the parallel supercomputing cluster, and computing an amount of disk storage required to store the checkpoint data for the different configurations of the parallel supercomputing cluster; and (c) presenting, to a user, a summary of the computed total number of computational floating point operations for the time between checkpoints for the different configurations of the parallel supercomputing cluster, and presenting, to the user, a summary of the computed amount of disk storage required to store the checkpoint data for the different configurations of the parallel supercomputing cluster. - View Dependent Claims (2, 3, 4, 5, 6, 8)
-
-
7. An apparatus for simulating a parallel supercomputing cluster having a system of compute nodes and an I/O subsystem connected to the compute nodes for storing checkpoint data from the compute nodes, the apparatus comprising a digital computer including a data processor and non-transitory computer readable storage medium storing a simulation program, the simulation program including a model of the s stem of compute nodes and a model of the I/O subs stem and also including computer instructions that, when executed by the data processor, perform the steps of:
-
(a) receiving input parameters defining the s stem of compute nodes and input parameters defining the I/O subsystem; (b) computing a total number of computational floating point operations for a time between checkpoints for different configurations of the parallel supercomputing cluster, and computing an amount of disk storage required to store the checkpoint data for the different configurations of the parallel supercomputing cluster; and (c) presenting, to a user, a summary of the computed total number of computational floating point operations for the time between checkpoints for the different configurations of the parallel supercomputing cluster, and presenting, to the user, a summary of the computed amount of disk storage required to store the checkpoint data for the different configurations of the parallel supercomputing cluster; wherein step (b) includes computing an amount of time required to write checkpoint data from the compute nodes to the I/O subsystem for each of the different configurations of the parallel supercomputing cluster, and building a time series for each of the different configurations of the parallel supercomputing cluster.
-
-
9. A method of simulating a parallel supercomputing cluster having a system of compute nodes and an I/O subsystem connected to the compute nodes for storing checkpoint data from the compute nodes, the method comprising a data processor executing computer instructions of a simulation program stored in non-transitory computer readable storage medium, the simulation program including a model of the system of compute nodes and a model of the I/O subsystem, and the execution of the computer instructions performing the steps of:
-
(a) receiving input parameters defining the system of compute nodes and input parameters defining the I/O subsystem; (b) computing a total number of computational floating point operations for a time between checkpoints, and computing an amount of disk storage required to store the checkpoint data for different configurations of the parallel supercomputing cluster; and (c) presenting, to a user, a summary of the computed total number of computational floating point operations for the time between checkpoints for the different configurations of the parallel supercomputing cluster, and presenting, to the user, a summary of the computed amount of disk storage required to store the checkpoint data for the different configurations of the parallel supercomputing cluster. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer program product comprising non-transitory computer readable storage medium storing computer instructions that, when executed by a data processor, perform the steps of:
-
(a) receiving input parameters defining the system of compute nodes and input parameters defining the I/O subsystem; (b) computing a total number of computational floating point operations for a time between checkpoints, and computing an amount of disk storage required to store the checkpoint data for different configurations of the parallel supercomputing cluster; and (c) presenting, to a user, a summary of the computed total number of computational floating point operations for the time between checkpoints for the different configurations of the parallel supercomputing cluster, and presenting, to the user, a summary of the computed amount of disk storage required to store the checkpoint data for the different configurations of the parallel supercomputing cluster. - View Dependent Claims (18, 19, 20)
-
Specification