Method and apparatus for offloading compute resources to a flash co-processing appliance
First Claim
1. A parallel supercomputing cluster comprising:
- compute nodes interconnected in a mesh of data links for executing a Message Passing Interface (MPI) job using MPI data transfer between the computer nodes over the mesh of data links; and
solid-state storage nodes each linked to a respective group of the compute nodes for receiving checkpoint data from the respective compute nodes, andmagnetic disk storage linked to each of the solid-state storage nodes for asynchronous migration of the checkpoint data from the solid-state storage nodes to the magnetic disk storage;
wherein each solid-state storage node includes a data processor coupled to the respective group of compute nodes for receiving the checkpoint data from the respective group of compute nodes and coupled to the magnetic disk storage for transmitting the checkpoint data to the magnetic disk storage, solid state storage coupled to the data processor for buffering the checkpoint data, and non-transitory computer readable storage medium storing computer instructions that, when executed by the data processor, perform the steps of;
(a) presenting a file system interface to the MPI job, and multiple MPI processes of the MPI job writing the checkpoint data to a shared file in the solid-state storage in a strided fashion in a first data layout;
(b) asynchronously migrating the checkpoint data from the shared file in the solid-state storage to the magnetic disk storage and writing the checkpoint data to the magnetic disk storage in a sequential fashion in a second data layout; and
(c) performing additional tasks offloaded from the compute nodes or associated with the checkpoint data.
12 Assignments
0 Petitions
Accused Products
Abstract
Solid-State Drive (SSD) burst buffer nodes are interposed into a parallel supercomputing cluster to enable fast burst checkpoint of cluster memory to or from nearby interconnected solid-state storage with asynchronous migration between the burst buffer nodes and slower more distant disk storage. The SSD nodes also perform tasks offloaded from the compute nodes or associated with the checkpoint data. For example, the data for the next job is preloaded in the SSD node and very fast uploaded to the respective compute node just before the next job starts. During a job, the SSD nodes perform fast visualization and statistical analysis upon the checkpoint data. The SSD nodes can also perform data reduction and encryption of the checkpoint data.
54 Citations
20 Claims
-
1. A parallel supercomputing cluster comprising:
-
compute nodes interconnected in a mesh of data links for executing a Message Passing Interface (MPI) job using MPI data transfer between the computer nodes over the mesh of data links; and solid-state storage nodes each linked to a respective group of the compute nodes for receiving checkpoint data from the respective compute nodes, and magnetic disk storage linked to each of the solid-state storage nodes for asynchronous migration of the checkpoint data from the solid-state storage nodes to the magnetic disk storage; wherein each solid-state storage node includes a data processor coupled to the respective group of compute nodes for receiving the checkpoint data from the respective group of compute nodes and coupled to the magnetic disk storage for transmitting the checkpoint data to the magnetic disk storage, solid state storage coupled to the data processor for buffering the checkpoint data, and non-transitory computer readable storage medium storing computer instructions that, when executed by the data processor, perform the steps of; (a) presenting a file system interface to the MPI job, and multiple MPI processes of the MPI job writing the checkpoint data to a shared file in the solid-state storage in a strided fashion in a first data layout; (b) asynchronously migrating the checkpoint data from the shared file in the solid-state storage to the magnetic disk storage and writing the checkpoint data to the magnetic disk storage in a sequential fashion in a second data layout; and (c) performing additional tasks offloaded from the compute nodes or associated with the checkpoint data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of operating a parallel supercomputing cluster, the parallel supercomputing cluster including compute nodes, solid-state storage nodes, and magnetic disk storage, the compute nodes being interconnected in a mesh of data links for executing a Message Passing Interface (MPI) job using MPI data transfer between the computer nodes over the mesh of data links, each of the solid-state storage nodes being linked to a respective group of the compute nodes for receiving checkpoint data from the respective compute nodes, and the magnetic disk storage being linked to each of the solid-state storage nodes for asynchronous migration of the checkpoint data from the solid-state storage nodes to the magnetic disk storage, and each of the solid-state storage nodes including a data processor coupled to the respective group of compute nodes for receiving the checkpoint data from the respective group of compute nodes and coupled to the magnetic disk storage for transmitting the checkpoint data to the magnetic disk storage, and solid-state storage coupled to the data processor for buffering the checkpoint data, and non-transitory computer readable storage medium storing computer instructions, said method comprising the data processor executing the computer instructions to perform the steps of:
-
(a) presenting a file system interface to the MPI job, and multiple MPI processes of the MPI job writing the checkpoint data to a shared file in the solid-state storage in a strided fashion in a first data layout; (b) asynchronously migrating the checkpoint data from the shared file in the solid-state storage to the magnetic disk storage and writing the checkpoint data to the magnetic disk storage in a sequential fashion in a second data layout; and (c) performing additional tasks offloaded from the compute nodes or associated with the checkpoint data. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification