Parallel log structured file system collective buffering to achieve a compact representation of scientific and/or dimensional data
First Claim
1. A method, implemented at least in part by one or more computing devices, for data storage in a collective parallel processing environment, the method comprising:
- receiving data to be written for a plurality of collective processes within a collective parallel processing environment;
before the data is written, extracting a data pattern for the data to be written for the plurality of collective processes;
generating a representation describing the data pattern; and
saving the data and the representation of the data pattern;
wherein the representation is a single index entry, and wherein the representation describing the data pattern comprises information describing;
a number of the plurality of collective processes;
an offset in files being written when saving the data;
a total amount of data being written; and
an amount of data being written per collective process.
3 Assignments
0 Petitions
Accused Products
Abstract
Collective buffering and data pattern solutions are provided for storage, retrieval, and/or analysis of data in a collective parallel processing environment. For example, a method can be provided for data storage in a collective parallel processing environment. The method comprises receiving data to be written for a plurality of collective processes within a collective parallel processing environment, extracting a data pattern for the data to be written for the plurality of collective processes, generating a representation describing the data pattern, and saving the data and the representation.
20 Citations
19 Claims
-
1. A method, implemented at least in part by one or more computing devices, for data storage in a collective parallel processing environment, the method comprising:
-
receiving data to be written for a plurality of collective processes within a collective parallel processing environment; before the data is written, extracting a data pattern for the data to be written for the plurality of collective processes; generating a representation describing the data pattern; and saving the data and the representation of the data pattern; wherein the representation is a single index entry, and wherein the representation describing the data pattern comprises information describing; a number of the plurality of collective processes; an offset in files being written when saving the data; a total amount of data being written; and an amount of data being written per collective process. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage medium storing computer-executable instructions for causing a computing device to perform a method for data storage in a collective parallel processing environment, the method comprising:
-
receiving data to be written for a plurality of collective processes within a collective parallel processing environment; before the data is written, extracting a data pattern for the data to be written for the plurality of collective processes; generating a representation describing the data pattern; and saving the data and the representation of the data pattern; wherein the representation is a single index entry, and wherein the representation describing the data pattern comprises information describing; a number of the plurality of collective processes; an offset in files being written when saving the data; a total amount of data being written; and an amount of data being written per collective process.
-
-
14. A high-performance parallel processing computing environment comprising:
-
one or more computing devices comprising processing units and memory; the one or more computing devices configured to perform operations comprising; receiving data to be written for a plurality of collective processes within the parallel processing computing environment; before the data is written, extracting a data pattern for the data to be written for the plurality of collective processes; generating a representation describing the data pattern; and saving the data and the representation of the data pattern, wherein the data and the representation are saved using Parallel Log-structured File System (PLFS); wherein the representation is a single index entry, and wherein the representation describing the data pattern comprises information describing; a number of the plurality of collective processes; an offset in files being written when saving the data; a total amount of data being written; and an amount of data being written per collective process. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification