Parallelizing applications of script-driven tools
First Claim
1. A method for parallelizing a computer application program based on a script of a script-driven software tool, comprising automatically analyzing the script and producing a parallel computation specification based on such analysis, where such parallel computation specification provides functional equivalence to the script when executed by a parallel runtime system, by:
- (a) parsing the script into statements comprising at least processing steps and dataset definitions;
(b) constructing a serial dataflow graph from the parsed statements, the serial dataflow graph having nodes connected by directed edges, the nodes representing datasets, processing steps, and intermediate results; and
(c) constructing a parallel dataflow graph from the nodes of the serial dataflow graph such that the parallel dataflow graph may be executed by a parallel runtime system.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method for parallelizing applications of script-driven software tools. Scripts in the software tool scripting language are automatically analyzed in order to produce a specification for a parallel computation plus a set of “script fragments”, the combination of which is functionally equivalent to the original script. The computational specification plus the script fragments are then executed by a parallel runtime system, which causes multiple instances of the original software tool and/or supplemental programs to be run as parallel processes. The resulting processes will read input data and produce output data, performing the same computation as was specified by the original script. The combination of the analyzer, runtime system, original software tool, and supplemental programs will, for a given script and input data, produce the same output data as the original software tool alone, but has the capability of using multiple processors in parallel for substantial improvements in overall “throughput”. The invention includes computer program embodiments of an automatic script analyzer.
-
Citations
36 Claims
-
1. A method for parallelizing a computer application program based on a script of a script-driven software tool, comprising automatically analyzing the script and producing a parallel computation specification based on such analysis, where such parallel computation specification provides functional equivalence to the script when executed by a parallel runtime system, by:
-
(a) parsing the script into statements comprising at least processing steps and dataset definitions; (b) constructing a serial dataflow graph from the parsed statements, the serial dataflow graph having nodes connected by directed edges, the nodes representing datasets, processing steps, and intermediate results; and (c) constructing a parallel dataflow graph from the nodes of the serial dataflow graph such that the parallel dataflow graph may be executed by a parallel runtime system. - View Dependent Claims (3, 4, 5, 6, 7, 8)
-
-
2. A method for parallelizing a computer application program based on a script of a script-driven software tool, comprising automatically analyzing the script and producing a parallel computation specification plus a script fragment set based on such analysis, where such parallel computation specification and script fragment set provides functional equivalence to the script when executed by a parallel runtime system, by:
-
(a) parsing the script into statements comprising at least processing steps and dataset definitions; (b) constructing a serial dataflow graph from the parsed statements, the serial dataflow graph having nodes connected by directed edges, the nodes representing datasets, processing steps, and intermediate results; (c) constructing a parallel dataflow graph from the nodes of the serial dataflow graph such that the parallel dataflow graph may be executed by a parallel runtime system; and (d) analyzing the parallel dataflow graph to generate script fragments in a form that enables the script-driven software tool to execute some of the processing steps.
-
-
9. A computer program, residing on a computer-readable medium, for parallelizing a computer application program based on a script of a script-driven software tool, the computer program comprising instructions for causing a computer to automatically analyze the script and produce a parallel computation specification based on such analysis, where such parallel computation specification provides functional equivalence to the script when executed by a parallel runtime system, by:
-
(a) parsing the script into statements comprising at least processing steps and dataset definitions; (b) constructing a serial dataflow graph from the parsed statements, the serial dataflow graph having nodes connected by directed edges, the nodes representing datasets, processing steps, and intermediate results; and (c) constructing a parallel dataflow graph from the nodes of the serial dataflow graph such that the parallel dataflow graph may be executed by a parallel runtime system.
-
-
10. A computer program, residing on a computer-readable medium, for parallelizing a computer application program based on a script of a script-driven software tool, the computer program comprising instructions for causing a computer to automatically analyze the script and produce a parallel computation specification plus a script fragment set based on such analysis, where such parallel computation specification and script fragment set provides functional equivalence to the script when executed by a parallel runtime system, by:
-
(a) parsing the script into statements comprising at least processing steps and dataset definitions; (b) constructing a serial dataflow graph from the parsed statements, the serial dataflow graph having nodes connected by directed edges, the nodes representing datasets, processing steps, and intermediate results; (c) constructing a parallel dataflow graph from the nodes of the serial dataflow graph such that the parallel dataflow graph may be executed by a parallel runtime system; and (d) analyzing the parallel dataflow graph to generate script fragments in a form that enables the script-driven software tool to execute some of the processing steps. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A system for parallelizing a computer application program based on a script of a script-driven software tool, and for automatically analyzing the script and producing a parallel computation specification based on such analysis, where such parallel computation specification provides functional equivalence to the script when executed by a parallel runtime system, including:
-
(a) means for parsing the script into statements comprising at least processing steps and dataset definitions; (b) means for constructing a serial dataflow graph from the parsed statements, the serial dataflow graph having nodes connected by directed edges, the nodes representing datasets, processing steps, and intermediate results; and (c) means for constructing a parallel dataflow graph from the nodes of the serial dataflow graph such that the parallel dataflow graph may be executed by a parallel runtime system. - View Dependent Claims (19, 20, 21, 22, 23, 24)
-
-
18. A system for parallelizing a computer application program based on a script of a script-driven software tool, and for automatically analyzing the script and producing a parallel computation specification plus a script fragment set based on such analysis, where such parallel computation specification and script fragment set provides functional equivalence to the script when executed by a parallel runtime system, including:
-
(a) means for parsing the script into statements comprising at least processing steps and dataset definitions; (b) means for constructing a serial dataflow graph from the parsed statements, the serial dataflow graph having nodes connected by directed edges, the nodes representing datasets, processing steps, and intermediate results; (c) means for constructing a parallel dataflow graph from the nodes of the serial dataflow graph such that the parallel dataflow graph may be executed by a parallel runtime system; and (d) means for analyzing the parallel dataflow graph to generate script fragments in a form that enables the script-driven software tool to execute some of the processing steps.
-
-
25. A method for parallelizing a computer application program based on a script of a script-driven software tool, comprising automatically analyzing the script and producing a parallel computation specification based on such analysis, where such parallel computation specification provides functional equivalence to the script when executed by a parallel runtime system, by:
-
(a) parsing the script into statements; (b) constructing a serial dataflow graph from the parsed statements, said constructing including (i) constructing a serial dataset table of datasets used by the script, (ii) constructing a serial processing step table of statements performed by the script, and (iii) constructing a serial dataset access table indicating datasets in the dataset table used by statements in the processing step table; and (c) constructing a parallel dataflow graph from the serial dataflow graph.
-
-
26. A method for parallelizing a computer application program based on a script of a script-driven software tool, comprising automatically analyzing the script and producing a parallel computation specification plus a script fragment set based on such analysis, where such parallel computation specification and script fragment set provides functional equivalence to the script when executed by a parallel runtime system, by:
-
(a) parsing the script into statements; (b) constructing a serial dataflow graph from the parsed statements, said constructing including (i) constructing a serial dataset table of datasets used by the script; (ii) constructing a serial processing step table of statements performed by the script; and (iii) constructing a serial dataset access table indicating datasets in the dataset table used by statements in the processing step table; and (c) constructing a parallel dataflow graph from the serial dataflow graph.
-
-
27. A method for parallelizing a computer application program based on a script of a script-driven software tool, comprising automatically analyzing the script and producing a parallel computation specification based on such analysis, where such parallel computation specification provides functional equivalence to the script when executed by a parallel runtime system, by:
-
(a) parsing the script into statements; (b) constructing a serial dataflow graph from the parsed statements, said constructing including (i) constructing a parallel dataset table of datasets based on the serial dataset table; (ii) constructing a parallel processing step table of statements based on the serial processing step table; (iii) constructing a dataset access table based on the serial dataset access table; and (iv) determining, for each processing step identified in the parallel processing step table, if a corresponding pre-defined parallelization rewrite rule exists for such processing step, and if so, then applying the corresponding pre-defined parallelization rewrite rule to redefine associated entries in the parallel dataset table, the parallel processing step table, and the dataset access table as parallel processing entries; and
if not, then defining such associated entries as serial processing entries; and(c) constructing a parallel dataflow graph from the serial dataflow graph.
-
-
28. A method for parallelizing a computer application program based on a script of a script-driven software tool, comprising automatically analyzing the script and producing a parallel computation specification plus a script fragment set based on such analysis, where such parallel computation specification and script fragment set provides functional equivalence to the script when executed by a parallel runtime system, by:
-
(a) parsing the script into statements; (b) constructing a serial dataflow graph from the parsed statements, said instructing including (i) constructing a parallel dataset table of datasets based on the serial dataset table; (ii) constructing a parallel processing step table of statements based on the serial processing step table; (iii) constructing a dataset access table based on the serial dataset access table; and (iv) determining, for each processing step identified in the parallel processing step table, if a corresponding pre-defined parallelization rewrite rule exists for such processing step, and if so, then applying the corresponding pre-defined parallelization rewrite rule to redefine associated entries in the parallel dataset table, the parallel processing step table, and the dataset access table as parallel processing entries; and
if not, then defining such associated entries as serial processing entries; and(c) constructing a parallel dataflow graph from the serial dataflow graph.
-
-
29. A computer program, residing on a computer-readable medium, for parallelizing a computer application program based on a script of a script-driven software tool, the computer program comprising instructions for causing a computer to automatically analyze the script and produce a parallel computation specification based on such analysis, where such parallel computation specification provides functional equivalence to the script when executed by a parallel runtime system, by:
-
(a) parsing the script into statements; (b) constructing a serial dataflow graph from the parsed statements, said constructing including (i) constructing a serial dataset table of datasets used by the script; (ii) constructing a serial processing step table of statements performed by the script; and (iii) constructing a serial dataset access table indicating datasets in the dataset table used by statements in the processing step table; and (c) constructing a parallel dataflow graph from the serial dataflow graph.
-
-
30. A computer program, residing on a computer-readable medium, for parallelizing a computer application program based on a script of a script-driven software tool, the computer program comprising instructions for causing a computer to automatically analyze the script and produce a parallel computation specification plus a script fragment set based on such analysis, where such parallel computation specification and script fragment set provides functional equivalence to the script when executed by a parallel runtime system, by:
-
(a) parsing the script into statements; (b) constructing a serial dataflow graph from the parsed statements, said constructing including (i) constructing a serial dataset table of datasets used by the script; (ii) constructing a serial processing step table of statements performed by the script; and (iii) constructing a serial dataset access table indicating datasets in the dataset table used by statements in the processing step table; and (c) constructing a parallel dataflow graph from the serial dataflow graph.
-
-
31. A computer program, residing on a computer-readable medium, for parallelizing a computer application program based on a script of a script-driven software tool, the computer program comprising instructions for causing a computer to automatically analyze the script and produce a parallel computation specification based on such analysis, where such parallel computation specification provides functional equivalence to the script when executed by a parallel runtime system, by:
-
(a) parsing the script into statements; (b) constructing a serial dataflow graph from the parsed statements, said constructing including (i) constructing a parallel dataset table of datasets based on the serial dataset table; (ii) constructing a parallel processing step table of statements based on the serial processing step table; (iii) constructing a dataset access table based on the serial dataset access table; and (iv) determining, for each processing step identified in the parallel processing step table, if a corresponding pre-defined parallelization rewrite rule exists for such processing step, and if so, then applying the corresponding pre-defined parallelization rewrite rule to redefine associated entries in the parallel dataset table, the parallel processing step table, and the dataset access table as parallel processing entries; and
if not, then defining such associated entries as serial processing entries; and(c) constructing a parallel dataflow graph from the serial dataflow graph.
-
-
32. A computer program, residing on a computer-readable medium, for parallelizing a computer application program based on a script of a script-driven software tool, the computer program comprising instructions for causing a computer to automatically analyze the script and produce a parallel computation specification plus a script fragment set based on such analysis, where such parallel computation specification and script fragment set provides functional equivalence to the script when executed by a parallel runtime system, by:
-
(a) parsing the script into statements; (b) constructing a serial dataflow graph from the parsed statements, said constructing including (i) constructing a parallel dataset table of datasets based on the serial dataset table; (ii) constructing a parallel processing step table of statements based on the serial processing step table; (iii) constructing a dataset access table based on the serial dataset access table; and (iv) determining, for each processing step identified in the parallel processing step table, if a corresponding pre-defined parallelization rewrite rule exists for such processing step, and if so, then applying the corresponding pre-defined parallelization rewrite rule to redefine associated entries in the parallel dataset table, the parallel processing step table, and the dataset access table as parallel processing entries; and
if not, then defining such associated entries as serial processing entries; and(c) constructing a parallel dataflow graph from the serial dataflow graph.
-
-
33. A system for parallelizing a computer application program based on a script of a script-driven software tool, comprising means and for automatically analyzing the script and means for producing a parallel computation specification based on such analysis, where such parallel computation specification provides functional equivalence to the script when executed by a parallel runtime system, including:
-
(a) means for parsing the script into statements; (b) means for constructing a serial dataflow graph from the parsed statements, said means including means for (i) constructing a serial dataset table of datasets used by the script; (ii) constructing a serial processing step table of statements performed by the script; and (iii) constructing a serial dataset access table indicating datasets in the dataset table used by statements in the processing step table; and (c) means for constructing a parallel dataflow graph from the serial dataflow graph.
-
-
34. A system for parallelizing a computer application program based on a script of a script-driven software tool, comprising means and for automatically analyzing the script and means for producing a parallel computation specification plus a script fragment set based on such analysis, where such parallel computation specification and script fragment set provides functional equivalence to the script when executed by a parallel runtime system, including:
-
(a) means for parsing the script into statements; (b) means for constructing a serial dataflow graph from the parsed statements, said means including means for (i) constructing a serial dataset table of datasets used by the script; (ii) constructing a serial processing step table of statements performed by the script; and (ii) constructing a serial dataset access table indicating datasets in the dataset table used by statements in the processing step table; and (c) means for constructing a parallel dataflow graph from the serial dataflow graph.
-
-
35. A system for parallelizing a computer application program based on a script of a script-driven software tool, comprising means and for automatically analyzing the script and means for producing a parallel computation specification based on such analysis, where such parallel computation specification provides functional equivalence to the script when executed by a parallel runtime system, including:
-
(a) means for parsing the script into statements; (b) means for constructing a serial dataflow graph from the parsed statements, said means including means for (i) constructing a parallel dataset table of datasets based on the serial dataset table; (ii) constructing a parallel processing step table of statements based on the serial processing step table; (iii) constructing a dataset access table based on the serial dataset access table; and (iv) determining, for each processing step identified in the parallel processing step table, if a corresponding pre-defined parallelization rewrite rule exists for such processing step, and if so, then applying the corresponding pre-defined parallelization rewrite rule to redefine associated entries in the parallel dataset table, the parallel processing step table, and the dataset access table as parallel processing entries; and
if not, then defining such associated entries as serial processing entries; and(c) means for constructing a parallel dataflow graph from the serial dataflow graph.
-
-
36. A system for parallelizing a computer application program based on a script of a script-driven software tool, comprising means and for automatically analyzing the script and means for producing a parallel computation specification plus a script fragment set based on such analysis, where such parallel computation specification and script fragment set provides functional equivalence to the script when executed by a parallel runtime system, including:
-
(a) means for parsing the script into statements; (b) means for constructing a serial dataflow graph from the parsed statements, said means including means for (i) constructing a parallel dataset table of datasets based on the serial dataset table; (ii) constructing a parallel processing step table of statements based on the serial processing step table; (iii) constructing a dataset access table based on the serial dataset access table; and (iv) determining, for each processing step identified in the parallel processing step table, if a corresponding pre-defined parallelization rewrite rule exists for such processing step, and if so, then applying the corresponding pre-defined parallelization rewrite rule to redefine associated entries in the parallel dataset table, the parallel processing step table, and the dataset access table as parallel processing entries; and
if not, then defining such associated entries as serial processing entries; and(c) means for constructing a parallel dataflow graph from the serial dataflow graph.
-
Specification