Data flow plan optimizer
First Claim
1. An optimizer for a data flow plan comprising:
- a data flow plan analyzer, the data flow plan analyzer operating on the data flow plan, the data flow plan being a user-specified sequence of transforms that describe a transformation of data read from a source data repository in a first form into a second form in which the data can be written to a sink data repository that is distinct from the source data repository, each transform specifying an operation on the data, and the data flow plan analyzer determining whether the sequence of transforms includes a plurality of transforms that are optimizable transforms; and
a transform optimizer that produces an optimized data flow plan in which one or more optimized transforms that specify operations equivalent to those specified in the plurality of optimizable transforms replaces the plurality of optimizable transforms, there being fewer of the optimized transforms than of the optimizable transforms.
3 Assignments
0 Petitions
Accused Products
Abstract
An optimizer for a data transformation system. The optimizer optimizes data flow plans that describe how data is to be transformed from the form it has in a data source to the form required in a data destination. A data flow plan is made up of a sequence of transforms, and the optimized data flow plan is equivalent to the original data flow plan but has fewer transforms. One kind of optimization is read/write optimization, in which the data flow plan is modified so that operations of the original data flow plan are performed in the data source or destination. Another is merge optimization, in which a single merge transform specifies the operations specified in a plurality of the transforms of the original data flow plan. The operations specified in the merge transform can further be performed in parallel. The optimizer additionally reorders the transforms in the original data flow plan to increase the amount of optimization. Operation of the optimizer is transparent to the user of the data transformation system.
47 Citations
42 Claims
-
1. An optimizer for a data flow plan comprising:
-
a data flow plan analyzer, the data flow plan analyzer operating on the data flow plan, the data flow plan being a user-specified sequence of transforms that describe a transformation of data read from a source data repository in a first form into a second form in which the data can be written to a sink data repository that is distinct from the source data repository, each transform specifying an operation on the data, and the data flow plan analyzer determining whether the sequence of transforms includes a plurality of transforms that are optimizable transforms; and
a transform optimizer that produces an optimized data flow plan in which one or more optimized transforms that specify operations equivalent to those specified in the plurality of optimizable transforms replaces the plurality of optimizable transforms, there being fewer of the optimized transforms than of the optimizable transforms. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
the data flow plan analyzer further reorders the transforms to increase the number of optimizable transforms.
-
-
3. The optimizer set forth in claim 1 wherein:
-
the optimizer is part of a data transformation system that includes a user interface in which a visual representation of the sequence of transforms appears; and
the user interface continues to display the visual representation of the data flow plan after production of the optimized data flow plan.
-
-
4. The optimizer set forth in any of claims 1 through 3 wherein:
-
the transforms include a read transform that reads the data from the source and a write transform that writes the transformed data to the sink;
the source permits operations to be performed on data read therefrom or the sink permits operations to be performed on data written thereto; and
the optimized transforms include a read transform or a write transform that specifies that the source or sink perform operations on the data that are equivalent to those specified in the plurality of optimizable transforms.
-
-
5. The optimizer set forth in claim 4 wherein:
-
the source or the sink is a relational database system; and
the optimized read transform or the optimized write transform is an SQL query.
-
-
6. The optimizer set forth in claim 4 wherein:
the transform optimizer further comprises a table, the transform optimizer receiving properties of optimizable transforms from the data flow plan analyzer, placing the properties in the table, and using the properties in the table to produce the optimized transforms.
-
7. The optimizer set forth in any of claims 1 through 3 wherein:
the equivalent operations specified in the one or more optimized transforms are specified such that the equivalent operations may be performed in parallel.
-
8. The optimizer set forth in claim 7 wherein:
the transform optimizer further comprises a table, the transform optimizer receiving properties of optimizable transforms from the data flow plan analyzer, placing the properties in the table, and using the properties in the table to produce the optimized transforms.
-
9. A method of optimizing a data flow plan comprising the steps of:
-
analyzing the data flow plan, the data flow plan being a user-specified sequence of transforms that describe a transformation of data read from a source data repository in a first form into a second form in which the data can be written to a sink data repository that is distinct from the source data repository, each transform specifying an operation on the data, and the analysis determining whether the sequence of transforms includes a plurality of transforms that are optimizable transforms; and
producing an optimized data flow plan in which one or more optimized transforms that specify operations equivalent to those specified in the plurality of optimizable transforms replaces the plurality of optimizable transforms, there being fewer of the optimized transforms than of the optimizable transforms. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
reordering the transforms to increase the number of optimizable transforms.
-
-
11. The method set forth in claim 9 further comprising the steps of:
-
saving a copy of the data flow plan; and
using the copy to generate a visual representation of the sequence of transforms after the optimized data flow plan has been produced.
-
-
12. The method set forth in any of claims 9 through 11 wherein:
-
the transforms include a read transform that reads the data from the source and a write transform that writes the transformed data to the sink;
the source permits operations to be performed on data read therefrom or the sink permits operations to be performed on data written thereto; and
in the step of producing the optimized data flow plan, the optimized transforms include a read transform or a write transform that specifies that the source or sink perform operations on the data that are equivalent to those specified in the plurality of optimizable transforms.
-
-
13. The method set forth in claim 12 wherein:
-
the source or the sink is a relational database system; and
in the step of producing the optimized data flow plan, the optimized read transform or the optimized write transform is an SQL query.
-
-
14. The method set forth in claim 12 further comprising the steps of:
-
placing properties of the optimizable transforms in a table; and
using the table to produce the optimized transforms.
-
-
15. The method set forth in any of claims 9 through 11 wherein:
one or more of the optimized transforms specifies the equivalent operations such that the equivalent operations may be performed in parallel.
-
16. The method set forth in claim 15 further comprising the steps of:
-
placing properties of the optimizable transforms in a table; and
using the table to produce the optimized transforms.
-
-
17. A data transformation system, the data transformation system having the improvement comprising:
-
an optimizer that automatically produces an optimized data flow plan from a user-specified data flow plan that transforms data read from a source data repository in a first form into a second form in which the data can be written to a sink data repository that is distinct from the source data repository, the data flow plan being a first sequence of transforms, each of which specifies an operation on the data, and the optimized data flow plan being a second optimized sequence of transforms in which one or more optimized transforms replace transforms in the first sequence, the optimized sequence being equivalent to but having fewer transforms than the first sequence. - View Dependent Claims (18, 19, 20, 21)
a user interface that displays a visual representation of the data flow plan, the user interface continuing to display the visual representation of the data flow plan after production of the optimized data flow plan.
-
-
19. The data transformation system set forth in claim 17 or claim 18 wherein:
-
the transforms include a read transform that reads the data from the source and a write transform that writes the transformed data to the sink;
the source permits operations to be performed on data read therefrom or the sink permits operations to be performed on data written thereto; and
the optimized data flow plan includes a read transform or a write transform that replaces a plurality of transforms in the data flow plan, the read transform or write transform specifying that operations equivalent to the operations of the replaced transforms be performed in the source or the sink.
-
-
20. The data transformation system set forth in claim 17 or claim 18 wherein:
the optimized data flow plan includes a merge transform that replaces a plurality of transforms in the data flow plan, the merge transform specifying operations equivalent to the operations of the replaced transforms.
-
21. The data transformation system set forth in claim 20 wherein:
the merge transform further specifies the equivalent operations such that the equivalent operations may be performed in parallel.
-
22. A data storage device, the data storage device being characterized in that:
-
the data storage device contains code which when executed in a computer implements an optimizer for a data flow plan, the optimizer comprising a data flow plan analyzer, the data flow plan analyzer operating on the data flow plan, the data flow plan being a user-specified sequence of transforms that describe a transformation of data read from a source data repository in a first form into a second form in which the data can be written to a sink data repository that is distinct from the source data repository, each transform specifying an operation on the data, and the data flow plan analyzer determining whether the sequence of transforms includes a plurality of transforms that are optimizable transforms; and
a transform optimizer that produces an optimized data flow plan in which one or more optimized transforms that specify operations equivalent to those specified in the plurality of optimizable transforms replaces the plurality of optimizable transforms, there being fewer or the optimized transforms than of the optimizable transforms. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29)
the data flow plan analyzer further reorders the transforms to increase the number of optimizable transforms.
-
-
24. The data storage device set forth in claim 22 further characterized in that:
-
the optimizer is part of a data transformation system that includes a user interface in which a visual representation of the sequence of transforms appears; and
the user interface continues to display the visual representation of the data flow plan after production of the optimized data flow plan.
-
-
25. The data storage device set forth in claim 22 further characterized in that:
-
the transforms include a read transform that reads the data from the source and a write transform that writes the transformed data to the sink;
the source permits operations to be performed on data read therefrom or the sink permits operations to be performed on data written thereto; and
the optimized transforms include a read transform or a write transform that specifies that the source or sink perform operations on the data that are equivalent to those specified in the plurality of optimizable transforms.
-
-
26. The data storage device set forth in claim 25 further characterized in that:
-
the source or the sink is a relational database system; and
the optimized read transform or the optimized write transform is an SQL query.
-
-
27. The data storage device set forth in claim 25 further characterized in that:
the transform optimizer further comprises a table, the transform optimizer receiving properties of optimizable transforms from the data flow plan analyzer, placing the properties in the table, and using the properties in the table to produce the optimized transforms.
-
28. The data storage device set forth in claim 22 further characterized in that:
the equivalent operations specified in the one or more optimized transform are specified such that the equivalent operations may be performed in parallel.
-
29. The data storage device set forth in claim 28 further characterized in that:
the transform optimizer further comprises a table, the transform optimizer receiving properties of optimizable transforms from the data flow plan analyzer, placing the properties in the table, and using the properties in the table to produce the optimized transforms.
-
30. A data storage device, the data storage device being characterized in that:
-
the data storage device contains code which when executed in a computer implements a method of optimizing a data flow plan comprising the steps of analyzing the data flow plan, the data flow plan being a user-specified sequence of transforms that describe a transformation of data read from a source data repository in a first form into a second form in which the data can be written to a sink data repository that is distinct from the source data repository, each transform specifying an operation on the data, and the analysis determining whether the sequence of transforms includes a plurality of transforms that are optimizable transforms; and
producing an optimized data flow plan in which one or more optimized transforms that specify operations equivalent to those specified in the plurality of optimizable transforms replaces the plurality of optimizable transforms, there being fewer of the optimized transforms than of the optimizable transforms. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37)
the method further comprises the step of reordering the transforms to increase the number of optimizable transforms.
-
-
32. The data storage device set forth in claim 30 further characterized in that:
-
the method further comprising the steps of saving a copy of the data flow plan; and
using the copy to generate a visual representation of the sequence of transforms after the optimized data flow plan has been produced.
-
-
33. The data storage device set forth in claim 30 further characterized in that:
-
the transforms include a read transform that reads the data from the source and a write transform that writes the transformed data to the sink;
the source permits operations to be performed on data read therefrom or the sink permits operations to be performed on data written thereto; and
in the step of producing the optimized data flow plan, the optimized transforms include a read transform or a write transform that specifies that the source or sink perform operations on the data that are equivalent to those specified in the plurality of optimizable transforms.
-
-
34. The data storage device set forth in claim 33 further characterized in that:
-
the source or the sink is a relational database system; and
in the step of producing the optimized data flow plan, the optimized read transform or the optimized write transform is an SQL query.
-
-
35. The data storage device set forth in claim 34 further characterized in that:
-
the method further comprises the steps of;
placing properties of the optimizable transforms in a table; and
using the table to produce the optimized transforms.
-
-
36. The data storage device set forth in claim 30 further characterized in that:
one or more of the optimized transforms specifies the equivalent operations such that the equivalent operations may be performed in parallel.
-
37. The data storage device set forth in claim 36 further characterized in that:
-
the method further comprises the steps of;
placing properties of the optimizable transforms in a table; and
using the table to produce the optimized transforms.
-
-
38. A data storage device, the data storage device being characterized in that:
-
the data storage device contains code which when executed in a computer implements a data transformation system, the data transformation system having the improvement comprising an optimizer that automatically produces an optimized data flow plan from a user-specified data flow plan that transforms data read from a source data repository in a first form into a second form in which the data can be written to a sink data repository that is distinct from the source data repository, the data flow plan being a first sequence of transforms, each of which specifies an operation on the data, and the optimized data flow plan being a second optimized sequence of transforms in which one or more optimized transforms replace transforms in the first sequence, the optimized sequence being equivalent to but having fewer transforms than the first sequence. - View Dependent Claims (39, 40, 41, 42)
the data transformation system further comprises a user interface that displays a visual representation of the data flow plan, the user interface continuing to display the visual representation of the data flow plan after production of the optimized data flow plan.
-
-
40. The data storage device set forth in claim 38 further characterized in that:
-
the transforms include a read transform that reads the data from the source and a write transform that writes the transformed data to the sink;
the source permits operations to be performed on data read therefrom or the sink permits operations to be performed on data written thereto; and
the optimized data flow plan includes a read transform or a write transform that replaces a plurality of transforms in the data flow plan, the read transform or write transform specifying that operations equivalent to the operations of the replaced transforms be performed in the source or the sink.
-
-
41. The data storage device set forth in claim 38 further characterized in that:
the optimized data flow plan includes a merge transform that replaces a plurality of transforms in the data flow plan, the merge transform specifying operations equivalent to the operations of the replaced transforms.
-
42. The data storage device set forth in claim 41 further characterized in that:
the merge transform further specifies the equivalent operations such that the equivalent operations may be performed in parallel.
Specification