Apparatus and method for identifying performance bottlenecks in pipeline parallel processing environment
First Claim
1. A method for identifying performance bottleneck status in a parallel data processing environment, implemented by a computing processor, the method comprising:
- examining, by the computing processor, data flow associated with the parallel data processing environment to identify;
i) at least one operator, wherein an operator type is associated with the at least one operator;
ii) at least one buffer; and
iii) a relationship that the at least one buffer has with the at least one operator, wherein the relationship is associated with the operator type; and
wherein examining the data flow associated with the parallel data processing environment comprises;
identifying a first sub-operator connected to a second sub-operator, wherein no buffer exists between the first sub-operator and the second sub-operator; and
combining the first sub-operator and the second sub-operator to create the at least one operator;
monitoring, by the computing processor, the at least one buffer to determine a buffer status associated with the at least one buffer;
applying, by the computing processor, a set of rules to identify an operator bottleneck status associated with the at least one operator, wherein the set of rules is applied to;
i) the at least one operator, based on the operator type;
ii) the at least one buffer status; and
iii) the relationship that the at least one buffer has with the at least one operator; and
determining, by the computing processor, a performance bottleneck status associated with the parallel data processing environment, based on the operator bottleneck status.
2 Assignments
0 Petitions
Accused Products
Abstract
A system identifies a performance bottleneck status in a parallel data processing environment by examining data flow associated with the parallel data processing environment to identify at least one operator, where an operator type is associated with at least one operator, at least one buffer, and a relationship that the buffer has with the operator, where the relationship is associated with the operator type. The system monitors the buffer to determine a buffer status associated with the buffer. The system applies a set of rules to identify an operator bottleneck status associated with the operator. The set of rules is applied to the operator, based on the operator type, the buffer status, and relationship that the buffer has with the operator. The system then determines a performance bottleneck status associated with the parallel data processing environment, based on the operator bottleneck status.
78 Citations
17 Claims
-
1. A method for identifying performance bottleneck status in a parallel data processing environment, implemented by a computing processor, the method comprising:
-
examining, by the computing processor, data flow associated with the parallel data processing environment to identify; i) at least one operator, wherein an operator type is associated with the at least one operator; ii) at least one buffer; and iii) a relationship that the at least one buffer has with the at least one operator, wherein the relationship is associated with the operator type; and
wherein examining the data flow associated with the parallel data processing environment comprises;identifying a first sub-operator connected to a second sub-operator, wherein no buffer exists between the first sub-operator and the second sub-operator; and combining the first sub-operator and the second sub-operator to create the at least one operator; monitoring, by the computing processor, the at least one buffer to determine a buffer status associated with the at least one buffer; applying, by the computing processor, a set of rules to identify an operator bottleneck status associated with the at least one operator, wherein the set of rules is applied to; i) the at least one operator, based on the operator type; ii) the at least one buffer status; and iii) the relationship that the at least one buffer has with the at least one operator; and determining, by the computing processor, a performance bottleneck status associated with the parallel data processing environment, based on the operator bottleneck status. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product for identifying performance bottleneck status in a parallel data processing environment, the computer program product comprising:
a non-transitory computer readable storage medium having computer readable program code embodied therewith, the program code executable by a processor to; examine data flow associated with the parallel data processing environment to identify; i) at least one operator, wherein an operator type is associated with the at least one operator; ii) at least one buffer; and iii) a relationship that the at least one buffer has with the at least one operator, wherein the relationship is associated with the operator type; and
wherein the computer readable program code configured to examine data flow associated with the parallel data processing environment is further configured to;identify a first sub-operator connected to a second sub-operator, wherein no buffer exists between the first sub-operator and the second sub-operator; and combine the first sub-operator and the second sub-operator to create the at least one operator; monitor the at least one buffer to determine a buffer status associated with the at least one buffer; apply a set of rules to identify an operator bottleneck status associated with the at least one operator, wherein the set of rules is applied to; i) the at least one operator, based on the operator type; ii) the at least one buffer status; and iii) the relationship that the at least one buffer has with the at least one operator; and determine a performance bottleneck status associated with the parallel data processing environment, based on the operator bottleneck status. - View Dependent Claims (9, 10, 11, 12, 13)
-
14. A system comprising:
-
a processor; and a computer readable storage medium operationally coupled to the processor, the computer readable storage medium having computer readable program code embodied therewith to be executed by the processor, the computer readable program code configured to; examine data flow associated with the parallel data processing environment to identify; i) at least one operator, wherein an operator type is associated with the at least one operator; ii) at least one buffer; and iii) a relationship that the at least one buffer has with the at least one operator, wherein the relationship is associated with the operator type, and wherein the computer readable program code configured to examine data flow associated with the parallel data processing environment is further configured to;
determine the operator type associated with the at least one operator based on a data partition configuration associated with;i) the at least one buffer; and ii) the relationship that the at least one buffer has with the at least one operator; monitor the at least one buffer to determine a buffer status associated with the at least one buffer; apply a set of rules to identify an operator bottleneck status associated with the at least one operator, wherein the set of rules is applied to; i) the at least one operator, based on the operator type; ii) the at least one buffer status; and iii) the relationship that the at least one buffer has with the at least one operator; and determine a performance bottleneck status associated with the parallel data processing environment, based on the operator bottleneck status. - View Dependent Claims (15, 16, 17)
-
Specification