Identifying task instance outliers based on metric data in a large scale parallel processing system
First Claim
1. A computer-implemented method comprising:
- receiving, for each of a plurality of task instances that execute one or more computer-executable instructions to perform a task, a plurality of performance measures that each represent an execution performance of a property of the respective task instance for a particular time interval, wherein the plurality of task instances are executed in parallel on one or more computers;
for each task instance;
determining, for each performance measure of the respective task instance, whether the respective performance measure exceeds a threshold value that is based on a function of a mean and a standard deviation of the performance measure that represent the same property as the respective performance measure;
determining, for each of the performance measures that exceeds the threshold value, a score using the respective performance measure and a mean and a standard deviation of the performance measures that represent the same property as the respective performance measure; and
combining the scores for the performance measure that represent the execution performance measure of the same property of the respective task instance to obtain a combined score value;
ranking the combined score values associated with at least a subset of the plurality of task instances to identify an outlier; and
terminating an execution of a particular task instance on a first computer and executing the particular task instance on a second computer different from the first computer based on the ranking of the combined score values, the particular task instance from the plurality of task instances.
2 Assignments
0 Petitions
Accused Products
Abstract
Among other disclosed subject matter, a method includes receiving metric data associated with an execution of each of a plurality of task instances. The plurality of task instances include task instances associated with a task and the metric data for each task instance relating to execution performance of the task instance. The method includes for each task instance determining a deviation of the metric data associated with the task instance relative to an overall deviation of the metric data for the plurality of task instances of the task during each of a plurality of intervals and combining deviation measurements for the task instance that exceed a threshold deviation to obtain a combined deviation value. Each deviation measurement corresponds to the deviation of the metric data for one of the plurality of intervals. The method includes ranking the combined deviation values associated with at least a subset of the task instances.
34 Citations
28 Claims
-
1. A computer-implemented method comprising:
-
receiving, for each of a plurality of task instances that execute one or more computer-executable instructions to perform a task, a plurality of performance measures that each represent an execution performance of a property of the respective task instance for a particular time interval, wherein the plurality of task instances are executed in parallel on one or more computers; for each task instance; determining, for each performance measure of the respective task instance, whether the respective performance measure exceeds a threshold value that is based on a function of a mean and a standard deviation of the performance measure that represent the same property as the respective performance measure; determining, for each of the performance measures that exceeds the threshold value, a score using the respective performance measure and a mean and a standard deviation of the performance measures that represent the same property as the respective performance measure; and combining the scores for the performance measure that represent the execution performance measure of the same property of the respective task instance to obtain a combined score value; ranking the combined score values associated with at least a subset of the plurality of task instances to identify an outlier; and terminating an execution of a particular task instance on a first computer and executing the particular task instance on a second computer different from the first computer based on the ranking of the combined score values, the particular task instance from the plurality of task instances. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system, comprising:
-
memory; and one or more processors coupled to the memory and configured to perform operations comprising; receiving, for each of a plurality of task instances that execute one or more computer-executable instructions to perform a task, a plurality of performance measures that each represent an execution performance measure of a property of the respective task instance for a particular time interval, wherein the plurality of task instances are executed in parallel on one or more computers; for each task instance; determining, for each performance measure of the respective task instance, whether the respective performance measure exceeds a threshold value that is based on a function of a mean and a standard deviation of the performance measure that represent the same property as the respective performance measure; determining, for each of the performance measures that exceeds the threshold value, a score using the respective performance measure and a mean and a standard deviation of the performance measures that represent the same property as the respective performance measure; and combining the scores for the performance measure that represent the execution performance measure of the same property of the respective task instance to obtain a combined score value; ranking the combined score values associated with at least a subset of the plurality of task instances to identify an outlier; and terminating an execution of a particular task instance on a first computer and executing the particular task instance on a second computer different from the first computer based on the ranking of the combined score values, the particular task instance from the plurality of task instances. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A non-transitory computer readable medium encoded with a computer program comprising instructions that, when executed, operate to cause a computer to perform operations:
-
receive, for each of a plurality of task instances that execute one or more computer-executable instructions to perform a task, a plurality of performance measures that each represent an execution performance measure of a property of the respective task instance for a particular time interval, wherein the plurality of task instances are executed in parallel on one or more computers; for each task instance; determine, for each performance measure of the respective task instance, whether the respective performance measure exceeds a threshold value that is based on a function of a mean and a standard deviation of the performance measure that represent the same property as the respective performance measure; determine, for each of the performance measures that exceeds the threshold value, a score using the respective performance measure and a mean and a standard deviation of the performance measures that represent the same property as the respective performance measure; and combine the scores for the performance measure that represent the execution performance measure of the same property of the respective task instance to obtain a combined score value; rank the combined score values associated with at least a subset of the plurality of task instances to identify an outlier; and terminating an execution of a particular task instance on a first computer and executing the particular task instance on a second computer different from the first computer based on the ranking of the combined score values, the particular task instance from the plurality of task instances. - View Dependent Claims (23, 24, 25, 26, 27, 28)
-
Specification