Analyzing large-scale data processing jobs
First Claim
1. A computer-implemented method for data analysis in a distributed computing system, the method comprising:
- accessing data, stored in a storage device of a first processing zone, that is associated with a particular child job created from a particular distributed data processing job that has been executed;
detecting, from the data stored in the storage device, identifying information that identifies the particular child job created from the particular distributed data processing job;
in response to detecting the identifying information that identifies the particular child job created from the particular distributed data processing job, determining that the identifying information that identifies the particular child job and second identifying information stored in a storage device of a second processing zone share a common prefix;
in response to determining that the identifying information that identifies the particular child job and the second identifying information stored in the storage device of the second processing zone share a common prefix, identifying an additional child job as being created from the particular distributed data processing job;
correlating particular output data associated with the particular child job and additional output data associated with the additional child job created from the particular distributed data processing job;
determining performance data for the particular distributed data processing job based on the particular output data associated with the particular child job and the additional output data associated with the additional child job; and
providing for display the performance data for the particular distributed data processing job based on the particular output data associated with the particular child job and the additional output data associated with the additional child job.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus for data analysis in a distributed computing system by accessing data stored at a first processing zone associated with a distributed data processing job, detecting information identifying a particular child job associated with the distributed data processing job, comparing the identifying information to data stored at a second processing zone, and identifying an additional child job as associated with the distributed data processing job based on a result of the comparison. The methods, systems and apparatus are further for correlating particular output data associated with the particular child job and additional output data associated with the additional child job for the distributed data processing job, determining performance data for the distributed data processing job based on the output data associated with each of the particular child job and the additional child job, and providing for display the performance data for the distributed data processing job.
33 Citations
18 Claims
-
1. A computer-implemented method for data analysis in a distributed computing system, the method comprising:
-
accessing data, stored in a storage device of a first processing zone, that is associated with a particular child job created from a particular distributed data processing job that has been executed; detecting, from the data stored in the storage device, identifying information that identifies the particular child job created from the particular distributed data processing job; in response to detecting the identifying information that identifies the particular child job created from the particular distributed data processing job, determining that the identifying information that identifies the particular child job and second identifying information stored in a storage device of a second processing zone share a common prefix; in response to determining that the identifying information that identifies the particular child job and the second identifying information stored in the storage device of the second processing zone share a common prefix, identifying an additional child job as being created from the particular distributed data processing job; correlating particular output data associated with the particular child job and additional output data associated with the additional child job created from the particular distributed data processing job; determining performance data for the particular distributed data processing job based on the particular output data associated with the particular child job and the additional output data associated with the additional child job; and providing for display the performance data for the particular distributed data processing job based on the particular output data associated with the particular child job and the additional output data associated with the additional child job. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system, comprising:
-
one or more processors; and a memory storing instructions that are operable, when executed, to cause the one or more processors to perform operations comprising; accessing data, stored in a storage device of a first processing zone, that is associated with a particular child job created from a particular distributed data processing job that has been executed; detecting, from the data stored in the storage device, identifying information that identifies the particular child job created from the particular distributed data processing job; in response to detecting the identifying information that identifies the particular child job created from the particular distributed data processing job, determining that the identifying information that identifies the particular child job and second identifying information stored in a storage device of a second processing zone share a common prefix; in response to determining that the identifying information that identifies the particular child job and the second identifying information stored in the storage device of the second processing zone share a common prefix, identifying an additional child job as being created from the particular distributed data processing job; correlating particular output data associated with the particular child job and additional output data associated with the additional child job created from the particular distributed data processing job; determining performance data for the particular distributed data processing job based on the particular output data associated with the particular child job and the additional output data associated with the additional child job; and providing for display the performance data for the particular distributed data processing job based on the particular output data associated with the particular child job and the additional output data associated with the additional child job. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage device storing instructions executable by one or more processors which, upon such execution, cause the one or more processors to perform operations in a distributed computing system, the operations comprising:
-
accessing data, stored in a storage device of a first processing zone, that is associated with a particular child job created from a particular distributed data processing job that has been executed; detecting, from the data stored in the storage device, identifying information that identifies the particular child job created from the particular distributed data processing job; in response to detecting the identifying information that identifies the particular child job created from the particular distributed data processing job, determining that the identifying information that identifies the particular child job and second identifying information stored in a storage device of a second processing zone share a common prefix; in response to determining that the identifying information that identifies the particular child job and the second identifying information stored in the storage device of the second processing zone share a common prefix, identifying an additional child job as being created from the particular distributed data processing job; correlating particular output data associated with the particular child job and additional output data associated with the additional child job created from the particular distributed data processing job; determining performance data for the particular distributed data processing job based on the particular output data associated with the particular child job and the additional output data associated with the additional child job; and providing for display the performance data for the particular distributed data processing job based on the particular output data associated with the particular child job and the additional output data associated with the additional child job. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification