Distributed catalog service for multi-cluster data processing platform
First Claim
1. A method comprising:
- implementing a first portion of a distributed catalog service for a given one of a plurality of distributed processing node clusters associated with respective data zones, each of the clusters being configured to perform processing operations utilizing local data resources locally accessible within its corresponding data zone;
receiving in the first portion of the distributed catalog service a request to identify for each of a plurality of data resources to be utilized by an application initiated in the given cluster whether the data resource is a local data resource or a remote data resource relative to the given cluster; and
providing from the first portion of the distributed catalog service a response to the request;
wherein the first portion of the distributed catalog service in combination with additional portions implemented for respective additional ones of the plurality of distributed processing node clusters collectively provide the distributed catalog service with capability to resolve local or remote status of data resources in the data zones of each of the clusters responsive to requests from any other one of the clusters;
wherein a given one of the portions of the distributed catalog service in conjunction with its initiation as a Yet Another Resource Negotiator (YARN) application is registered as a service with a service registry of a resource manager of the corresponding cluster; and
wherein the method is implemented by at least one processing device comprising a processor coupled to a memory.
11 Assignments
0 Petitions
Accused Products
Abstract
A first portion of a distributed catalog service is implemented for a given one of a plurality of distributed processing node clusters associated with respective data zones, each of the clusters being configured to perform processing operations utilizing local data resources locally accessible within its corresponding data zone. The first portion of the distributed catalog service receives a request to identify for each of a plurality of data resources to be utilized by an application initiated in the given cluster whether the data resource is a local or remote data resource relative to the given cluster, and provides a response to the request. The first portion of the distributed catalog service in combination with additional portions implemented for respective additional ones of the distributed processing node clusters collectively provide the distributed catalog service with capability to resolve local or remote status of data resources in each of the data zones.
69 Citations
20 Claims
-
1. A method comprising:
-
implementing a first portion of a distributed catalog service for a given one of a plurality of distributed processing node clusters associated with respective data zones, each of the clusters being configured to perform processing operations utilizing local data resources locally accessible within its corresponding data zone; receiving in the first portion of the distributed catalog service a request to identify for each of a plurality of data resources to be utilized by an application initiated in the given cluster whether the data resource is a local data resource or a remote data resource relative to the given cluster; and providing from the first portion of the distributed catalog service a response to the request; wherein the first portion of the distributed catalog service in combination with additional portions implemented for respective additional ones of the plurality of distributed processing node clusters collectively provide the distributed catalog service with capability to resolve local or remote status of data resources in the data zones of each of the clusters responsive to requests from any other one of the clusters; wherein a given one of the portions of the distributed catalog service in conjunction with its initiation as a Yet Another Resource Negotiator (YARN) application is registered as a service with a service registry of a resource manager of the corresponding cluster; and wherein the method is implemented by at least one processing device comprising a processor coupled to a memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes said at least one processing device:
-
to implement a first portion of a distributed catalog service for a given one of a plurality of distributed processing node clusters associated with respective data zones, each of the clusters being configured to perform processing operations utilizing local data resources locally accessible within its corresponding data zone; to receive in the first portion of the distributed catalog service a request to identify for each of a plurality of data resources to be utilized by an application initiated in the given cluster whether the data resource is a local data resource or a remote data resource relative to the given cluster; and to provide from the first portion of the distributed catalog service a response to the request; wherein the first portion of the distributed catalog service in combination with additional portions implemented for respective additional ones of the plurality of distributed processing node clusters collectively provide the distributed catalog service with capability to resolve local or remote status of data resources in the data zones of each of the clusters responsive to requests from any other one of the clusters; and wherein a given one of the portions of the distributed catalog service in conjunction with its initiation as a Yet Another Resource Negotiator (YARN) application is registered as a service with a service registry of a resource manager of the corresponding cluster. - View Dependent Claims (17)
-
-
18. An apparatus comprising:
-
at least one processing device having a processor coupled to a memory; wherein said at least one processor is configured; to implement a first portion of a distributed catalog service for a given one of a plurality of distributed processing node clusters associated with respective data zones, each of the clusters being configured to perform processing operations utilizing local data resources locally accessible within its corresponding data zone; to receive in the first portion of the distributed catalog service a request to identify for each of a plurality of data resources to be utilized by an application initiated in the given cluster whether the data resource is a local data resource or a remote data resource relative to the given cluster; and to provide from the first portion of the distributed catalog service a response to the request; wherein the first portion of the distributed catalog service in combination with additional portions implemented for respective additional ones of the plurality of distributed processing node clusters collectively provide the distributed catalog service with capability to resolve local or remote status of data resources in the data zones of each of the clusters responsive to requests from any other one of the clusters; and wherein a given one of the portions of the distributed catalog service in conjunction with its initiation as a Yet Another Resource Negotiator (YARN) application is registered as a service with a service registry of a resource manager of the corresponding cluster. - View Dependent Claims (19, 20)
-
Specification