×

Distributed catalog service for multi-cluster data processing platform

  • US 10,270,707 B1
  • Filed: 12/29/2015
  • Issued: 04/23/2019
  • Est. Priority Date: 04/06/2015
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • implementing a first portion of a distributed catalog service for a given one of a plurality of distributed processing node clusters associated with respective data zones, each of the clusters being configured to perform processing operations utilizing local data resources locally accessible within its corresponding data zone;

    receiving in the first portion of the distributed catalog service a request to identify for each of a plurality of data resources to be utilized by an application initiated in the given cluster whether the data resource is a local data resource or a remote data resource relative to the given cluster; and

    providing from the first portion of the distributed catalog service a response to the request;

    wherein the first portion of the distributed catalog service in combination with additional portions implemented for respective additional ones of the plurality of distributed processing node clusters collectively provide the distributed catalog service with capability to resolve local or remote status of data resources in the data zones of each of the clusters responsive to requests from any other one of the clusters;

    wherein a given one of the portions of the distributed catalog service in conjunction with its initiation as a Yet Another Resource Negotiator (YARN) application is registered as a service with a service registry of a resource manager of the corresponding cluster; and

    wherein the method is implemented by at least one processing device comprising a processor coupled to a memory.

View all claims
  • 11 Assignments
Timeline View
Assignment View
    ×
    ×