×

Integration of distributed data processing platform with one or more distinct supporting platforms

  • US 10,541,938 B1
  • Filed: 02/01/2018
  • Issued: 01/21/2020
  • Est. Priority Date: 04/06/2015
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • configuring a plurality of distributed processing nodes, each comprising a processor coupled to a memory, to communicate over a network;

    obtaining metadata characterizing data locally accessible in respective data zones of respective ones of the distributed processing nodes;

    populating catalog instances of a distributed catalog service for respective ones of the data zones utilizing the obtained metadata; and

    performing distributed data analytics for a given analytics job in the distributed processing nodes utilizing the populated catalog instances of the distributed catalog service and the locally accessible data of the respective data zones;

    wherein the given analytics job utilizes one or more first data items locally accessible within a first one of the data zones of a first one of the distributed processing nodes and one or more second data items locally accessible within at least a second one of the data zones of at least a second one of the distributed processing nodes;

    wherein the obtained metadata comprises one or more metadata tags identifying at least one of the one or more first data items and the one or more second data items, a first entrance key permitting access to a first internal network of the first data zone, and a second entrance key permitting access to a second internal network of the second data zone;

    wherein populating the catalog instances of the distributed catalog service further comprises provisioning a first populated catalog instance associated with the first data zone of the first distributed processing node with the first entrance key and provisioning a second populated catalog instance associated with the second data zone of the second distributed processing node with the second entrance key, andwherein performing distributed data analytics for the given analytics job comprises at least one of;

    the first populated catalog instance associated with the first data zone of the first distributed processing node utilizing at least one of the one or more metadata tags to map to one or more first physical storage locations of one or more of the first data items in one or more first data storage devices of the first distributed processing node and utilizing the first entrance key to access one or more of the first data items at the one or more first physical storage locations in the one or more first data storage devices of the first distributed processing node; and

    the second populated catalog instance associated with the second data zone of the second distributed processing node utilizing at least one of the one or more metadata tags to map to one or more second physical storage locations of one or more of the second data items in one or more second data storage devices of the second distributed processing node and utilizing the second entrance key to access one or more of the second data items at the one or more second physical storage locations in the one or more second data storage devices of the second distributed processing node.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×