Distributed catalog service for data processing platform
First Claim
1. A method comprising:
- configuring a plurality of distributed processing nodes, each comprising a processor coupled to a memory, to communicate over a network;
abstracting content locally accessible in respective data zones of respective ones of the distributed processing nodes into respective catalogs of a distributed catalog service in accordance with a layered extensible data model;
providing in the distributed processing nodes a plurality of microservices for performing processing operations on at least one of the layered extensible data model and the catalogs of the distributed catalog service; and
executing an application distributed across at least two of the plurality of distributed processing nodes utilizing the catalogs of the distributed catalog service to determine, for each of the at least two distributed processing nodes, a subset of a plurality of data resources utilized by the application that are located within its corresponding one of the data zones;
wherein each of the catalogs of the distributed catalog service is configured to track data resources within its corresponding one of the data zones through addressing the data resources based on semantic content of the data resources expressed through metadata;
wherein the layered extensible model comprises;
a data layer configured to persist the catalogs of the distributed catalog service;
a core data model layer configured to provide a set of core classes for classifying the data resources in the respective data zones; and
at least one extensions layer configured to extend respective ones of the core classes to at least one of;
one or more additional classes; and
instances of one or more the core classes and the additional classes;
wherein the microservices comprise at least one microservice configured to establish relationships between data resources and metadata using one or more of the core classes, the additional classes, and the instances of the core classes and additional classes; and
wherein the microservices further comprise at least one microservice configured to automate a process of metadata collection and ingestion for one or more discovered data hubs and data sources to populate the catalogs of the distributed catalog service.
7 Assignments
0 Petitions
Accused Products
Abstract
An apparatus in one embodiment comprises at least one processing device having a processor coupled to a memory. The one or more processing devices are operative to configure a plurality of distributed processing nodes to communicate over a network, to abstract content locally accessible in respective data zones of respective ones of the distributed processing nodes into respective catalogs of a distributed catalog service in accordance with a layered extensible data model, and to provide in the distributed processing nodes a plurality of microservices for performing processing operations on at least one of the layered extensible data model and the catalogs. The layered extensible data model comprises a plurality of layers including a core data model layer and at least one extensions layer. The microservices may comprise at least one microservice to alter the layered extensible data model and at least one microservice to query one or more of the catalogs.
-
Citations
20 Claims
-
1. A method comprising:
-
configuring a plurality of distributed processing nodes, each comprising a processor coupled to a memory, to communicate over a network; abstracting content locally accessible in respective data zones of respective ones of the distributed processing nodes into respective catalogs of a distributed catalog service in accordance with a layered extensible data model; providing in the distributed processing nodes a plurality of microservices for performing processing operations on at least one of the layered extensible data model and the catalogs of the distributed catalog service; and executing an application distributed across at least two of the plurality of distributed processing nodes utilizing the catalogs of the distributed catalog service to determine, for each of the at least two distributed processing nodes, a subset of a plurality of data resources utilized by the application that are located within its corresponding one of the data zones; wherein each of the catalogs of the distributed catalog service is configured to track data resources within its corresponding one of the data zones through addressing the data resources based on semantic content of the data resources expressed through metadata; wherein the layered extensible model comprises; a data layer configured to persist the catalogs of the distributed catalog service; a core data model layer configured to provide a set of core classes for classifying the data resources in the respective data zones; and at least one extensions layer configured to extend respective ones of the core classes to at least one of;
one or more additional classes; and
instances of one or more the core classes and the additional classes;wherein the microservices comprise at least one microservice configured to establish relationships between data resources and metadata using one or more of the core classes, the additional classes, and the instances of the core classes and additional classes; and wherein the microservices further comprise at least one microservice configured to automate a process of metadata collection and ingestion for one or more discovered data hubs and data sources to populate the catalogs of the distributed catalog service. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes said at least one processing device:
-
to configure a plurality of distributed processing nodes to communicate over a network; to abstract content locally accessible in respective data zones of respective ones of the distributed processing nodes into respective catalogs of a distributed catalog service in accordance with a layered extensible data model; to provide in the distributed processing nodes a plurality of microservices for performing processing operations on at least one of the layered extensible data model and the catalogs of the distributed catalog service; and to execute an application distributed across at least two of the plurality of distributed processing nodes utilizing the catalogs of the distributed catalog service to determine, for each of the at least two distributed processing nodes, a subset of a plurality of data resources utilized by the application that are located within its corresponding one of the data zones; wherein each of the catalogs of the distributed catalog service is configured to track a set of data resources within its corresponding one of the data zones through addressing the data resources based on semantic content of the data resources expressed through metadata; wherein the layered extensible model comprises; a data layer configured to persist the catalogs of the distributed catalog service; a core data model layer configured to provide a set of core classes for classifying the data resources in the respective data zones; and at least one extensions layer configured to extend respective ones of the core classes to at least one of;
one or more additional classes; and
instances of one or more the core classes and the additional classes; andwherein the microservices comprise at least one microservice configured to establish relationships between data resources and metadata using one or more of the core classes, the additional classes, and the instances of the core classes and additional classes; and wherein the microservices further comprise at least one microservice configured to automate a process of metadata collection and ingestion for one or more discovered data hubs and data sources to populate the catalogs of the distributed catalog service.
-
-
20. An apparatus comprising:
-
at least one processing device having a processor coupled to a memory; wherein said at least one processor is operative; to configure a plurality of distributed processing nodes to communicate over a network; to abstract content locally accessible in respective data zones of respective ones of the distributed processing nodes into respective catalogs of a distributed catalog service in accordance with a layered extensible data model; to provide in the distributed processing nodes a plurality of microservices for performing processing operations on at least one of the layered extensible data model and the catalogs of the distributed catalog service; and to execute an application distributed across at least two of the plurality of distributed processing nodes utilizing the catalogs of the distributed catalog service to determine, for each of the at least two distributed processing nodes, a subset of a plurality of data resources utilized by the application that are located within its corresponding one of the data zones; wherein each of the catalogs of the distributed catalog service is configured to track a set of data resources within its corresponding one of the data zones through addressing the data resources based on semantic content of the data resources expressed through metadata; wherein the layered extensible model comprises; a data layer configured to persist the catalogs of the distributed catalog service; a core data model layer configured to provide a set of core classes for classifying the data resources in the respective data zones; and at least one extensions layer configured to extend respective ones of the core classes to at least one of;
one or more additional classes; and
instances of one or more the core classes and the additional classes;wherein the microservices comprise at least one microservice configured to establish relationships between data resources and metadata using one or more of the core classes, the additional classes, and the instances of the core classes and additional classes; and wherein the microservices further comprise at least one microservice configured to automate a process of metadata collection and ingestion for one or more discovered data hubs and data sources to populate the catalogs of the distributed catalog service.
-
Specification