Method for connecting a relational data store's meta data with Hadoop
First Claim
Patent Images
1. A system for managing metadata, the system comprising:
- a network resource configured to store database metadata and to provide a shared catalog service to one or more data sources, wherein the shared catalog manages metadata for the one or more data sources, and wherein the one or more data sources include at least one data source comprising unstructured data and at least one data source comprising structured data;
a first data source in communication with the network resource to access the shared catalog service, wherein the first data source comprises a memory storing unstructured data; and
a second data source in communication with the network resource to access the shared catalog service, wherein the second data source comprises memory storing structured data,wherein in the event that a query is submitted to the system, the system determines that the query requires information from one or more of the first data source and the second data source, the first data source is configured to retrieve a first database schema corresponding to the first data source from the network resource via the shared catalog service and the first database schema is applied to the unstructured data comprised in the first data source, and the second data source is configured to retrieve a second database schema corresponding to the second data source from the network resource via the shared catalog service and the second database schema is applied to the structured data comprised in the second data source, andwherein a response to the query is generated using the unstructured data of the first data source and the structured data of the second data source.
9 Assignments
0 Petitions
Accused Products
Abstract
A system for sharing a metadata store between a relational database and an unstructured data source is disclosed. The unstructured data source may comprise a Hadoop system with a Hadoop Distributed Files System.
87 Citations
26 Claims
-
1. A system for managing metadata, the system comprising:
-
a network resource configured to store database metadata and to provide a shared catalog service to one or more data sources, wherein the shared catalog manages metadata for the one or more data sources, and wherein the one or more data sources include at least one data source comprising unstructured data and at least one data source comprising structured data; a first data source in communication with the network resource to access the shared catalog service, wherein the first data source comprises a memory storing unstructured data; and a second data source in communication with the network resource to access the shared catalog service, wherein the second data source comprises memory storing structured data, wherein in the event that a query is submitted to the system, the system determines that the query requires information from one or more of the first data source and the second data source, the first data source is configured to retrieve a first database schema corresponding to the first data source from the network resource via the shared catalog service and the first database schema is applied to the unstructured data comprised in the first data source, and the second data source is configured to retrieve a second database schema corresponding to the second data source from the network resource via the shared catalog service and the second database schema is applied to the structured data comprised in the second data source, and wherein a response to the query is generated using the unstructured data of the first data source and the structured data of the second data source. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A method for establishing a system for managing metadata, the method comprising:
-
receiving a query at a first data source; determining that the query requires information from one or more of the first data source and the second data source, wherein the first data source in communication with a shared catalog service via a network resource, wherein the shared catalog manages metadata for the one or more data sources, and wherein the one or more data sources include at least one data source comprising unstructured data and at least one data source comprising structured data, wherein the network resource is configured to store database metadata and to provide the shared catalog service, wherein the shared catalog service is in communication with a second data source, wherein the first data source comprises unstructured data, and wherein the second data source comprises structured data; retrieving, by one or more processors, a first database schema corresponding to the first data source from a network resource via the shared catalog service; applying, by one or more processors, the first database schema to unstructured data of the first data source; retrieving, by one or more processors, a second database schema corresponding to the second data source from the network resource via the shared catalog service; applying, by one or more processors, the second database schema to structured data of the second data source; processing, by one or more processors, the request; and generating a response to a query using the unstructured data of the first data source and the structured data of the second data source. - View Dependent Claims (25)
-
-
26. A computer program product for establishing a system for managing metadata, comprising a non-transitory computer readable medium having program instructions embodied therein for:
-
receiving a query at a first data source; determining that the query requires information from one or more of the first data source and the second data source, wherein the first data source in communication with a shared catalog service, wherein the shared catalog service is provided by a network resource, wherein the shared catalog manages metadata for the one or more data sources, and wherein the one or more data sources include at least one data source comprising unstructured data and at least one data source comprising structured data, wherein the shared catalog service is in communication with a second data source, wherein the first data source comprises unstructured data, and wherein the second data source comprises structured data; retrieving a first database schema from the network resource via the shared catalog service; applying the first database schema to unstructured data of the first data source; retrieving a second database schema from the network resource via the shared catalog service;
applying the second database schema to structured data of the second data source;processing the request; and generating a response to a query using the unstructured data of the first data source and the structured data of the second data source.
-
Specification