Hive table links
First Claim
1. A method, comprising:
- receiving a data query from a tenant assigned to a first virtual data warehouse included in a plurality of virtual data warehouses located within a multi-tenancy data warehouse, wherein the first virtual data warehouse is a database part of a first physical data center,wherein the tenant is associated with multiple namespaces, wherein the data query identifies a first namespace of the multiple namespaces, wherein the multiple namespaces are mapped to the multi-tenancy data warehouse, wherein the multi-tenancy data warehouse corresponds to a physical data warehouse, and wherein the first virtual data warehouse is included in a first physical data center;
generating a set of information, based on the data query, indicating data the tenant is authorized to access within the multi-tenancy data warehouse;
receiving a declaration from the tenant to utilize a subset of the data which the tenant can access as indicated by the set of information;
determining whether the subset of the data is in a second physical data center that is different from the first physical data center;
in response to determining the subset of the data is in a second physical data center different from the first physical data center, caching the subset of the data from the second physical data center in the first physical data center and generating a link identifying a set of parameters for retrieving the data, wherein the set of parameters comprises a time range of data to be retrieved;
importing data that is outside of the first virtual data warehouse using the link;
preventing access, by the first namespace using a two-part name syntax, to a second virtual warehouse by the first virtual data warehouse, wherein the second virtual data warehouse is included in the plurality of virtual data warehouses;
monitoring utilization of the data outside of the first virtual data warehouse;
determining the utilization of the data only includes data from a smaller time range; and
updating the set of parameters so that the link only retrieves the data from the smaller time range.
2 Assignments
0 Petitions
Accused Products
Abstract
Some embodiments include a plurality of virtual data warehouses having table link capabilities that are built on top of a data center (e.g., running Apache Hive). Each virtual data warehouse can be modeled as a database and manage data in forms of database tables. The virtual data warehouse can include links which import tables from other virtual data warehouses by reference. Each link may contain partition metadata for the table partitions by dates of the source table and retention metadata to declare the needed retention time period for the partitions of the source table. The links can be dynamic and update when the corresponding source table receives new partitions or drops partitions. When a virtual data warehouse is migrated to another data center, the system can retain necessary table partitions on the current data center based on the partition and retention metadata of the links.
-
Citations
7 Claims
-
1. A method, comprising:
-
receiving a data query from a tenant assigned to a first virtual data warehouse included in a plurality of virtual data warehouses located within a multi-tenancy data warehouse, wherein the first virtual data warehouse is a database part of a first physical data center, wherein the tenant is associated with multiple namespaces, wherein the data query identifies a first namespace of the multiple namespaces, wherein the multiple namespaces are mapped to the multi-tenancy data warehouse, wherein the multi-tenancy data warehouse corresponds to a physical data warehouse, and wherein the first virtual data warehouse is included in a first physical data center; generating a set of information, based on the data query, indicating data the tenant is authorized to access within the multi-tenancy data warehouse; receiving a declaration from the tenant to utilize a subset of the data which the tenant can access as indicated by the set of information; determining whether the subset of the data is in a second physical data center that is different from the first physical data center; in response to determining the subset of the data is in a second physical data center different from the first physical data center, caching the subset of the data from the second physical data center in the first physical data center and generating a link identifying a set of parameters for retrieving the data, wherein the set of parameters comprises a time range of data to be retrieved; importing data that is outside of the first virtual data warehouse using the link; preventing access, by the first namespace using a two-part name syntax, to a second virtual warehouse by the first virtual data warehouse, wherein the second virtual data warehouse is included in the plurality of virtual data warehouses; monitoring utilization of the data outside of the first virtual data warehouse; determining the utilization of the data only includes data from a smaller time range; and updating the set of parameters so that the link only retrieves the data from the smaller time range. - View Dependent Claims (2, 3)
-
-
4. A computer-implemented method comprising:
-
generating a multi-tenancy data warehouse by creating multiple virtual data warehouses that are each assigned to a tenant, wherein a first virtual data warehouse of the multiple virtual data warehouses includes a link that identifies the tenant based on a namespace and links to a data table stored within a second virtual data warehouse of the multiple virtual data warehouses to allow the tenant to access the data table without copying the data table from the second virtual data warehouse to the first virtual data warehouse, wherein the namespace is included in multiple namespaces associated with the tenant, wherein the multiple namespaces are mapped to the multi-tenancy data warehouse, wherein the multitenancy data warehouse corresponds to a physical data warehouse, wherein the first virtual data warehouse is included in a first physical data center, and wherein the link to the data table includes a time frame restriction; monitoring for changes within the multi-tenancy data warehouse that would require the link to be updated, wherein the changes include an addition or a dropping of a partition in the data table stored within the second virtual data warehouse; updating the link in accordance with the changes detected within the multitenancy data warehouse; determining that the tenant requested data from another multi-tenancy data warehouse that is different from the multi-tenancy data warehouse; copying the requested data from the other multi-tenancy data warehouse to the multi-tenancy data warehouse; preventing access, by the namespace using a two-part name syntax, to a second virtual data warehouse by the first virtual data warehouse, wherein the second virtual data warehouse is included in the plurality of virtual data warehouses; monitoring access to the data table to determine an amount of the data table actually used; and updating the link to the data table to include a second time frame restriction that corresponds to an amount of the data table actually used. - View Dependent Claims (5, 6, 7)
-
Specification