Pluggable storage system for parallel query engines
First Claim
Patent Images
1. A method for managing data, comprising:
- receiving, by a universal namenode, a query from a client;
based at least in part on the received query, accessing a catalog service and searching a catalog provided by the catalog service for location information of one or more files responsive to the query, wherein the catalog stores a mapping of a plurality of files stored on a plurality of storage systems to a location at which the plurality of files are respectively stored on the corresponding plurality of storage systems, the plurality of storage systems comprising at least a first storage system and a second storage system;
based at least in part on the search of the catalog, determining to move at least one of the plurality of files from the second storage system to the first storage system, and determining to communicate with the first storage system in connection with the one or more files responsive to the queries, and a first protocol for communication with the first storage system;
communicating, by the universal namenode, with the first storage system using the associated first protocol;
performing at least a portion of the query on the first storage system; and
providing, to the client, results of the query such that in the event that various portions of the results correspond to query results stored on a set of the plurality of storage systems, the results of the query are presented in a unified view across the set of the plurality of storage systems and appear, from a perspective of the client, to exist from a single namespace.
9 Assignments
0 Petitions
Accused Products
Abstract
A method, article of manufacture, and apparatus for managing data. In some embodiments, this includes, receiving a query from a client, based on the received query, analyzing a catalog for location information, based on the analysis, determining a first storage system, an associated first file system and an associated first protocol, using the associated first protocol to communicate with the first storage system, and performing at least a portion of the query on the first storage system.
140 Citations
26 Claims
-
1. A method for managing data, comprising:
-
receiving, by a universal namenode, a query from a client; based at least in part on the received query, accessing a catalog service and searching a catalog provided by the catalog service for location information of one or more files responsive to the query, wherein the catalog stores a mapping of a plurality of files stored on a plurality of storage systems to a location at which the plurality of files are respectively stored on the corresponding plurality of storage systems, the plurality of storage systems comprising at least a first storage system and a second storage system; based at least in part on the search of the catalog, determining to move at least one of the plurality of files from the second storage system to the first storage system, and determining to communicate with the first storage system in connection with the one or more files responsive to the queries, and a first protocol for communication with the first storage system; communicating, by the universal namenode, with the first storage system using the associated first protocol; performing at least a portion of the query on the first storage system; and providing, to the client, results of the query such that in the event that various portions of the results correspond to query results stored on a set of the plurality of storage systems, the results of the query are presented in a unified view across the set of the plurality of storage systems and appear, from a perspective of the client, to exist from a single namespace. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 21, 22, 23, 24, 25, 26)
-
-
9. A system for managing data, comprising a processor configured to:
-
receive a query from a client; based at least in part on the received query, accessing a catalog service and searching a catalog provided by the catalog service for location information of one or more files responsive to the query, wherein the catalog stores a mapping of a plurality of files stored on a plurality of storage systems to a location at which the plurality of files are respectively stored on the corresponding plurality of storage systems, the plurality of storage systems comprising at least a first storage system and a second storage system; based at least in part on the search of the catalog, determine to move at least one of the plurality of files from the second storage system to the first storage system, and determine to communicate with the first storage system in connection with the one or more files responsive to the queries, and a first protocol for communication with the first storage system; communicate with the first storage system using the associated first protocol; perform at least a portion of the query on the first storage system; and provide, to the client, results of the query such that in the event that various portions of the results correspond to query results stored on a set of the plurality of storage systems, the results of the query are presented in a unified view across the set of the plurality of storage systems and appear, from a perspective of the client, to exist from a single namespace. - View Dependent Claims (10, 11, 12, 13, 20)
-
-
14. A computer program product for processing data, comprising a non-transitory computer readable medium having program instructions embodied therein for:
-
receiving a query from a client; based at least in part on the received query, accessing a catalog service and searching a catalog provided by the catalog service for location information of one or more files responsive to the query, wherein the catalog stores a mapping of a plurality of files stored on a plurality of storage systems to a location at which the plurality of files are respectively stored on the corresponding plurality of storage systems, the plurality of storage systems comprising at least a first storage system and a second storage system; based at least in part on the search of the catalog, determining to move at least one of the plurality of files from the second storage system to the first storage system, and determining to communicate with the first storage system in connection with the one or more files responsive to the queries, a first protocol for communication with the first storage system; communicate with the first storage system using the associated first protocol; performing at least a portion of the query on the first storage system; and providing, to the client, results of the query such that in the event that various portions of the results correspond to query results stored on a set of the plurality of storage systems, the results of the query are presented in a unified view across the set of the plurality of storage systems and appear, from a perspective of the client, to exist from a single namespace. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification