Query-level access to external petabyte-scale distributed file systems
First Claim
1. A computer implemented method for query-level access by a database engine to an external distributed file system, the method comprising:
- identifying a query against external data, the external data being queried as a table;
identifying one or more external locations for the external data, wherein the one or more external locations are on the external distributed file system;
identifying metadata that specifies code to execute operational directives to operate against the external data in the external distributed file system;
operating against the external data in external files that are in the external distributed file system, wherein one or more results files are generated from operating against the external data; and
retrieving data from the one or more results files from the external distributed file system without copying the data from the one or more results files to one or more table files on the database engine by;
streaming the data to one or more parallel query engines, wherein the one or more parallel query engines process the data, andoutputting the data from the one or more parallel query engines to a user application.
1 Assignment
0 Petitions
Accused Products
Abstract
A system to implement query-level access by a database engine to an external distributed file system by identifying a results file location of one or more results files on the external distributed file system, and storing the results file locations in external table files on the database engine for subsequent use during retrieval of data from the results files. The database engine serves to process queries where the query specifies the external table (which in turn references locations of the results files). Execution of the query streams data from the external distributed file system into the database engine. The data from the external distributed file system is not stored in the external table files on the database engine; rather, the external table files specify a location of code or operational directives which, when executed, streams results from the external distributed file system to at least one parallel query engine.
11 Citations
20 Claims
-
1. A computer implemented method for query-level access by a database engine to an external distributed file system, the method comprising:
-
identifying a query against external data, the external data being queried as a table; identifying one or more external locations for the external data, wherein the one or more external locations are on the external distributed file system; identifying metadata that specifies code to execute operational directives to operate against the external data in the external distributed file system; operating against the external data in external files that are in the external distributed file system, wherein one or more results files are generated from operating against the external data; and retrieving data from the one or more results files from the external distributed file system without copying the data from the one or more results files to one or more table files on the database engine by; streaming the data to one or more parallel query engines, wherein the one or more parallel query engines process the data, and outputting the data from the one or more parallel query engines to a user application. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer system for query-level access by a database engine to an external distributed file system, comprising:
-
a computer processor to execute a set of program code instructions; and a memory to hold the program code instructions, in which the program code instructions comprises program code to perform; identify a query against external data, the external data being queried as a table; identify one or more external locations for the external data, wherein the one or more external locations are on the external distributed file system; identify metadata that specifies code to execute operational directives to operate against the external data in the external distributed file system; operate against the external data in external files that are in the external distributed file system, wherein one or more results files are generated from operating against the external data; and retrieve data from the one or more results files from the external distributed file system without copying the data from the one or more results files to one or more table files on the database engine by; streaming the data to one or more parallel query engines, wherein the one or more parallel query engines process the data, and outputting the data from the one or more parallel query engines to a user application. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product embodied in a non-transitory computer readable medium, the computer readable medium having stored thereon a sequence of instructions which, when executed by a processor causes the processor to execute a process to perform query-level access by a database engine to an external distributed file system, the process comprising:
-
identifying a query against external data, the external data being queried as a table; identifying one or more external locations for the external data, wherein the one or more external locations are on the external distributed file system; identifying metadata that specifies code to execute operational directives to operate against the external data in the external distributed file system; operating against the external data in external files that are in the external distributed file system, wherein one or more results files are generated from operating against the external data; and retrieving data from the one or more results files from the external distributed file system without copying the data from the one or more results files to one or more table files on the database engine by; streaming the data to one or more parallel query engines, wherein the one or more parallel query engines process the data, and outputting the data from the one or more parallel query engines to a user application. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification