Data loading systems and methods
First Claim
Patent Images
1. A method for processing and transferring data from a file system to a database system, the method comprising the steps of:
- receiving a query containing a request for accessing data from a file system, wherein the request for accessing data identifies a plurality of attributes, each attribute being associated with an object identifier;
determining, based on the query, whether at least one partition of at least one attribute of the data has been previously loaded into the database system;
incrementally loading, based on a determination that the at least one partition of at least one attribute of the data has not been previously loaded into the database system, the at least one partition of the at least one attribute of the data into the database system while continuing to process the query without loading all attributes in the plurality of attributes identified by the request at the time of receiving the query, and without loading the at least one partition of at least one attribute of the data into the database system upon determination that the at least one partition has been previously loaded into the database system, the determination is being made based on a catalog containing a mapping of a portion of the plurality of attributes that has been previously loaded into the database system, the at least one loaded partition of the at least one attribute is being stored together with the object identifier associated with the at least one attribute; and
joining the at least one loaded partition and at least another loaded partition of at least another attribute using the object identifier associated with the at least one attribute and another object identifier associated with the at least another attribute, to generate a dataset responsive to the received query;
wherein the incremental loading is performed during a map phase of a MapReduce processing task.
1 Assignment
0 Petitions
Accused Products
Abstract
System, method, and computer program product for processing data are disclosed. The system is configured to perform transfer of data from a file system to a database system. Such transfer is accomplished through receiving a request for loading data into a database system, wherein the data includes a plurality of attributes, determining at least one attribute of the data for loading into the database system, and loading the at least one attribute of the data into the database system while continuing to process remaining attributes of the data.
-
Citations
58 Claims
-
1. A method for processing and transferring data from a file system to a database system, the method comprising the steps of:
-
receiving a query containing a request for accessing data from a file system, wherein the request for accessing data identifies a plurality of attributes, each attribute being associated with an object identifier; determining, based on the query, whether at least one partition of at least one attribute of the data has been previously loaded into the database system; incrementally loading, based on a determination that the at least one partition of at least one attribute of the data has not been previously loaded into the database system, the at least one partition of the at least one attribute of the data into the database system while continuing to process the query without loading all attributes in the plurality of attributes identified by the request at the time of receiving the query, and without loading the at least one partition of at least one attribute of the data into the database system upon determination that the at least one partition has been previously loaded into the database system, the determination is being made based on a catalog containing a mapping of a portion of the plurality of attributes that has been previously loaded into the database system, the at least one loaded partition of the at least one attribute is being stored together with the object identifier associated with the at least one attribute; and joining the at least one loaded partition and at least another loaded partition of at least another attribute using the object identifier associated with the at least one attribute and another object identifier associated with the at least another attribute, to generate a dataset responsive to the received query; wherein the incremental loading is performed during a map phase of a MapReduce processing task. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 22, 56)
-
-
11. A data processing system for transferring data having a plurality of attributes from a file system to a database system, the system comprising:
-
a processor configured to receive a query containing a request for accessing data from a file system, wherein the request for accessing data identifies a plurality of attributes, each attribute being associated with an object identifier; determine, based on the query, whether at least one partition of at least one attribute of the data has been previously loaded into the database system; a data loader module configured to incrementally load, based on a determination that the at least one partition of at least one attribute of the data has not been previously loaded into the database system, the at least one partition of at least one attribute of the data into the database system while the processor continues to process the query without loading all attributes in the plurality of attributes at the time of receiving the query and without loading the at least one partition of at least one attribute of the data into the database system upon determination that the at least one partition has been previously loaded into the database system, the determination is being made based on a catalog containing a mapping of a portion of the plurality of attributes that has been previously loaded into the database system, the at least one loaded partition of the at least one attribute is being stored together with the object identifier associated with the at least one attribute; and a join module configured to join the at least one loaded partition and at least another loaded partition of at least another attribute using the object identifier associated with the at least one attribute and another object identifier associated with the at least another attribute, to generate a dataset responsive to the received query; wherein the incremental loading is performed during a map phase of a MapReduce processing task. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 23, 57)
-
-
21. A non-transitory computer program product, tangibly embodied in a non-transitory computer-readable medium, the computer program product causing a data processing system for transferring data from a file system to a database system, to perform operations comprising:
-
receiving a query containing a request for accessing data from a file system, wherein the request for accessing data identifies a plurality of attributes, each attribute being associated with an object identifier; determining, based on the query, whether at least one partition of at least one attribute of the data has been previously loaded into the database system; incrementally loading, based on a determination that the at least one partition of at least one attribute of the data has not been previously loaded into the database system, the at least one partition of the at least one attribute of the data into the database system while continuing to process the query without loading all attributes in the plurality of attributes identified by the request at the time of receiving the query, and without loading the at least one partition of at least one attribute of the data into the database system upon determination that the at least one partition has been previously loaded into the database system, the determination is being made based on a catalog containing a mapping of a portion of the plurality of attributes that has been previously loaded into the database system, the at least one loaded partition of the at least one attribute is being stored together with the object identifier associated with the at least one attribute; and joining the at least one loaded partition and at least another loaded partition of at least another attribute using the object identifier associated with the at least one attribute and another object identifier associated with the at least another attribute, to generate a dataset responsive to the received query; wherein the incremental loading is performed during a map phase of a MapReduce processing task. - View Dependent Claims (24, 58)
-
-
25. A computer-implemented method for processing and transferring data from a file system to a database system, the method comprising the steps of:
-
receiving a query containing a request for accessing data from a file system, wherein the request for accessing data identifies a plurality of attributes, each attribute being associated with an object identifier; parsing at least one attribute in the plurality of attributes from the data; incrementally loading at least one partition of the at least one parsed attribute; processing, based on the query, to determine whether the at least one partition of the at least one parsed attribute of the data has been previously loaded into the database system; loading the data containing the at least one partition of the at least one parsed attribute of the data into the database system while continuing to process the query without loading all attributes in the plurality of attributes identified by the request at the time of receiving the query, and without loading the at least one partition of at least one parsed attributed of the data into the database system upon determination that the at least one partition has been previously loaded into the database system, the determination is being made based on a catalog containing a mapping of a portion of the plurality of attributes that has been previously loaded into the database system, the at least one loaded partition of the at least one attribute is being stored together with the object identifier associated with the at least one attribute; and joining the at least one loaded partition and at least another loaded partition of at least another attribute using the object identifier associated with the at least one attribute and another object identifier associated with the at least another attribute, to generate a dataset responsive to the received query; wherein the loading is performed during a map phase of a MapReduce processing task. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34)
-
-
35. A system comprising:
-
at least one processor; at least one memory coupled to the at least one processor, the at least one memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising; receiving a query containing a request for accessing data from a file system, wherein the request for accessing data identifies a plurality of attributes, each attribute being associated with an object identifier; parsing at least one attribute in the plurality of attributes from the data; incrementally loading at least one partition of the at least one parsed attribute; processing, based on the query, to determine whether the at least one partition of the at least one parsed attribute of the data has been previously loaded into the database system; loading the data containing the at least one partition of the at least one parsed attribute of the data into the database system while continuing to process the query without loading all attributes in the plurality of attributes identified by the request at the time of receiving the query, and without loading the at least one partition of at least one parsed attributed of the data into the database system upon determination that the at least one partition has been previously loaded into the database system, the determination is being made based on a catalog containing a mapping of a portion of the plurality of attributes that has been previously loaded into the database system, the at least one loaded partition of the at least one attribute is being stored together with the object identifier associated with the at least one attribute; and joining the at least one loaded partition and at least another loaded partition of at least another attribute using the object identifier associated with the at least one attribute and another object identifier associated with the at least another attribute, to generate a dataset responsive to the received query; wherein the loading is performed during a map phase of a MapReduce processing task. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42, 43, 44)
-
-
45. A non-transitory computer program product comprising non-transitory machine-readable medium storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising:
-
receiving a query containing a request for accessing data from a file system, wherein the request for accessing data identifies a plurality of attributes, each attribute being associated with an object identifier; parsing at least one attribute in the plurality of attributes from the data; incrementally loading at least one partition of the at least one parsed attribute; processing, based on the query, to determine whether the at least one partition of the at least one parsed attribute of the data has been previously loaded into the database system; loading the data containing the at least one partition of the at least one parsed attribute of the data into the database system while continuing to process the query without loading all attributes in the plurality of attributes identified by the request at the time of receiving the query, and without loading the at least one partition of at least one parsed attributed of the data into the database system upon determination that the at least one partition has been previously loaded into the database system, the determination is being made based on a catalog containing a mapping of a portion of the plurality of attributes that has been previously loaded into the database system, the at least one loaded partition of the at least one attribute is being stored together with the object identifier associated with the at least one attribute; and joining the at least one loaded partition and at least another loaded partition of at least another attribute using the object identifier associated with the at least one attribute and another object identifier associated with the at least another attribute, to generate a dataset responsive to the received query; wherein the loading is performed during a map phase of a MapReduce processing task. - View Dependent Claims (46, 47, 48, 49, 50, 51, 52, 53, 54)
-
-
55. A method for processing and transferring data from a file system to a database system, the method comprising the steps of:
-
receiving a processing task containing a request for accessing data from a file system, wherein the data includes a plurality of attributes identified by the received processing task, each attribute being associated with an object identifier; determining, based on the processing task, whether at least one partition of at least one attribute of the data has been previously loaded into the database system; incrementally loading, based on a determination that the at least one partition of at least one attribute of the data has not been previously loaded into the database system, the at least one partition of the at least one attribute of the data into the database system while continuing to process the processing task without loading all attributes in the plurality of attributes identified by the received processing task at the time of receiving the processing task and without loading the at least one partition of at least one attribute of the data into the database system upon determination that the at least one partition has been previously loaded into the database system, the determination is being made based on a catalog containing a mapping of a portion of the plurality of attributes that has been previously loaded into the database system, the at least one loaded partition of the at least one attribute is being stored together with the object identifier associated with the at least one attribute; and joining the at least one loaded partition and at least another loaded partition of at least another attribute using the object identifier associated with the at least one attribute and another object identifier associated with the at least another attribute, to generate a dataset responsive to the received query; wherein the loading is performed during a map phase of a MapReduce processing task.
-
Specification