Partition-based index management in hadoop-like data stores
First Claim
1. A method for maintaining an index of a processing a dataset in a partitioned distributed storage system after a batch update of a dataset, the partitioned distributed storage system having data stored in a base table and an index stored in an index table, comprising:
- locking the base table and the index table to prevent external update operations;
receiving base and index table metadata from the partitioned distributed storage system, wherein the base table metadata and the index table metadata include respective base table partition information and index table partition information;
partitioning the dataset into a set of base-delta files according to the base table metadata;
generating a set of index-delta files corresponding with the base-delta files according to the index table metadata;
updating the partitioned distributed storage system with the set of base-delta files and the set of index-delta files,wherein a first update of the base table is synchronous with a second update of the index table; and
unlocking, subsequent to the update, the base table and the index table.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for processing a dataset in a partitioned distributed storage system having data stored in a base table and an index stored in an index table, may include receiving base and index table metadata from the partitioned distributed storage system, where the base and index table metadata includes respective table partition information. The method may further include partitioning the dataset into a set of base-delta files according to the base table metadata, and generating a set of index-delta files corresponding with the base-delta files according to the index table metadata. The method may additionally include updating the partitioned distributed storage system with the set of base-delta and the set of index-delta files, where a first update of the base table is synchronous with a second update of the index table.
-
Citations
20 Claims
-
1. A method for maintaining an index of a processing a dataset in a partitioned distributed storage system after a batch update of a dataset, the partitioned distributed storage system having data stored in a base table and an index stored in an index table, comprising:
-
locking the base table and the index table to prevent external update operations; receiving base and index table metadata from the partitioned distributed storage system, wherein the base table metadata and the index table metadata include respective base table partition information and index table partition information; partitioning the dataset into a set of base-delta files according to the base table metadata; generating a set of index-delta files corresponding with the base-delta files according to the index table metadata; updating the partitioned distributed storage system with the set of base-delta files and the set of index-delta files, wherein a first update of the base table is synchronous with a second update of the index table; and unlocking, subsequent to the update, the base table and the index table. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for maintaining an index of a partitioned distributed storage system having data stored in a base table and an index stored in an index table, comprising:
-
one or more computing nodes having a memory and a hardware processor; and a non-transitory computer readable storage medium of the one or more computing nodes having program instructions embodied therewith, the program instructions executable by the processor to cause the system to; lock the base table and the index table to prevent external update operations; receive base and index table metadata from the partitioned distributed storage system, wherein the base table metadata and the index table metadata include respective base table partition information and index table partition information; partition the dataset into a set of base-delta files according to the base table metadata; generate a set of index-delta files corresponding with the base-delta files according to the index table metadata; and update the partitioned distributed storage system with the set of base-delta files and the set of index-delta files, wherein a first update of the base table is synchronous with a second update of the index table; and unlock, subsequent to the update, the base table and the index table. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A computer program product for processing a dataset in a partitioned distributed storage system having data stored in a base table and an index stored in an index table, the computer program product including a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processing circuit to cause the processing circuit to perform a method comprising:
-
locking the base table and the index table to prevent external update operations; receiving base and index table metadata from the partitioned distributed storage system, wherein the base table metadata and the index table metadata include respective base table partition information and index table partition information; partitioning the dataset into a set of base-delta files according to the base table metadata; generating a set of index-delta files corresponding with the base-delta files according to the index table metadata; updating the partitioned distributed storage system with the set of base-delta files and the set of index-delta files, wherein a first update of the base table is synchronous with a second update of the index table; and unlocking, subsequent to the second update, the base table and the index table. - View Dependent Claims (17, 18, 19, 20)
-
Specification