In-memory database system providing lockless read and write operations for OLAP and OLTP transactions
First Claim
1. A method for implementation by a database system comprising a combination of on-disk storage and in-memory storage, the method comprising:
- storing, in a plurality of fragments comprising at least a delta fragment and a main fragment retained in the on-disk storage, a plurality of data records that comprise a table, each fragment having visibility data structures to enable multi-version concurrency control (MVCC), the visibility data structures comprising at least one bit identifying a visibility of a row in the table, wherein MVCC information is maintained for each row of each fragment as such rows are inserted, updated, and deleted, the MVCC information comprising at least both of a creation timestamp and a destruction timestamp for each row;
compressing the delta fragment and the main fragment using n-bits compression to generate a compressed main fragment and a compressed delta fragment;
in response to operations comprising read and/or write operations on the row of the table, loading the compressed main fragment and the compressed delta fragment into main system memory in the in-memory storage from the on-disk storage when the compressed main fragment and the compressed delta fragment are accessed for the operations and are not already in the main system memory;
accessing a multi-version concurrency control object to obtain a current system timestamp for the row;
concurrently performing the operations on the row, while providing snapshot isolation, on the at least one of the compressed main fragment and the compressed delta fragment while the at least one of the compressed main fragment and the compressed delta fragment is in the main system memory, wherein the providing snapshot isolation comprises;
making the row visible for allowing changes during concurrent performing of the operations comprising a plurality of lockless read and/or write operations on the compressed delta fragment, the visibility of the row based on the current system timestamp corresponding to when the operations began, the making of the row visible comprising setting the at least one bit to identify the row as visible; and
generating a new system timestamp when the operations commit, such that the new system timestamp becomes a commit identifier for the row; and
wherein a transaction attempting to read rows in a fragment establishes the visibility of each row, optimally for data set reads with varying granularity levels ranging from single row to the whole table, by;
comparing a base timestamp of a consistent view of the transaction with the MVCC information for the row;
orcomparing a control block of the transaction with a referenced transaction control block referred to by the creation or destruction timestamp within the MVCC information.
2 Assignments
0 Petitions
Accused Products
Abstract
As part of a database system comprising a combination of on-disk storage and in-memory storage, a plurality of records that comprise a table are stored in a plurality of fragments that include at least a delta fragment and a main fragment retained in the on-disk storage. Each fragment has visibility data structures to enable multi-version concurrency control. Each fragment can be compressed using dictionary compression and n-bits compression. The fragments are loaded into main system memory in the in-memory storage from the on-disk storage if they are accessed for read operations or write operations and are not already in memory. A plurality of lockless read and write operations are concurrently performed, while providing snapshot isolation, on the at least one of the plurality of fragments while the at least one of the plurality of fragments is in the main system memory.
128 Citations
20 Claims
-
1. A method for implementation by a database system comprising a combination of on-disk storage and in-memory storage, the method comprising:
-
storing, in a plurality of fragments comprising at least a delta fragment and a main fragment retained in the on-disk storage, a plurality of data records that comprise a table, each fragment having visibility data structures to enable multi-version concurrency control (MVCC), the visibility data structures comprising at least one bit identifying a visibility of a row in the table, wherein MVCC information is maintained for each row of each fragment as such rows are inserted, updated, and deleted, the MVCC information comprising at least both of a creation timestamp and a destruction timestamp for each row; compressing the delta fragment and the main fragment using n-bits compression to generate a compressed main fragment and a compressed delta fragment; in response to operations comprising read and/or write operations on the row of the table, loading the compressed main fragment and the compressed delta fragment into main system memory in the in-memory storage from the on-disk storage when the compressed main fragment and the compressed delta fragment are accessed for the operations and are not already in the main system memory; accessing a multi-version concurrency control object to obtain a current system timestamp for the row; concurrently performing the operations on the row, while providing snapshot isolation, on the at least one of the compressed main fragment and the compressed delta fragment while the at least one of the compressed main fragment and the compressed delta fragment is in the main system memory, wherein the providing snapshot isolation comprises; making the row visible for allowing changes during concurrent performing of the operations comprising a plurality of lockless read and/or write operations on the compressed delta fragment, the visibility of the row based on the current system timestamp corresponding to when the operations began, the making of the row visible comprising setting the at least one bit to identify the row as visible; and generating a new system timestamp when the operations commit, such that the new system timestamp becomes a commit identifier for the row; and wherein a transaction attempting to read rows in a fragment establishes the visibility of each row, optimally for data set reads with varying granularity levels ranging from single row to the whole table, by; comparing a base timestamp of a consistent view of the transaction with the MVCC information for the row;
orcomparing a control block of the transaction with a referenced transaction control block referred to by the creation or destruction timestamp within the MVCC information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer program product storing instructions for use by a database system comprising a processor and combination of on-disk storage and in-memory storage, the instructions, which when executed by the database system, result in operations comprising:
-
storing, in a plurality of fragments comprising at least a delta fragment and a main fragment retained in the on-disk storage, a plurality of data records that comprise a table, each fragment having visibility data structures to enable multi-version concurrency control (MVCC), the visibility data structures comprising at least one bit identifying a visibility of a row in the table, wherein MVCC information is maintained for each row of each fragment as such rows are inserted, updated, and deleted, the MVCC information comprising at least both of a creation timestamp and a destruction timestamp for each row; compressing the delta fragment and the main fragment using n-bits compression to generate a compressed main fragment and a compressed delta fragment; in response to operations comprising read and/or write operations on the row of the table, loading the compressed main fragment and the compressed delta fragment into main system memory in the in-memory storage from the on-disk storage when the compressed main fragment and the compressed delta fragment are accessed for the operations and are not already in the main system memory; accessing a multi-version concurrency control object to obtain a current system timestamp for the row; concurrently performing the operations on the row, while providing snapshot isolation, on the at least one of the compressed main fragment and the compressed delta fragment while the at least one of the compressed main fragment and the compressed delta fragment is in the main system memory, wherein the providing snapshot isolation comprises; making the row visible for allowing changes during concurrent performing of the operations comprising a plurality of lockless read and/or write operations on the compressed delta fragment, the visibility of the row based on the current system timestamp corresponding to when the operations began, the making of the row visible comprising setting the at least one bit to identify the row as visible; and generating a new system timestamp when the operations commit, such that the new system timestamp becomes a commit identifier for the row; and wherein a transaction attempting to read rows in a fragment establishes the visibility of each row, optimally for data set reads with varying granularity levels ranging from single row to the whole table, by; comparing a base timestamp of a consistent view of the transaction with the MVCC information for the row;
orcomparing a control block of the transaction with a referenced transaction control block referred to by the creation or destruction timestamp within the MVCC information. - View Dependent Claims (14, 15, 16)
-
-
17. A system comprising:
-
on-disk storage; in-memory storage; and at least one hardware data processor configured to perform operations comprising; storing, in a plurality of fragments comprising at least a delta fragment and a main fragment retained in the on-disk storage, a plurality of data records that comprise a table, each fragment having visibility data structures to enable multi-version concurrency control (MVCC), the visibility data structures comprising at least one bit identifying a visibility of a row in the table, wherein MVCC information is maintained for each row of each fragment as such rows are inserted, updated, and deleted, the MVCC information comprising at least both of a creation timestamp and a destruction timestamp for each row; compressing the delta fragment and the main fragment using n-bits compression to generate a compressed main fragment and a compressed delta fragment; in response to operations comprising read and/or write operations on the row of the table, loading the compressed main fragment and the compressed delta fragment into main system memory in the in-memory storage from the on-disk storage when the compressed main fragment and the compressed delta fragment are accessed for the operations and are not already in the main system memory; accessing a multi-version concurrency control object to obtain a current system timestamp for the row; concurrently performing the operations on the row, while providing snapshot isolation, on the at least one of the compressed main fragment and the compressed delta fragment while the at least one of the compressed main fragment and the compressed delta fragment is in the main system memory, wherein the providing snapshot isolation comprises; making the row visible for allowing changes during concurrent performing of the operations comprising a plurality of lockless read and/or write operations on the compressed delta fragment, the visibility of the row based on the current system timestamp corresponding to when the operations began, the making of the row visible comprising setting the at least one bit to identify the row as visible; and generating a new system timestamp when the operations commit, such that the new system timestamp becomes a commit identifier for the row; and wherein a transaction attempting to read rows in a fragment establishes the visibility of each row, optimally for data set reads with varying granularity levels ranging from single row to the whole table, by; comparing a base timestamp of a consistent view of the transaction with the MVCC information for the row;
orcomparing a control block of the transaction with a referenced transaction control block referred to by the creation or destruction timestamp within the MVCC information. - View Dependent Claims (18, 19, 20)
-
Specification