System and method for storing stream data in distributed relational tables with data provenance
First Claim
1. A method for storing and querying data, comprising the steps of:
- representing data-elements in a hyper-table stored in a table-store, wherein the data-elements are allocated to data-blocks stored in the table-store, wherein the table-store is located on a distributed device, wherein the hyper-table comprises;
(i) hyper-rows representing the data-elements allocated to the data-blocks, (ii) at least one hyper-column associated with an attribute of the corresponding data-elements, and (iii) hyper-cells having data-values of the corresponding data-elements;
generating indices of the hyper-rows and the at least one hyper-column for the corresponding data-elements, wherein the indices are located on the distributed device and the data-values of the corresponding data-elements in the hyper-cells are capable of being retrieved based on the indices;
storing a pre-determined time period on the distributed device;
generating snapshots of the indices at each occurrence of the pre-determined time-period, wherein the snapshots comprise the indices as the indices existed at the occurrence of the pre-determined time-period;
generating checkpoints of the hyper-table at each occurrence of the pre-determined time-period; and
selecting one of a plurality of occurrences of the pre-determined time period and querying the snapshots of the indices and the checkpoints of the hyper-table to return query results based on the state of the indices and the hyper-table as of the selected occurrence of the pre-determined time-period,wherein said distributed device comprises a processor, a random-access memory, and a network interface connected to a network and is connected to a plurality of remote distributed devices, each of which comprises a processor, a random-access memory and a network interface connected to the network and is specially configured to store at least a portion of the data elements and at least a portion of the indices.
8 Assignments
0 Petitions
Accused Products
Abstract
A system, a method and a computer readable medium for storing data elements and related data provenance information. The data elements may be represented in a hyper-table having rows and columns which may be indexed. The data-values of the corresponding data-elements in the hyper-cells may be retrieved based on the indices. Snapshots of the indices may be generated at pre-determined time periods. Checkpoints of the hyper-table may be generated at time periods that are based on transactions on the hyper-table. The hyper-table is capable of being queried as the hyper-table existed at certain time-periods, and data-values of the data-elements may be retrieved as the data-elements existed at such time-periods.
43 Citations
32 Claims
-
1. A method for storing and querying data, comprising the steps of:
-
representing data-elements in a hyper-table stored in a table-store, wherein the data-elements are allocated to data-blocks stored in the table-store, wherein the table-store is located on a distributed device, wherein the hyper-table comprises;
(i) hyper-rows representing the data-elements allocated to the data-blocks, (ii) at least one hyper-column associated with an attribute of the corresponding data-elements, and (iii) hyper-cells having data-values of the corresponding data-elements;generating indices of the hyper-rows and the at least one hyper-column for the corresponding data-elements, wherein the indices are located on the distributed device and the data-values of the corresponding data-elements in the hyper-cells are capable of being retrieved based on the indices; storing a pre-determined time period on the distributed device; generating snapshots of the indices at each occurrence of the pre-determined time-period, wherein the snapshots comprise the indices as the indices existed at the occurrence of the pre-determined time-period; generating checkpoints of the hyper-table at each occurrence of the pre-determined time-period; and selecting one of a plurality of occurrences of the pre-determined time period and querying the snapshots of the indices and the checkpoints of the hyper-table to return query results based on the state of the indices and the hyper-table as of the selected occurrence of the pre-determined time-period, wherein said distributed device comprises a processor, a random-access memory, and a network interface connected to a network and is connected to a plurality of remote distributed devices, each of which comprises a processor, a random-access memory and a network interface connected to the network and is specially configured to store at least a portion of the data elements and at least a portion of the indices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A system for storing data, comprising:
-
a distributed device, a table-store on said distributed device adapted to store data-blocks, the table-store having data-elements allocated to the data-blocks, the data-elements represented in a hyper-table, the hyper-table comprises;
(i) hyper-rows representing the data-elements allocated to the data-blocks, (ii) at least one hyper-column associated with an attribute of the corresponding data-elements, and (iii) hyper-cells having data-values of the corresponding data-elements;indices generated by said system of the hyper-rows and the at least one hyper-column for the corresponding data-elements; a pre-determined time period stored on the distributed device; snapshots generated by said system of the indices at each occurrence of the pre-determined time-period, the snapshots comprising the indices as the indices existed at the occurrence of the pre-determined time-period, the snapshots generated at each occurrence of the pre-determined time-period; checkpoints generated by said system of the hyper-table as stored on the table-store at each occurrence of the pre-determined time-period; and selections of one of a plurality of occurrences of the pre-determined time period, whereby a query of the snapshots of the indices and the checkpoints of the hyper-table may return query results based on the state of the indices and the hyper-table as of the selected occurrence of the pre-determined time-period, wherein said distributed device comprises a processor, a random-access memory, and a network interface connected to a network and is connected to a plurality of remote distributed devices, each of which comprises a processor, a random-access memory and a network interface connected to the network and is specially configured to store at least a portion of the data elements and at least a portion of the indices.
-
-
32. A non-transitory computer readable medium having computer readable instructions stored thereon for execution by a processor, wherein the instructions on the non-transitory computer readable medium are adapted to:
-
represent data-elements in a hyper-table stored in a table-store, wherein the data-elements are allocated to data-blocks stored in the table-store, wherein the table-store is located on a distributed device, wherein the hyper-table comprises;
(i) hyper-rows representing the data-elements allocated to the data-blocks, (ii) at least one hyper-column associated with an attribute of the corresponding data-elements, and (iii) hyper-cells having data-values of the corresponding data-elements;store a pre-determined time period on the distributed device; generate indices of the hyper-rows and the at least one hyper-column for the corresponding data-elements, wherein the indices are located on the distributed device; generate snapshots of the indices at each occurrence of the pre-determined time-period, wherein the snapshots comprise the indices as the indices existed at the occurrence of the pre-determined time-period; generate checkpoints of the hyper-table at each occurrence of the pre-determined time-period; and select one of a plurality of occurrences of the pre-determined time period and query the snapshots of the indices and the checkpoints of the hyper-table to return query results based on the state of the indices and the hyper-table as of the selected occurrence of the pre-determined time-period, wherein said distributed device comprises a processor, a random-access memory, and a network interface connected to a network and is connected to a plurality of remote distributed devices, each of which comprises a processor, a random-access memory and a network interface connected to the network and is specially configured to store at least a portion of the data elements and at least a portion of indices.
-
Specification