Systems and methods for data storage and retrieval using algebraic optimization
First Claim
1. A computer system implemented method for providing a requested data set, the computer system comprising at least one processor, memory and a data store, the method comprising:
- providing a data set information store in the memory;
storing identifiers for a plurality of data sets in the data set information store;
providing a relation store in the memory;
storing a plurality of algebraic relations defining relationships between the plurality of data sets, each of the algebraic relations in the plurality of algebraic relations comprising a respective first expression including a symbolic representation of at least a first respective data set, a respective second expression including a symbolic representation of at least a second respective data set; and
a relational operator symbolically defining a mathematical relationship between the respective first expression and the respective second expression;
receiving a plurality of statements that each reference at least one respective data set identified in the data set information store;
using the statements to define additional data sets;
storing identifiers for the additional data sets in the data set information store;
using the statements to compose a plurality of additional algebraic relations that each define a relationship between at least one of the additional data sets and at least one other data set identified in the data set information store;
receiving a request for the requested data set;
using at least some of the plurality of additional algebraic relations to compose a plurality of collections of algebraic relations that define a result equal to the requested data set, including generating new algebraic relations that were not previously available at the time the requested data set is first requested;
determining a cost for each of the plurality of collections of algebraic relations, wherein the cost is based, at least in part, on an estimate of the transfer time required to retrieve the data sets from the data store required to calculate the requested data set from the collection of algebraic relations;
selecting the collection of algebraic relations with the lowest cost; and
using the selected collection of algebraic relations to provide the requested data set.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for storing and accessing data. A relation store may be used to store algebraic relations between data sets. Alternative collections of algebraic relations may be generated and evaluated to determine an optimized collection of algebraic relations to use in calculating and providing a requested data set. Optimization criteria may be based on an estimate of the amount of data required to be transferred and/or the amount of time required to transfer data sets from storage in order to calculate the collection of algebraic relations. The optimization criteria may distinguish among equivalent data sets containing the same logical data in different physical formats or in different locations. The optimization may be performed using the algebraic relations rather than retrieving underlying data sets from storage. As a result, optimization may be performed at processor speeds to minimize the amount of time required for data to be retrieved.
58 Citations
18 Claims
-
1. A computer system implemented method for providing a requested data set, the computer system comprising at least one processor, memory and a data store, the method comprising:
-
providing a data set information store in the memory; storing identifiers for a plurality of data sets in the data set information store; providing a relation store in the memory; storing a plurality of algebraic relations defining relationships between the plurality of data sets, each of the algebraic relations in the plurality of algebraic relations comprising a respective first expression including a symbolic representation of at least a first respective data set, a respective second expression including a symbolic representation of at least a second respective data set; and
a relational operator symbolically defining a mathematical relationship between the respective first expression and the respective second expression;receiving a plurality of statements that each reference at least one respective data set identified in the data set information store; using the statements to define additional data sets; storing identifiers for the additional data sets in the data set information store; using the statements to compose a plurality of additional algebraic relations that each define a relationship between at least one of the additional data sets and at least one other data set identified in the data set information store; receiving a request for the requested data set; using at least some of the plurality of additional algebraic relations to compose a plurality of collections of algebraic relations that define a result equal to the requested data set, including generating new algebraic relations that were not previously available at the time the requested data set is first requested; determining a cost for each of the plurality of collections of algebraic relations, wherein the cost is based, at least in part, on an estimate of the transfer time required to retrieve the data sets from the data store required to calculate the requested data set from the collection of algebraic relations; selecting the collection of algebraic relations with the lowest cost; and using the selected collection of algebraic relations to provide the requested data set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer system implemented method for storing a specified data set, the computer system comprising at least one processor, memory and a data store, the method comprising:
-
providing a data set information store in the memory; storing identifiers for a plurality of data sets in the data set information store; providing a relation store in the memory; storing a plurality of algebraic relations defining relationships between a plurality of data sets, each of the algebraic relations in the first plurality of algebraic relations comprising a respective first expression including a symbolic representation of at least a first respective data set, a respective second expression including a symbolic representation of at least a second respective data set; and
a relational operator symbolically defining a mathematical relationship between the respective first expression and the respective second expression;receiving a plurality of statements that each reference at least one respective data set identified in the data set information store; using the statements to define additional data sets; storing identifiers for the additional data sets in the data set information store; using the statements to compose a plurality of additional algebraic relations that each define a relationship between at least one of the additional data sets and at least one other data set identified in the data set information store; receiving a request for the specified data set; using at least some of the plurality of additional algebraic relations from the relation store to compose a plurality of collections of algebraic relations that define a result equal to the specified data set, including generating new algebraic relations that were not previously available at the time the specified data set is first requested; determining a cost for each of the plurality of collections of algebraic relations, wherein the cost is based, at least in part, on an estimate of the transfer time required to retrieve the data sets from the data store required to calculate the requested data set from the collection of algebraic relations; selecting the collection of algebraic relations with the lowest cost; and using the selected collection of algebraic relations to calculate the specified data set; and storing the specified data set that has been calculated from the selected collection of algebraic relations in the data store. - View Dependent Claims (18)
-
Specification