Distributed data store for hierarchical data
First Claim
1. A computer-implemented method for query processing, comprising:
- under the control of one or more computer systems configured with executable instructions,receiving user input data;
storing the input data in one or more storage nodes by at least;
storing a first portion of the user input data in an append data store as a result of having insufficient computing capacity to perform one or more optimization operations;
extracting a second portion of the user input data from the append data store and performing the one or more optimization operations on the second portion of the input data to create optimized data as a result of regaining sufficient computing capacity to perform the one or more optimization operations; and
storing the optimized data in an optimized data store;
receiving a user query and a completion threshold based at least in part on information generated by the user;
performing a search for one or more records responsive to the user query, using a filter, on the optimized data store;
obtaining sufficient records to satisfy the completion threshold by at least;
if the search on the optimized data store obtained sufficient records to satisfy the completion threshold, returning a result of the search in response to the query; and
if the search on the optimized data store did not obtain sufficient records to satisfy the completion threshold, performing a second search for the one or more records response to the user query, using the filter, on the append data store; and
providing the one or more records.
1 Assignment
0 Petitions
Accused Products
Abstract
A computing resource service provider may store user data in a distributed data storage system. The distributed data storage system may contain one or more storage nodes configured to store hierarchical data in one or more data stores such as a column data store. Data in the data stores may be compressed or otherwise encoded, by a storage optimizer, in order to reduce that redundancy in the hierarchical data stored in the one or more data stores. Responses to user queries may be fulfilled based at least in part on data stored in the one or more data stores. A query processor may scan multiple different data stores across various storage nodes in order to obtain items responsive to the user query.
23 Citations
20 Claims
-
1. A computer-implemented method for query processing, comprising:
under the control of one or more computer systems configured with executable instructions, receiving user input data; storing the input data in one or more storage nodes by at least; storing a first portion of the user input data in an append data store as a result of having insufficient computing capacity to perform one or more optimization operations; extracting a second portion of the user input data from the append data store and performing the one or more optimization operations on the second portion of the input data to create optimized data as a result of regaining sufficient computing capacity to perform the one or more optimization operations; and storing the optimized data in an optimized data store; receiving a user query and a completion threshold based at least in part on information generated by the user; performing a search for one or more records responsive to the user query, using a filter, on the optimized data store; obtaining sufficient records to satisfy the completion threshold by at least; if the search on the optimized data store obtained sufficient records to satisfy the completion threshold, returning a result of the search in response to the query; and if the search on the optimized data store did not obtain sufficient records to satisfy the completion threshold, performing a second search for the one or more records response to the user query, using the filter, on the append data store; and providing the one or more records. - View Dependent Claims (2, 3, 4)
-
5. A system, comprising:
-
one or more processors; one or more storage nodes comprising, a first storage node, the first storage node comprising a first data store and a second data store, the first data store comprising data items retrieved from and removed from the second data store and processed for storage in the first data store, the second data store comprising preprocessed data items stored in the second data store as a result of the system having insufficient computing capacity at a time of receiving the preprocessed data items to processed the data items; and memory with instructions that, when executed by the one or more processors, cause the system to; generate a filter based at least in part on one or more query terms included in a received user query; perform a first search for one or more data items responsive to the user query, using the filter, on the first data store; obtain sufficient data items to satisfy a threshold by at least; if the first search on the first data store obtained sufficient processed data items to satisfy the threshold, returning a result of the search in response to the query; and if the search on the first data store did not obtain sufficient processed data items to satisfy the threshold, performing a second search for the one or more data items in response to the user query, using the filter, on preprocessed data items contained in the second data store; and provide the one or more data items. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage medium having collectively stored thereon executable instructions that, when executed by one or more processors of a computer system, cause the computer system to at least:
-
receive a query and a value; obtain responses to the query satisfying the value by at least; performing a first search on a first data store, of a storage node, for one or more records responsive to the query, the first data store containing encoded data such that at least a portion of redundancy in data is removed to create the encoded data, the encoded data created as a result of the storage node having sufficient capacity to encode data obtained from a second data store of the storage node; and if the search does not return sufficient responses to satisfy the value, performing a second search on the second data store, of the storage node, for the one or more records responsive to the query the second data store contacting data; and return the one or more records. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification