NORMALIZING DATA FOR FAST SUPERSCALAR PROCESSING
First Claim
1. A computer-implemented method for normalizing an in-memory representation of stored data for faster superscalar processing, the method comprising:
- accessing stored data that includes multiple columns, each column having a data type;
selecting a column in the accessed data to determine an appropriate in-memory representation;
determining a data type of the selected column;
determining whether row data associated with the selected column can be normalized based at least in part on the determined data type of the selected column; and
upon determining that the row data can be normalized, converting the row data associated with the selected column into a normalized data representation,wherein the preceding steps are performed by at least one processor.
2 Assignments
0 Petitions
Accused Products
Abstract
A data normalization system is described herein that represents multiple data types that are common within database systems in a normalized form that can be processed uniformly to achieve faster processing of data on superscalar CPU architectures. The data normalization system includes changes to internal data representations of a database system as well as functional processing changes that leverage normalized internal data representations for a high density of independently executable CPU instructions. Because most data in a database is small, a majority of data can be represented by the normalized format. Thus, the data normalization system allows for fast superscalar processing in a database system in a variety of common cases, while maintaining compatibility with existing data sets.
-
Citations
20 Claims
-
1. A computer-implemented method for normalizing an in-memory representation of stored data for faster superscalar processing, the method comprising:
-
accessing stored data that includes multiple columns, each column having a data type; selecting a column in the accessed data to determine an appropriate in-memory representation; determining a data type of the selected column; determining whether row data associated with the selected column can be normalized based at least in part on the determined data type of the selected column; and upon determining that the row data can be normalized, converting the row data associated with the selected column into a normalized data representation, wherein the preceding steps are performed by at least one processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer system for storing and processing data in a manner that encourages parallel processing by one or more superscalar processors, the system comprising:
-
a processor and memory configured to execute software instructions; a data storage component configured to store database data persistently between sessions of use of the system; a data normalization component configured to retrieve data stored by the data storage component and to load the retrieved data into memory in a normalized data representation that allows fast superscalar processing; an operation manager configured to manage requests to perform database operations on stored database data; a batch assembly component configured to identify batches of data that have control flow and data independence such that the batch includes parallelizable operations; an outlier identification component configured to identify data values in a batch of data that cannot be performed by a fast processing path that performs efficient superscalar processing; a fast operation component configured to provide instructions to a superscalar processor in a manner that allows parallel execution of the instructions by multiple functional units of the superscalar processor; a slow operation component configured to perform database operations on data within a batch that is not stored in the normalized data representation; and a result processing component configured to gather results from the fast operation component and slow operation component and return the results to an operation requestor. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A computer-readable storage medium comprising instructions for controlling a computer system to perform an operation on a tabular set of data having rows and columns, wherein the instructions, when executed, cause a processor to perform actions comprising:
-
identifying a batch of operations that can be executed in parallel by a superscalar processor; identifying zero or more non-normalized rows of data associated with the batch of operations; submitting the identified batch of operations that involve normalized rows of data to the superscalar processor for parallel processing; submitting the identified batch of operations that involve identified non-normalized rows of data for processing; and reporting results of performing the batch of operations to a requestor of the operations. - View Dependent Claims (17, 18, 19, 20)
-
Specification