DETERMINATION OF SAMPLING CHARACTERISTICS BASED ON AVAILABLE MEMORY
First Claim
1. A method of importing a portion of data records of a full input data set into memory of a computer system for processing by an executing application, wherein the full input data set includes data records of a dimensionally-modeled fact collection, the method comprising:
- determining an amount of the data of the full input set to import based on an amount of available memory of the computer system;
based on the determined amount of the data to import and on characteristics of the full input data set at least other than the total size of the full input data set, determining sampling characteristics for sampling the full input data set; and
causing a portion of the records of the full input data set to be imported into the memory of the computer system, including sampling the full input data set, to determine the portion of the records to import, in accordance with the determined sampling characteristics, wherein the sampling characteristics are determined such that analysis as a result of processing by the executing application of the sampled portion of the records imported is representative of the analysis that could otherwise be carried out on the full input data set, with a calculable statistical relevance.
5 Assignments
0 Petitions
Accused Products
Abstract
A portion of data records of a full input data set are imported into memory of a computer system for processing by an executing application. The full input data set includes data records of a dimensionally-modeled fact collection. An amount of the data of the full input set to import is determined based on an amount of available memory of the computer system. The sampling characteristics for sampling the full input data set are determined based on the amount of the data that can be imported and on characteristics of the full input data set and application involved. The full input data set is then sampled and a portion of the records are imported into the memory of the computer system for processing. The sampling characteristics are determined such that analysis as a result of processing by the executing application of the sampled portion of the records imported is representative of the analysis that could otherwise be carried out on the full input data set, with a calculable statistical relevance.
10 Citations
24 Claims
-
1. A method of importing a portion of data records of a full input data set into memory of a computer system for processing by an executing application, wherein the full input data set includes data records of a dimensionally-modeled fact collection, the method comprising:
-
determining an amount of the data of the full input set to import based on an amount of available memory of the computer system; based on the determined amount of the data to import and on characteristics of the full input data set at least other than the total size of the full input data set, determining sampling characteristics for sampling the full input data set; and causing a portion of the records of the full input data set to be imported into the memory of the computer system, including sampling the full input data set, to determine the portion of the records to import, in accordance with the determined sampling characteristics, wherein the sampling characteristics are determined such that analysis as a result of processing by the executing application of the sampled portion of the records imported is representative of the analysis that could otherwise be carried out on the full input data set, with a calculable statistical relevance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method of importing a portion of data records of a full input data set into memory of a computer system for processing by an executing application, wherein the full input data set includes data records of a dimensionally-modeled fact collection, the method comprising:
-
determining a nominal amount of the data of the full input set to import based on an amount of available memory of the computer system; based on the determined amount of the data to import and on characteristics of the full input data set at least other than the total size of the full input data set, determining sampling characteristics for sampling the full input data set; determining a characteristic associated with the determined amount of data; adjusting the determined sampling characteristics based on a user-provided indication of a desired characteristic different from the determined characteristic associated with the determined amount of data; causing a portion of the records of the full input data set to be imported into the memory of the computer system, including sampling the full input data set, to determine the portion of the records to import, in accordance with the adjusted sampling characteristics. - View Dependent Claims (11, 12, 13)
-
-
14. A computing device comprising processing circuitry and memory circuitry, wherein the computing device is configured to import an amount of data of a full input data set that is a portion of the full input data set, wherein the full input data set includes data records of a dimensionally-modeled fact collection, wherein the portion of the full input data set to import has been determined by:
-
determining an amount of the data of the full input set to import based on an amount of available memory of the computer system; based on the determined amount of the data to import and on characteristics of the full input data set at least other than the total size of the full input data set, determining sampling characteristics for sampling the full input data set; and sampling the full input data set, to determine the portion of the records to import, in accordance with the determined sampling characteristics, wherein the sampling characteristics are determined such that analysis as a result of processing by the executing application of the sampled portion of the records imported is representative of the analysis that could otherwise be carried out on the full input data set, with a calculable statistical relevance. - View Dependent Claims (15, 16)
-
-
17. A computer program product for importing a portion of data records of a full input data set into memory of a computer system for processing by an executing application, wherein the full input data set includes data records of a dimensionally-modeled fact collection, the computer program product comprising at least one computer-readable medium having computer program instructions stored therein which are operable to cause at least one computing device to:
-
determine an amount of the data of the full input set to import based on an amount of available memory of the computer system; based on the determined amount of the data to import and on characteristics of the full input data set at least other than the total size of the full input data set, determine sampling characteristics for sampling the full input data set; and cause a portion of the records of the full input data set to be imported into the memory of the computer system, including sampling the full input data set, to determine the portion of the records to import, in accordance with the determined sampling characteristics, wherein the sampling characteristics are determined such that analysis as a result of processing by the executing application of the sampled portion of the records imported is representative of the analysis that could otherwise be carried out on the full input data set, with a calculable statistical relevance. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification