Computer method, apparatus and programmed medium for approximating large databases and improving search efficiency
First Claim
1. A computer based method for producing a representation of a distribution of data stored as a set of data points in a computer database, said representation comprising fewer data points than said data distribution in said database to allow a user to more efficiently analyze said database, said method comprising the steps of:
- a) partitioning said data distribution within said database into a plurality of regions, wherein a maximum number of said plurality of regions is specified by said user;
b) producing for each of said regions a new set of data which approximates the data contents of each of said regions, said new set of data having less data points than that of the data points present in each of said regions, said approximation resulting in an error of approximation for each of said plurality of regions;
c) combining said error of approximation for each of said plurality of regions using a norm to produce an approximation with a total error of said data distribution;
d) repeating steps (a) through (c) until a minimum value of said total error is determined for said maximum number of said plurality of regions as specified by said user; and
e) storing said produced approximation which is associated with said minimum value of said total error of said data distribution.
3 Assignments
0 Petitions
Accused Products
Abstract
A novel and unique method, apparatus and programmed storage medium for approximating large data distributions of a database in order to allow a user to accurately analyze the entire data distribution using a limited amount of memory space in a reasonable amount of time. The approximation is based on partitioning the data domain into a number of regions, approximating each region using any well-known technique, and composing the errors of approximation to optimize suitable criteria for approximating the entire data distribution.
38 Citations
54 Claims
-
1. A computer based method for producing a representation of a distribution of data stored as a set of data points in a computer database, said representation comprising fewer data points than said data distribution in said database to allow a user to more efficiently analyze said database, said method comprising the steps of:
-
a) partitioning said data distribution within said database into a plurality of regions, wherein a maximum number of said plurality of regions is specified by said user; b) producing for each of said regions a new set of data which approximates the data contents of each of said regions, said new set of data having less data points than that of the data points present in each of said regions, said approximation resulting in an error of approximation for each of said plurality of regions; c) combining said error of approximation for each of said plurality of regions using a norm to produce an approximation with a total error of said data distribution; d) repeating steps (a) through (c) until a minimum value of said total error is determined for said maximum number of said plurality of regions as specified by said user; and e) storing said produced approximation which is associated with said minimum value of said total error of said data distribution. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer based method for producing a representation of a distribution of data stored as a set of data points in a computer database, said representation comprising fewer data points than said distribution in said database to allow a user to more efficiently analyze said database, said method comprising the steps of:
-
a) partitioning said data distribution within said database into a plurality of regions, wherein a maximum number of said plurality of regions is specified by said user; b) producing for each of said regions a new set of data which approximates the data contents of each of said regions, said new set of data having less data points than that of the data points present in each of said regions, said approximation resulting in an error of approximation for each of said plurality of regions; c) combining said error of approximation for each of said plurality of regions using a norm to produce an approximation with a total error of said data distribution; d) repeating steps (a) through (c) until a minimum value of said total error is determined for said maximum number of said plurality of regions as specified by said user; e) storing said produced approximation which is associated with said minimum value of said total error of said data distribution; and f) using said stored produced approximation to construct a histogram of said database. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A computer readable storage medium containing a computer readable code for operating a computer to perform a method for producing a representation of a distribution of data stored as data points in a computer database, said representation comprising fewer data points than said data distribution in said database to allow a user to more efficiently analyze said database, said method comprising the steps of:
-
a) partitioning said data distribution within said database into a plurality of regions, wherein a maximum number of said plurality of regions is specified by said user; b) producing for each of said regions a new set of data which approximates the data contents of each of said regions, said new set of data having less data points than that of the data points present in each of said regions, said approximation resulting in an error of approximation for each of said plurality of regions; c) combining said error of approximation for each of said plurality of regions using a norm to produce an approximation with a total error of said data distribution; d) repeating steps (a) through (c) until a minimum value of said total error is determined for said maximum number of said plurality of regions as specified by said user; and e) storing said produced approximation which is associated with said minimum value of said total error of said data distribution. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
-
-
42. A programmed computer database approximating system, said system comprising:
-
a computer operating a program, said program comprising; means for partitioning a data distribution within a computer database stored on a computer readable medium into a plurality of regions wherein a maximum number of said regions is specified by a user, said computer database including a set of data points; means for producing for each of said regions a new set of data which approximates the data in each of said regions, said new set of data having less data points than that of the data points in each of said regions, said approximation resulting in an error of approximation for each of said regions; means for combining said error of approximation for each of said plurality of regions using a norm to produce an approximation with a total error of said data distribution; means for determining when a minimum value for said total error is produced for said maximum number of regions as specified by said user; and means for storing said produced approximation of said data distribution which is associated with said minimum total error. - View Dependent Claims (43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54)
-
Specification