Method of generating attribute cardinality maps
First Claim
1. A computer-implemented method for estimating the computational efficiency of a search within a database comprising the steps of:
- a. generating a histogram of data, comprising the steps of;
i. providing a data set representing a plurality of elements from the database and a value associated with each element, the data set having a property defining an order of the elements therein;
ii. determining at least one range, each of the at least one range having at least an element, an arithmetic mean of each range equal to the arithmetic mean of the values associated with the at least an element within said range, a specific range from the at least one range comprising a plurality of elements from the data set adjacent each other within the defined order, wherein the arithmetic mean of the specific range is within a predetermined maximum distance from a value associated with an element within the specific range, the predetermined maximum distance being independent of the number of elements within the specific range and their associated values;
iii. determining at least a value related to an estimate of a value associated with an element within the range; and
iv. for each range storing said value related to an estimate of a value associated with an element within the range and data relating to the size and location of the range, to provide a histogram of said value associated with an element within the range and data relating to the size and location of the range; and
b. using the histogram of data for estimating the computational efficiency of a search within a database.
0 Assignments
0 Petitions
Accused Products
Abstract
This invention provides a novel means for creating a histogram for use in minimizing response time and resource consumption when optimizing a query in a database, and other like structures, the histogram being created by placing ordered elements into specific range until the next element to be considered for inclusion in the range is a predetermined distance from the (generalized) mean value associated with the elements within the range, whereupon that next element is placed in the following range. Similarly, the following ranges are closed when the next element to be considered for inclusion in the range is greater than a predetermined distance from the (generalized) mean value associated with the elements in that range, whereupon that next element is placed in the following range. For each range, the location and size of the range is recorded with, for example, the mean value, the slope or other attribute characterizing one or more elements in the range. The invention has also applications in pattern recognition, message routing, and in actuarial sciences.
297 Citations
61 Claims
-
1. A computer-implemented method for estimating the computational efficiency of a search within a database comprising the steps of:
-
a. generating a histogram of data, comprising the steps of;
i. providing a data set representing a plurality of elements from the database and a value associated with each element, the data set having a property defining an order of the elements therein;
ii. determining at least one range, each of the at least one range having at least an element, an arithmetic mean of each range equal to the arithmetic mean of the values associated with the at least an element within said range, a specific range from the at least one range comprising a plurality of elements from the data set adjacent each other within the defined order, wherein the arithmetic mean of the specific range is within a predetermined maximum distance from a value associated with an element within the specific range, the predetermined maximum distance being independent of the number of elements within the specific range and their associated values;
iii. determining at least a value related to an estimate of a value associated with an element within the range; and
iv. for each range storing said value related to an estimate of a value associated with an element within the range and data relating to the size and location of the range, to provide a histogram of said value associated with an element within the range and data relating to the size and location of the range; and
b. using the histogram of data for estimating the computational efficiency of a search within a database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A computer-implemented method comprising the steps of:
-
a. generating a histogram of data, comprising the steps of;
i. providing a data set representing a plurality of elements from the database and a value associated with each element, the data set having a property defining an order of the elements therein;
ii. determining at least one range, each of the at least one range having at least an element, an arithmetic mean of each range equal to the arithmetic mean of the values associated with the at least an element within said range, a specific range from the at least one range comprising a plurality of elements from the data set adjacent each other within the defined order, wherein the arithmetic mean of the specific range is within a predetermined maximum distance from a value associated with an element within the specific range, the predetermined maximum distance independent of the number of elements within the specific range and their associated values;
iii. determining at least a value related to an estimate of a value associated with an element within the range; and
iv. for each range storing at least a value related to an estimate of a value associated with an element within the range and at least data relating to the size and location of the range;
b. determining a routing table in dependence upon the histogram; and
,c. estimating a value for use in network routing in dependence upon the routing table.
-
-
27. A computer-implemented method comprising the steps of:
-
a. generating a histogram of data, comprising the steps of;
i. providing a data set representing a plurality of elements from the database and a value associated with each element, the data set having a property defining an order of the elements therein;
ii. determining at least one range, each of the at least one range having at least an element, an arithmetic mean of each range equal to the arithmetic mean of the values associated with the at least an element within said range, a specific range from the at least one range comprising a plurality of elements from the data set adjacent each other within the defined order, wherein the arithmetic mean of the specific range is within a predetermined maximum distance from a value associated with an element within the specific range, the predetermined maximum distance independent of the number of elements within the specific range and their associated values;
iii. determining at least a value related to an estimate of a value associated with an element within the range; and
iv. for each range storing at least a value related to an estimate of a value associated with an element within the range and at least data relating to the size and location of the range;
b. providing the mean associated with the range as an estimate of a value associated with an element within said range; and
c. using the estimate for determining an approach to searching within a plurality of different databases given a predetermined limited time for conducting the search, wherein the approach is selected to approximately maximise the probability of successfully completing the search.
-
-
28. A computer-implemented method comprising the steps of:
-
a. generating a histogram of data, comprising the steps of;
i. providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
ii. determining a plurality of ranges, each of the plurality of ranges having at least an element, a known statistical correlation existing between values associated with elements in a same range, some ranges from the plurality of ranges comprising a plurality of elements from the data set adjacent each other within the defined order, the statistical correlation for those ranges indicative of a maximum error between an estimated value associated with an element within the range and the value associated with the element, the maximum error other than the difference between the total area of the range and the estimated value, wherein an area of each range is equal to the product of the arithmetic mean of said range and the number of elements within said range;
iii. determining at least a value related to an estimate of a value associated with an element within the range; and
iv. for each range storing at least a value related to an estimate of a value associated with an element within the range and at least data relating to the size and location of the range; and
b. using the histogram of data for estimating the computational efficiency of a search within a database. - View Dependent Claims (29, 30)
-
-
31. A computer-implemented method comprising the steps of:
-
a. generating a histogram of data, comprising the steps of;
i. providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
ii. determining at least one range having a length such that the value associated with at least one element within the range is within a predetermined maximum distance of at least one other element within the range;
iii. determining at least a value related to an estimate of a value associated with an element within the range; and
iv. for each range storing at least a value related to an estimate of a value associated with an element within the range and at least data relating to the size and location of the range; and
b. using the histogram of data for estimating the computational efficiency of a search within a database.
-
-
32. A computer-implemented method comprising the steps of:
-
a. generating a histogram of data, comprising the steps of;
i. providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
ii. determining at least one range having a length such that the value associated with every element within the range is within a predetermined maximum distance of every other element within the range;
iii. determining at least a value related to an estimate of a value associated with an element within the range; and
iv. for each range storing at least a value related to an estimate of a value associated with an element within the range and at least data relating to the size and location of the range; and
b. using the histogram of data for estimating the computational efficiency of a search within a database.
-
-
33. A computer readable program storage medium, the medium embodying one or more instructions executable by the computer to perform method steps, the method comprising the steps of:
-
a. generating a histogram of data, comprising the steps of;
i. providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
ii. determining at least one range, each of the at least one range having at least an element, an arithmetic mean of each range equal to the arithmetic mean of the values associated with the at least an element within said range, a specific range from the at least a range comprising a plurality of elements from the data set adjacent each other within the defined order, wherein the arithmetic mean of the specific range is within a predetermined maximum distance from a value associated with an element within the specific range, the predetermined maximum distance independent of the number of elements within the specific range and their associated values;
iii. determining at least a value related to an estimate of a value associated with an element within the range; and
iv. storing for each range at least a value related to the mean and at least data relating to the size and location of the range; and
b. using the histogram of data for estimating the computational efficiency of a search within a database.
-
-
34. A computer-implemented method comprising the steps of:
-
a. generating a histogram of data, comprising the steps of;
i. providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
ii. determining a range within the data set, the range comprising a plurality of elements from the data set and adjacent within the order; and
,iii. storing a plurality of values indicative of a straight line defining an approximate upper boundary of the values associated with each element within the range, the straight line indicating different values for different elements within the range; and
b. using the histogram of data for estimating the computational efficiency of a search within a database. - View Dependent Claims (35, 36, 37, 38, 39, 40, 41, 42, 43, 52, 59)
-
-
44. A computer-implemented method comprising the steps of:
-
a. generating a histogram of data, comprising the steps of;
i. providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
ii. providing a range within the data set, the range comprising a plurality of elements from the data set and adjacent within the order;
iii. determining a straight line indicating different values for different elements within the range; and
,iv. storing a plurality of values indicative of a straight line defining an approximate upper boundary of the values associated with each element within the range, the straight line indicating different values for different elements within the range; and
b. using the histogram of data for estimating the computational efficiency of a search within a database. - View Dependent Claims (45, 46, 47, 48, 49, 50, 51, 54, 55, 56, 60)
-
-
57. A computer-implemented method comprising the steps of:
-
a. generating a histogram of data, comprising the steps of;
i. providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
ii. providing a range within the data set, the range comprising a plurality of elements from the data set and adjacent within the order;
iii. determining a straight line indicating different values for different elements within the range; and
,iv. storing a plurality of values indicative of a straight line defining an approximate upper boundary of the values associated with each element within the range, the straight line indicating different values for different elements within the range;
b. determining a routing table in dependence upon the histogram; and
,c. determining an estimate of a value within the routing table for determining a network routing. - View Dependent Claims (53)
-
-
58. A computer-implemented method comprising the steps of:
-
a. generating a histogram of data, comprising the steps of;
i. providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
ii. providing a range within the data set, the range comprising a plurality of elements from the data set and adjacent within the order;
iii. determining a straight line indicating different values for different elements within the range; and
,iv. storing a plurality of values indicative of a straight line defining an approximate upper boundary of the values associated with each element within the range, the straight line indicating different values for different elements within the range;
b. estimating a value associated with an element based on the location of the straight line at the element; and
c. using the estimate for determining an approach to searching within a plurality of different databases given a predetermined limited time for conducting the search, wherein the approach is determined to approximately maximise the probability of successfully completing the search.
-
-
61. A computer readable program storage medium, the medium embodying one or more instructions executable by the computer to perform method steps, the method comprising the steps of:
-
a. generating a histogram of data, comprising the steps of;
i. providing a data set representing a plurality of elements and a value associated with each element, the data set having a property defining an order of the elements therein;
ii. providing a range within the data set, the range comprising a plurality of elements from the data set and adjacent within the order;
iii. determining a straight line indicating different values for different elements within the range; and
,iv. storing a plurality of values indicative of a straight line defining an approximate upper boundary of the values associated with each element within the range, the straight line indicating different values for different elements within the range and b. using the histogram of data for estimating the computational efficiency of a search within a database.
-
Specification