SQI-based automated, adaptive, histogram bin data description assist
First Claim
1. A computer-implemented system for performing data mining applications, comprising:
- (a) a computer having one or more data storage devices connected thereto, wherein a relational database is stored on one or more of the data storage devices;
(b) a relational database management system, executed by the computer, for accessing the relational database stored on the data storage devices; and
(c) an analytic application programming interface (API), executed by the computer, that generates an automated, adaptive, histogram bin data description assist function performed directly within the relational database management system, (d) wherein the automated, adaptive, histogram bin data description assist function counts a number of occurrences of values in value ranges for a numeric data element in a column of a table stored in the relational database, (e) wherein the automated, adaptive, histogram bin data description assist function creates a plurality of bins, and each of the bins stores a selected range of values for the numeric data element; and
(e) wherein the automated, adaptive, histogram bin data description assist function accepts one or more parameters selected from a group comprising;
a table name, a name of a numeric column in the table, a desired number of equal sized bins, a frequency percentage above which a value of the numeric data element should be treated as a spike, a percentage above which a bin should be further subdivided into sub-bins, a WHERE clause that reduces a beginning and ending range of the bins, and a WHERE clause that filters rows.
2 Assignments
0 Petitions
Accused Products
Abstract
A method, apparatus, and article of manufacture for performing data mining applications in a massively parallel relational database management system. A scalable data mining function comprising an automated, adaptive, histogram bin data description assist function is instantiated and parameterized via an analytic application programming interface (API). The automated, adaptive, histogram bin data description assist function comprises a query that is performed directly within the relational database management system, wherein the automated, adaptive, histogram bin data description assist function counts a number of occurrences of values in value ranges for a numeric data element in a column of a table stored in the relational database.
-
Citations
72 Claims
-
1. A computer-implemented system for performing data mining applications, comprising:
-
(a) a computer having one or more data storage devices connected thereto, wherein a relational database is stored on one or more of the data storage devices;
(b) a relational database management system, executed by the computer, for accessing the relational database stored on the data storage devices; and
(c) an analytic application programming interface (API), executed by the computer, that generates an automated, adaptive, histogram bin data description assist function performed directly within the relational database management system, (d) wherein the automated, adaptive, histogram bin data description assist function counts a number of occurrences of values in value ranges for a numeric data element in a column of a table stored in the relational database, (e) wherein the automated, adaptive, histogram bin data description assist function creates a plurality of bins, and each of the bins stores a selected range of values for the numeric data element; and
(e) wherein the automated, adaptive, histogram bin data description assist function accepts one or more parameters selected from a group comprising;
a table name, a name of a numeric column in the table, a desired number of equal sized bins, a frequency percentage above which a value of the numeric data element should be treated as a spike, a percentage above which a bin should be further subdivided into sub-bins, a WHERE clause that reduces a beginning and ending range of the bins, and a WHERE clause that filters rows.- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24)
-
-
17. A computer-implemented system for performing data mining applications, comprising:
-
(a) a computer having one or more data storage devices connected thereto, wherein a relational database is stored on one or more of the data storage devices;
(b) a relational database management system, executed by the computer, for accessing the relational database stored on the data storage devices; and
(c) an analytic application programming interface (API), executed by the computer, that generates an automated, adaptive, histogram bin data description assist function performed directly within the relational database management system, (d) wherein the automated, adaptive, histogram bin data description assist function counts a number of occurrences of values in value ranges for a numeric data element in a column of a table stored in the relational database, (e) wherein the function makes three logical passes of the data in the relational database, and each pass uses information gathered in a previous pass, and the three logical passes comprise;
(1) a first pass that determines a count of specific values occurring above a threshold frequency by percentage, (2) a second pass that counts values in a plurality of ranges of values based on dividing an overall range of values for the numeric data element into a specified number of bins, and then combining counts for these bins with the count of frequently occurring values found in the first pass, (3) a third pass that sub-divides the bins from the second pass that contain greater than a threshold frequency by percentage of the counts and adds these counts to the counts obtained in the first and second passes, wherein the result is an ordered list of counts and ranges for each bin with an indication of a type for each bin.
-
-
25. A method for performing data mining applications, comprising:
-
(a) storing a relational database on one or more data storage devices connected to a computer;
(b) accessing the relational database stored on the data storage devices using a relational database management system; and
(c) executing an analytic application programming interface (API) in the computer to generate an automated, adaptive, histogram bin data description assist function performed directly within the relational database management system, (d) wherein the automated, adaptive, histogram bin data description assist function counts a number of occurrences of values in value ranges for a numeric data element in a column of a table stored in the relational database;
(e) wherein the automated, adaptive, histogram bin data description assist function creates a plurality of bins, and each of the bins stores a selected range of values for the numeric data element; and
(e) wherein the automated, adaptive, histogram bin data description assist function accepts one or more parameters selected from a group comprising;
a table name, a name of a numeric column in the table, a desired number of equal sized bins, a frequency percentage above which a value of the numeric data element should be treated as a spike, a percentage above which a bin should be further subdivided into sub-bins, a WHERE clause that reduces a beginning and ending range of the bins, and a WHERE clause that filters rows.- View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47)
-
-
48. An article of manufacture comprising logic embodying a method for performing data mining applications, comprising:
-
(a) storing a relational database on one or more data storage devices connected to a computer;
(b) accessing the relational database stored on the data storage devices using a relational database management system; and
(c) executing an analytic application programming interface (API) in the computer to generate an automated, adaptive, histogram bin data description assist function performed directly within the relational database management system, (d) wherein the automated histogram bin data description assist function counts a number of occurrences of values in value ranges for a numeric data element in a column of a table stored in the rational database;
(e) wherein the automated, adaptive, histogram bin data description assist function creates a plurality of bins, and each of the bins stores a selected range of values for the numeric data element; and
(e) wherein the automated, adaptive, histogram bin data description assist function accepts one or more parameters selected from a group comprising;
a table name, a name of a numeric column in the table, a desired number of equal sized bins, a frequency percentage above which a value of the numeric data element should be treated as a spike, a percentage above which a bin should be further subdivided into sub-bins, a WHERE clause that reduces a beginning and ending range of the bins, and a WHERE clause that filters rows.- View Dependent Claims (49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70)
-
-
71. A method for performing data mining applications, comprising:
-
(a) storing a relational database on one or more data storage devices connected to a computer;
(b) accessing the relational database stored on the data storage devices using a relational database management system; and
(c) executing an analytic application programming interface (API) in the computer to generate an automated, adaptive, histogram bin data description assist function performed directly within the relational database management system, (d) wherein the automated, adaptive, histogram bin data description assist function counts a number of occurrences of values in value ranges for a numeric data element in a column of a table stored in the relational database;
(e) wherein the function makes three logical passes of the data in the relational database, and each pass uses information gathered in a previous pass, and the three logical passes comprise;
(1) a first pass that determines a count of specific values occurring above a threshold frequency by percentage, (2) a second pass that counts values in a plurality of ranges of values based on dividing an overall range of values for the numeric data element into a specified number of bins, and then combining counts for these bins with the count of frequently occurring values found in the first pass, (3) a third pass that sub-divides the bins from the second pass that contain greater than a threshold frequency by percentage of the counts and adds these counts to the counts obtained in the first and second passes, wherein the result is an ordered list of counts and ranges for each bin with an indication of a type for each bin.
-
-
72. An article of manufacture comprising logic embodying a method for performing data mining applications, comprising:
-
(a) storing a relational database on one or more data storage devices connected to a computer;
(b) accessing the relational database stored on the data storage devices using a relational database management system; and
(c) executing an analytic application programming interface (API) in the computer to generate an automated, adaptive, histogram bin data description assist function performed directly within the relational database management system, (d) wherein the automated, adaptive, histogram bin data description assist function counts a number of occurrences of values in value ranges for a numeric data element in a column of a table stored in the relational database;
(e) wherein the function makes three logical passes of the data in the relational database, and each pass uses information gathered in a previous pass, and the three logical passes comprise;
(1) a first pass that determines a count of specific values occurring above a threshold frequency by percentage, (2) a second pass that counts values in a plurality of ranges of values based on dividing an overall range of values for the numeric data element into a specified number of bins, and then combining counts for these bins with the count of frequently occurring values found in the first pass, (3) a third pass that sub-divides the bins from the second pass that contain greater than a threshold frequency by percentage of the counts and adds these counts to the counts obtained in the first and second passes, wherein the result is an ordered list of counts and ranges for each bin with an indication of a type for each bin.
-
Specification