Analytic logical data model
First Claim
Patent Images
1. A computer-implemented system for performing data mining applications, comprising:
- (a) a computer having one or more data storage devices connected thereto, wherein a relational database is stored on one or more of the data storage devices;
(b) a relational database management system, executed by the computer, for accessing the relational database stored on the data storage devices;
(c) an analytic logical data model (LDM) that provides logical entity and attribute definitions for advanced analytic processing performed by the relational database management system directly against the relational database, wherein the advanced analytic processing comprise one or more scalable data mining functions and the scalable data mining functions are selected from a group of functions comprising Data Description functions, Data Derivation functions, Data Reduction functions, Data Reorganization functions, Data Sampling functions, and Data Partitioning functions.
2 Assignments
0 Petitions
Accused Products
Abstract
A method, apparatus (1), and article of manufacture for performing data mining applications (110) in a relational database management system (114). An analytic logic data model (LDM) provides logical entity and attribute definitions for advanced analytic processing (112) performed by the relational database management system directly against the relational database.
-
Citations
21 Claims
-
1. A computer-implemented system for performing data mining applications, comprising:
-
(a) a computer having one or more data storage devices connected thereto, wherein a relational database is stored on one or more of the data storage devices;
(b) a relational database management system, executed by the computer, for accessing the relational database stored on the data storage devices;
(c) an analytic logical data model (LDM) that provides logical entity and attribute definitions for advanced analytic processing performed by the relational database management system directly against the relational database, wherein the advanced analytic processing comprise one or more scalable data mining functions and the scalable data mining functions are selected from a group of functions comprising Data Description functions, Data Derivation functions, Data Reduction functions, Data Reorganization functions, Data Sampling functions, and Data Partitioning functions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
(1) descriptive statistics for one or more numeric columns, wherein the statistics are selected from a group comprising count, minimum, maximum, mean, standard deviation, standard mean error, variance, coefficient of variance, skewness, kurtosis, uncorrected sum of squares, corrected sum of squares, and quantiles, (2) a count of values for a column, (3) a calculated modality for a column, (4) one or more bin numeric columns of counts with overlay and statistics options, (5) one or more automatically sub-binned numeric columns giving additional counts and isolated frequently occurring individual values, (6) a computed frequency of one or more column values, (7) a computed frequency of values for pairs of columns in a column list, (8) a Pearson Product-Moment Correlation matrix, (9) a Covariance matrix, (10) a sum of squares and cross-products matrix, and (11) a count of overlapping column values in one or more combinations of tables.
-
-
8. The system of claim 1, wherein the analytical logical data model stores results from the Data Derivation functions comprising column derivations or transformations.
-
9. The system of claim 1, wherein the analytical logical data model stores results from the Data Derivation functions that arc selected from a group comprising:
-
(1) a derived binned numeric column wherein a new column is bin number, (2) a n-valued categorical column dummy-coded into “
n”
0/1 values,(3) a n-valued categorical column recoded into n or less new values, (4) one or more numeric columns scaled via range transformation, (5) one or more columns scaled to a z-score that is a number of standard deviations from a mean, (6) one or more numeric columns scaled via a sigmoidal transformation function, (7) one or more numeric columns scaled via a base 10 logarithm function, (8) one or more numeric columns scaled via a natural logarithm function, (9) one or more numeric columns scaled via an exponential function, (10) one or more numeric columns raised to a specified power, (11) one or more numeric columns derived via user defined transformation function, (12) one or more new columns derived by ranking one or more columns or expressions based on order, (13) one or more new columns derived with quantile 0 to n−
1 based on order and n,(14) a cumulative sum of a value expression based on a sort expression, (15) a moving average of a value expression based on a width and order, (16) a moving sum of a value expression based on a width and order, (17) a moving difference of a value expression based on a width and order, (18) a moving linear regression value derived from an expression, width, and order, (19) a multiple account/product ownership bitmap, (20) a product ownership bitmap over multiple time periods, (21) one or more counts, amount, percentage means and intensities derived from a transaction summary, (22) one or more variabilities derived from transaction summary data, (23) one or more derived trigonometric values and their inverses, including sin, arcsin, cos, arccos, csc, arccsc, sec, arcsec, tan, arctan, cot, and arccot, and (24) one or more derived hyperbolic values and their inverses, including sinh, arcsinh, cosh, arccosh, csch, arccsch, sech, arcsech, tanh, arctanh, coth, and arccoth.
-
-
10. The system of claim 1, wherein the analytical logical data model stores results from the Data Reduction functions comprising one or more matrices.
-
11. The system of claim 1, wherein the analytical logical data model stores results from the Data Reduction functions that are selected from a group comprising:
-
(1) build one or more data reduction matrices selected from a group comprising;
(i) a Pearson-Product Moment Correlations (COR) matrix;
(ii) a Covariances (COV) matrix; and
(iii) a Sum of Squares and Cross Products (SSCP) matrix,(2) export a resultant matrix, and (3) restart a matrix operation.
-
-
12. The system of claim 1, wherein the analytical logical data model stores metadata for the Data Reduction functions.
-
13. The system of claim 1, wherein the analytical logical data model stores metadata for the Data Reduction functions selected from a group comprising:
-
(1) metadata to track the matrix type and its associated descriptions, (2) metadata to track internal table and column indexes, and their associated names and aliases, (3) metadata to track what columns are used to join multiple tables, (4) metadata to persist matrix calculations, using the internal table, column and select identifiers.
-
-
14. The system of claim 1, wherein the analytical logical data model stores results from the Data Reorganization comprising a wide analytic data set resulting from data reorganized by joining or de-normalizing pre-processed results.
-
15. The system of claim 1, wherein the analytical logical data model stores results from the Data Reorganization functions that are selected from a group comprising:
-
(1) a de-normalized new table created by removing one or more key columns from another table, and (2) a combined result table created by joining a plurality of tables or views.
-
-
16. The system of claim 1, wherein the analytical logical data model stores results from the Data Sampling function comprising a new table constructed from a randomly selected subset of the rows in an existing table or view.
-
17. The system of claim 1, wherein the analytical logical data model stores results from the Data Sample function comprising one or more data samples of specified sizes selected from a table.
-
18. The system of claim 1, wherein the analytical logical data model stores results from the Data Partitioning function comprising a new table constructed from at least one randomly selected subset of rows in an existing table or view, wherein the subsets are mutually distinct but all-inclusive subsets of data.
-
19. The system of claim 1, wherein the analytical logical data model stores results from the Data Partitioning function comprising one or more data partitions selected from a table using a database internal hashing technique.
-
20. A method for performing data mining applications, comprising:
-
(a) storing a relational database on one or more data storage devices connected to a computer;
(b) accessing the relational database stored on the data storage devices using a relational database management system executed by the computer; and
(c) providing logical entity and attribute definitions in an analytic logical data model (LDM) to support advanced analytic processing performed by the relational database management system directly against the relational database, wherein the advanced analytic processing comprise one or more scalable data mining functions and the scalable data mining functions are selected from a group of functions comprising Data Description functions, Data Derivation functions, Data Reduction functions, Data Reorganization functions, Data Sampling functions, and Data Partitioning functions.
-
-
21. An article of manufacture comprising logic embodying a method for performing data mining applications, comprising:
-
(a) storing a relational database on one or more data storage devices connected to a computer;
(b) accessing the relational database stored on the data storage devices using a relational database management system executed by the computer; and
(c) providing logical entity and attribute definitions in an analytic logical data model (LDM)to support advanced analytic processing performed by the relational database management system directly against the relational database, wherein the advanced analytic processing comprise one or more scalable data mining functions and the scalable data mining functions are selected from a group of functions comprising Data Description functions, Data Derivation functions, Data Reduction functions, Data Reorganization functions, Data Sampling functions, and Data Partitioning functions.
-
Specification