Automatic data explorer that determines relationships among original and derived fields
First Claim
1. A method for improving the efficiency of data mining software tools that operate on a database, the method comprising:
- determining relationships between tables in the database;
identifying and categorizing all data fields in the tables;
pre-processing any unstructured data fields to represent the unstructured fields with vectors compatible with a format of structured fields;
converting certain fields into modified fields;
determining a level of relationship between all the data fields; and
storing the relationship data in a database;
wherein the method is performed automatically by a computer system when system resources are available, and without human intervention.
2 Assignments
0 Petitions
Accused Products
Abstract
An automatic data mining tool that characterizes the relationships between different database fields from both structured and unstructured data. It extracts a data model, identifies and categorizes all the data fields, performs pre-processing to deal with unstructured data effectively, and processes the data without human intervention to automatically explore how the fields are related to one another. Prior to the commencement of user-controlled data mining, the present invention goes through all the fields in a database table space in order to establish meaningful relationships between various fields using whatever computer resources are available (i.e. by using “cycle stealing”). This allows the present invention to run in the background and establish relationships between fields even before data mining (DM) begins, and determine redundant, useless, and/or trivial fields without any external guidance. This results in faster, more accurate data mining since these relationships are available before a user begins the process of data mining.
74 Citations
17 Claims
-
1. A method for improving the efficiency of data mining software tools that operate on a database, the method comprising:
-
determining relationships between tables in the database;
identifying and categorizing all data fields in the tables;
pre-processing any unstructured data fields to represent the unstructured fields with vectors compatible with a format of structured fields;
converting certain fields into modified fields;
determining a level of relationship between all the data fields; and
storing the relationship data in a database;
wherein the method is performed automatically by a computer system when system resources are available, and without human intervention. - View Dependent Claims (2, 3)
-
-
4. A method for determining relationships among data fields in a database, the method comprising:
-
extracting a data model for each set of related tables in the database;
determining whether each field in each table is structured or unstructured data;
for each unstructured data field, determining a data type for each field;
extracting feature data from the unstructured data based upon the determined data type of the data fields;
analyzing the structured fields and feature data to determine a level of relationship between the fields or data; and
storing information related to the level of relationship between the fields or data. - View Dependent Claims (5, 6, 7, 8, 9, 10)
-
-
11. A computer readable medium including computer code for an automatic data explorer that determines relationships among original and derived fields, the computer readable medium comprising:
-
computer code for extracting a data model for each set of tables in the database;
computer code for determining whether each field is structured or unstructured data;
computer code for determining a data type for each unstructured field;
computer code for extracting feature data from the unstructured data based upon the determined data type of the data fields;
computer code for analyzing the structured fields and feature data to determine a level of relationship between the fields or data; and
computer code for storing information related to the level of relationship between the fields or data.. - View Dependent Claims (12, 13, 15, 16, 17)
-
-
14. A computer system for improving the efficiency of data mining software tools that operate on a database, the computer system comprising:
-
a processor; and
computer program code that executes on the processor, the computer program code comprising;
computer code for determining relationships between tables in the database;
computer code for identifying and categorizing all data fields in the tables;
computer code for pre-processing any unstructured data fields to represent the unstructured fields with vectors compatible with a format of structured fields;
computer code for determining a level of relationship between the all the data fields, and computer code for storing the relationship data in a database;
wherein the computer code is executed automatically by the computer system when system resources are available, and without human intervention..
-
Specification