Method and apparatus for knowledge discovery in databases
First Claim
1. A method of extracting knowledge from a database containing records of information, comprising:
- (a) defining a process plan comprising a plurality of components each adapted to perform a designated function upon said records, said plurality of components being interconnected by a plurality of links representing a transfer of records from an output of a first component to an input of a second component;
(b) performing a data compression function to yield a compressed database;
(c) supplying said compressed database to said components in said process plan such that each component performs its designated function upon said records to yield desired knowledge from said database; and
(d) maintaining said records transferred across one of said plurality of links in at least one cache, said cache being accessible by at least one of said components.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer-based method and apparatus for knowledge discovery from databases. The disclosed method involves the user creation of a project plan comprising a plurality of operational components adapted to cooperatively extract desired information from a database. In one embodiment, the project plan is created within a graphical user interface and consists of objects representing the various functional components of the overall plan interconnected by links representing the flow of data from the data source to a data sink. Data visualization components may be inserted essentially anywhere in the project plan. One or more data links in the project plan may be designated as caching links which maintain copies of the data flowing across them, such that the cached data is available to other components in the project plan. In one embodiment, compression technology is applied to reduce the overall size of the database.
339 Citations
17 Claims
-
1. A method of extracting knowledge from a database containing records of information, comprising:
-
(a) defining a process plan comprising a plurality of components each adapted to perform a designated function upon said records, said plurality of components being interconnected by a plurality of links representing a transfer of records from an output of a first component to an input of a second component;
(b) performing a data compression function to yield a compressed database;
(c) supplying said compressed database to said components in said process plan such that each component performs its designated function upon said records to yield desired knowledge from said database; and
(d) maintaining said records transferred across one of said plurality of links in at least one cache, said cache being accessible by at least one of said components. - View Dependent Claims (2, 3, 4)
(e) performing said data compression function on said at least one cache.
-
-
3. A method in accordance with claim 1, wherein each component is of a type selected from the group of:
- data source components, data reduction components, data transformation components, algorithm components, data sink components, and data visualization components.
-
4. A method in accordance with claim 2, further comprising:
-
(f) maintaining a data directory containing information describing the organization of data in said database; and
(g) consolidating said desired knowledge from said database into a form suitable for reporting.
-
-
5. A system for extracting knowledge from a database containing records of information, comprising:
-
one or more processing units, said processing units coupled to each other by a network, wherein said processing units execute software to create a framework within which one or more process plans are designed, managed, modified, tested, evaluated, run, and stored for future use, wherein said network includes one or more network interface cards, each network interface card coupled to one processing unit and connecting said processing units through physical means;
one or more data storage units coupled to each of said processing units, said data storage units containing said records of information, wherein said records of information can be used throughout the framework in compressed form; and
wherein a user interacting with the framework implements said process plan to extract knowledge from the database, said process plan including components that perform functions on the records of information, each of said components representing a different stage of knowledge extraction, wherein components in the process plan execute on different processing units as decided by the framework, said components connected by data links over said physical means, said data links operating to permit the output of one component to be applied to the input of another. - View Dependent Claims (6, 7, 8)
-
-
9. A method of discovering knowledge from a database by employing a process plan including a number of components to perform functions on records of the database, comprising the steps of:
-
(a) specifying the knowledge to be discovered from the database;
(a1) performing a data compression function to yield a compressed database;
(b) selecting records of information from the database that suit the knowledge specification; and
(b1) supplying said compressed database to said components in said process plan such that each component performs its designated function upon said records. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
(c) preprocessing the selected records to remove noise or extrapolate for missing records;
(d) transforming the records to be usable with an analysis algorithm;
(g) consolidating results into a reportable format for knowledge discovery; and
(h) reporting the consolidated results.
-
-
11. A method in accordance with claim 10 wherein the step of transforming the records scales the records between two numbers.
-
12. A method in accordance with claim 10, further comprising:
(e) mining the transformed records to isolate characteristics between the records, said mining performed by the analysis algorithm.
-
13. A method in accordance with claim 12 wherein the analysis algorithm is a neural network algorithm.
-
14. A method in accordance with claim 12 wherein the step of mining the records includes travelling a decision tree to isolate characteristics.
-
15. A method in accordance with claim 12, further comprising:
(f) interpreting the results generated by mining.
-
16. A method in accordance with claim 15 wherein the step of interpreting the results includes assessing the quality of the results using visualization tools.
-
17. A method in accordance with claim 15, wherein consolidating the interpreted results includes documenting and acting on the results.
Specification