Parallel object-oriented data mining system
First Claim
1. A data mining system, comprising:
- a parallel reading and displaying module for reading and displaying data in different formats, said data containing data items with features;
a parallel object identifying module for identifying said data items;
a parallel feature extracting module for extracting at least one of said features for each of said data items;
a parallel pattern recognition algorithms module for pattern recognition;
a storage module to store at least one of said features for each of said data items as it is extracted, and a parallel linking module for linking said parallel object identifying module, said parallel feature extracting module, said parallel pattern recognition algorithms module, and said storage module.
4 Assignments
0 Petitions
Accused Products
Abstract
A data mining system uncovers patterns, associations, anomalies and other statistically significant structures in data. Data files are read and displayed. Objects in the data files are identified. Relevant features for the objects are extracted. Patterns among the objects are recognized based upon the features. Data from the Faint Images of the Radio Sky at Twenty Centimeters (FIRST) sky survey was used to search for bent doubles. This test was conducted on data from the Very Large Array in New Mexico which seeks to locate a special type of quasar (radio-emitting stellar object) called bent doubles. The FIRST survey has generated more than 32,000 images of the sky to date. Each image is 7.1 megabytes, yielding more than 100 gigabytes of image data in the entire data set.
154 Citations
33 Claims
-
1. A data mining system, comprising:
-
a parallel reading and displaying module for reading and displaying data in different formats, said data containing data items with features;
a parallel object identifying module for identifying said data items;
a parallel feature extracting module for extracting at least one of said features for each of said data items;
a parallel pattern recognition algorithms module for pattern recognition;
a storage module to store at least one of said features for each of said data items as it is extracted, and a parallel linking module for linking said parallel object identifying module, said parallel feature extracting module, said parallel pattern recognition algorithms module, and said storage module. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A parallel object-oriented data mining system, comprising:
-
a parallel object-oriented reading and displaying module for reading and displaying data in different formats, said data containing data items with features;
a parallel object-oriented identifying module for identifying said data items;
a parallel object-oriented feature extracting module for extracting at least one of said features for each of said data items;
a parallel object-oriented pattern recognition algorithms module for pattern recognition;
a storage module to store at least one of said features for each of said data items as it is extracted, and a parallel object-oriented linking module for linking said parallel object-oriented identifying module, said parallel object-oriented extracting module, said parallel object-oriented pattern recognition algorithms module, and said storage module. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
-
16. A parallel object-oriented data mining system, comprising:
-
parallel object-oriented reading and displaying means for reading and displaying data in different formats, said data containing data items with features;
parallel object-oriented sampling means for sampling said data to reduce the number of data items;
parallel object-oriented multiresolution analysis means for performing a reversible transformation of said data into a coarser resolution;
parallel object-oriented noise removing means for removing noise from said data;
parallel object-oriented data fusion means for data fusion;
parallel object-oriented object identifying means for identifying data items;
parallel object-oriented feature extracting means for extracting at least one of said features for each of the said data items;
parallel object-oriented dimension reduction means for dimension reduction which reduces the number of features for a data item;
parallel object-oriented pattern recognition algorithms means for pattern recognition; and
database means for storing features for each data item as it is extracted, wherein, the appropriate means are linked as necessary using a scripting language.
-
-
17. A data mining system for science, engineering, business and other applications, comprising:
-
a parallel object-oriented reading, writing, and displaying module for reading, writing, and displaying engineering, business and other data in different formats, said data containing data items from different sensors at different times under different conditions;
a parallel object-oriented sampling module for sampling said data and reducing the number of said data items;
a parallel object-oriented multiresolution analysis module for multiresolution analysis to perform a reversible transformation of said data into a coarser resolution using multi-resolution techniques;
a parallel object-oriented noise removal module for removing noise from said data;
a parallel object-oriented data fusion module for data fusion if said data is obtained from different sensors at different times under different conditions at different resolutions;
a parallel object-oriented object identifying module for identifying said data items in the fused, denoised, sampled, multi-resolution data;
a parallel object-oriented feature extracting module for extracting at least one of said features for each of said items from the said fused, denoised, sampled, multi-resolution data;
a parallel object-oriented dimension reduction module for dimension reduction which reduces the number of features for each of said data items;
a parallel object-oriented pattern recognition module using pattern recognition algorithms selected from the group consisting of decision trees, neural networks, k-nearest neighbor, k-means, or evolutionary algorithms; and
a database to store said at least one of said features for each of said data items as it is extracted, after the number of features have been reduced, and as the data set grows in size, enabling easy access to subsets of data;
wherein, all the appropriate modules are linked as necessary using a scripting language such as Python to provide a solution for data mining.- View Dependent Claims (18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30)
-
-
24. A method of data mining, comprising the steps of:
-
reading and displaying in parallel data files, said data files containing objects having at least one feature;
identifying in parallel said objects in said data files;
extracting in parallel said at least one feature for each of said objects; and
recognizing in parallel patterns among said objects based upon said features.
-
-
31. A method of data mining, comprising the steps of:
-
reading and displaying data files using a parallel object-oriented reading and displaying module, said data files containing objects having at least one feature;
identifying said objects in said data files using a parallel object-oriented object identifying module;
extracting at least one feature for each of said objects using a parallel object-oriented feature extracting module; and
recognizing patterns among said objects based on at least one feature using a parallel object-oriented pattern recognizing module.
-
-
32. A method of data mining, comprising the steps of:
-
reading, writing, and displaying a number of data files;
sampling said data files and reducing the number of said data files;
conducting multi-resolution analysis to perform a reversible transformation into a coarser resolution of said data files;
removing noise from said data files;
implementing data fusion of said data files;
identifying objects in said data files;
extracting at least one feature for each of said objects;
normalizing at least one feature of said objects;
reducing the dimension or number of at least one of said feature of said objects;
recognizing patterns among said objects using at least one of said feature;
displaying said data files and said objects and capturing feedback from scientists for validation;
storing at least one feature for each of said objects, after they have been extracted in said extracting step, reduced in number in said reducing step, used for pattern recognition in said recognizing patterns step, and displayed in said displaying step; and
linking said foregoing steps.
-
-
33. A method of data mining, comprising the steps of:
-
reading, writing, and displaying scientific, engineering, business and other data in different formats using a parallel object-oriented reading, writing, and displaying module, said data containing data items;
sampling said data and reducing the number of said data items using a parallel object-oriented sampling module;
conducting multiresolution analysis to perform a reversible transformation of said data into a coarser resolution using a parallel object-oriented multiresolution module;
removing noise from said data using a parallel object-oriented removing noise module;
conducting data fusion using a parallel object-oriented data fusion module;
when said data is obtained from different sensors at different times under different conditions at different resolutions;
identifying objects or data items in said data and extracting at least one feature for each of said data items using a parallel object-oriented identifying objects module;
conducting dimension reduction which reduces the number of said features for one or more of said data items using a parallel object-oriented conducting dimension reduction module;
implementing pattern recognition algorithms using a parallel object-oriented implementing pattern recognition algorithms module;
using a database to store at least one feature for each of said data items extracted after the number said features have been reduced, and as said data items grows in size, enabling easy access to subsets of said data; and
linking appropriate foregoing parallel object-oriented modules as necessary using a scripting language.
-
Specification