Scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems
First Claim
Patent Images
1. A method of organizing data in a parallel database which is partitioned across computational processors of a parallel computer, said data being organized into a plurality of records, said method comprising:
- a. representing each record as an n dimensional vector;
b. compressing each n dimensional vector by eliminating zeros in components of each said n dimensional vector for each of said records; and
c. applying a modified self-organized map algorithm to operate on the non-zero components of each vector for said compressed input records to group said records into a plurality of clusters, wherein each cluster comprises a plurality of said records having a set of common input parameters.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for retrieving and organizing data using a self-organizing map in a parallel transaction data base system. With this invention, input records of data are represented as n dimensional vectors. Each of these vectors then compressed by eliminating zeros in the components of these vectors. Then, the self-organizing map algorithm is applied to the compressed input records to group the records into a number of clusters, where each cluster comprises a number of records having common input parameters.
-
Citations
8 Claims
-
1. A method of organizing data in a parallel database which is partitioned across computational processors of a parallel computer, said data being organized into a plurality of records, said method comprising:
-
a. representing each record as an n dimensional vector;
b. compressing each n dimensional vector by eliminating zeros in components of each said n dimensional vector for each of said records; and
c. applying a modified self-organized map algorithm to operate on the non-zero components of each vector for said compressed input records to group said records into a plurality of clusters, wherein each cluster comprises a plurality of said records having a set of common input parameters. - View Dependent Claims (2, 3)
-
-
4. A method of retrieving data from a parallel database which is partitioned across computational nodes of a parallel computer said data being organized into a plurality of records, said method comprising:
-
a. representing each record as an n dimensional vector;
b. compressing each n dimensional vector by eliminating zeros in components of each said n-dimensional vector for each of said records;
c. applying a modified self-organizing map algorithm to operate on the no-zero components of each vector for said compressed input records to group said records into a plurality of clusters, wherein each cluster comprises a plurality of said records having a set of common input parameters; and
d. determining statistical measures of each record that is retrieved by using said input parameters associated with one of said clusters, said one cluster being the cluster from which said record was retrieved. - View Dependent Claims (5, 6)
-
-
7. A program storage device readable by a machine, tangibly embodying a program of instructions executable by said machine to perform method steps in a parallel transaction database which is partitioned across computational processors of a parallel computer said data being organized into a plurality of records, said method comprising the steps of:
-
a. representing each record and n dimensional vector;
b. compressing each n dimensional vector by eliminating zeros in components of each said n dimensional vector for each of said records; and
c. applying a modified self-organized map algorithm to operate on the non-zero components of each vector for said compressed input records to group said records into a plurality of said nodes, wherein each group comprises a plurality of said records having a corresponding set of common input parameters. - View Dependent Claims (8)
-
Specification