Scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems

US 6,260,036 B1
Filed: 05/07/1998
Issued: 07/10/2001
Est. Priority Date: 05/07/1998
Status: Expired due to Fees

First Claim

Patent Images

1. A method of organizing data in a parallel database which is partitioned across computational processors of a parallel computer, said data being organized into a plurality of records, said method comprising:

a. representing each record as an n dimensional vector;

b. compressing each n dimensional vector by eliminating zeros in components of each said n dimensional vector for each of said records; and

c. applying a modified self-organized map algorithm to operate on the non-zero components of each vector for said compressed input records to group said records into a plurality of clusters, wherein each cluster comprises a plurality of said records having a set of common input parameters.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for retrieving and organizing data using a self-organizing map in a parallel transaction data base system. With this invention, input records of data are represented as n dimensional vectors. Each of these vectors then compressed by eliminating zeros in the components of these vectors. Then, the self-organizing map algorithm is applied to the compressed input records to group the records into a number of clusters, where each cluster comprises a number of records having common input parameters.

Citations

8 Claims

1. A method of organizing data in a parallel database which is partitioned across computational processors of a parallel computer, said data being organized into a plurality of records, said method comprising:
- a. representing each record as an n dimensional vector;
  
  b. compressing each n dimensional vector by eliminating zeros in components of each said n dimensional vector for each of said records; and
  
  c. applying a modified self-organized map algorithm to operate on the non-zero components of each vector for said compressed input records to group said records into a plurality of clusters, wherein each cluster comprises a plurality of said records having a set of common input parameters.
- View Dependent Claims (2, 3)
- - 2. A method as recited in claim 1, wherein said zeros are deleted each time each record is accessed.
  - 3. The method of claim 1 wherein said applying of a modified self-organizing map algorithm comprises invoking the equation ${\overline{d}}_{k}$
    - (t)=∑
      
      xi≠
      
      0
      
      xi
      
      (t)
      
      [xi
      
      (t)-2
      
      ω
      
      ki
      
      (t0)]+∑
      
      i=1n
      
      ω
      
      ki2
      
      (t0).

4. A method of retrieving data from a parallel database which is partitioned across computational nodes of a parallel computer said data being organized into a plurality of records, said method comprising:
- a. representing each record as an n dimensional vector;
  
  b. compressing each n dimensional vector by eliminating zeros in components of each said n-dimensional vector for each of said records;
  
  c. applying a modified self-organizing map algorithm to operate on the no-zero components of each vector for said compressed input records to group said records into a plurality of clusters, wherein each cluster comprises a plurality of said records having a set of common input parameters; and
  
  d. determining statistical measures of each record that is retrieved by using said input parameters associated with one of said clusters, said one cluster being the cluster from which said record was retrieved.
- View Dependent Claims (5, 6)
- - 5. A method as recited in claim 4, wherein said statistical measures are determined by comparison of input parameters of corresponding clusters.
  - 6. The method of claim 4 wherein said applying of a modified self-organizing map algorithm comprises invoking the equation ${\overline{d}}_{k}$
    - (t)=∑
      
      xi≠
      
      0
      
      xi
      
      (t)
      
      [xi
      
      (t)-2
      
      ω
      
      ki
      
      (t0)]+∑
      
      i=1n
      
      ω
      
      ki2
      
      (t0).

7. A program storage device readable by a machine, tangibly embodying a program of instructions executable by said machine to perform method steps in a parallel transaction database which is partitioned across computational processors of a parallel computer said data being organized into a plurality of records, said method comprising the steps of:
- a. representing each record and n dimensional vector;
  
  b. compressing each n dimensional vector by eliminating zeros in components of each said n dimensional vector for each of said records; and
  
  c. applying a modified self-organized map algorithm to operate on the non-zero components of each vector for said compressed input records to group said records into a plurality of said nodes, wherein each group comprises a plurality of said records having a corresponding set of common input parameters.
- View Dependent Claims (8)
- - 8. The device of claim 7 wherein said step of applying a modified self-organizing map algorithm comprises invoking the equation ${\overline{d}}_{k}$
    - (t)=∑
      
      xi≠
      
      0
      
      xi
      
      (t)
      
      [xi
      
      (t)-2
      
      ω
      
      ki
      
      (t0)]+∑
      
      i=1n
      
      ω
      
      ki2
      
      (t0).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Rushmeier, Holly Edith, Lawrence, Richard Douglas, Almasi, George S.
Primary Examiner(s)
Amsbury, Wayne
Assistant Examiner(s)
PARDO, THUY N

Application Number

US09/074,619
Time in Patent Office

1,160 Days
Field of Search

704/10, 707/2, 707/5, 707/10, 707/103, 707/6, 707/7, 706/50
US Class Current

707/688
CPC Class Codes

G06F 16/285   Clustering or classification

G06F 18/2137   based on criteria of topolo...

Y10S 707/957   Multidimensional

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99936   Pattern matching access

Y10S 707/99937   Sorting

Scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links