Method and system for data mining in high dimensional data spaces

US 7,567,972 B2
Filed: 02/26/2004
Issued: 07/28/2009
Est. Priority Date: 05/08/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A computerized data mining method performed by a processor that analyzes a multitude of items in an n-dimensional space D_n, each described by n item features, said method using a mining function f with at least one control parameter Pⁱcontrolling a target feature of the data mining function, said method comprising:

a first step of selecting a transformation function T to reduce dimensions of said n-dimensional space by space-filling curves mapping said n-dimensional space to a in-dimensional space;

a second step of determining a transformed control parameter P^T_icontrolling the target feature of the data mining function in said m-dimensional space, wherein the m-dimensional space comprises fewer dimensions that the n-dimensional space and wherein the transformation function T ensures that all information within the n-dimensional space is mapped onto and maintained in the m-dimensional data space;

a third step of applying said selected transformation function T on said multitude D_nof items to create a transformed multitude D_mof items and executing said mining function f controlled by said transformed control parameter P^T_ion said transformed multitude of items D_m; and

a fourth step of storing a result of the third step in memory.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computerized method and system for analyzing a multitude of items in a high dimensional (n-dimensional) data space D_neach described by n item features. The method uses a mining function f with at least one control parameter P_icontrolling the target of the data mining function. The method selects a transformation function T for reducing dimensions of the n-dimensional space by space-filling curves mapping said n-dimensional space to a m-dimensional space (m<n). The method determines a transformed control parameter P^T_icontrolling the target of the data mining function in the m-dimensional space. The method applies the selected transformation function T on the multitude D_nof items to create a transformed multitude D_mof items, executes the mining function f controlled by the transformed control parameter P^T_ion the transformed multitude of items D_m, and stores the result.

18 Citations

View as Search Results

11 Claims

1. A computerized data mining method performed by a processor that analyzes a multitude of items in an n-dimensional space D_n, each described by n item features, said method using a mining function f with at least one control parameter Pⁱcontrolling a target feature of the data mining function, said method comprising:
- a first step of selecting a transformation function T to reduce dimensions of said n-dimensional space by space-filling curves mapping said n-dimensional space to a in-dimensional space;
  
  a second step of determining a transformed control parameter P^T_icontrolling the target feature of the data mining function in said m-dimensional space, wherein the m-dimensional space comprises fewer dimensions that the n-dimensional space and wherein the transformation function T ensures that all information within the n-dimensional space is mapped onto and maintained in the m-dimensional data space;
  
  a third step of applying said selected transformation function T on said multitude D_nof items to create a transformed multitude D_mof items and executing said mining function f controlled by said transformed control parameter P^T_ion said transformed multitude of items D_m; and
  
  a fourth step of storing a result of the third step in memory.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The computerized data mining method according to claim 1, wherein said transformed control parameter P^T_iis determined additionally based on one or more of the following:
    - the transformation function T;
      
      the multitude of items D_n;
      
      the number of dimensions n;
      
      the number of dimensions m;
      
      other control parameters P_j.
  - 3. The computerized data mining method according to claim 1, wherein said second step of determining said transformed control parameter P^T_icomprises the following sub-steps:
    - choosing a set of sample items of said n-dimensional space;
      
      applying said selected transformation function T on said set of sample items to create a transformed set of sample items; and
      
      determining said transformed control parameter P^T_isuch that an acceptable target of the data mining function in said m-dimensional space is achieved with respect to said transformed set of sample items.
  - 4. The computerized data mining method according to claim 3, wherein said set of sample itemsis created by selecting items from said multitude of items D_n;
    - oris created by creating artificial items within said n-dimensional space.
  - 5. The computerized data mining method according to claim 1, wherein responsive to a failure to determine a satisfactory transformed control parameter P^T_iin said second step, said method is iterated with said first step by selecting an alternative transformation function T_altinstead of said transformation function T.
  - 6. The computerized data mining method according to claim 5, wherein said alternative transformation function T_altis being based on the same class of space-filling curves as said transformation function T, orwherein said alternative transformation function T_altis being based on another class of space-filling curves than said transformation function T.
  - 7. The computerized data mining method according to claim 6, wherein said same class of space-filling curves is the class of Hilbert space-filling curves.
  - 8. The computerized data mining method according to claim 1, wherein in said second step said transformed control parameter P^T_iis determined for at least one comparable data mining problem using empirical data from previously generated data models.
  - 9. The computerized data mining method according to claim 1,wherein said mining function is solving a clustering problem and said control parameter P_iand transformed control parameter P^T_iis a minimal distance between clusters, orwherein said mining function is solving a classification problem and said control parameter P_iand transformed control parameter P^T_iis a maximum depth of a classification decision tree.
  - 10. The computerized data mining method according to claim 1, wherein said method comprising a fifth step of presenting data mining results determined within said m-dimensional space in terms of said n-dimensional space.
  - 11. The computerized data mining method according to claim 1, wherein said m-dimensional space is reduced to one dimension (m=1) only.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation, Kaon Interactive, Inc.
Original Assignee
International Business Machines Corporation
Inventors
Lingenfelder, Christoph, Geiselhart, Reinhold, Orechkina, Janna
Primary Examiner(s)
Wassum; Luke S.
Assistant Examiner(s)
Hicks; Michael J

Application Number

US10/787,660
Publication Number

US 20040225638A1
Time in Patent Office

1,979 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06F 16/30   of unstructured textual dat...

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99942   Manipulating data structure...

Method and system for data mining in high dimensional data spaces

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

18 Citations

11 Claims

Specification

Use Cases

Quick Links

Others

Method and system for data mining in high dimensional data spaces

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

18 Citations

11 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others