Partition-based high dimensional similarity join method

US 20040093320A1
Filed: 08/13/2003
Published: 05/13/2004
Est. Priority Date: 09/11/2002
Status: Active Grant

First Claim

Patent Images

1. A partition-based high dimensional similarity join method, comprising:

determining dimensions for use in partitioning a high dimensional data space and the number of partitioning dimensions;

partitioning the high dimensional data space in accordance with the determined dimensions and the number of partitioning dimensions; and

performing joins between data sets according to the partitioned dimensions, wherein the joins are performed only when respective cells in the data sets are overlapping with each other or are neighboring each other.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A partition-based high dimensional similarity join method allowing similarity to be efficiently measured by beforehand dynamically selecting space partitioning dimensions and the number of the partitioning dimensions using a dimension selection algorithm. A method of efficiently performing similarity join for high dimensional data during a relatively short period of time without requiring massive storage space. The method includes according to the present invention comprises the steps of partitioning a high dimensional data space and performing joins between predetermined data sets. Dimensions for use in partitioning the high dimensional data space and the number of partitioning dimensions are determined in advance before the space partitioning, and the joins are performed only when respective cells of the data sets are overlapping with each other or are neighboring each other.

8 Citations

View as Search Results

9 Claims

1. A partition-based high dimensional similarity join method, comprising:
- determining dimensions for use in partitioning a high dimensional data space and the number of partitioning dimensions;
  
  partitioning the high dimensional data space in accordance with the determined dimensions and the number of partitioning dimensions; and
  
  performing joins between data sets according to the partitioned dimensions, wherein the joins are performed only when respective cells in the data sets are overlapping with each other or are neighboring each other.
- View Dependent Claims (2, 4, 5, 6)
- - 2. The method as claimed in claim 1, further comprising the operation of counting the number of join computations which can occur in the joins between the respective cells of the data sets.
  - 4. The method as claimed in claim 2, wherein the number of dimensions d_pused in partitioning the high dimensional data space is obtained by comparing the size of the data sets and the size of disk blocks in which the data sets are stored, according to the following equation:
  - 5. The method as claimed in claim 4, wherein the number of join computations is obtained by computing the number of entries of the data sets R and S included in the respective cells for respective dimensions and then counting the number of distance computations of joins between the cells for the respective dimensions.
  - 6. The method as claimed in claim 4, wherein the number of join computations is obtained by computing the number of entries of the data sets R and S included in sampled cells among the cells for the respective dimensions and then counting the number of distance computations of joins between the cells for the respective dimensions.

3. The method as claimed in 2, wherein the dimensions for use in partitioning the high dimensional data space are determined based on the number of join computations.
- View Dependent Claims (7, 8, 9)
- - 7. The method as claimed in claim 3, wherein the number of dimensions d_pused in partitioning the high dimensional data space is obtained by comparing the size of the data sets and the size of disk blocks in which the data sets are stored, according to the following equation:
  - 8. The method as claimed in claim 7, wherein the number of join computations is obtained by computing the number of entries of the data sets R and S included in the respective cells for respective dimensions and then counting the number of distance computations of joins between the cells for the respective dimensions.
  - 9. The method as claimed in claim 7, wherein the number of join computations is obtained by computing the number of entries of the data sets R and S included in sampled cells among the cells for the respective dimensions and then counting the number of distance computations of joins between the cells for the respective dimensions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd. (Samsung Group), Paceco Corporation (Tsuneishi Holdings Company Limited)
Original Assignee
Samsung Electronics Co. Ltd. (Samsung Group)
Inventors
Shin, Hyoseop

Granted Patent

US 7,167,868 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/283   Multi-dimensional databases...

G06F 16/40   of multimedia data, e.g. sl...

Y10S 707/99942   Manipulating data structure...

Y10S 707/99943   Generating database or data...

Y10S 707/99945   Object-oriented database st...

Y10S 707/99953   Recoverability

Partition-based high dimensional similarity join method

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

8 Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Partition-based high dimensional similarity join method

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

8 Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links