Partition-based high dimensional similarity join method
First Claim
1. A partition-based high dimensional similarity join method, comprising:
- determining dimensions for use in partitioning a high dimensional data space and the number of partitioning dimensions;
partitioning the high dimensional data space in accordance with the determined dimensions and the number of partitioning dimensions; and
performing joins between data sets according to the partitioned dimensions, wherein the joins are performed only when respective cells in the data sets are overlapping with each other or are neighboring each other.
2 Assignments
0 Petitions
Accused Products
Abstract
A partition-based high dimensional similarity join method allowing similarity to be efficiently measured by beforehand dynamically selecting space partitioning dimensions and the number of the partitioning dimensions using a dimension selection algorithm. A method of efficiently performing similarity join for high dimensional data during a relatively short period of time without requiring massive storage space. The method includes according to the present invention comprises the steps of partitioning a high dimensional data space and performing joins between predetermined data sets. Dimensions for use in partitioning the high dimensional data space and the number of partitioning dimensions are determined in advance before the space partitioning, and the joins are performed only when respective cells of the data sets are overlapping with each other or are neighboring each other.
8 Citations
9 Claims
-
1. A partition-based high dimensional similarity join method, comprising:
-
determining dimensions for use in partitioning a high dimensional data space and the number of partitioning dimensions;
partitioning the high dimensional data space in accordance with the determined dimensions and the number of partitioning dimensions; and
performing joins between data sets according to the partitioned dimensions, wherein the joins are performed only when respective cells in the data sets are overlapping with each other or are neighboring each other. - View Dependent Claims (2, 4, 5, 6)
-
- 3. The method as claimed in 2, wherein the dimensions for use in partitioning the high dimensional data space are determined based on the number of join computations.
Specification