Method and system for performing proximity joins on high-dimensional data points in parallel
First Claim
1. A method for performing proximity join operations on high-dimensional data points in parallel in a multiprocessor system, the join operations being based on a similarity distance between any two data points, the method comprising the steps of:
- partitioning the data points among the processors;
creating an index structure for the data points of each processor, the index structure having a plurality of leaf nodes each corresponding to a subset of the data points;
assigning the join operations to the processors using the index structures; and
simultaneously redistributing and joining the data points in the processors in parallel based on a predetermined joining condition.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for performing spatial proximity joins on high-dimensional points representing data objects of a database in parallel in a multiprocessor system. The method comprises the steps of: partitioning the data points among the processors; creating index structures for the data points of the processors in parallel; assigning the join operations to the processors using the index structures; and simultaneously redistributing and joining the data points in the processors in parallel based on a predetermined joining condition. An efficient data structure, ε-K-D-B tree, is used to provide fast access to the high-dimensional points and to minimize system storage requirements. The invention achieves fast response time and requires minimum storage space by having structurally identical indices among the processors, assigning workload based on the join costs, and redistributing the data points among the processors while joining the data whenever possible.
-
Citations
49 Claims
-
1. A method for performing proximity join operations on high-dimensional data points in parallel in a multiprocessor system, the join operations being based on a similarity distance between any two data points, the method comprising the steps of:
-
partitioning the data points among the processors; creating an index structure for the data points of each processor, the index structure having a plurality of leaf nodes each corresponding to a subset of the data points; assigning the join operations to the processors using the index structures; and simultaneously redistributing and joining the data points in the processors in parallel based on a predetermined joining condition. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer program product for use with a multiprocessor system for directing the system to perform proximity joins on high-dimensional data points in parallel in the processors, the computer program product comprising:
-
a computer readable medium; means, provided on the computer-readable medium, for directing the system to partition the data points among the processors; means, provided on the computer-readable medium, for directing the system to create an index structure for the data points in each processor, the index structure having a plurality of leaf nodes each corresponding to a subset of the data points; means, provided on the computer-readable medium, for directing the system to assign the join operations to the processors using the index structures; and means, provided on the computer-readable medium, for directing the system to simultaneously redistribute and join the data points in the processors in parallel, based on a predetermined joining condition. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
-
35. A database system for performing proximity joins on high-dimensional points in parallel in a multiprocessor computer, comprising:
-
means for partitioning the data points among the processors; means, in each processor, for creating an index structure for the data points of the processor, the index structure having a plurality of leaf nodes each corresponding to a subset of the data points; means for assigning the join operations to the processors using the index structures; and means, in each processor, for simultaneously redistributing and joining the data points in the processors in parallel based on a predetermined joining condition. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49)
-
Specification