Apparatus and method for similarity searches using hyper-rectangle based multidimensional data segmentation
First Claim
1. An apparatus for hyper-rectangle based multidimensional data similarity searches, the multidimensional data being representable by a multidimensional data sequence, comprising:
- MBR generation means for segmenting a multidimensional data sequence to be partitioned into subsequences, and representing each subsequence by each Minimum Bounding Rectangle (MBR), such that sets of MBRs are generated from the multidimensional data sequence, and the MBR sets are stored in a database;
first sequence pruning means for pruning irrelevant data sequences using a distance Dmbr between MBRs extracted from an inputted query sequence and the MBR sets stored in the database in a multidimensional Euclidean space;
second sequence pruning means for pruning irrelevant data sequences using a normalized distance Dnorm between MBRs extracted from the query sequence and the MBR sets of data sequences remaining after the data sequences are pruned by the first sequence pruning means in a multidimensional Euclidean space; and
subsequence finding means for finding subsequences similar to the given query sequence by obtaining sets of points in MBRs involved in a calculation of the distance Dnorm from each sequence obtained using the distance Dnorm.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed herein is an apparatus and method for similarity searches using hyper-rectangle based multidimensional data segmentation. The similarity search apparatus has MBR generation means, first sequence pruning means, second sequence pruning means, and subsequence finding means. The MBR generation means segments a multidimensional data sequence to be partitioned into subsequences, and represents each subsequence by each Minimum Bounding Rectangle (MBR), such that sets of MBRs are generated from the multidimensional data sequence, and the MBR sets are stored in a database. The first sequence pruning means prunes irrelevant data sequences using a distance Dmbr between MBRs extracted from an inputted query sequence and the MBR sets stored in the database in a multidimensional Euclidean space. The second sequence pruning means prunes irrelevant data sequences using a normalized distance Dnorm between MBRs extracted from the query sequence and the MBR sets of data sequences remaining after the data sequences are pruned in a multidimensional Euclidean space. The subsequence finding means detects subsequences similar to the given query sequence by obtaining sets of points contained in MBRs involved in a calculation of the distance Dnorm from each sequence obtained using the distance Dnorm.
-
Citations
18 Claims
-
1. An apparatus for hyper-rectangle based multidimensional data similarity searches, the multidimensional data being representable by a multidimensional data sequence, comprising:
-
MBR generation means for segmenting a multidimensional data sequence to be partitioned into subsequences, and representing each subsequence by each Minimum Bounding Rectangle (MBR), such that sets of MBRs are generated from the multidimensional data sequence, and the MBR sets are stored in a database;
first sequence pruning means for pruning irrelevant data sequences using a distance Dmbr between MBRs extracted from an inputted query sequence and the MBR sets stored in the database in a multidimensional Euclidean space;
second sequence pruning means for pruning irrelevant data sequences using a normalized distance Dnorm between MBRs extracted from the query sequence and the MBR sets of data sequences remaining after the data sequences are pruned by the first sequence pruning means in a multidimensional Euclidean space; and
subsequence finding means for finding subsequences similar to the given query sequence by obtaining sets of points in MBRs involved in a calculation of the distance Dnorm from each sequence obtained using the distance Dnorm. - View Dependent Claims (2, 3)
(a) the number of points contained in each MBR;
(b) distance Dmbr between the query MBR and a target MBR in the data sequence; and
(c) distance Dmbr between the query MBR and an MBR adjacent to the target MBR in the data sequence.
-
-
4. An apparatus for hyper-rectangle based multidimensional data similarity searches, comprising:
-
MBR generation means for segmenting a multidimensional data sequence to be partitioned into subsequences, and representing each subsequence by each Minimum Bounding Rectangle (MBR), such that sets of MBRs are generated from the multidimensional data sequence, and the MBR sets are stored in a database;
first sequence pruning means for pruning irrelevant data sequences using a distance Dmbr between MBRs extracted from an inputted query sequence and the MBR sets stored in the database in a multidimensional Euclidean space;
second sequence pruning means for pruning irrelevant data sequences using a normalized distance Dnorm between MBRs extracted from the query sequence and the MBR sets of data sequences remaining after the data sequences are pruned by the first sequence pruning means in a multidimensional Euclidean space; and
subsequence finding means for finding subsequences similar to the given query sequence by obtaining sets of points in MBRs involved in a calculation of the distance Dnorm from each sequence obtained using the distance Dnorm;
wherein the MBR generation means includes;
threshold calculation means for inputting a multidimensional sequence Si and the minimum number of points per segment minPts, and calculating bounding threshold values for a volume and an edge using a unit hyper-cube occupied by a single point in n-dimensional unit space, if points are uniformly distributed in a hyper-rectangle which is a minimum bounding rectangle containing all points in the sequence Si, segment generation means for initializing a segment set and an outlier set to empty sets and generating a current segment using a first point of the sequence Si, geometric condition determination means for determining whether a next point of the sequence Si satisfies a geometric condition using the bounding threshold values for the volume and the edge, segment merging means for merging the next point of the sequence Si into the current segment if geometric condition is satisfied, and segment updating means for including the current segment in the segment set and re-generating a new current segment using the next point of the sequence Si, if the geometric condition is not satisfied and the number of points contained in the current segment exceeds the minimum number of points per segment minPts. - View Dependent Claims (5)
-
-
6. An apparatus for hyper-rectangle based multidimensional data similarity searches, comprising:
-
MBR generation means for segmenting a multidimensional data sequence to be partitioned into subsequences, and representing each subsequence by each Minimum Bounding Rectangle (MBR), such that sets of MBRs are generated from the multidimensional data sequence, and the MBR sets are stored in a database;
first sequence pruning means for pruning irrelevant data sequences using a distance Dmbr between MBRs extracted from an inputted query sequence and the MBR sets stored in the database in a multidimensional Euclidean space;
second sequence pruning means for pruning irrelevant data sequences using a normalized distance Dnorm between MBRs extracted from the query sequence and the MBR sets of data sequences remaining after the data sequences are pruned by the first sequence pruning means in a multidimensional Euclidean space; and
subsequence finding means for finding subsequences similar to the given query sequence by obtaining sets of points in MBRs involved in a calculation of the distance Dnorm from each sequence obtained using the distance Dnorm;
wherein the MBR generation means includes;
threshold calculation means for inputting a multidimensional sequence Si and the minimum number of points per segment minPts, and calculating bounding threshold values for a volume and a semantic factor using a unit hyper-cube occupied by a single point in n-dimensional unit space, if points are uniformly distributed in a hyper-rectangle which is a minimum bounding rectangle containing all points in the sequence Si, segment generation means for initializing a segment set and an outlier set to empty sets and generating a current segment using a first point of the sequence Si, geometric and semantic condition determination means for determining whether a next point of the sequence Si satisfies geometric and semantic conditions using the bounding threshold values for the volume and the semantic factor, segment merging means for merging the next point of the sequence Si into the current segment if the geometric and semantic conditions are satisfied, and segment updating means for including the current segment in the segment set and re-generating a new current segment using the next point of the sequence Si, if the geometric and semantic conditions are not satisfied and the number of points contained in the current segment exceeds the minimum number of points per segment minPts. - View Dependent Claims (7)
-
-
8. An apparatus for hyper-rectangle based multidimensional data similarity searches, comprising:
-
MBR generation means for segmenting a multidimensional data sequence to be partitioned into subsequences, and representing each subsequence by each Minimum Bounding Rectangle (MBR), such that sets of MBRs are generated from the multidimensional data sequence, and the MBR sets are stored in a database;
first sequence pruning means for pruning irrelevant data sequences using a distance Dmbr between MBRs extracted from an inputted query sequence and the MBR sets stored in the database in a multidimensional Euclidean space;
second sequence pruning means for pruning irrelevant data sequences using a normalized distance Dnorm between MBRs extracted from the query sequence and the MBR sets of data sequences remaining after the data sequences are pruned by the first sequence pruning means in a multidimensional Euclidean space; and
subsequence finding means for finding subsequences similar to the given query sequence by obtaining sets of points in MBRs involved in a calculation of the distance Dnorm from each sequence obtained using the distance Dnorm;
wherein the MBR generation means includes;
threshold calculation means for inputting a multidimensional sequence Si and the minimum number of points per segment minPts, and calculating bounding threshold values for a volume, an edge and a semantic factor using a unit hyper-cube occupied by a single point in n-dimensional unit space, if points are uniformly distributed in a hyper-rectangle which is a minimum bounding rectangle containing all points in the sequence Si, segment generation means for initializing a segment set and an outlier set to empty sets and generating a current segment using a first point of the sequence Si, geometric and semantic condition determination means for determining whether a next point of the sequence Si satisfies geometric and semantic conditions using the bounding threshold values for the volume, the edge and the semantic factor, segment merging means for merging the next point of the sequence Si into the current segment if the geometric and semantic conditions are satisfied, and segment updating means for including the current segment in the segment set and re-generating a new current segment using the next point of the sequence Si, if the geometric and semantic conditions are not satisfied and the number of points contained in the current segment exceeds the minimum number of points per segment minPts. - View Dependent Claims (9)
-
-
10. A method for a hyper-rectangle based multidimensional data similarity searches, the multidimensional data being representable by a multidimensional data sequence, comprising the steps of:
-
segmenting a multidimensional data sequence to be partitioned into subsequences, and representing each subsequence by each Minimum Bounding Rectangle (MBR), such that sets of MBRs are generated from the multidimensional data sequence, and the MBR sets are stored in a database;
pruning irrelevant data sequences using a distance Dmbr between MBRs extracted from an inputted query sequence and the MBR sets stored in the database in a multidimensional Euclidean space;
pruning irrelevant data sequences using a normalized distance Dnorm between MBRs extracted from the query sequence and the MBR sets of data sequences remaining after the data sequences are pruned in a multidimensional Euclidean space; and
finding subsequences similar to the given query sequence by obtaining sets of points contained in MBRs involved in a calculation of the distance Dnorm from each sequence obtained using the distance Dnorm. - View Dependent Claims (11, 12)
(a) the number of points contained in each MBR;
(b) distance Dmbr between the query MBR and a target MBR in the data sequence; and
(c) distance Dmbr between the query MBR and an MBR adjacent to the target MBR in the data sequence.
-
-
13. A method for a hyper-rectangle based multidimensional data similarity searches, comprising the steps of:
-
segmenting a multidimensional data sequence to be partitioned into subsequences, and representing each subsequence by each Minimum Bounding Rectangle (MBR), such that sets of MBRs are generated from the multidimensional data sequence, and the MBR sets are stored in a database;
pruning irrelevant data sequences using a distance Dmbr between MBRs extracted from an inputted query sequence and the MBR sets stored in the database in a multidimensional Euclidean space;
pruning irrelevant data sequences using a normalized distance Dnorm between MBRs extracted from the query sequence and the MBR sets of data sequences remaining after the data sequences are pruned in a multidimensional Euclidean space; and
finding subsequences similar to the given query sequence by obtaining sets of points contained in MBRs involved in a calculation of the distance Dnorm from each sequence obtained using the distance Dnorm;
wherein the MBR generation step includes the steps of;
inputting a multidimensional sequence Si and the minimum number of points per segment minPts, and calculating bounding threshold values for a volume and an edge using a unit hyper-cube occupied by a single point in n-dimensional unit space, if points are uniformly distributed in a hyper-rectangle which is a minimum bounding rectangle containing all points in the sequence Si, initializing a segment set and an outlier set to empty sets and generating a current segment using a first point of the sequence Si, determining whether a next point of the sequence Si satisfies a geometric condition using the bounding threshold values for the volume and the edge, merging the next point of the sequence Si into the current segment if the geometric condition is satisfied, and including the current segment in the segment set and updating the segment set by re-generating a new current segment using the next point of the sequence Si, if the geometric condition is not satisfied and the number of points contained in the current segment exceeds the minimum number of points per segment minPts. - View Dependent Claims (14)
-
-
15. A method for a hyper-rectangle based multidimensional data similarity searches, comprising the steps of:
-
segmenting a multidimensional data sequence to be partitioned into subsequences, and representing each subsequence by each Minimum Bounding Rectangle (MBR), such that sets of MBRs are generated from the multidimensional data sequence, and the MBR sets are stored in a database;
pruning irrelevant data sequences using a distance Dmbr between MBRs extracted from an inputted query sequence and the MBR sets stored in the database in a multidimensional Euclidean space;
pruning irrelevant data sequences using a normalized distance Dnorm between MBRs extracted from the query sequence and the MBR sets of data sequences remaining after the data sequences are pruned in a multidimensional Euclidean space; and
finding subsequences similar to the given query sequence by obtaining sets of points contained in MBRs involved in a calculation of the distance Dnorm from each sequence obtained using the distance Dnorm;
wherein the MBR generation step includes the steps of;
inputting a multidimensional sequence Si and the minimum number of points per segment minPts and calculating bounding threshold values for a volume and a semantic factor using a unit hyper-cube occupied by a single point in n-dimensional unit space, if points are uniformly distributed in a hyper-rectangle which is a minimum bounding rectangle containing all points in the sequence Si, initializing a segment set and an outlier set to empty sets and generating a current segment using a first point of the sequence Si, determining whether a next point of the sequence Si satisfies geometric and semantic conditions using the bounding threshold values for the volume and the semantic factor, merging the next point of the sequence Si into the current segment if the geometric and semantic conditions are satisfied, and including the current segment in the segment set and updating the segment set by re-generating a new current segment using the next point of the sequence Si, if the geometric and semantic conditions are not satisfied and the number of points contained in the current segment exceeds the minimum number of points per segment minPts. - View Dependent Claims (16)
-
-
17. A method for a hyper-rectangle based multidimensional data similarity searches, comprising the steps of:
-
segmenting a multidimensional data sequence to be partitioned into subsequences, and representing each subsequence by each Minimum Bounding Rectangle (MBR), such that sets of MBRs are generated from the multidimensional data sequence, and the MBR sets are stored in a database;
pruning irrelevant data sequences using a distance Dmbr between MBRs extracted from an inputted query sequence and the MBR sets stored in the database in a multidimensional Euclidean space;
pruning irrelevant data sequences using a normalized distance Dnorm between MBRs extracted from the query sequence and the MBR sets of data sequences remaining after the data sequences are pruned in a multidimensional Euclidean space; and
finding subsequences similar to the given query sequence by obtaining sets of points contained in MBRs involved in a calculation of the distance Dnorm from each sequence obtained using the distance Dnorm;
wherein the MBR generation step includes the steps of;
inputting a multidimensional sequence Si and the minimum number of points per segment minPts, and calculating bounding threshold values for a volume, an edge and a semantic factor using a unit hyper-cube occupied by a single point in n-dimensional unit space, if points are uniformly distributed in a hyper-rectangle which is a minimum bounding rectangle containing all points in the sequence Si, initializing a segment set and an outlier set to empty sets and generating a current segment using a first point of the sequence Si, determining whether a next point of the sequence Si satisfies geometric and semantic conditions using the bounding threshold values for the volume, the edge and the semantic factor, merging the next point of the sequence Si into the current segment if the geometric and semantic conditions are satisfied, and including the current segment in the segment set and updating the segment set by re-generating a new current segment using the next point of the sequence Si, if the geometric and semantic conditions are not satisfied and the number of points contained in the current segment exceeds the minimum number of points per segment minPts. - View Dependent Claims (18)
-
Specification