Processing spatial joins using a mapreduce framework
First Claim
Patent Images
1. A method comprising:
- partitioning a spatial data domain into multiple portions of partitioned spatial data via a MapReduce framework based on a distribution of spatial data objects across multiple nodes of a cluster of machines;
defining at least one operation to be performed on each of the multiple portions of the partitioned spatial data domain based on one or more spatial predicates of a query, wherein;
said at least one operation is selected from a group consisting of (i) a project operation that determines a partition in which the start point of a given spatial data object resides, (ii) a split operation that determines all partitions that share at least one point of a given spatial data object, and (iii) a replication operation that determines all partitions that satisfy a given condition; and
said one or more spatial predicates are selected from a group consisting of (i) an overlap parameter that indicates that two or more portions of the spatial data each possess at least one identical value, (ii) a range parameter that indicates that any point in a first portion of the spatial data is within a given distance of any point in a second portion of the spatial data, and (iii) a nearest neighbor parameter that indicates that a first portion of the spatial data is nearer to a second portion of the spatial data than any other portion of the spatial data; and
executing the at least one defined operation on each of the multiple portions of the partitioned spatial data domain to determine a response to the query, wherein each of the multiple portions of the partitioned spatial data is processed exclusively by a distinct map task within the MapReduce framework;
wherein said partitioning, said defining, and said executing are carried out by a computer device.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques, systems, and articles of manufacture for processing spatial joins using a MapReduce framework. A method includes partitioning a spatial data domain based on a distribution of spatial data objects across multiple nodes of a cluster of machines, defining at least one operation to be performed on the partitioned spatial data domain based on one or more predicates of a query, and executing the at least one defined operation on the partitioned spatial data domain to determine a response to the query.
-
Citations
12 Claims
-
1. A method comprising:
-
partitioning a spatial data domain into multiple portions of partitioned spatial data via a MapReduce framework based on a distribution of spatial data objects across multiple nodes of a cluster of machines; defining at least one operation to be performed on each of the multiple portions of the partitioned spatial data domain based on one or more spatial predicates of a query, wherein; said at least one operation is selected from a group consisting of (i) a project operation that determines a partition in which the start point of a given spatial data object resides, (ii) a split operation that determines all partitions that share at least one point of a given spatial data object, and (iii) a replication operation that determines all partitions that satisfy a given condition; and said one or more spatial predicates are selected from a group consisting of (i) an overlap parameter that indicates that two or more portions of the spatial data each possess at least one identical value, (ii) a range parameter that indicates that any point in a first portion of the spatial data is within a given distance of any point in a second portion of the spatial data, and (iii) a nearest neighbor parameter that indicates that a first portion of the spatial data is nearer to a second portion of the spatial data than any other portion of the spatial data; and executing the at least one defined operation on each of the multiple portions of the partitioned spatial data domain to determine a response to the query, wherein each of the multiple portions of the partitioned spatial data is processed exclusively by a distinct map task within the MapReduce framework; wherein said partitioning, said defining, and said executing are carried out by a computer device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An article of manufacture comprising a computer readable storage medium having computer readable instructions tangibly embodied thereon which, when implemented, cause a computer to carry out a plurality of method steps comprising:
-
partitioning a spatial data domain into multiple portions of partitioned spatial data via a MapReduce framework based on a distribution of spatial data objects across multiple nodes of a cluster of machines; defining at least one operation to be performed on each of the multiple portions of the partitioned spatial data domain based on one or more spatial predicates of a query, wherein; said at least one operation is selected from a group consisting of (i) a project operation that determines a partition in which the start point of a given spatial data object resides, (ii) a split operation that determines all partitions that share at least one point of a given spatial data object, and (iii) a replication operation that determines all partitions that satisfy a given condition; and said one or more spatial predicates are selected from a group consisting of (i) an overlap parameter that indicates that two or more portions of the spatial data each possess at least one identical value, (ii) a range parameter that indicates that any point in a first portion of the spatial data is within a given distance of any point in a second portion of the spatial data, and (iii) a nearest neighbor parameter that indicates that a first portion of the spatial data is nearer to a second portion of the spatial data than any other portion of the spatial data; and executing the at least one defined operation on each of the multiple portions of the partitioned spatial data domain to determine a response to the query, wherein each of the multiple portions of the partitioned spatial data is processed exclusively by a distinct map task within the MapReduce framework. - View Dependent Claims (10)
-
-
11. A system comprising:
-
a memory; and at least one processor coupled to the memory and operative for; partitioning a spatial data domain into multiple portions of partitioned spatial data via a MapReduce framework based on a distribution of spatial data objects across multiple nodes of a cluster of machines; defining at least one operation to be performed on each of the multiple portions of the partitioned spatial data domain based on one or more spatial predicates of a query, wherein; said at least one operation is selected from a group consisting of (i) a project operation that determines a partition in which the start point of a given spatial data object resides, (ii) a split operation that determines all partitions that share at least one point of a given spatial data object, and (iii) a replication operation that determines all partitions that satisfy a given condition; and said one or more spatial predicates are selected from a group consisting of (i) an overlap parameter that indicates that two or more portions of the spatial data each possess at least one identical value, (ii) a range parameter that indicates that any point in a first portion of the spatial data is within a given distance of any point in a second portion of the spatial data, and (iii) a nearest neighbor parameter that indicates that a first portion of the spatial data is nearer to a second portion of the spatial data than any other portion of the spatial data; and executing the at least one defined operation on each of the multiple portions of the partitioned spatial data domain to determine a response to the query, wherein each of the multiple portions of the partitioned spatial data is processed exclusively by a distinct map task within the MapReduce framework.
-
-
12. A method comprising:
-
partitioning a spatial data domain into multiple portions of partitioned spatial data via a MapReduce framework based on a distribution of spatial data objects across multiple nodes of a cluster of machines; implementing a bounding around each spatial data object in the partitioned spatial data domain; processing one or more spatial predicates of a query against each bounding in the partitioned spatial data domain to one or more of the one or more spatial predicates that are applicable to the spatial data objects; defining at least one operation to be performed on each of the multiple portions of the partitioned spatial data domain based on one or more spatial predicates of a query, wherein; said at least one operation is selected from a group consisting of (i) a project operation that determines a partition in which the start point of a given spatial data object resides, (ii) a split operation that determines all partitions that share at least one point of a given spatial data object, and (iii) a replication operation that determines all partitions that satisfy a given condition; and said one or more spatial predicates are selected from a group consisting of (i) an overlap parameter that indicates that two or more portions of the spatial data each possess at least one identical value, (ii) a range parameter that indicates that any point in a first portion of the spatial data is within a given distance of any point in a second portion of the spatial data, and (iii) a nearest neighbor parameter that indicates that a first portion of the spatial data is nearer to a second portion of the spatial data than any other portion of the spatial data executing the at least one defined operation on each of the multiple portions of the partitioned spatial data domain to determine a response to the query, wherein each of the multiple portions of the partitioned spatial data is processed exclusively by a distinct map task within the MapReduce framework; wherein said partitioning, said implementing, said processing, said defining, and said executing are carried out by a computer device.
-
Specification