Efficient distributed query execution
First Claim
1. A method for query execution in multiple nodes of a distributed database system, said method comprising:
- receiving a database query in a first node of the distributed database system, the database query including a first constraint and at least one additional constraint;
identifying data in the first node that satisfies the first constraint with a first processor;
encoding the data with an encoder to generate encoded data;
sending the encoded data to a second node of the distributed database system with a first communications device, wherein each node of the nodes includes a data field, a predicate field, and an object field;
encoding a predicate in the second node to generate an encoded predicate; and
encoding an object in the second node to generate an encoded object;
identifying at least one encoded data of the encoded data that is in a mapping table in the second node with a second processor;
identifying at least one missing identifier with the second processor, the at least one missing identifier including at least one encoded data of the encoded data that is not in the mapping table in the second node;
sending the missing identifier to the first node with a second communications device;
decoding the missing identifier to retrieve the value of the missing identifier;
mapping the missing identifier to the retrieved value;
sending the mapping of the missing identifier and the retrieved value to the second node with the first communications device;
querying a dictionary in the second node with the retrieved value to identify an identification number for the retrieved value;
rewriting the database query to include;
the identification number for the retrieved value; and
at least one identification number for the at least one encoded data of theencoded data that is in the mapping table in the second node; and
mapping the missing identifier to the identification number for the retrieved value.
2 Assignments
0 Petitions
Accused Products
Abstract
An embodiment of the invention provides a method wherein a database query including a first constraint and additional constraint(s) are received in a first node. Data in the first node that satisfies the first constraint is identified, encoded, and sent to a second node. Encoded data is identified in a mapping table in the second node; and, one or more missing identifiers are identified that include encoded data that is not in the mapping table. The missing identifier is sent to the first node, decoded to retrieve the value of the missing identifier, and mapped to the retrieved value. The mapping of the missing identifier and the retrieved value are sent to the second node. A dictionary in the second node is queried with the retrieved value to identify an identification number for the retrieved value. The missing identifier is mapped to the identification number for the retrieved value.
23 Citations
10 Claims
-
1. A method for query execution in multiple nodes of a distributed database system, said method comprising:
-
receiving a database query in a first node of the distributed database system, the database query including a first constraint and at least one additional constraint; identifying data in the first node that satisfies the first constraint with a first processor; encoding the data with an encoder to generate encoded data; sending the encoded data to a second node of the distributed database system with a first communications device, wherein each node of the nodes includes a data field, a predicate field, and an object field; encoding a predicate in the second node to generate an encoded predicate; and encoding an object in the second node to generate an encoded object; identifying at least one encoded data of the encoded data that is in a mapping table in the second node with a second processor; identifying at least one missing identifier with the second processor, the at least one missing identifier including at least one encoded data of the encoded data that is not in the mapping table in the second node; sending the missing identifier to the first node with a second communications device; decoding the missing identifier to retrieve the value of the missing identifier; mapping the missing identifier to the retrieved value; sending the mapping of the missing identifier and the retrieved value to the second node with the first communications device; querying a dictionary in the second node with the retrieved value to identify an identification number for the retrieved value; rewriting the database query to include; the identification number for the retrieved value; and at least one identification number for the at least one encoded data of the encoded data that is in the mapping table in the second node; and mapping the missing identifier to the identification number for the retrieved value. - View Dependent Claims (2, 3, 4)
-
-
5. A method for improving the efficiency of distributed query execution in multiple nodes of a federated database system, said method comprising:
-
receiving a database query in a first node of the federated database system, the database query including a set of constraints; identifying a set of bindings with a first processor in the first node that satisfies a subset of the set of constraints, wherein each binding in the set of bindings includes a variable and a value; encoding the values in the set of bindings with an encoder to generate encoded bindings; sending bindings to a second node of the federated database system with a first communications device, said sending of the bindings to the second node including only sending the encoded bindings; identifying a second set of bindings in the second node that satisfies a second subset of the set of constraints; identifying at least one missing encoded binding of the encoded bindings that is not in the dictionary in the second node with a second processor; sending a request for the missing encoded binding with a second communications device, the request being sent to the first node; repeating previous process for every other node in the federated database system until all of the set of constraints are satisfied; retrieving a value corresponding to the missing encoded binding from a mapping table in the first node; retrieving encoded values from the mapping table in the first node, sending the remaining constraints to satisfy the second subset of the set of constraints to the second node, wherein the second subset of the set of constraints is the difference between the set of constraints and the subset of the set of constraints; sending the retrieved value to the second node with the first communications device; and updating the dictionary in the second node with the retrieved value and mapping the retrieved value to the missing encoded binding with the second processor. - View Dependent Claims (6, 7, 8, 9)
-
-
10. A computer program product for improving the efficiency of distributed query execution in multiple nodes of a federated database system, said computer program product comprising:
-
a non-transitory computer readable storage medium having stored thereon; first program instructions executable by a device to cause the device to receive a database query in a first node of the federated database system, the database query including a set of constraints; second program instructions executable by the device to cause the device to identify a set of bindings in the first node that satisfies a subset of the set of constraints, wherein each binding in the set of bindings includes a variable and a value; third program instructions executable by the device to cause the device to encode the values in the set of bindings to generate encoded bindings; fourth program instructions executable by the device to cause the device to send bindings to a second node of the federated database system, wherein said fourth program instructions only sends the encoded bindings; fifth program instructions executable by the device to cause the device to identify at least one missing encoded binding of the encoded bindings that is not in the dictionary in the second node; sixth program instructions executable by the device to cause the device to send a request for the missing encoded binding to the first node; seventh program instructions executable by the device to cause the device to retrieve a value corresponding to the missing encoded binding from a mapping table in the first node; eighth program instructions executable by the device to cause the device to send the retrieved value to the second node; ninth program instructions executable by the device to cause the device to update the dictionary in the second node with the retrieved value and mapping the retrieved value to the missing encoded binding; tenth program instructions executable by the device to cause the device to identify a second set of bindings in the second node that satisfies a second subset of the set of constraints; eleventh program instructions executable by the device to cause the device to repeat previous process for every other node in the federated database system until all of the set of constraints are satisfied; twelfth program instructions executable by the device to cause the device to retrieve encoded values from the mapping table in the first node; and thirteenth program instructions executable by the device to cause the device to send the remaining constraints to satisfy the second subset of the set of constraints to the second node, wherein the second subset of the set of constraints is the difference between the set of constraints and the subset of the set of constraints.
-
Specification