ADAPTIVE RESOLUTION HISTOGRAM
First Claim
1. A method comprising:
- receiving a plurality of keys of a first data type having N elements of a second data type;
creating a trie and adding a root node, wherein nodes added to the trie are initialized to a value of zero;
defining a traverse procedure to process an input key at an input node of the trie, wherein the traverse procedure increments a value of one or more nodes, traversed according to the input key, until reaching a leaf node comprising at least one of a bottom node at a maximum depth corresponding to N and a halt node with a value not exceeding a predetermined threshold value, and wherein a new child node is added to the trie only when the value of a particular node is incremented beyond the predetermined threshold value; and
executing the traverse procedure to process each of the plurality of keys at the root node of the trie;
wherein the method is performed by one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, apparatus, and system for determining a data distribution is provided by using an adaptive resolution histogram. In an embodiment, the adaptive resolution histogram is created using a trie, wherein node values in the trie represent frequency distributions and node positions define associated keys or key prefixes. Keys are derived from input data such as database records that are streamed from a record source. These keys may be processed as received to build the trie in parallel with the production of the input data. To provide adaptive resolution, new child nodes may only be created in the trie when a node value is incremented beyond a predetermined threshold. In this manner, the histogram adjusts the allocation of nodes according to the actual distribution of the data. The completed adaptive resolution histogram may be used for various tasks such as partitioning for balanced parallel processing of the input data.
-
Citations
20 Claims
-
1. A method comprising:
-
receiving a plurality of keys of a first data type having N elements of a second data type; creating a trie and adding a root node, wherein nodes added to the trie are initialized to a value of zero; defining a traverse procedure to process an input key at an input node of the trie, wherein the traverse procedure increments a value of one or more nodes, traversed according to the input key, until reaching a leaf node comprising at least one of a bottom node at a maximum depth corresponding to N and a halt node with a value not exceeding a predetermined threshold value, and wherein a new child node is added to the trie only when the value of a particular node is incremented beyond the predetermined threshold value; and executing the traverse procedure to process each of the plurality of keys at the root node of the trie; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer-readable medium storing one or more sequences of instructions which, when executed by one or more processors, cause performing of:
-
receiving a plurality of keys of a first data type having N elements of a second data type; creating a trie and adding a root node, wherein nodes added to the trie are initialized to a value of zero; defining a traverse procedure to process an input key at an input node of the trie, wherein the traverse procedure increments a value of one or more nodes, traversed according to the input key, until reaching a leaf node comprising at least one of a bottom node at a maximum depth corresponding to N and a halt node with a value not exceeding a predetermined threshold value, and wherein a new child node is added to the trie only when the value of a particular node is incremented beyond the predetermined threshold value; and executing the traverse procedure to process each of the plurality of keys at the root node of the trie. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system comprising one or more computing devices configured to:
-
receive a plurality of keys of a first data type having N elements of a second data type; create a trie and add a root node, wherein nodes added to the trie are initialized to a value of zero; define a traverse procedure to process an input key at an input node of the trie, wherein the traverse procedure increments a value of one or more nodes, traversed according to the input key, until reaching a leaf node comprising at least one of a bottom node at a maximum depth corresponding to N and a halt node with a value not exceeding a predetermined threshold value, and wherein a new child node is added to the trie only when the value of a particular node is incremented beyond the predetermined threshold value; and execute the traverse procedure to process each of the plurality of keys at the root node of the trie. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification