Fast vector quantization with topology learning
First Claim
Patent Images
1. A method for analyzing data comprising:
- receiving data at a data processing system;
partitioning the data and generating a tree based on the partitions;
learning a topology of a distribution of the data using the generated tree to build a graph of the topology by linking a subtree based on a baseline graph and creating long-range links between nodes of the subtree wherein the long-range links are created by;
for each pair of nodes (u1, u2) connected by a link in the baseline graph and for each leaf s1 in the subtree rooted in u1, finding a closest leaf node s2 in the subtree rooted in u2;
creating a link between s1 and s2, if 1/dist(s1, s2) is greater than a smallest weight amongst links containing either s1 or s2; and
keeping the link with the smallest weight, if s2 was already linked to a node in the subtree rooted at u1; and
finding a best matching unit in the data using the learned topology.
1 Assignment
0 Petitions
Accused Products
Abstract
A new process called a vector approximation graph (VA-graph) leverages a tree based vector quantizer to quickly learn the topological structure of the data. It then uses the learned topology to enhance the performance of the vector quantizer. A method for analyzing data comprises receiving data, partitioning the data and generating a tree based on the partitions, learning a topology of a distribution of the data, and finding a best matching unit in the data using the learned topology.
13 Citations
36 Claims
-
1. A method for analyzing data comprising:
-
receiving data at a data processing system; partitioning the data and generating a tree based on the partitions; learning a topology of a distribution of the data using the generated tree to build a graph of the topology by linking a subtree based on a baseline graph and creating long-range links between nodes of the subtree wherein the long-range links are created by; for each pair of nodes (u1, u2) connected by a link in the baseline graph and for each leaf s1 in the subtree rooted in u1, finding a closest leaf node s2 in the subtree rooted in u2; creating a link between s1 and s2, if 1/dist(s1, s2) is greater than a smallest weight amongst links containing either s1 or s2; and keeping the link with the smallest weight, if s2 was already linked to a node in the subtree rooted at u1; and finding a best matching unit in the data using the learned topology. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for analyzing data comprising:
-
a processor operable to execute computer program instructions; a memory operable to store computer program instructions executable by the processor; and computer program instructions stored in the memory and executable to perform the steps of; receiving data; partitioning the data and generating a tree based on the partitions; learning a topology of a distribution of the data using the generated tree to build a graph of the topology by linking a subtree based on a baseline graph and creating long-range links between nodes of the subtree wherein the long-range links are created by; for each pair of nodes (u1, u2) connected by a link in the baseline graph and for each leaf s1 in the subtree rooted in u1, finding a closest leaf node s2 in the subtree rooted in u2; creating a link between s1 and s2, if 1/dist(s1, s2) is greater than a smallest weight amongst links containing either s1 or s2; and keeping the link with the smallest weight, if s2 was already linked to a node in the subtree rooted at u1; and finding a best matching unit in the data using the learned topology. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
applying Optimally Topology Preserving Maps to nodes quantized by the tree structure quantizer to construct a baseline graph.
-
-
20. The system of claim 19, wherein the level of quantization in a tree structure quantizer is identified by:
selecting all nodes in the tree for which Cj<
n and Cd(j)≧
n, wherein Cj is a number of inputs assigned to node j, d(j) is one index of two children of node j, and n is a user defined parameter.
-
21. The system of claim 13, wherein the subtree is linked based on the baseline graph by:
-
generating at least one random vector for each node in the baseline graph; combining the generated random vectors for each node with centroid values of leaf nodes in the subtree to form a combined set; finding and linking leaf nodes in the subtree for each row in the combined set; and assigning a weight to each link.
-
-
22. The system of claim 21, wherein components of each random vector are between a minimum and a maximum of values of components of the leaf nodes in the subtree rooted at a respective baseline graph node.
-
23. The system of claim 21, wherein the weight assigned to each link is 1/dist(s1, s2), wherein dist(s1, s2) is a distance function.
-
24. The system of claim 23, wherein the distance function is a Euclidean metric.
-
25. A computer program product for analyzing data comprising:
-
a non-transitory computer readable storage medium; computer program instructions, recorded on the computer readable medium, executable by a processor, for performing the steps of receiving data; partitioning the data and generating a tree based on the partitions; learning a topology of a distribution of the data using the generated tree to build a graph of the topology by linking a subtree based on a baseline graph and creating long-range links between nodes of the subtree wherein the long-range links are created by; for each pair of nodes (u1, u2) connected by a link in the baseline graph and for each leaf s1 in the subtree rooted in u1, finding a closest leaf node s2 in the subtree rooted in u2; creating a link between s1 and s2, if 1/dist(s1, s2) is greater than a smallest weight amongst links containing either s1 or s2; and keeping the link with the smallest weight, if s2 was already linked to a node in the subtree rooted at u1; and finding a best matching unit in the data using the learned topology. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
Specification