Efficient near neighbor search (ENN-search) method for high dimensional data sets with noise
First Claim
1. A method of processing data vectors by locating near neighbors, said method comprising the steps of:
- inputting data vectors;
generating a set of exemplar vectors;
constructing a sorted set of exemplar vectors organized according to a search criterion;
comparing one of the inputted data vectors to at least one exemplar vector of the sorted set of exemplar vectors using a matching criterion to find a first match;
when a first match is found, determining a probability value based on the probability that a better match exists in the sorted set of exemplar vectors; and
comparing the data vector to an additional exemplar vector if the probability value determined is greater than a predetermined probability value.
1 Assignment
0 Petitions
Accused Products
Abstract
A nearer neighbor matching and compression method and apparatus provide matching of data vectors to exemplar vectors. A data vector is compared to exemplar vectors contained within a subset of exemplar vectors, i.e., a set of possible exemplar vectors, to find a match. After a match is found, a probability function assigns a probability value based on the probability that a better matching exemplar vector exists. If the probability that a better match exists is greater than a predetermined probability value, the data vector is compared to an additional exemplar vector. If a match is not found, the data vector is added to the set of exemplar vectors. Data compression may be achieved in a hyperspectral image data vector set by replacing each observed data vector representing a respective spatial pixel by reference to a member of the exemplar set that “matches” the data vector. As such, each spatial pixel will be assigned to one of the exemplar vectors
-
Citations
26 Claims
-
1. A method of processing data vectors by locating near neighbors, said method comprising the steps of:
-
inputting data vectors;
generating a set of exemplar vectors;
constructing a sorted set of exemplar vectors organized according to a search criterion;
comparing one of the inputted data vectors to at least one exemplar vector of the sorted set of exemplar vectors using a matching criterion to find a first match;
when a first match is found, determining a probability value based on the probability that a better match exists in the sorted set of exemplar vectors; and
comparing the data vector to an additional exemplar vector if the probability value determined is greater than a predetermined probability value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method of finding exemplar vectors which match input data vectors, by locating near neighbors within a set of sorted exemplar vectors organized according to a sort criterion, said method comprising:
-
selecting a decision boundary which provides a region for locating a better match based on a probability density function;
defining a possibility zone made up of exemplar vectors having reference vector projection magnitudes differing from a magnitude of a reference vector projection of the data vector by a predetermined amount;
selecting a starting exemplar vector from the set of sorted exemplar vector as an exemplar vector having a corresponding reference vector projection magnitude most similar to the magnitude of a reference vector projection corresponding to the selected data vector;
searching the sorted set of exemplar vectors for a first match using a predetermined matching criterion;
determining a probability value as to whether a better match exists in the possibility zone; and
searching the sorted set of exemplar vectors for a better match if the probability value so determined is greater than a predetermined probability value. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A method of processing data vectors associated with respective spatial pixels by locating near neighbors within a set of sorted exemplar vectors organized according to a sort criterion, said method comprising:
-
selecting a decision boundary, nΦ
, such that a desired probability of finding a best match is related to an area defined by the corresponding probability density function f(Φ
) in the interval from μ
Φ
−
nΦ
σ
Φ
to μ
Φ
+nΦ
σ
Φ
;
determining parameters μ
Φ
and σ
Φ
by sampling within a possibility zone, among the set of sorted exemplar vectors, the possibility zone being defined by exemplars vectors whose corresponding reference vector projection magnitude is within a predetermined difference of a magnitude of a reference vector projection of the data vector;
selecting a starting exemplar vector from the set of sorted exemplars as an exemplar vector whose corresponding reference vector projection magnitude is most similar to a magnitude of a reference vector projections of the data vector;
searching the sorted set of exemplars alternately and sequentially up and down the search structure of exemplar vectors, in a zigzag search pattern, beginning from the starting exemplar vector, until a first match is found or boundaries of the possibility zone are reached;
searching the probability zone for a better match by computing a value Φ
(δ
j, {circumflex over (ε
)}) aswhere δ
j is a projection difference associated with a current value of an exemplar vector about to be tested, and {circumflex over (ε
)} is a similarity parameter for the best match found so far; and
comparing the data vector to an additional exemplar vector if the value Φ
(δ
j, {circumflex over (ε
)}) is inside the decision boundary, nΦ
. - View Dependent Claims (18, 19)
-
-
20. An apparatus for processing data vectors by locating near neighbors, said apparatus comprising:
-
a sensor for receiving data vectors;
a memory device for storing a set of exemplar vectors; and
a processor for;
i. sorting the set of exemplars using a sort criterion, ii. comparing a respective data vector to at least one exemplar vector of the sorted set of exemplar vectors using a matching criterion to find a first match, iii. determining a probability value based on the probability that a better match exists in the sorted set of exemplar vectors when a first match is found, and iv. comparing the selected data vector to an additional exemplar vector if the probability value determined is greater than a predetermined probability value. - View Dependent Claims (21, 22, 23, 24, 25, 26)
-
Specification