Efficient near neighbor search (ENN-search) method for high dimensional data sets with noise

US 20030187616A1
Filed: 03/29/2002
Published: 10/02/2003
Est. Priority Date: 03/29/2002
Status: Active Grant

First Claim

Patent Images

1. A method of processing data vectors by locating near neighbors, said method comprising the steps of:

inputting data vectors;

generating a set of exemplar vectors;

constructing a sorted set of exemplar vectors organized according to a search criterion;

comparing one of the inputted data vectors to at least one exemplar vector of the sorted set of exemplar vectors using a matching criterion to find a first match;

when a first match is found, determining a probability value based on the probability that a better match exists in the sorted set of exemplar vectors; and

comparing the data vector to an additional exemplar vector if the probability value determined is greater than a predetermined probability value.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A nearer neighbor matching and compression method and apparatus provide matching of data vectors to exemplar vectors. A data vector is compared to exemplar vectors contained within a subset of exemplar vectors, i.e., a set of possible exemplar vectors, to find a match. After a match is found, a probability function assigns a probability value based on the probability that a better matching exemplar vector exists. If the probability that a better match exists is greater than a predetermined probability value, the data vector is compared to an additional exemplar vector. If a match is not found, the data vector is added to the set of exemplar vectors. Data compression may be achieved in a hyperspectral image data vector set by replacing each observed data vector representing a respective spatial pixel by reference to a member of the exemplar set that “matches” the data vector. As such, each spatial pixel will be assigned to one of the exemplar vectors

Citations

26 Claims

1. A method of processing data vectors by locating near neighbors, said method comprising the steps of:
- inputting data vectors;
  
  generating a set of exemplar vectors;
  
  constructing a sorted set of exemplar vectors organized according to a search criterion;
  
  comparing one of the inputted data vectors to at least one exemplar vector of the sorted set of exemplar vectors using a matching criterion to find a first match;
  
  when a first match is found, determining a probability value based on the probability that a better match exists in the sorted set of exemplar vectors; and
  
  comparing the data vector to an additional exemplar vector if the probability value determined is greater than a predetermined probability value.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the step of constructing a sorted set of exemplar vectors comprises:
    - assigning a respective index to each exemplar vector;
      
      computing a set of reduced dimension exemplar vectors from the set of exemplar vectors by projecting each exemplar vector onto at least a first reference vector reference vector to thereby create a respective reference vector projection for each exemplar vector, each reference vector projection comprising a magnitude value; and
      
      organizing the indices in order based on the magnitude of the respective reference vector projection of the corresponding exemplar vector.
  - 3. The method of claim 2, wherein the step of comparing the data vector to at least one exemplar comprises:
    - generating a reduced dimension data vector by projecting the data vector onto the first reference vector; and
      
      locating a starting exemplar vector corresponding to the reduced dimension exemplar vector whose magnitude most closely matches the magnitude of the reduced dimension data vector.
  - 4. The method of claim 3, wherein the step of comparing the data vector to at least one exemplar vector further comprises:
    - comparing the reduced dimension data vector to the starting reduced dimension exemplar vector to determine whether the reduced dimension data vector is within a range such that it is possible that the exemplar vector matches the data vector; and
      
      said comparing the data vector to the exemplar vector, in full space, only if the reduced dimension data vector is within a range such that it is possible that the exemplar vector matches the data vector.
  - 5. The method of claim 2, wherein the step of comparing the data vector to an additional exemplar vector comprises:
    - comparing the data vector to exemplar vectors of the sorted set of exemplar vectors in an alternate, zigzag search pattern, beginning with an exemplar vector having a next higher magnitude reference vector projection or a next lower magnitude reference vector projection as compared with a reference vector projection magnitude corresponding to the starting exemplar vector if the probability that a better match exists is greater than a predetermined probability value in both a still higher magnitude reference vector projection and still lower magnitude reference vector projection;
      
      or comparing the data vector to exemplar vectors of the sorted set of exemplar vectors in a sequential search pattern of either ever higher magnitude reference vector projection or ever lower magnitude reference vector projection if the probability that a better match exists in only a next higher magnitude reference vector projection or a next lower magnitude reference vector projection, respectively.
  - 6. The method of claim 1, further comprises adding the data vector to the set of exemplar vectors if the data vector does not match any exemplar vector of the set exemplar vectors.
  - 7. The method of claim 1, wherein the step of determining a probability value comprises:
    - selecting a decision boundary, n_Φ, such that a desired probability of finding a best match is the area under a probability density function, f(Φ
      
      ), in the interval from μ
      
      _Φ−
      
      n_Φσ
      
      _Φ to μ
      
      _Φ+n_Φσ
      
      _Φ;
      
      determining parameters μ
      
      _Φ and σ
      
      _Φ by sampling within a possibility zone of exemplar vectors, the possibility zone defined by exemplar vectors whose corresponding reference vector projection magnitude is within a predetermined difference of a magnitude of the data vector reference projection;
      
      searching the sorted set of exemplars alternate, sequentially, in a zigzag search pattern, from a starting exemplar vector whose corresponding reference vector projection magnitude is most similar to the magnitude of a reference vector projection of the data vector, until a first match is found or defining boundaries of the possibility zone are reached;
      
      searching the probability zone for a better match by computing a value Φ
      
      (δ
      
      _j, {circumflex over (ε
      
      )}) as $\frac{δ_{j}}{\sqrt{2 \hat{ɛ}}},$ where δ
      
      _jis a projection difference associated with a current value of an exemplar vector about to be tested, and {circumflex over (ε
      
      )} is a similarity parameter for the best match found so far.
  - 8. The method of claim 7, further comprises comparing the data vector to an additional exemplar vector if the value Φ
    - (δ
      
      _j, {circumflex over (ε
      
      )}) is inside the decision boundary, n_Φ.

9. A method of finding exemplar vectors which match input data vectors, by locating near neighbors within a set of sorted exemplar vectors organized according to a sort criterion, said method comprising:
- selecting a decision boundary which provides a region for locating a better match based on a probability density function;
  
  defining a possibility zone made up of exemplar vectors having reference vector projection magnitudes differing from a magnitude of a reference vector projection of the data vector by a predetermined amount;
  
  selecting a starting exemplar vector from the set of sorted exemplar vector as an exemplar vector having a corresponding reference vector projection magnitude most similar to the magnitude of a reference vector projection corresponding to the selected data vector;
  
  searching the sorted set of exemplar vectors for a first match using a predetermined matching criterion;
  
  determining a probability value as to whether a better match exists in the possibility zone; and
  
  searching the sorted set of exemplar vectors for a better match if the probability value so determined is greater than a predetermined probability value.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The method of claim 9, wherein said selecting the decision boundary comprises selecting a decision boundary such that a desired probability of finding a best match is the area under the probability density function f(Φ
    - ) in the interval from μ
      
      _Φ−
      
      n_Φσ
      
      _Φ to μ
      
      _Φ+n_Φσ
      
      _Φ.
  - 11. The method of claim 9, further comprising determining probability density functions parameters μ
    - _Φ and σ
      
      _Φ by sampling within a possibility zone within the set of sorted exemplar vectors.
  - 12. The method of claim 9, wherein said searching the sorted set of exemplars for a first match comprises searching, in an alternate, zigzag search pattern, up and down the search structure of exemplar vectors, from the starting exemplar vector, until either a first match is found or defining boundaries of the possibility zone are reached.
  - 13. The method of claim 9, wherein the step of determining a probability value comprises:
    - computing a value Φ
      
      (δ
      
      _j, {circumflex over (ε
      
      )}) as $\frac{δ_{j}}{\sqrt{2 \hat{ɛ}}},$ where δ
      
      _jis a projection difference associated with a current value of an exemplar vector about to be tested, and {circumflex over (ε
      
      )} is a similarity parameter for the best match found so far.
  - 14. The method of claim 13, wherein the step of searching the sorted set of exemplar vectors for a better match comprises comparing the data vector to an additional exemplar vector if the value Φ
    - (δ
      
      _j, {circumflex over (ε
      
      )}) is inside the decision boundary, n_Φ.
  - 15. The method of claim 9, wherein the step of searching the sorted set of exemplar vectors for a better match comprises searching the probability zone in a zigzag search pattern until either an upper or a lower probability boundary limit is exceeded, and depending on which boundary limit is exceeded, then searching the probability zone in a sequential pattern moving up the search structure or down the search structure toward the boundary limit not exceeded.
  - 16. The method of claim 15, further comprising:
    - assigning the data vector to the best matching exemplar vector.

17. A method of processing data vectors associated with respective spatial pixels by locating near neighbors within a set of sorted exemplar vectors organized according to a sort criterion, said method comprising:
- selecting a decision boundary, n_Φ, such that a desired probability of finding a best match is related to an area defined by the corresponding probability density function f(Φ
  
  ) in the interval from μ
  
  _Φ−
  
  n_Φσ
  
  _Φ to μ
  
  _Φ+n_Φσ
  
  _Φ;
  
  determining parameters μ
  
  _Φ and σ
  
  _Φ by sampling within a possibility zone, among the set of sorted exemplar vectors, the possibility zone being defined by exemplars vectors whose corresponding reference vector projection magnitude is within a predetermined difference of a magnitude of a reference vector projection of the data vector;
  
  selecting a starting exemplar vector from the set of sorted exemplars as an exemplar vector whose corresponding reference vector projection magnitude is most similar to a magnitude of a reference vector projections of the data vector;
  
  searching the sorted set of exemplars alternately and sequentially up and down the search structure of exemplar vectors, in a zigzag search pattern, beginning from the starting exemplar vector, until a first match is found or boundaries of the possibility zone are reached;
  
  searching the probability zone for a better match by computing a value Φ
  
  (δ
  
  _j, {circumflex over (ε
  
  )}) as $\frac{δ_{j}}{\sqrt{2 \hat{ɛ}}},$ where δ
  
  _jis a projection difference associated with a current value of an exemplar vector about to be tested, and {circumflex over (ε
  
  )} is a similarity parameter for the best match found so far; and
  
  comparing the data vector to an additional exemplar vector if the value Φ
  
  (δ
  
  _j, {circumflex over (ε
  
  )}) is inside the decision boundary, n_Φ.
- View Dependent Claims (18, 19)
- - 18. The method of claim 17, wherein said searching the probability zone further comprises searching the probability zone in said zigzag search pattern until an upper or lower probability boundary limit is exceeded, and then searching the probability zone in a sequential pattern moving up the search structure or down the search structure depending on the boundary that has been exceeded.
  - 19. The method of claim 18, further comprising:
    - assigning the respective spatial pixel associated with the data vector to the best matching exemplar vector.

20. An apparatus for processing data vectors by locating near neighbors, said apparatus comprising:
- a sensor for receiving data vectors;
  
  a memory device for storing a set of exemplar vectors; and
  
  a processor for;
  
  i. sorting the set of exemplars using a sort criterion, ii. comparing a respective data vector to at least one exemplar vector of the sorted set of exemplar vectors using a matching criterion to find a first match, iii. determining a probability value based on the probability that a better match exists in the sorted set of exemplar vectors when a first match is found, and iv. comparing the selected data vector to an additional exemplar vector if the probability value determined is greater than a predetermined probability value.
- View Dependent Claims (21, 22, 23, 24, 25, 26)
- - 21. The apparatus of claim 20, wherein said processor constructs a sorted set of exemplar vectors by computing a set of reduced dimension exemplar vectors from the set of exemplar vectors by projecting each exemplar vector onto a reference vector, each reduced dimension exemplar vector comprising a magnitude value.
  - 22. The apparatus of claim 21, wherein said processor compares the data vector to at least one exemplar by generating a reduced dimension data vector by projecting the data vector onto the reference vector, calculating a magnitude for the reduced dimension data vector, and locating a starting exemplar vector corresponding to the exemplar vector whose magnitude of projection onto the reference vector most closely matches the magnitude of the projection of the data vector onto the reference vector.
  - 23. The apparatus of claim 22, wherein said processor compares the data vector to at least one exemplar vector further comprises comparing the reduced dimension data vector to the starting reduced dimension exemplar vector to determine whether the reduced dimension data vector is within a predetermined range measured from the starting reduced dimension exemplar vector;
    - and compares the data vector to the exemplar vector only if the reduced dimension data vector is within the predetermined range measured from the starting reduced dimension exemplar vector.
  - 24. The apparatus of claim 22, wherein said processor compares the data vector to an additional exemplar vector, after the first match is found, if a reference vector projection of the data vector and a reference vector projection of the exemplar vector differ in magnitude by an amount less than a predetermined value.
  - 25. The apparatus of claim 24, wherein said processor compares the data vector to an additional exemplar vector by:
    - i. comparing the data vector to exemplar vectors of the sorted set of exemplar vectors in an alternate, zigzag search pattern, beginning with an exemplar vector having next higher a reference vector projection magnitude and next lower reference vector projection magnitude as compared with a reference vector projection magnitude of the starting exemplar vector if the probability that a better match exists is greater than a predetermined probability value in both a still higher magnitude reference vector projection exemplar vector and still lower magnitude reference vector projection exemplar vector;
      
      or ii. comparing the data vector to exemplar vectors of the sorted set of exemplar vectors in a sequential search pattern of either ever higher magnitude reference vector projections of the exemplar vectors or ever lower magnitude reference vector projections of the exemplar vectors if the probability that a better match exists in only a next higher magnitude reference vector projection exemplar vector or a next lower magnitude reference vector projection exemplar vector, respectively.
  - 26. The apparatus of claim 25, wherein said processor adds the data vector to the set of exemplar vectors stored in said memory device if the data vector does not match any exemplar vector of the set exemplar vectors.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The United States of America As Represented By The Secretary of Agriculture
Original Assignee
The United States of America As Represented By The Secretary of Agriculture
Inventors
Palmadesso, Peter J., Bowles, Jeffrey H., Gillis, David B.

Granted Patent

US 6,947,869 B2
Time in Patent Office

Days
Field of Search
US Class Current

702/181
CPC Class Codes

G06F 17/16 Matrix or vector computatio...

Efficient near neighbor search (ENN-search) method for high dimensional data sets with noise

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Efficient near neighbor search (ENN-search) method for high dimensional data sets with noise

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links