Multi-view cognitive swarm for object recognition and 3D tracking

US 7,558,762 B2
Filed: 03/20/2006
Issued: 07/07/2009
Est. Priority Date: 08/14/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A multi-view object recognition system incorporating swarming domain classifiers, comprising:

a processor having a plurality of software agents configured to operate as a cooperative swarm to classify an object in a domain as seen from multiple view points, where each agent is a complete classifier and is assigned an initial velocity vector to explore a solution space for object solutions, where each agent is configured to perform at least one iteration, the iteration being a search in the solution space for a potential solution optima where each agent keeps track of its coordinates in multi-dimensional space that are associated with an observed best solution (pbest) that the agent has identified, and a global best solution (gbest) where the gbest is used to store the best location among all agents, with each velocity vector thereafter changing towards pbest and gbest, allowing the cooperative swarm to concentrate on the vicinity of the object and classify the object when a classification level exceeds a preset threshold;

wherein the agents are configured to search for the object in three-dimensional (3D) spatial coordinates, such that the object is a 3D object and the 3D object has distinct appearances from each view point in the multiple view points; and

wherein the distinct appearances of the 3D object from the multiple view points are linked by agents searching for the 3D object in the spatial coordinates, such that each agent has an associated 3D location X and an object height h, and wherein each of the multiple view points is provided as a 2D image from a calibrated camera having a given geometry, such that given the known geometry of the calibrated cameras, a 2D location, [x,y]^T=π

(X), of an agent'"'"'s projection in each view (2D image) is calculated and used to select an image window that is sent to a classifier having a classifier output that corresponds to the classifier'"'"'s confidence that the image window contains the object, where superscript T denotes transpose.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An object recognition system is described that incorporates swarming classifiers. The swarming classifiers comprise a plurality of software agents configured to operate as a cooperative swarm to classify an object in a domain as seen from multiple view points. Each agent is a complete classifier and is assigned an initial velocity vector to explore a solution space for object solutions. Each agent is configured to perform an iteration, the iteration being a search in the solution space for a potential solution optima where each agent keeps track of its coordinates in multi-dimensional space that are associated with an observed best solution (pbest) that the agent has identified, and a global best solution (gbest) where the gbest is used to store the best location among all agents. Each velocity vector changes towards pbest and gbest, allowing the cooperative swarm to concentrate on the vicinity of the object and classify the object.

Citations

30 Claims

1. A multi-view object recognition system incorporating swarming domain classifiers, comprising:
- a processor having a plurality of software agents configured to operate as a cooperative swarm to classify an object in a domain as seen from multiple view points, where each agent is a complete classifier and is assigned an initial velocity vector to explore a solution space for object solutions, where each agent is configured to perform at least one iteration, the iteration being a search in the solution space for a potential solution optima where each agent keeps track of its coordinates in multi-dimensional space that are associated with an observed best solution (pbest) that the agent has identified, and a global best solution (gbest) where the gbest is used to store the best location among all agents, with each velocity vector thereafter changing towards pbest and gbest, allowing the cooperative swarm to concentrate on the vicinity of the object and classify the object when a classification level exceeds a preset threshold;
  
  wherein the agents are configured to search for the object in three-dimensional (3D) spatial coordinates, such that the object is a 3D object and the 3D object has distinct appearances from each view point in the multiple view points; and
  
  wherein the distinct appearances of the 3D object from the multiple view points are linked by agents searching for the 3D object in the spatial coordinates, such that each agent has an associated 3D location X and an object height h, and wherein each of the multiple view points is provided as a 2D image from a calibrated camera having a given geometry, such that given the known geometry of the calibrated cameras, a 2D location, [x,y]^T=π
  
  (X), of an agent'"'"'s projection in each view (2D image) is calculated and used to select an image window that is sent to a classifier having a classifier output that corresponds to the classifier'"'"'s confidence that the image window contains the object, where superscript T denotes transpose.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. A multi-view object recognition system as set forth in claim 1, wherein each agent has a search trajectory that is guided by a cost function, such that the cost function is formed by combining the classifier outputs evaluated at the agent'"'"'s projection points in each of the views, wherein the projection points are points in an image corresponding to the 3D object.
  - 3. A multi-view object recognition system as set forth in claim 2, wherein for an agent at 3D location X=[x,y,z]^T, a value of the cost function is calculated according to the following:
    - ƒ
      
      (X,h)=w₁*classifier(image₁,π
      
      ₁(X),Π
      
      ₁(X,h))+w₂*classifier(image₂,π
      
      ₂(X),Π
      
      ₂(X,h)),where, π
      
      ;
      
      ³→
      
      ²is a projection operator such that π
      
      _nmaps the object'"'"'s 3D location X into a 2D location [x,y] in image n, and w₁+w₂=1, where w₁and w₂are positive weighting factors and normally w₁=w₂=0.5, and where Π
      
      is a projection operator for object height, h, such that the projection operator Π
      
      _nmaps the 3D object height to its corresponding projection size in image n, and where classifer denotes a confidence output of the object classifier operating on the window in image n with location π
      
      _n(X) and window size Π
      
      _n(X,h), and where * denotes multiplication.
  - 4. A multi-view object recognition system as set forth in claim 1, wherein each of the multiple view points is provided as a 2D image from calibrated stereo cameras, and wherein the multiple view points include at least two 2D views, view 1 and view 2, and wherein the agents are configured to move within each view independently to localize the object in each view independently, and wherein each view has 2D spatial coordinates.
  - 5. A multi-view object recognition system as set forth in claim 4, wherein the object is a 3D object having 3D spatial coordinates, and wherein the 3D object has a 2D projection in each view in the multiple view points such that the object has a distinct appearance in each view, and wherein the agents are configured to search for the object in the 2D spatial coordinates.
  - 6. A multi-view object recognition system as set forth in claim 5, wherein each of the 2D views is connected through geometric constraints of the calibrated stereo cameras, and wherein the agents are further configured to operate as two distinct sets of agents such that each set searches for the object in a view independently to locate a 2D location [x,y] and a 2D image window height ĥ
    - of the object in each view, and wherein using triangulation, the 2D locations from each view are combined to estimate the object'"'"'s 3D spatial coordinates from the 2D projections.
  - 7. A multi-view object recognition system as set forth in claim 6, wherein the system is further configured to recognize multiple objects in the domain, such that when there is more than one object in the domain, the system is further configured to establish a correspondence between the 2D locations found in each 2D view to identify inter-view pairs.
  - 8. A multi-view object recognition system as set forth in claim 7, wherein when establishing a correspondence between the 2D locations found in each 2D view, the system is further configured to form a cost/distance matrix for all possible inter-view pairs of identified object locations, the cost/distance matrix is a pair-wise cost (Cost_ij) function, calculated as follows:
    - $Cost (i, j) = λ_{1} \langle x_{2}^{T} {Fx}_{1} \rangle + λ_{2} (\frac{\langle {\hat{h}}_{1} - h_{1} \rangle}{h_{1}} + \frac{\langle {\hat{h}}_{2} - h_{2} \rangle}{h_{2}}) + λ_{3} (Σ_{w} { I_{1} (x, y) - I_{2} (x, y) }^{2}),$ where Cost denotes a cost function, minimization of which ensures a consistent localization of an object in the 3D spatial coordinates;
      
      i and j denote point i in view 1 and point j in view 2 that correspond to detected objects in the two views;
      
      λ
      
      ₁denotes weighting factor for an epipolar constraint portion of the cost function;
      
      λ
      
      ₁denotes weighting factor for an epipolar constraint portion of the cost function;
      
      x₂denotes a coordinate vector for an object in view 2;
      
      F denotes a fundamental matrix that determines epipolar lines in one view corresponding to points in the other view;
      
      x₁denotes a coordinate vector for an object in view 1;
      
      superscript T denotes a transpose of the vector x₂;
      
      wherein x₁is a column vector and F is a matrix, so Fx₁is also a column vector;
      
      λ
      
      ₂denotes a weighting factor for a window size consistency portion of the cost function;
      
      h₁denotes a size of the object in view 1 determined from the 2D projection of the object in 3D spatial coordinates to 2D view 1;
      
      ĥ
      
      denotes the size of the object in view 1 or 2 as determined from the object classifier outputs;
      
      h₂denotes the size of an object in view 2 determined from the 2D projection of the object in 3D spatial coordinates to the 2D view 2;
      
      λ
      
      ₃denotes a weighting factor for a window appearance similarity portion of the cost function;
      
      w denotes window index;
      
      ∥
      
      denotes a magnitude operator;
      
      I₁denotes an intensity distribution of the window in view 1;
      
      x denotes an x coordinate in either view;
      
      y denotes a y coordinate in either view;
      
      I₂denotes an intensity distribution of the window in view 2; and
      
      Σ
      
      denotes summation.
  - 9. A multi-view object recognition system as set forth in claim 8, wherein the system is further configured to optimize pairing between the inter-view points (point i in view 1 and point j in view 2) using a bipartite weighted matching problem, and further comprising a smoothing filter for optimal 3D trajectory estimation.
  - 10. A multi-view object recognition system as set forth in claim 9, wherein the object is further configured to track multiple objects.

11. A computer implemented method for multi-view object recognition using swarming domain classifiers, the method comprising acts of:
- configuring a plurality of software agents (i.e., particles) to operate as a cooperative swarm to classify an object in a domain as seen from multiple view points, where each agent is a complete classifier and is assigned an initial velocity vector to explore a solution space for object solutions;
  
  configuring each agent to perform at least one iteration, the iteration being a search in the solution space for a potential solution optimum where each agent keeps track of its coordinates in multi-dimensional space that are associated with an observed best solution (pbest) that the agent has identified, and a global best solution (gbest) where the gbest is used to store the best location among all agents, with each velocity vector thereafter changing towards pbest and gbest, allowing the cooperative swarm to concentrate on the vicinity of the object and classify the object when a classification level exceeds a preset threshold;
  
  further comprising an act of configuring the agents to search for the object in three-dimensional (3D) spatial coordinates, such that the object is a 3D object and the 3D object has distinct appearances from each view point in the multiple view points; and
  
  further comprising an act of linking the agents such that the distinct appearances of the 3D object from the multiple view points are linked by agents searching for the 3D object in the spatial coordinates, such that each agent has an associated 3D location X and an object height h, and wherein each of the multiple view points is provided as a 2D image from a calibrated camera having a given geometry, such that given the known geometry of the calibrated cameras, a 2D location, [x,y]^T=π
  
  (X), of an agent'"'"'s projection in each view (2D image) is calculated and used to select an image window that is sent to a classifier having a classifier output that corresponds to the classifier'"'"'s confidence that the image window contains the object, where superscript T denotes transpose.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. A method as set forth in claim 11, further comprising an act of configuring each agent to have a search trajectory that is guided by a cost function, such that the cost function is formed by combining the classifier outputs evaluated at the agent'"'"'s projection points in each of the views, wherein the projection points are points in an image corresponding to the 3D object.
  - 13. A method as set forth in claim 12, further comprising an act of calculating a cost function for an agent at 3D location X=[x,y,z]^T, according to the following:
    - ƒ
      
      (X,h)=w₁*classifier(image₁,π
      
      ₁(X),Π
      
      ₁(X,h))+w₂*classifier(image₂,π
      
      ₂(X),Π
      
      ₂(X,h)),where, π
      
      ;
      
      ³→
      
      ²is a projection operator such that π
      
      _nmaps the object'"'"'s 3D location X into a 2D location [x,y] in image n, and w₁+w₂=1, where w₁and w₂are positive weighting factors and normally w₁=w₂=0.5, and where Π
      
      is a projection operator for object height, h, such that the projection operator Π
      
      _nmaps the 3D object height to its corresponding projection size in image n, and where classifer denotes a confidence output of the object classifier operating on the window in image n with location π
      
      _n(X) and window size Π
      
      _n(X,h), and where * denotes multiplication.
  - 14. A method as set forth in claim 11, further comprising acts of:
    - receiving each of the multiple view points as a 2D image from calibrated stereo cameras, and wherein the multiple view points include at least two 2D views, view 1 and view 2, wherein each view has 2D spatial coordinates; and
      
      configuring the agents to move within each view independently to localize the object in each view independently.
  - 15. A method as set forth in claim 14, further comprising acts of:
    - forming a 2D projection of a 3D object having 3D spatial coordinates, such that the 3D object has a 2D projection in each view in the multiple view points such that the object has a distinct appearance in each view; and
      
      configuring the agents to search for the object in the 2D spatial coordinates.
  - 16. A method as set forth in claim 15, further comprising acts of:
    - configuring the agents to operate as two distinct sets of agents such that each set searches for the object in a view independently to locate a 2D location [x,y] and a 2D image window height ĥ
      
      of the object in each view; and
      
      using triangulation to combine the 2D locations from each view to estimate the object'"'"'s 3D spatial coordinates from the 2D projections.
  - 17. A method as set forth in claim 16, further comprising an act of establishing a correspondence between the 2D locations found in each 2D view to identify inter-view pairs.
  - 18. A method as set forth in claim 17, wherein the act of establishing a correspondence, further comprises an act of forming a cost/distance matrix for all possible inter-view pairs of identified object locations, the cost/distance matrix being a pair-wise cost (Cost_ij) function, calculated as follows:
    - $\begin{matrix} Cos t (i, j) = λ_{1} \langle x_{2}^{T} {Fx}_{1} \rangle + λ_{2} (\frac{\langle {\hat{h}}_{1} - h_{1} \rangle}{h_{1}} + \frac{\langle {\hat{h}}_{2} - h_{2} \rangle}{h_{2}}) + \\ λ_{3} (\sum_{w} { I_{1} (x, y) - I_{2} (x, y) }^{2}) \end{matrix}$ where Cost denotes a cost function, minimization of which ensures a consistent localization of an object in the 3D spatial coordinates;
      
      i and j denote point i in view 1 and point j in view 2 that correspond to detected objects in the two views;
      
      λ
      
      ₁denotes weighting factor for an epipolar constraint portion of the cost function;
      
      x₂denotes a coordinate vector for an object in view 2;
      
      F denotes a fundamental matrix that determines epipolar lines in one view corresponding to points in the other view;
      
      x₁denotes a coordinate vector for an object in view 1;
      
      superscript T denotes a transpose of the vector x₂;
      
      wherein x₁is a column vector and F is a matrix, so Fx₁is also a column vector;
      
      λ
      
      ₂denotes a weighting factor for a window size consistency portion of the cost function;
      
      h₁denotes a size of the object in view 1 determined from the 2D projection of the object in 3D spatial coordinates to 2D view 1;
      
      ĥ
      
      denotes the size of the object in view 1 or 2 as determined from the object classifier outputs;
      
      h₂denotes the size of an object in view 2 determined from the 2D projection of the object in 3D spatial coordinates to the 2D view 2;
      
      λ
      
      ₃denotes a weighting factor for a window appearance similarity portion of the cost function;
      
      w denotes window index;
      
      ∥
      
      denotes a magnitude operator;
      
      I₁denotes an intensity distribution of the window in view 1;
      
      x denotes an x coordinate in either view;
      
      y denotes a y coordinate in either view;
      
      I₂denotes an intensity distribution of the window in view 2; and
      
      Σ
      
      denotes summation.
  - 19. A method as set forth in claim 18, further comprising acts of:
    - optimizing pairing between the inter-view points (point i in view 1 and point j in view 2) using a bipartite weighted matching problem; and
      
      estimating a 3D trajectory of at least one 3D object.
  - 20. A method as set forth in claim 19, further comprising an act of tracking multiple objects.

21. A computer program product for object recognition, the computer program product comprising computer-readable instruction means encoded on a computer-readable medium and executable by a computer for causing a computer to:
- configure a plurality of software agents (i.e., particles) to operate as a cooperative swarm to classify an object in a domain as seen from multiple view points, where each agent is a complete classifier and is assigned an initial velocity vector to explore a solution space for object solutions, where each agent is configured to perform at least one iteration, the iteration being a search in the solution space for a potential solution optima where each agent keeps track of its coordinates in multi-dimensional space that are associated with an observed best solution (pbest) that the agent has identified, and a global best solution (gbest) where the gbest is used to store the best location among all agents, with each velocity vector thereafter changing towards pbest and gbest, allowing the cooperative swarm to concentrate on the vicinity of the object and classify the object when a classification level exceeds a preset threshold;
  
  further comprising instruction means to cause a computer to perform an operation of utilizing the agents to search for the object in three-dimensional (3D) spatial coordinates, such that the object is a 3D object and the 3D object has distinct appearances from each view point in the multiple view points; and
  
  further comprising instruction means to cause a computer to perform an operation of linking the agents such that the distinct appearances of the 3D object from the multiple view points are linked by agents searching for the 3D object in the spatial coordinates, such that each agent has an associated 3D location X and an object height h, and wherein each of the multiple view points is provided as a 2D image from a calibrated camera having a given geometry, such that given the known geometry of the calibrated cameras, a 2D location, [x,y]^T=π
  
  (X), of an agent'"'"'s projection in each view (2D image) is calculated and used to select an image window that is sent to a classifier having a classifier output that corresponds to the classifier'"'"'s confidence that the image window contains the object, where superscript T denotes transpose.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 22. A computer program product as set forth in claim 21, further comprising instruction means to cause a computer to perform an operation of configuring each agent to have a search trajectory that is guided by a cost function, such that the cost function is formed by combining the classifier outputs evaluated at the agent'"'"'s projection points in each of the views, wherein the projection points are points in an image corresponding to the 3D object.
  - 23. A computer program product as set forth in claim 22, further comprising instruction means to cause a computer to perform an operation of calculating a cost function for an agent at 3D location X=[x,y,z]^T, according to the following:
    - ƒ
      
      (X,h)=w₁*classifier(image₁,π
      
      ₁(X),Π
      
      ₁(X,h))+w₂*classifier(image₂,π
      
      ₂(X),Π
      
      ₂(X,h)),where, π
      
      ;
      
      ³→
      
      ²is a projection operator such that π
      
      _nmaps the object'"'"'s 3D location X into a 2D location [x,y] in image n, and w₁+w₂=1, where w₁and w₂are positive weighting factors and normally w₁=w₂=0.5, and where Π
      
      is a projection operator for object height, h, such that the projection operator Π
      
      _nmaps the 3D object height to its corresponding projection size in image n, and where classifer denotes a confidence output of the object classifier operating on the window in image n with location π
      
      _n(X) and window size Π
      
      _n(X,h), and where * denotes multiplication.
  - 24. A computer program product as set forth in claim 21, further comprising instruction means for causing a computer to perform operations of:
    - receive each of the multiple view points as a 2D image from calibrated stereo cameras, and wherein the multiple view points include at least two 2D views, view 1 and view 2, wherein each view has 2D spatial coordinates; and
      
      configure the agents to move within each view independently to localize the object in each view independently.
  - 25. A computer program product as set forth in claim 24, further comprising instruction means to cause a computer to perform operations of:
    - forming a 2D projection of a 3D object having 3D spatial coordinates, such that the 3D object has a 2D projection in each view in the multiple view points such that the object has a distinct appearance in each view; and
      
      configuring the agents to search for the object in the 2D spatial coordinates.
  - 26. A computer program product as set forth in claim 25, further comprising instruction means to cause a computer to perform operations of:
    - configuring the agents to operate as two distinct sets of agents such that each set searches for the object in a view independently to locate a 2D location [x,y] and a 2D image window height ĥ
      
      of the object in each view; and
      
      using triangulation to combine the 2D locations from each view to estimate the object'"'"'s 3D spatial coordinates from the 2D projections.
  - 27. A computer program product as set forth in claim 26, further comprising instruction means to cause a computer to perform an operation of establishing a correspondence between the 2D locations found in each 2D view to identify inter-view pairs.
  - 28. A computer program product as set forth in claim 27, further comprising instruction means to cause a computer to perform an operation of forming a cost/distance matrix for all possible inter-view pairs of identified object locations, the cost/distance matrix being a pair-wise cost (Cost_ij) function, calculated as follows:
    - $\begin{matrix} Cos t (i, j) = λ_{1} \langle x_{2}^{T} {Fx}_{1} \rangle + λ_{2} (\frac{\langle {\hat{h}}_{1} - h_{1} \rangle}{h_{1}} + \frac{\langle {\hat{h}}_{2} - h_{2} \rangle}{h_{2}}) + \\ λ_{3} (\sum_{w} { I_{1} (x, y) - I_{2} (x, y) }^{2}) \end{matrix}$ where Cost denotes a cost function, minimization of which ensures a consistent localization of an object in the 3D spatial coordinates;
      
      i and j denote point i in view 1 and point j in view 2 that correspond to detected objects in the two views;
      
      λ
      
      ₁denotes weighting factor for an epipolar constraint portion of the cost function;
      
      x₂denotes a coordinate vector for an object in view 2;
      
      F denotes a fundamental matrix that determines epipolar lines in one view corresponding to points in the other view;
      
      x₁denotes a coordinate vector for an object in view 1;
      
      superscript T denotes a transpose of the vector x₂;
      
      wherein x₁is a column vector and F is a matrix, so Fx₁is also a column vector;
      
      λ
      
      ₂denotes a weighting factor for a window size consistency portion of the cost function;
      
      h₁denotes a size of the object in view 1 determined from the 2D projection of the object in 3D spatial coordinates to 2D view 1;
      
      ĥ
      
      denotes the size of the object in view 1 or 2 as determined from the object classifier outputs;
      
      h₂denotes the size of an object in view 2 determined from the 2D projection of the object in 3D spatial coordinates to the 2D view 2;
      
      λ
      
      ₃denotes a weighting factor for a window appearance similarity portion of the cost function;
      
      w denotes window index;
      
      ∥
      
      denotes a magnitude operator;
      
      I₁denotes an intensity distribution of the window in view 1;
      
      x denotes an x coordinate in either view;
      
      y denotes a y coordinate in either view;
      
      I₂denotes an intensity distribution of the window in view 2; and
      
      Σ
      
      denotes summation.
  - 29. A computer program product as set forth in claim 28, further instruction means to cause a computer to perform operations:
    - optimizing pairing between the inter-view points (point i in view 1 and point j in view 2) using a bipartite weighted matching problem; and
      
      estimating a 3D trajectory of at least one 3D object.
  - 30. A computer program product as set forth in claim 29, further comprising instruction means to cause a computer to perform an operation of tracking multiple objects.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
HRL Laboratories LLC (The Boeing Co.)
Original Assignee
HRL Laboratories LLC (The Boeing Co.)
Inventors
Saisan, Payam, Owechko, Yuri, Medasani, Swarup
Primary Examiner(s)
Vincent; David R
Assistant Examiner(s)
Rifkin; Ben M

Application Number

US11/385,983
Publication Number

US 20070183669A1
Time in Patent Office

1,205 Days
Field of Search

706/14
US Class Current

706/14
CPC Class Codes

G06F 18/2111   by using evolutionary compu...

G06F 18/254   of classification results, ...

G06V 10/771   Feature selection, e.g. sel...

G06V 10/809   of classification results, ...

G06V 40/103   Static body considered as a...

Multi-view cognitive swarm for object recognition and 3D tracking

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-view cognitive swarm for object recognition and 3D tracking

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links