Object recognition using binary image quantization and hough kernels

US 6,807,286 B1
Filed: 04/13/2000
Issued: 10/19/2004
Est. Priority Date: 04/13/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented process for recognizing objects in an input image of a scene, comprising using a computer to perform the following process actions:

generating training images depicting a surface of interest of an object it is desired to recognized in said input image of the scene;

creating a set of prototype edge features which collectively represent the edge pixel patterns encountered within a sub-window centered on each pixel depicting an edge of the object in the training images;

defining a Hough kernel for each prototype edge feature, wherein a Hough kernel for a particular prototype edge feature is defined by a set of offset vectors representing the distance and direction, from each edge pixel having a sub-window associated therewith that has an edge pixel pattern best represented by the prototype edge feature, to a prescribed reference point on the surface of interest of the object, wherein said offset vectors are represented in the Hough kernel as originating at a central point thereof;

identifying for each pixel in the input image depicting an edge, which prototype edge feature best represents the edge pixel pattern exhibited within said sub-window centered on the input image edge pixel under consideration;

for each pixel location in the input image, identifying how many offset vectors terminate at that location from Hough kernels centered on each edge pixel location of the input image, wherein the Hough kernel centered on each pixel location is the Hough kernel associated with the prototype edge feature best representing the edge pixel pattern exhibited within said sub-window centered on that input image edge pixel location; and

declaring the object to be present in the input image if any of the pixel locations in the input image have a quantity of offset vectors terminating thereat that exceed a detection threshold which is indicative of the presence of the surface of interest of the object.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and process for recognizing an object in an input image involving first generating training images depicting the object. A set of prototype edge features is created that collectively represent the edge pixel patterns encountered within a sub-window centered on each pixel depicting an edge of the object in the training images. Next, a Hough kernel is defined for each prototype edge feature in the form of a set of offset vectors representing the distance and direction, from each edge pixel having an associated sub-window exhibiting an edge pixel pattern best represented by the prototype edge feature, to a prescribed reference point on a surface of the object. The offset vectors are represented as originating at a central point of the kernel. For each edge pixel in the input image, the prototype edge feature which best represents the edge pixel pattern exhibited within the sub-window centered on the edge pixel is identified. Then, for each input image pixel location, the number of offset vectors terminating at that location from Hough kernels centered on each edge pixel location of the input image is identified. The Hough kernel centered on each pixel location is the Hough kernel associated with the prototype edge feature best representing the edge pixel pattern exhibited within a sub-window centered on that input image edge pixel location. The object is declared to be present in the input image if any of the input image pixel locations have a quantity of offset vectors terminating thereat that equals or exceeds a detection threshold.

75 Citations

View as Search Results

42 Claims

1. A computer-implemented process for recognizing objects in an input image of a scene, comprising using a computer to perform the following process actions:
- generating training images depicting a surface of interest of an object it is desired to recognized in said input image of the scene;
  
  creating a set of prototype edge features which collectively represent the edge pixel patterns encountered within a sub-window centered on each pixel depicting an edge of the object in the training images;
  
  defining a Hough kernel for each prototype edge feature, wherein a Hough kernel for a particular prototype edge feature is defined by a set of offset vectors representing the distance and direction, from each edge pixel having a sub-window associated therewith that has an edge pixel pattern best represented by the prototype edge feature, to a prescribed reference point on the surface of interest of the object, wherein said offset vectors are represented in the Hough kernel as originating at a central point thereof;
  
  identifying for each pixel in the input image depicting an edge, which prototype edge feature best represents the edge pixel pattern exhibited within said sub-window centered on the input image edge pixel under consideration;
  
  for each pixel location in the input image, identifying how many offset vectors terminate at that location from Hough kernels centered on each edge pixel location of the input image, wherein the Hough kernel centered on each pixel location is the Hough kernel associated with the prototype edge feature best representing the edge pixel pattern exhibited within said sub-window centered on that input image edge pixel location; and
  
  declaring the object to be present in the input image if any of the pixel locations in the input image have a quantity of offset vectors terminating thereat that exceed a detection threshold which is indicative of the presence of the surface of interest of the object.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 2. The process of claim 1, wherein the process action of creating a set of prototype edge features, comprises the actions of:
3. The process of claim 2, wherein the process action of producing an edge pixel training image representing each training image, comprises the process actions of:
- assigning a first binary value to each pixel location of each training image corresponding to an edge pixel; and
  
  assigning a second binary value to each pixel location of each training image corresponding to a non-edge pixel.
4. The process of claim 2, wherein the prescribed size of the sub-window is 7 by 7 pixels, and the prescribed number of prototype edge features is 64.
5. The process of claim 2, wherein the process action of creating a prescribed number of prototype edge features, comprises the actions of:
- creating a number of arbitrary, initial prototype edge features equal to said prescribed number of prototype edge features, wherein each initial prototype edge feature is created by randomly assigning each pixel location thereof as either corresponding to an edge pixel or a non-edge pixel;
  
  computing the similarity between a prescribed number of raw edge features and each initial prototype edge feature;
  
  assigning each of said prescribed number of raw edge features to the initial prototype edge feature to which it is most similar based on the computed similarities associated therewith; and
  
  replacing each initial prototype edge feature with a composite prototype edge feature produced by combining the raw edge features assigned to the initial prototype edge feature.
6. The process of claim 5, wherein the process action of computing the similarity between the prescribed number of raw edge features and each initial prototype edge feature, comprises the actions of:
- computing a grassfire transform from each of the prescribed number of raw edge features and from each of the initial prototype edge features, such that each edge pixel in the edge features is assigned a grassfire value of “
  
  0” and
  
  each non-edge pixel is assigned a grassfire value which is an integer number that increases the farther away the non-edge pixel is from the nearest edge pixel;
  
  computing the similarity between each grassfire-transformed raw edge feature and each grassfire-transformed initial prototype edge feature by, for each raw edge feature-initial prototype edge feature pair considered, identifying all the pixel locations in the raw edge feature that correspond to edge pixel locations in the initial prototype edge feature, summing the grassfire values assigned to the identified pixel locations in the raw edge feature to create a raw edge feature grassfire sum, identifying all the pixel locations in the initial prototype edge feature that correspond to edge pixel locations in the raw edge feature, summing the grassfire values assigned to the identified pixel locations in the initial prototype edge feature to create an initial prototype edge feature grassfire sum, and adding the raw edge feature grassfire sum to the initial prototype edge feature grassfire sum to create a combined grassfire sum, wherein the smaller the combined grassfire sum, the more similar the raw edge feature is to the initial prototype edge feature.
7. The process of claim 6, wherein the process action of replacing each initial prototype edge feature with a composite prototype edge feature, comprises the actions of:
- (a) computing an edge pixel mean by dividing the sum of the number of edge pixels in each of the raw edge features by the total number of raw edge features assigned to the initial prototype feature;
  
  (b) respectively summing the grassfire values in each corresponding pixel location of the raw edge features assigned to the initial prototype edge feature to create a summed grassfire-transformed edge feature;
  
  (c) identifying locations in the summed grassfire-transformed edge feature having the lowest grassfire value and recording the quantity of these locations;
  
  (d) identifying locations in the summed grassfire-transformed edge feature having the next higher grassfire value and recording the quantity of these locations;
  
  (e) adding the number of pixel locations having the lowest grass fire value to the number of pixel locations having the next higher grassfire value to produce a combined pixel location sum;
  
  (f) comparing the number of pixel locations having the lowest grassfire value, and the combined pixel location sum, to the edge pixel mean;
  
  (g) whenever the number of pixel locations having the lowest grassfire value is closer to the edge pixel mean than the combined pixel location sum, designate the pixel locations having the lowest grassfire value as edge pixels in the composite prototype edge feature;
  
  (h) whenever the combined pixel location sum is closer to the edge pixel mean than the number of pixel locations having the lowest grassfire value, and the mean is less than the combined pixel location sum, designate the pixel locations associated with the combined pixel location sum as edge pixels in the composite prototype edge feature;
  
  (i) whenever the combined pixel location sum is closer to the edge pixel mean than the number of pixel locations having the lowest grassfire value, and the mean is greater than the combined pixel location sum, identifying locations in the summed grassfire-transformed edge feature having the next higher, previously unconsidered grassfire value, and recording the quantity of these locations;
  
  (j) adding the number of pixel locations associated with said next higher, previously unconsidered grassfire value to the last previously computed combined pixel location sum to produce a current combined pixel location sum;
  
  (k) comparing the number of pixel locations associated with the last previously-computed combined pixel location sum, and the current combined pixel location sum, to the edge pixel mean;
  
  (l) whenever the last previously-computed combined pixel location sum is closer to the edge pixel mean than the current combined pixel location sum, designate the pixel locations associated with the last previously-computed combined pixel location sum as edge pixels in the composite prototype edge feature;
  
  (m) whenever the current combined pixel location sum is closer to the edge pixel mean than the last previously-computed combined pixel location sum, and the mean is less than the current combined pixel location sum, designate the pixel locations associated with the current combined pixel location sum as edge pixels in the composite prototype edge feature;
  
  (n) whenever the current combined pixel location sum is closer to the edge pixel mean than the last previously-computed combined pixel location sum, and the mean is greater than the current combined pixel location sum, identifying locations in the summed grassfire-transformed edge feature having the next higher, previously unconsidered grassfire value, recording the quantity of these locations, and repeat process actions (j) through (m).
8. The process of claim 5, further comprising the process actions of:
- (a) computing the similarity between the prescribed number of raw edge features and each composite prototype edge feature;
  
  (b) assigning each of said prescribed number of raw edge features to the composite prototype edge feature to which it is most similar based on the computed similarities associated therewith;
  
  (c) replacing the last-produced composite prototype edge feature with a current composite prototype edge feature produced by combining the raw edge features assigned to the last-produced composite prototype edge feature;
  
  (d) repeating process actions (a) through (c), until the edge pixel locations in the current composite prototype edge feature have not changed in comparison to the last-produced composite prototype edge feature; and
  
  (e) designating the current composite prototype edge features as the final prototype edge features.
9. The process of claim 8, wherein the process action of computing the similarity between the prescribed number of raw edge features and each composite prototype edge feature, comprises the actions of:
- computing a grassfire transform from each of the prescribed number of raw edge features and from each of the composite prototype edge features, such that each edge pixel in the edge features is assigned a grassfire value of “
  
  0” and
  
  each non-edge pixel is assigned a grassfire value which is an integer number that increases the farther away the non-edge pixel is from the nearest edge pixel;
  
  computing the similarity between each grassfire-transformed raw edge feature and each grassfire-transformed composite prototype edge feature by, for each raw edge feature-composite prototype edge feature pair considered, identifying all the pixel locations in the raw edge feature that correspond to edge pixel locations in the composite prototype edge feature, summing the grassfire values assigned to the identified pixel locations in the raw edge feature to create a raw edge feature grassfire sum, identifying all the pixel locations in the composite prototype edge feature that correspond to edge pixel locations in the raw edge feature, summing the grassfire values assigned to the identified pixel locations in the composite prototype edge feature to create a composite prototype edge feature grassfire sum, and adding the raw edge feature grassfire sum to the composite prototype edge feature grassfire sum to create a combined grassfire sum, wherein the smaller the combined grassfire sum, the more similar the raw edge feature is to the composite prototype edge feature.
10. The process of claim 9, wherein the process action of replacing the last-produced composite prototype edge feature with a current composite prototype edge feature produced, comprises the actions of:
- (i) computing an edge pixel mean by dividing the sum of the number of edge pixels in each of the raw edge features by the total number of raw edge features assigned to the last-produced composite prototype feature;
  
  (ii) respectively summing the grassfire values in each corresponding pixel location of the raw edge features assigned to the last-produced composite prototype edge feature to create a summed grassfire-transformed edge feature;
  
  (iii) identifying locations in the summed grassfire-transformed edge feature having the lowest grassfire value and recording the quantity of these locations;
  
  (iv) identifying locations in the summed grassfire-transformed edge feature having the next higher grassfire value and recording the quantity of these locations;
  
  (v) adding the number of pixel locations having the lowest grass fire value to the number of pixel locations having the next higher grassfire value to produce a combined pixel location sum;
  
  (vi) comparing the number of pixel locations having the lowest grassfire value, and the combined pixel location sum, to the edge pixel mean;
  
  (vii) whenever the number of pixel locations having the lowest grassfire value is closer to the edge pixel mean than the combined pixel location sum, designate the pixel locations having the lowest grassfire value as edge pixels in the current composite prototype edge feature;
  
  (viii) whenever the combined pixel location sum is closer to the edge pixel mean than the number of pixel locations having the lowest grassfire value, and the mean is less than the combined pixel location sum, designate the pixel locations associated with the combined pixel location sum as edge pixels in the current composite prototype edge feature;
  
  (ix) whenever the combined pixel location sum is closer to the edge pixel mean than the number of pixel locations having the lowest grassfire value, and the mean is greater than the combined pixel location sum, identifying locations in the summed grassfire-transformed edge feature having the next higher, previously unconsidered grassfire value, and recording the quantity of these locations;
  
  (x) adding the number of pixel locations associated with said next higher, previously unconsidered grassfire value to the last previously computed combined pixel location sum to produce a current combined pixel location sum;
  
  (xi) comparing the number of pixel locations associated with the last previously-computed combined pixel location sum, and the current combined pixel location sum, to the edge pixel mean;
  
  (xii) whenever the last previously-computed combined pixel location sum is closer to the edge pixel mean than the current combined pixel location sum, designate the pixel locations associated with the last previously-computed combined pixel location sum as edge pixels in the current composite prototype edge feature;
  
  (xiii) whenever the current combined pixel location sum is closer to the edge pixel mean than the last previously-computed combined pixel location sum, and the mean is less than the current combined pixel location sum, designate the pixel locations associated with the current combined pixel location sum as edge pixels in the current composite prototype edge feature;
  
  (xiv) whenever the current combined pixel location sum is closer to the edge pixel mean than the last previously-computed combined pixel location sum, and the mean is greater than the current combined pixel location sum, identifying locations in the summed grassfire-transformed edge feature having the next higher, previously unconsidered grassfire value, recording the quantity of these locations, and repeat process actions (x) through (xiii).
11. The process of claim 5, wherein the prescribed number of raw edge features is equal to all the raw edge features defined.
12. The process of claim 5, wherein the prescribed number of raw edge features is less than all the raw edge features defined, and wherein the prescribed number of raw edge features is selected at random from the raw edge features available.
13. The process of claim 12, wherein the prescribed number of raw edge features is greater than about 10,000.
14. The process of claim 2, wherein the process action of defining a Hough kernel for each prototype edge feature, comprises the actions of:
- assigning a unique number to the prototype edge feature, said number hereinafter being referred to as a prototype index number;
  
  producing an indexed training image from each edge pixel training image by assigning an appropriate prototype index number to each edge pixel location in the edge pixel training image, wherein the appropriate index number is the index number assigned to the prototype edge feature which is most similar to the raw edge feature associated with the edge pixel at the edge pixel location;
  
  assigning an offset vector to each raw edge feature in each edge pixel training image, said offset vector defining the distance and direction from the center edge pixel of a raw edge feature to a pixel location in the edge pixel training image corresponding to a prescribed reference point on the surface of interest of the object depicted in the training image;
  
  identifying each offset vector assigned to the edge pixels of the edge pixel training images whose location corresponds to a pixel location in the associated indexed training image to which the prototype edge feature under consideration has been assigned, and designating these offset vectors as elements of a Hough kernel for the prototype edge feature;
  
  characterizing the Hough kernel for the prototype edge feature as an image having a central pixel location from which all the offset vectors identified as being an element of the Hough kernel originate, and whose pixels are assigned a vote number indicative of how many the offset vectors terminate at that location.
15. The process of claim 14, wherein the process action of producing an indexed training image from each edge pixel training image, comprises an action of determining the prototype edge feature which is most similar to each raw edge feature, said determining action comprising, for each raw edge feature, the actions of:
- computing a grassfire transform from the raw edge feature and from each prototype edge feature, such that each edge pixel in the edge features is assigned a “
  
  0” and
  
  each non-edge pixel is assigned an integer number that increases the farther away the non-edge pixel is from the nearest edge pixel;
  
  identifying all the pixel locations in the raw edge feature that correspond to edge pixel locations in the prototype edge feature;
  
  summing the grassfire values assigned to the identified pixel locations in the raw edge feature to create a raw edge feature grassfire sum;
  
  identifying all the pixel locations in each of the prototype edge feature that correspond to edge pixel locations in the raw edge feature;
  
  for each of the prototype edge features, summing the grassfire values assigned to the identified pixel locations in the prototype edge feature to create a prototype edge feature grassfire sum;
  
  respectively adding the raw edge feature grassfire sum to each of the prototype edge feature grassfire sums to create a plurality of combined grassfire sums;
  
  identifying which of the combined grassfire sums is the smallest; and
  
  designating the prototype edge feature associated with the smallest combined grassfire sum as the most similar to the raw edge feature under consideration.
16. The process of claim 14, wherein the prescribed reference point on the surface of interest of the object depicted in the training image is the centroid of the surface.
17. The process of claim 1, wherein the process action of identifying for each pixel in the input image depicting an edge, which prototype edge feature best represents the edge pixel pattern exhibited within said sub-window centered on the input image edge pixel under consideration, comprises the actions of:
- detecting edge pixels in the input image and producing an edge pixel input image representing the input image;
  
  defining a raw edge feature associated with each edge pixel in the edge pixel input image, wherein the raw edge feature comprises a sub-window in the edge pixel input image of a prescribed size which is centered on an edge pixel;
  
  assigning a unique number to each prototype edge feature, said number hereinafter being referred to as a prototype index number; and
  
  assigning an appropriate prototype index number to each edge pixel location in the edge pixel input image to create an indexed input image, wherein the appropriate index number is the index number assigned to the prototype edge feature which is most similar to the raw edge feature associated with the edge pixel at the edge pixel location under consideration.
18. The process of claim 17, wherein the prescribed size of the sub-window in the edge pixel input image is the same as the prescribed size of the sub-window employed in the training images.
19. The process of claim 18, wherein the prescribed size of both of the sub-windows is 7 by 7 pixels.
20. The process of claim 17, wherein the process action of assigning an appropriate prototype index number to each edge pixel location in the edge pixel input image, comprises an action of determining the prototype edge feature which is most similar to each raw edge feature in the edge pixel input image, said determining action comprising, for each raw edge feature, the actions of:
- computing a grassfire transform from the raw edge feature and from each prototype edge feature, such that each edge pixel in the edge features is assigned a “
  
  0” and
  
  each non-edge pixel is assigned an integer number that increases the farther away the non-edge pixel is from the nearest edge pixel;
  
  identifying all the pixel locations in the raw edge feature that correspond to edge pixel locations in the prototype edge feature;
  
  summing the grassfire values assigned to the identified pixel locations in the raw edge feature to create a raw edge feature grassfire sum;
  
  identifying all the pixel locations in each of the prototype edge feature that correspond to edge pixel locations in the raw edge feature;
  
  for each of the prototype edge features, summing the grassfire values assigned to the identified pixel locations in the prototype edge feature to create a prototype edge feature grassfire sum;
  
  respectively adding the raw edge feature grassfire sum to each of the prototype edge feature grassfire sums to create a plurality of combined grassfire sums;
  
  identifying which of the combined grassfire sums is the smallest; and
  
  designating the prototype edge feature associated with the smallest combined grassfire sum as the most similar to the raw edge feature under consideration.
21. The process of claim 17, wherein the process action of identifying how many offset vectors terminate at each pixel location in the input image from Hough kernels centered on each edge pixel location of the input image, comprises an action of generating a final voting image from the indexed input image, said generating action comprising the actions of:
- for each prototype index number, identifying each pixel location in the indexed input image that has been assigned the prototype index number under consideration, creating an equal-value index image for the prototype index number under consideration, wherein the equal-value index image is the same size as the indexed input image and is created by assigning a first binary pixel value to every pixel location with the exception of those pixel locations that correspond to the identified pixel locations of the indexed input image having a pixel value matching the prototype index number under consideration in which case the second binary pixel value is assigned to these pixel locations of the equal-value index image, characterizing the Hough kernel associated with the prototype edge feature assigned the prototype index number under consideration as an image having a central pixel location from which all the offset vectors associated therewith originate, and whose pixels are assigned a vote count indicative of how many the offset vectors terminate at that location, superimposing the Hough kernel associated with the prototype edge feature assigned the prototype index number under consideration onto each of the pixel locations of the equal-value index image exhibiting the second binary pixel value such that the Hough kernel'"'"'s central pixel location corresponds to each of these pixel locations, for each pixel location in the equal-value index image, assigning the sum of the vote counts associated with Hough kernel pixels superimposed on that pixel location to a corresponding pixel location of an initial voting image associated with the equal-value index image; and
  
  combining each of the initial voting images to create a final voting image by respectively summing the summed vote counts assigned to each corresponding pixel location across all the initial voting images and assigning the resulting sum to that location in the final voting image.
22. The process of claim 17, wherein the process action of generating training images, comprises the actions of:
- inputting a model image that depicts the surface of interest of the object;
  
  creating a base training image depicting the portion of the model image containing said surface of interest; and
  
  synthesizing training images from the base image which depict the object'"'"'s surface of interest in different orientations; and
  
  wherein the model image, base training image, and synthetic training images are color images.
23. The process of claim 22, wherein the process action of detecting edge pixels in the input image and producing an edge pixel input image representing the input image, comprises the process actions of:
- identifying which color levels among all the possible color levels are acceptably related to the color levels exhibited by the pixels of the object'"'"'s surface of interest as depicted in the base training image; and
  
  eliminating from consideration as pixels depicting an edge, every pixel location in the edge pixel input image corresponding to a pixel location in the input image whose pixel exhibits a color level not among those identified as being acceptably related to the color levels exhibited by the pixels of the object'"'"'s surface of interest in the base training image.
24. The process of claim 23, wherein the process action of identifying which color levels among all the possible color levels are acceptably related to the color levels exhibited by the pixels of the object'"'"'s surface of interest, comprises the actions of:
- establishing a lookup table having a separate cell for each possible pixel color level;
  
  identifying the color level of each pixel associated with the object'"'"'s surface of interest;
  
  normalizing each of the identified color levels associated with the object'"'"'s surface of interest to produce a plurality of object-related color levels;
  
  normalizing each color level of the lookup table;
  
  establishing a final acceptance region of a prescribed size for each of the normalized object-related color levels, wherein the acceptance region indicates a range of color levels that is considered equivalent to a particular normalized object-related color level; and
  
  setting the cells of the lookup table to a first binary value if the cell'"'"'s normalized color level is not found within the final acceptance region of at least one of the normalized object-related color levels, and setting the cells of the lookup table to a second binary value if the cell'"'"'s normalized color level is found within the final acceptance region of at least one of the normalized object-related color levels, to create a final binary lookup table.
25. The process of claim 24, wherein the process action of eliminating from consideration as pixels depicting an edge, every pixel location in the edge pixel input image corresponding to a pixel location in the input image whose pixel exhibits a color level not among those identified as being acceptably related to the color levels exhibited by the pixels of the object'"'"'s surface of interest in the base training image, comprises the actions of:
- identifying the color level of each pixel of the input image;
  
  identifying those pixels of the input image that exhibit a color level which is set to the first binary level in the final binary lookup table, thereby indicating that the pixels are not object-related, setting each pixel location in the edge pixel input image corresponding to a pixel location in the input image identified as having a pixel that is not object-related to a pixel value reserved to indicate a non-edge pixel.
26. The process of claim 24, wherein the process action of establishing a final acceptance region of a prescribed size for each of the normalized object-related color levels, comprises the actions of:
- (a) employing as the prescribed size of an initial acceptance region one of (i) a default size, or (ii) a user-specified size;
  
  (b) setting the cells of the lookup table to a first binary value if the cell'"'"'s normalized color level is not found within the initial acceptance region of at least one of the normalized object-related color levels, and setting the cells of the lookup table to a second binary value if the cell'"'"'s normalized color level is found within the final acceptance region of at least one of the normalized object-related color levels, to create an initial binary lookup table;
  
  (c) identifying the color levels of each pixel of the model image;
  
  (d) setting a pixel of the model image to the first binary value whenever the color level identified for that pixel corresponds to a cell of the initial binary lookup table that has been set to said first binary level, and setting a pixel of the model image to the second binary value whenever the color level identified for that pixel corresponds to a cell of the initial binary lookup table that has been set to said second binary level, to create a binary model image;
  
  (e) displaying the binary model image to the user in a manner that all the pixels of the image set to the first binary value are displayed using a first color level, and all the pixels of the image set to the second binary value are displayed using a second, contrasting color level;
  
  (f) requesting the user to provide a revised size for the acceptance region, whenever the user determines that a substantial number of the pixels depicting the object'"'"'s surface of interest are not displayed using said second color level;
  
  (g) repeating process actions (b) through (f) employing the revised size for the acceptance region in place of the prescribed size of the initial acceptance region, whenever a revised size is provided by the user; and
  
  (h) establishing as the prescribed size of the final acceptance region, the last-employed prescribed size of the acceptance region, whenever the user does not provide a revised size for the acceptance region.
27. The process of claim 26, wherein the color levels are measured in terms of the red, green and blue (RGB) color components exhibited by a pixel, and wherein the default size of the initial acceptance region is 5 RGB units in any direction.
28. The process of claim 21, wherein the process action of declaring the object to be present in the input image, comprises the actions of:
- establishing a final detection threshold defining the minimum vote count necessary to indicate that a pixel location in the final voting image could correspond to the pixel location of the prescribed reference point of the object'"'"'s surface of interest in the input image;
  
  for each pixel of the final voting image, ascertaining if the pixel'"'"'s vote count equals or exceeds the final detection threshold;
  
  declaring the object'"'"'s surface of interest, and so the object, to be present in the input image only if at least one of the pixels of the final voting image equals or exceeds the final detection threshold.
29. The process of claim 28, further comprising the process actions of:
- determining which pixel of the final voting image, among the pixels thereof equaling or exceeding the final detection threshold, exhibits the highest vote count; and
  
  designating the pixel location of input image corresponding to the pixel location of the pixel in the final voting image determined to exhibit the highest vote count equaling or exceeding the final detection threshold to be the location of the prescribed reference point on the object'"'"'s surface of interest as it appears in the input image.
30. The process of claim 28, wherein the process action of establishing a final detection threshold, comprises the actions of:
- (a) employing as an initial detection threshold one of (i) a default value, or (ii) a user-specified value;
  
  (b) identifying for each pixel depicting an edge in one of (i) the model image, or (ii) a new image of the scene depicting the object'"'"'s surface of interest at a known location therein, which prototype edge feature best represents the edge pixel pattern exhibited within the sub-window centered on the edge pixel under consideration;
  
  (c) for each pixel location in the model image or new image, identifying how many offset vectors terminate at that location from Hough kernels centered on each edge pixel location, wherein the Hough kernel centered on each pixel location is the Hough kernel associated with the prototype edge feature best representing the edge pixel pattern exhibited within said sub-window centered on that edge pixel location, and wherein the number of offset vectors identified as terminating at a particular pixel constitutes a vote count for that pixel;
  
  (d) displaying the model image or new image to the user with the pixel location having vote counts equaling or exceeding the initial detection threshold highlighted in a manner apparent to the user;
  
  (e) requesting the user to provide a revised detection threshold having a higher value whenever the user determines that a significant number of pixels not depicting the object'"'"'s surface of interest are highlighted, and requesting the user to provide a revised detection threshold having a lower value whenever the user determines that too few of the pixels depicting the object'"'"'s surface of interest are highlighted;
  
  (f) repeating process actions (b) through (e) employing the revised detection threshold in place of the initial detection threshold, whenever a revised detection threshold is provided by the user; and
  
  (g) establishing as the final detection threshold, the last-employed detection threshold, whenever the user does not provide a revised detection threshold.

31. A system for training at least one general purpose computing device to recognize objects in an input image of a scene, comprising:
- at least one general purpose computing device; and
  
  a computer program comprising program modules executable by the at least one computing device, wherein the at least one computing device is directed by the program modules of the computer program to, generate training images depicting a surface of interest of an object it is desired to recognized in said input image of the scene, wherein said generating comprises, inputting a model image that depicts the surface of interest of the object, creating a base training image depicting the Portion of the model image containing said surface of interest, and synthesizing training images from the base image which depict the object'"'"'s surface of interest in different orientations, wherein said synthesizing comprises, (a) synthetically pointing a normal vector of the object'"'"'s surface of interest at one of a prescribed number of nodes of a tessellated hemisphere defined as overlying the surface of interest, (b) simulating a paraperspective projection representation of the surface of interest, and (c) repeating sub-module (a) and (b) for each of the remaining nodes of the tessellated hemisphere;
  
  create a set of Prototype edge features which collectively represent the edge pixel patterns encountered within a sub-window centered on each pixel depicting an edge of the object in the training images, and define a Hough kernel for each prototype edge feature, wherein a Hough kernel for a particular prototype edge feature is defined by a set of offset vectors representing the distance and direction, from each edge pixel having a sub-window associated therewith that has an edge pixel pattern best represented by the Prototype edge feature, to a prescribed reference point on the surface of interest of the object, wherein said offset vectors are represented in the Hough kernel as originating at a central point thereof, wherein said set of prototype edge features and said Hough kernels associated therewith are used to recognize said object in the input image and to identify the location of the object in the input image.
- View Dependent Claims (32, 33, 34, 35, 36, 37, 38)
- - 32. The system of claim 31, wherein the tessellated hemisphere has 31 nodes each of which is spaced apart from each adjacent node by approximately 20 degrees.
  - 33. The system of claim 31, wherein there are no nodes of the tessellated hemisphere within a prescribed distance from the equator of the hemisphere.
  - 34. The system action of claim 33, wherein there are no nodes of the tessellated hemisphere within approximately 20 degrees of the equator of the hemisphere.
  - 35. The system of claim 31, wherein the sub-module for synthesizing training images, further comprises a sub-module for:
36. The system of claim 35, wherein the sub-module for simulating a training image at each of a prescribed number of rotation increments, comprises simulating a training image after about each 20 degrees of rotation.
37. The system of claim 35, wherein the sub-module for synthesizing training images, further comprises sub-modules for:
- incrementally scaling each of the synthesized training images either up, down, or both; and
  
  simulating a training image at each of a prescribed number of scale increments.
38. The system of claim 37, wherein the sub-module for simulating a training image at each of a prescribed number of scale increments, comprises simulating a training image at 10 percent scaling increments.

39. A system for recognizing objects in an input image of a scene, said system having a set of prototype edge features and a set of Hough kernels associated therewith each of which is comprised of a plurality of offset vectors originating from a central point of the associated Hough kernel, said system comprising:
- at least one general purpose computing device; and
  
  a computer program comprising program modules executable by the at least one computing device, wherein the at least one computing device is directed by the program modules of the computer program to, identify for each pixel in the input image depicting an edge, which prototype edge feature best represents the edge pixel pattern exhibited within a sub-window centered on the input image edge pixel under consideration, for each pixel location in the input image, identify how many offset vectors terminate at that location from Hough kernels centered on each edge pixel location of the input image, wherein the Hough kernel centered on each pixel location is the Hough kernel associated with the prototype edge feature best representing the edge pixel pattern exhibited within said sub-window centered on that input image edge pixel location, and declare the object to be present in the input image if any of the pixel locations in the input image have a quantity of offset vectors terminating thereat that exceed a detection threshold which is indicative of the presence of the surface of interest of the object.

40. A computer-readable memory for recognizing objects in an input image of a scene, comprising:
- a computer-readable storage medium; and
  
  a computer program comprising program modules stored in the storage medium, wherein the storage medium is so configured by the computer program that it causes a computer to, generate training images depicting an object it is desired to recognized in said input image of the scene, for each training image, detect edge pixels and producing an edge pixel training image representing the training image, define a raw edge feature associated with each edge pixel, said raw edge feature comprising a sub-window in the edge pixel training image of a prescribed size which is centered on an edge pixel, and assign an offset vector to each raw edge feature, said offset vector defining the distance and direction from the center edge pixel of a raw edge feature to a pixel location in the edge pixel training image corresponding to a prescribed reference point on the object in the training image, create a prescribed number of prototype edge features each of which represents a group of the raw edge features and assigning a unique index number to each prototype edge feature, wherein each raw edge feature in a group of raw edge features represented by a prototype edge feature is more similar to that prototype edge feature than any of the other prototype edge features, produce an indexed training image from each edge pixel training image by assigning an appropriate prototype index number to each edge pixel location in the edge pixel training image, wherein the appropriate index number is the index number assigned to the prototype edge feature which is most similar to the raw edge feature associated with the edge pixel at the edge pixel location under consideration, create a Hough kernel for each prototype edge feature, wherein a Hough kernel for a particular prototype edge feature comprises every offset vector associated with a raw edge feature in the group of raw edge features represented by the prototype edge feature, and wherein each of said offset vectors originates at a central point of the Hough kernel, produce an indexed input image by, detecting edge pixels and producing an edge pixel input image representing the input image, defining a raw edge feature associated with each edge pixel, and assigning an appropriate prototype index number to each edge pixel location in the edge pixel input image, wherein the appropriate index number is the index number assigned to the prototype edge feature which is most similar to the raw edge feature associated with the edge pixel at the edge pixel location under consideration, generate a voting image from the indexed input image by, for each pixel location in the indexed input image, identifying how many offset vectors terminate at that pixel location from Hough kernels centered on each pixel of the indexed input image having a prototype index number assigned thereto, wherein the Hough kernel centered on each pixel is the Hough kernel associated with the prototype edge feature to which the index number of that pixel has been assigned, and assigning a vote count to each pixel of the indexed input image indicative of how many of said offset vectors terminated at that pixel location, ascertain if any of the vote counts in the voting image exceed a prescribed detection threshold which is indicative of the presence of the object being recognized in the input image, and declare the object to be present in the input image if the detection threshold is exceeded by any of the vote counts of the voting image pixels and designate the pixel location associated with the pixel having the largest vote count exceeding the threshold as corresponding to the pixel location of said prescribed reference point of the object being recognized in the input image.

41. A computer-implemented process for computing the similarity between a pair of edge features, wherein an edge feature represents the edge pixel patterns encountered within a sub-window centered on a pixel of an image, comprising using a computer to perform the following process actions:
- computing a grassfire transform from each edge feature, such that each edge pixel in the edge features is assigned a “
  
  0” and
  
  each non-edge pixel is assigned an integer number that increases the farther away the non-edge pixel is from the nearest edge pixel;
  
  identifying all the pixel locations in the first edge feature that correspond to edge pixel locations in the second edge feature;
  
  summing the grassfire values assigned to the identified pixel locations in the first edge feature to create a first edge feature grassfire sum;
  
  identifying all the pixel locations in the second edge feature that correspond to edge pixel locations in the first edge feature;
  
  summing the grassfire values assigned to the identified pixel locations in the second edge feature to create a second edge feature grassfire sum; and
  
  adding the first edge feature grassfire sum to the second edge feature grassfire sum to create a combined grassfire sum, wherein the smaller the combined grassfire sum, the more similar the first edge feature is to the second edge feature.

42. A computer-implemented process for computing a composite edge feature from a plurality of edge features, wherein an edge feature represents the edge pixel patterns encountered within a sub-window centered on a pixel of an image, comprising using a computer to perform the following process actions:
- (a) computing an edge pixel mean by dividing the sum of the number of edge pixels in each of the edge features by the total number of edge features;
  
  (b) computing a grassfire transform from each edge feature, such that each edge pixel in the edge features is assigned a “
  
  0” and
  
  each non-edge pixel is assigned an integer number that increases the farther away the non-edge pixel is from the nearest edge pixel;
  
  (c) respectively summing the grassfire values in each corresponding pixel location of the edge features to create a summed grassfire-transformed edge feature;
  
  (d) identifying locations in the summed grassfire-transformed edge feature having the lowest grassfire value and recording the quantity of these locations;
  
  (e) identifying locations in the summed grassfire-transformed edge feature having the next higher grassfire value and recording the quantity of these locations;
  
  (f) adding the number of pixel locations having the lowest grass fire value to the number of pixel locations having the next higher grassfire value to produce a combined pixel location sum;
  
  (g) comparing the number of pixel locations having the lowest grassfire value, and the combined pixel location sum, to the edge pixel mean;
  
  (h) whenever the number of pixel locations having the lowest grassfire value is closer to the edge pixel mean than the combined pixel location sum, designate the pixel locations having the lowest grassfire value as edge pixels in the composite edge feature;
  
  (i) whenever the combined pixel location sum is closer to the edge pixel mean than the number of pixel locations having the lowest grassfire value, and the mean is less than the combined pixel location sum, designate the pixel locations associated with the combined pixel location sum as edge pixels in the composite edge feature;
  
  (j) whenever the combined pixel location sum is closer to the edge pixel mean than the number of pixel locations having the lowest grassfire value, and the mean is greater than the combined pixel location sum, identifying locations in the summed grassfire-transformed edge feature having the next higher, previously unconsidered grassfire value, and recording the quantity of these locations;
  
  (k) adding the number of pixel locations associated with said next higher, previously unconsidered grassfire value to the last previously computed combined pixel location sum to produce a current combined pixel location sum;
  
  (l) comparing the number of pixel locations associated with the last previously-computed combined pixel location sum, and the current combined pixel location sum, to the edge pixel mean;
  
  (m) whenever the last previously-computed combined pixel location sum is closer to the edge pixel mean than the current combined pixel location sum, designating the pixel locations associated with the last previously-computed combined pixel location sum as edge pixels in the composite edge feature;
  
  (n) whenever the current combined pixel location sum is closer to the edge pixel mean than the last previously-computed combined pixel location sum, and the mean is less than the current combined pixel location sum, designate the pixel locations associated with the current combined pixel location sum as edge pixels in the composite edge feature;
  
  (o) whenever the current combined pixel location sum is closer to the edge pixel mean than the last previously-computed combined pixel location sum, and the mean is greater than the current combined pixel location sum, identifying locations in the summed grassfire-transformed edge feature having the next higher, previously unconsidered grassfire value, recording the quantity of these locations, and repeat process actions (k) through (n).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Campbell, Richard J., Krumm, John
Primary Examiner(s)
Wu, Jingge
Assistant Examiner(s)
Hesseltine, Ryan J.

Application Number

US09/548,182
Time in Patent Office

1,650 Days
Field of Search

382/197, 382/199, 382/200, 382/205, 382/281, 382/291, 382/103, 382/141, 382/142, 382/143, 382/144, 382/145, 382/146, 382/147, 382/148, 382/149, 382/150, 382/151, 382/152, 382/153, 382/154, 382/159
US Class Current

382/103
CPC Class Codes

G06F 18/28 Determining representative ...

G06V 10/757 Matching configurations of ...

Object recognition using binary image quantization and hough kernels

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

75 Citations

42 Claims

Specification

Solutions

Use Cases

Quick Links

Object recognition using binary image quantization and hough kernels

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

75 Citations

42 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links