Three-dimensional visual phrases for object recognition

US 8,983,201 B2
Filed: 07/30/2012
Issued: 03/17/2015
Est. Priority Date: 07/30/2012
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

under control of a processor configured with computer-executable instructions,receiving a collection of images each containing an object;

constructing a three-dimensional (3-D) model of the object, the 3-D model including a plurality of points;

determining a popularity of individual points in the plurality of points, wherein the popularity of an individual point is based at least in part on a number of images of the collection of images in which the individual point is observed;

selecting, based at least in part on a first sampling rate, a first popular point subset of the plurality of points based at least in part on the popularities of the respective individual points in the plurality of points;

selecting, based at least in part on a second sampling rate different than the first sampling rate, a second popular point subset of the plurality of points based at least in part on the popularities of the respective individual points in the plurality of points;

generating one or more 3-D visual phrases based on the first popular point subset of the plurality of points and the second popular point subset of the plurality of points; and

using the one or more 3-D visual phrases to detect the object in an unclassified image.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The techniques discussed herein discover three-dimensional (3-D) visual phrases for an object based on a 3-D model of the object. The techniques then describe the 3-D visual phrases. Once described, the techniques use the 3-D visual phrases to detect the object in an image (e.g., object recognition).

37 Citations

View as Search Results

20 Claims

1. A method comprising:
- under control of a processor configured with computer-executable instructions,receiving a collection of images each containing an object;
  
  constructing a three-dimensional (3-D) model of the object, the 3-D model including a plurality of points;
  
  determining a popularity of individual points in the plurality of points, wherein the popularity of an individual point is based at least in part on a number of images of the collection of images in which the individual point is observed;
  
  selecting, based at least in part on a first sampling rate, a first popular point subset of the plurality of points based at least in part on the popularities of the respective individual points in the plurality of points;
  
  selecting, based at least in part on a second sampling rate different than the first sampling rate, a second popular point subset of the plurality of points based at least in part on the popularities of the respective individual points in the plurality of points;
  
  generating one or more 3-D visual phrases based on the first popular point subset of the plurality of points and the second popular point subset of the plurality of points; and
  
  using the one or more 3-D visual phrases to detect the object in an unclassified image.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method recited in claim 1, wherein a 3-D visual phrase is a triangular facet on a surface of the 3-D model.
  - 3. The method recited in claim 1, further comprising describing each of the one or more 3-D visual phrases by characterizing a visual appearance of each point in the 3-D visual phrase.
  - 4. The method recited in claim 1, further comprising describing each of the one or more 3-D visual phrases by characterizing a geometric structure of points in the 3-D visual phrase.
  - 5. The method recited in claim 1, wherein using the one or more 3-D visual phrases to detect the object in the unclassified image comprises matching visual appearances of one or more points in the one or more 3-D visual phrases with scale -invariant feature transform (SIFT) features extracted from the unclassified image.
  - 6. The method recited in claim 1, wherein using the one or more 3-D visual phrases to detect the object in the unclassified image comprises comparing a geometric structure of each 3-D visual phrase to a cyclic order of features extracted from the unclassified image.
  - 7. The method recited in claim 1, wherein the collection of images is a set of training images known to include the object.
  - 8. The method recited in claim 1,determining the first sampling rate such that the one or more 3-D visual phrases can detect the object in a photograph where the object and a camera are separated by a first distance;
    - anddetermining the second sampling rate so that the one or more 3-D visual phrases can detect the object in another photograph where the object and the camera are separated by a second distance that is greater than the first distance.
  - 9. The method recited in claim 1, wherein the object is a landmark and the collection of images is received from multiple different locations on a network.
  - 10. The method recited in claim 1, further comprising storing the one or more 3-D visual phrases for object recognition.

11. One or more computer storage media storing computer executable instructions that, when executed, perform operations comprising:
- sampling a plurality of points of a 3-D model of an object to determine a popular point subset, wherein each point of the popular point subset is sampled based on a number of training images in which the point is observed;
  
  generating one or more sets of three-dimensional (3-D) visual phrases based on the popular point subset;
  
  storing the one or more sets of 3-D visual phrases, each set of 3-D visual phrases being associated with the object;
  
  receiving an indication to perform object recognition for one or more images;
  
  performing object recognition on the one or more images using the one or more sets of 3-D visual phrases; and
  
  categorizing the one or more images based on whether the object is recognized in an image using the one or more sets of 3-D visual phrases.
- View Dependent Claims (12, 13)
- - 12. The one or more computer storage media recited in claim 11, wherein the indication is based on an image-based search query received from a client device.
  - 13. The one or more computer storage media recited in claim 12, further performing an operation comprising providing image-based search results that include at least one of the one or more categorized images associated with the image-based search query.

14. A system comprising:
- one or more processors;
  
  one or more computer memories, coupled to the one or more processors and storing;
  
  an image access module, operable by the one or more processors, to access a plurality of images that each comprise an object;
  
  a three-dimensional (3-D) visual phrase discoverer module, operable by the one or more processors, to;
  
  sample a plurality of points of a 3-D point cloud for the object to determine a popular point subset, wherein an individual point of the popular point subset is sampled based on a number of the plurality of images in which the individual point is observed; and
  
  discover one or more 3-D visual phrases from the 3-D point cloud for the object based at least in part on the popular point subset; and
  
  an object detection module, operable by the one or more processors, to receive an indication to perform object recognition for one or more additional images and to use the one or more 3-D visual phrases to recognize the object in individual ones of the one or more additional images.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The system as recited in claim 14, wherein the three-dimensional (3-D) visual phrase discoverer module discovers the one or more 3-D visual phrases by constructing the 3-D point cloud for the object using the plurality of the images, and each 3-D visual phrase is a triangular facet on a surface of the 3-D point cloud.
  - 16. The system as recited in claim 14, further comprising a 3-D visual phrase description module, operable by the one or more processors, to characterize the one or more 3-D visual phrases by describing a visual appearance for each point in an individual 3-D visual phrase and describing a geometric structure of all the points in the individual 3-D visual phrase.
  - 17. The system as recited in claim 16, wherein the object detection module uses the one or more characterized 3-D visual phrases to recognize the object in the image by matching visual appearances of one or more points in the one or more characterized 3-D visual phrases with scale-invariant feature transform (SIFT) features extracted from the image.
  - 18. The system as recited in claim 16, wherein the object detection module uses the one or more characterized 3-D visual phrases to recognize the object in the image by comparing a geometric structure of an individual 3-D visual phrase to a cyclic order of features extracted from the image.
  - 19. The system as recited in claim 14, wherein the plurality of images are known to contain the object and the one or more additional images are not known to contain the object prior to operation of the object detection module.
  - 20. The system as recited in claim 14, wherein the plurality of points are sampled based at least in part on a sampling rate selected based at least in part on a distance between the object and a camera for one or more of the plurality of images.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Cai, Rui, Li, Zhiwei, Zhang, Lei, Hao, Qiang
Primary Examiner(s)
Le, Vu
Assistant Examiner(s)
RIVERA-MARTINEZ, GUILLERMO M

Application Number

US13/561,718
Publication Number

US 20140029856A1
Time in Patent Office

960 Days
Field of Search
US Class Current

382/195
CPC Class Codes

G06V 10/426 Graphical representations

G06V 10/464 using a plurality of salien...

Three-dimensional visual phrases for object recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

37 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Three-dimensional visual phrases for object recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

37 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links