In-video product annotation with web information mining

US 9,355,330 B2
Filed: 04/11/2012
Issued: 05/31/2016
Est. Priority Date: 04/12/2011
Status: Active Grant

First Claim

Patent Images

1. A computer method for providing product annotation in a video to one or more users, the method comprising:

generating a product visual signature for a product by at least;

collecting an unannotated expert product image of the product from an expert product repository,searching for a plurality of unannotated product images from a plurality of web resources different from the expert product repository, the plurality of unannotated product images related to the unannotated expert product image,selecting a subset of the plurality of unannotated product images by filtering the plurality of unannotated product images based on a similarity measure to the unannotated expert product image, andgenerating the product visual signature from the unannotated expert product image and the subset of the plurality of unannotated product images;

receiving a video for product annotation, the video comprising a plurality of video frames;

extracting a plurality of key frames from the video frames; and

for each key frame;

generating a visual representation of the key framed;

comparing the visual representation with a plurality of product visual signatures including the product visual signature; and

determining, based on the comparison, that the key frame contains the product identified by the product visual signature.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system provides product annotation in a video to one or more users. The system receives a video from a user, where the video includes multiple video frames. The system extracts multiple key frames from the video and generates a visual representation of the key frame. The system compares the visual representation of the key frame with a plurality of product visual signatures, where each visual signature identifies a product. Based on the comparison of the visual representation of the key frame and a product visual signature, the system determines whether the key frame contains the product identified by the visual signature of the product. To generate the plurality of product visual signatures, the system collects multiple training images comprising multiple of expert product images obtained from an expert product repository, each of which is associated with multiple product images obtained from multiple web resources.

Citations

17 Claims

1. A computer method for providing product annotation in a video to one or more users, the method comprising:
- generating a product visual signature for a product by at least;
  
  collecting an unannotated expert product image of the product from an expert product repository,searching for a plurality of unannotated product images from a plurality of web resources different from the expert product repository, the plurality of unannotated product images related to the unannotated expert product image,selecting a subset of the plurality of unannotated product images by filtering the plurality of unannotated product images based on a similarity measure to the unannotated expert product image, andgenerating the product visual signature from the unannotated expert product image and the subset of the plurality of unannotated product images;
  
  receiving a video for product annotation, the video comprising a plurality of video frames;
  
  extracting a plurality of key frames from the video frames; and
  
  for each key frame;
  
  generating a visual representation of the key framed;
  
  comparing the visual representation with a plurality of product visual signatures including the product visual signature; and
  
  determining, based on the comparison, that the key frame contains the product identified by the product visual signature.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein extracting a plurality of key frames from the video comprises:
    - extracting each of the plurality of key frames at a fixed point of the video.
  - 3. The method of claim 1, wherein generating the visual signature of a key frame comprises:
    - extracting a plurality of visual features from the key frame;
      
      grouping the plurality of visual features into a plurality of clusters; and
      
      generating multi-dimensional bag visual words histogram as the visual signature of the key frame.
  - 4. The method of claim 3, wherein the plurality of visual features of a key frame are scale invariance feature transform (SIFT) descriptors of the key frame.
  - 5. The method of claim 1, wherein generating the subset of the plurality of unannotated product images represent a set of training images for generating the product visual signature.
  - 6. The method of claim 1, wherein generating the product visual signature further comprises:
    - applying a collective sparsification scheme to the, subset of the plurality of unannotated product images, wherein information unrelated to the product contained in a related product image is reduced in generating the product visual signature.
  - 7. The method of claim 1, wherein generating the product visual signature further comprises:
    - iteratively updating the product visual signature through a pre-determined number of iterations, wherein each of the iterations computes a respective similarity measure.
  - 8. The method of claim 1, further comprising:
    - collecting a plurality of unannotated expert product images of the product at different views of the product, wherein the subset of the subset of the plurality of unannotated product images comprise unannotated product images corresponding to the plurality of unannotated expert product images.
  - 9. The method of claim 1, wherein determining that the key frame contains the product identified comprises:
    - estimating product relevance between the visual representation of the key frame with each product visual signature of the plurality of the product visual signatures; and
      
      determining that the key frame contains the product identified by the product visual signature based on the estimated product relevance.

10. A non-transitory computer-readable storage medium storing executable computer program instructions for providing on-demand digital assets hosting services to one or more users, the computer program instructions when executed by a processor cause a system to perform operations comprising:
- generating a product visual signature for a product by at least;
  
  collecting an unannotated expert product image of the product from an expert product repository,searching for a plurality of unannotated product images from a plurality of web resources different from the expert product repository, the plurality of unannotated product images related to the unannotated expert product image,selecting a subset of the plurality of unannotated product images by filtering the plurality of unannotated product images based on a similarity measure to the unannotated expert product image, andgenerating the product visual signature from the unannotated expert product image and the subset of the plurality of unannotated product images;
  
  receiving a video from a user for product annotation, the video comprising a plurality of video frames;
  
  extracting a plurality of key frames from the video; and
  
  for each key frame;
  
  extracting a plurality of visual features from the key frame;
  
  grouping the plurality of visual features into a plurality of clusters; and
  
  generating a multi-dimensional bag visual words histogram as a visual representation of the key frame;
  
  comparing the visual representation with a plurality of product visual signatures comprising the product visual signature;
  
  determining, based on the comparison, whether the key frame contains the product identified by the product visual signature.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The computer-readable storage medium of claim 10, wherein the operations further comprise:
    - extracting each of the plurality of key frames at a fixed point of the video.
  - 12. The computer-readable storage medium of claim 10, wherein the plurality of visual features of a key frame are scale invariance feature transform (SIFT) descriptors of the key frame.
  - 13. The computer-readable storage medium of claim 10, wherein generating the subset of the plurality of unannotated product images represent a set of training images for generating the product visual signature.
  - 14. The computer-readable storage medium of claim 10, wherein generating the product visual signature further comprises:
    - applying a collective sparsification scheme to the, subset of the plurality of unannotated product images, wherein information unrelated to the product contained in a related product image is reduced in generating the product visual signature.
  - 15. The computer-readable storage medium of claim 10, wherein generating the product visual signature further comprises:
    - iteratively updating the product visual signature through a pre-determined number of iterations, wherein each of the iterations computes a respective similarity measure.
  - 16. The computer-readable storage medium of claim 10, wherein the operations further comprise:
    - collecting a plurality of unannotated expert product images of the product at different views of the product, wherein the subset of the subset of the plurality of unannotated product images comprise unannotated product images corresponding to the plurality of unannotated expert product images.
  - 17. The computer-readable storage medium of claim 10, wherein determining whether the key frame contains the product comprises:
    - estimating product relevance between the visual representation of the key frame with each product visual signature of the plurality of the product visual signatures; and
      
      determining that thee key frame contains the product identified by the product visual signature based on the estimated product relevance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
ViSenze Pte., Ltd.
Original Assignee
National University of Singapore
Inventors
Chua, Tat Seng, Li, Guangda, Lu, Zheng, Wang, Meng
Primary Examiner(s)
Ansari, Tahmina

Application Number

US14/111,149
Publication Number

US 20140029801A1
Time in Patent Office

1,511 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06F 16/7847 using low-level visual feat...

G06V 20/46 Extracting features or char...

In-video product annotation with web information mining

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

In-video product annotation with web information mining

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links