Learning multimedia semantics from large-scale unstructured data

US 9,875,301 B2
Filed: 04/30/2014
Issued: 01/23/2018
Est. Priority Date: 04/30/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

extracting, by at least one or more computing devices, visual features from images of a corpus of images;

arranging, by the at least one or more computing devices, the images in clusters based at least in part on similarities of the visual features;

calculating, by the at least one or more computing devices, at least two relevance features, including;

first relevance features representing distribution characteristics of distances between pairs of images in a same cluster; and

second relevance features representing distribution characteristics of distances between different clusters of images; and

refining, by the at least one or more computing devices, the corpus by removing one or more images from the corpus based in part on the at least two relevance features to create a refined corpus.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for learning topic models from unstructured data and applying the learned topic models to recognize semantics for new data items are described herein. In at least one embodiment, a corpus of multimedia data items associated with a set of labels may be processed to generate a refined corpus of multimedia data items associated with the set of labels. Such processing may include arranging the multimedia data items in clusters based on similarities of extracted multimedia features and generating intra-cluster and inter-cluster features. The intra-cluster and the inter-cluster features may be used for removing multimedia data items from the corpus to generate the refined corpus. The refined corpus may be used for training topic models for identifying labels. The resulting models may be stored and subsequently used for identifying semantics of a multimedia data item input by a user.

Citations

20 Claims

1. A method comprising:
- extracting, by at least one or more computing devices, visual features from images of a corpus of images;
  
  arranging, by the at least one or more computing devices, the images in clusters based at least in part on similarities of the visual features;
  
  calculating, by the at least one or more computing devices, at least two relevance features, including;
  
  first relevance features representing distribution characteristics of distances between pairs of images in a same cluster; and
  
  second relevance features representing distribution characteristics of distances between different clusters of images; and
  
  refining, by the at least one or more computing devices, the corpus by removing one or more images from the corpus based in part on the at least two relevance features to create a refined corpus.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A method as claim 1 recites wherein a first cluster of the different clusters is associated with a first label and a second cluster of the different clusters is associated with a second label.
  - 3. A method as claim 1 recites, further comprising:
    - processing the refined corpus by applying one or more learning algorithms to the refined corpus; and
      
      creating one or more models associated with a topic for identifying an image.
  - 4. A method as claim 1 recites, wherein the visual features include at least one of edges, corners, or objects.
  - 5. A method as claim 1 recites, further comprising:
    - extracting textual features from textual data associated with the images; and
      
      arranging the images in the clusters based at least in part on similarities of the visual features and the textual features.

6. A method comprising:
- receiving, by at least one or more computing devices, a corpus of images associated with a set of labels;
  
  extracting, by the at least one or more computing devices, visual features from the images;
  
  arranging, by the at least one or more computing devices;
  
  the images into a plurality of clusters based at least in part on similarities of the visual features;
  
  determining, by the at least one or more computing devices, at least two relevance features associated with individual clusters of the plurality of clusters, wherein;
  
  first relevance features of the at least two relevance features are based on pairs of images in a first cluster of the plurality of clusters;
  
  the first cluster is associated with a first label of the set of labels; and
  
  second relevance features of the at least two relevance features are based on the first cluster and at least one second cluster associated with a second label of the set of labels;
  
  processing, by the at least one or more computing devices, the corpus of images to generate a refined corpus of images associated with the set of labels based in part on the at least two relevance features; and
  
  training, by the at least one or more computing devices, a set of models for identifying individual labels of the set of labels based at least in part on the extracted visual features.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
- - 7. A method as claim 6 recites wherein the processing further comprises removing images from the corpus based in part on the at least two relevance features.
  - 8. A method as claim 7 recites wherein the first relevance features represent distribution characteristics of distances between the pairs of images in the first cluster.
  - 9. A method as claim 7 recites wherein the second relevance features represent distribution characteristics of distances between the first cluster and a plurality of second clusters associated with the second label.
  - 10. A method as claim 6 recites wherein the receiving the corpus of images comprises receiving individual images from at least one of one or more search engines, sharing sites, or websites.
  - 11. A method as claim 6 recites, further comprising receiving textual queries corresponding to individual labels of the set of labels, wherein the individual labels represent a semantic meaning associated with individual images of the corpus of images.
  - 12. A method as claim 11 recites, further comprising, prior to receiving the textual queries:
    - receiving a topic query identifying a topic;
      
      sending the topic query to at least one of one or more search engines, sharing sites, knowledge databases, or websites;
      
      responsive to sending the topic query, receiving a set of labels associated with the topic; and
      
      identifying the textual queries from the set of labels.
  - 13. A method as claim 6 recites, further comprising:
    - receiving a new textual query identifying a new label;
      
      receiving a new corpus of images associated with the new label;
      
      extracting new visual features from the new corpus of images;
      
      training a new model for identifying the new label based at least in part on the new visual features; and
      
      storing the new model for identifying the new label with a set of previously stored models.

14. A system comprising:
- memory;
  
  one or more processors; and
  
  one or more modules stored in the memory and executable by the one or more processors, the one or more modules including;
  
  a labeling module configured to learn a topic model associated with one or more based at least in part on;
  
  extracting visual features from a corpus of images associated with the one or more labels; and
  
  processing the corpus of images based in part on at least two relevance features;
  
  first relevance features of the at least two relevance features representing distribution characteristics of distances between pairs of images in a same cluster; and
  
  second relevance features of the at least two relevance features representing distribution characteristics of distances between different clusters of images.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. A system as claim 14 recites, wherein the one or more modules further include:
    - an input module configured to receive an input including an image; and
      
      an output module configured to output one or more results based on applying the topic model to the image, the one or more results including at least one label of the one or more labels identifying the image.
  - 16. A system s claim 15 recites wherein the input further includes a topic associated with the image.
  - 17. A system as claim 15 recites wherein the output module is further configured to rank the one or more labels identifying the image based at least in part on a confidence score.
  - 18. A system as claim 15 recites, further comprising an annotation module configured to:
    - query one or more search engines, sharing sites, or websites, wherein individual queries include individual labels of the one or more labels identifying the image;
      
      receive annotation information associated with the one or more labels identifying the image; and
      
      present the annotation information associated with the one or more labels identifying the image to the output module.
  - 19. A system as claim 18 recites, wherein the output module is further configured to output the annotation information with the one or more labels identifying the image.
  - 20. A system as claim 15 recites, wherein:
    - the image is associated with two or more regions of interest;
      
      the labeling module is further configured to apply the topic model to determine two or more labels associated with the two or more regions of interest; and
      
      the output module is further configured to output the two or more labels identifying the two or more regions of interest.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Hua, Xian-Sheng, Li, Jin, Ushiku, Yoshitaka
Primary Examiner(s)
Gonzales, Vincent

Application Number

US14/266,228
Publication Number

US 20150317389A1
Time in Patent Office

1,364 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/334   Query execution G06F16/335 ...

G06F 16/35   Clustering; Classification

G06F 16/951   Indexing; Web crawling tech...

G06N 20/00   Machine learning

Learning multimedia semantics from large-scale unstructured data

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Learning multimedia semantics from large-scale unstructured data

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links