×

Associating media with metadata of near-duplicates

  • US 9,703,782 B2
  • Filed: 05/28/2010
  • Issued: 07/11/2017
  • Est. Priority Date: 05/28/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • retrieving a plurality of media objects responsive to a query media object presented to a search engine;

    extracting first visual words from the query media object, at least one of the first visual words being a vector quantization of a visual feature extracted from a media object;

    generating an inverted index mapping a plurality of visual words corresponding to individual media objects of the plurality of media objects;

    identifying near-duplicate media objects from the plurality of media objects based at least on analyzing the first visual words with respect to the inverted index and retrieving the individual media objects having at least one of the plurality of visual words with similarities to the first visual words greater than a predetermined threshold;

    extracting metadata from the near-duplicate media objects to form extracted metadata;

    storing the extracted metadata in a datastore as a set of metadata;

    increasing the set of metadata in the datastore based, at least in part, on a synonym dictionary;

    mining the set of metadata in the datastore to produce consolidated extracted metadata, wherein the mining the set of metadata includes utilizing a globalization data store, which maps terms from a first language to analogous terms in a second language;

    evaluating the consolidated extracted metadata to determine one or more metadata items that are common among the near-duplicate media objects; and

    associating the one or more metadata items that are common among the near-duplicate media objects with the query media object as one or more descriptors of the query media object to enable discovery of the query media object based on the one or more descriptors.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×