×

Joint acoustic and visual processing

  • US 10,515,292 B2
  • Filed: 06/15/2017
  • Issued: 12/24/2019
  • Est. Priority Date: 06/15/2016
  • Status: Active Grant
First Claim
Patent Images

1. A method for cross-modal media processing comprising:

  • configuring a cross-modal similarity processor, including processing a first reference set of media that includes a set of corresponding pairs of media items, each pair of the media items includes one audio item and one image item, the items of each pair having related content elements;

    wherein the configuring of the similarity processor includes setting parameter values for an image processor and for an audio processor, the image processor and the audio processor each being configured to produce a fixed-length numerical representation of an input image and input audio signal, respectively, wherein the image processor is configured to produce a first numerical vector, and the audio processor is configured to produce a second numerical vector,wherein the image processor and the audio processor each comprises an artificial neural network, and setting parameter values for the image processor and for the audio processor includes applying a neural network weight determination approach to determine the parameter values, andwherein the similarity processor is configured to output a quantity representing a similarity between the input image and the input audio signal based on the numerical representations, the quantity representing the similarity comprising a similarity between the first numerical vector and the second numerical vector.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×