×

Search with joint image-audio queries

  • US 8,788,434 B2
  • Filed: 10/28/2010
  • Issued: 07/22/2014
  • Est. Priority Date: 10/28/2010
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method performed by a data processing apparatus, the method comprising:

  • receiving, by a data processing apparatus, a joint image-audio query sent to the data processing apparatus from a client device separate from the data processing apparatus, the joint image-audio query including query image data defining a query image and query audio data defining query audio, wherein;

    the query image data is an image file;

    the query audio data is an audio recording file of speech; and

    the query image data and the query audio data are paired as the joint-image audio query at the client device and then sent to the data processing apparatus;

    determining, by the data processing apparatus, query image feature data from the query image data included in the received joint image-audio query, the query image feature data describing image features of the query image;

    determining, by the data processing apparatus, query audio feature data from the audio data included in the received joint image-audio query, the query audio feature data including text derived from the audio recording of speech;

    providing, by the data processing apparatus, the query image feature data and the query audio feature data to a joint image-audio relevance model that i) receives, as input, image feature data and audio feature data, and ii) is trained to generate relevance scores for a plurality of resources based on a combined relevance of the query image feature data to image feature data of the resource and the text derived from the audio recording of speech to text of the resource;

    identifying, by the data processing apparatus, resources responsive to the joint image-audio query based, in part, on a corresponding relevance score that was determined by the joint image-audio relevance model, wherein each identified resource includes i) resource image data defining a resource image for the identified resource, and ii) text data defining resource text for the identified resource, and wherein each relevance score for each identified resource is a measure of the relevance of the corresponding resource image data and text data defining the resource text to the query image feature data and the text derived from the audio recording of speech;

    ordering, by the data processing apparatus, the identified resources according to the corresponding relevance scores; and

    providing, by the data processing apparatus, data defining search results indicating the order of the identified resources to the client device.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×