Search with joint image-audio queries

US 8,788,434 B2
Filed: 10/28/2010
Issued: 07/22/2014
Est. Priority Date: 10/28/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method performed by a data processing apparatus, the method comprising:

receiving, by a data processing apparatus, a joint image-audio query sent to the data processing apparatus from a client device separate from the data processing apparatus, the joint image-audio query including query image data defining a query image and query audio data defining query audio, wherein;

the query image data is an image file;

the query audio data is an audio recording file of speech; and

the query image data and the query audio data are paired as the joint-image audio query at the client device and then sent to the data processing apparatus;

determining, by the data processing apparatus, query image feature data from the query image data included in the received joint image-audio query, the query image feature data describing image features of the query image;

determining, by the data processing apparatus, query audio feature data from the audio data included in the received joint image-audio query, the query audio feature data including text derived from the audio recording of speech;

providing, by the data processing apparatus, the query image feature data and the query audio feature data to a joint image-audio relevance model that i) receives, as input, image feature data and audio feature data, and ii) is trained to generate relevance scores for a plurality of resources based on a combined relevance of the query image feature data to image feature data of the resource and the text derived from the audio recording of speech to text of the resource;

identifying, by the data processing apparatus, resources responsive to the joint image-audio query based, in part, on a corresponding relevance score that was determined by the joint image-audio relevance model, wherein each identified resource includes i) resource image data defining a resource image for the identified resource, and ii) text data defining resource text for the identified resource, and wherein each relevance score for each identified resource is a measure of the relevance of the corresponding resource image data and text data defining the resource text to the query image feature data and the text derived from the audio recording of speech;

ordering, by the data processing apparatus, the identified resources according to the corresponding relevance scores; and

providing, by the data processing apparatus, data defining search results indicating the order of the identified resources to the client device.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing joint image-audio queries. In one aspect, a method includes receiving, from a client device, a joint image-audio query including query image data and query audio data. Query image feature data is determined from the query image data. Query audio feature data is determined from the audio data. The query image feature data and the query audio feature data are provided to a joint image-audio relevance model trained to generate relevance scores for a plurality of resources, each resource including resource image data defining a resource image for the resource and text data defining resource text for the resource. Each relevance score is a measure of the relevance of corresponding resource to the joint image-audio query. Data defining search results indicating the order of the resources is provided to the client device.

8 Citations

View as Search Results

13 Claims

1. A computer-implemented method performed by a data processing apparatus, the method comprising:
- receiving, by a data processing apparatus, a joint image-audio query sent to the data processing apparatus from a client device separate from the data processing apparatus, the joint image-audio query including query image data defining a query image and query audio data defining query audio, wherein;
  
  the query image data is an image file;
  
  the query audio data is an audio recording file of speech; and
  
  the query image data and the query audio data are paired as the joint-image audio query at the client device and then sent to the data processing apparatus;
  
  determining, by the data processing apparatus, query image feature data from the query image data included in the received joint image-audio query, the query image feature data describing image features of the query image;
  
  determining, by the data processing apparatus, query audio feature data from the audio data included in the received joint image-audio query, the query audio feature data including text derived from the audio recording of speech;
  
  providing, by the data processing apparatus, the query image feature data and the query audio feature data to a joint image-audio relevance model that i) receives, as input, image feature data and audio feature data, and ii) is trained to generate relevance scores for a plurality of resources based on a combined relevance of the query image feature data to image feature data of the resource and the text derived from the audio recording of speech to text of the resource;
  
  identifying, by the data processing apparatus, resources responsive to the joint image-audio query based, in part, on a corresponding relevance score that was determined by the joint image-audio relevance model, wherein each identified resource includes i) resource image data defining a resource image for the identified resource, and ii) text data defining resource text for the identified resource, and wherein each relevance score for each identified resource is a measure of the relevance of the corresponding resource image data and text data defining the resource text to the query image feature data and the text derived from the audio recording of speech;
  
  ordering, by the data processing apparatus, the identified resources according to the corresponding relevance scores; and
  
  providing, by the data processing apparatus, data defining search results indicating the order of the identified resources to the client device.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The computer-implemented method of claim 1 wherein the query audio feature data includes data that relates to a query object in the query image data by further describing the query object, the query object being a subset of the query image that includes image data that depicts an object of interest.
  - 3. The computer-implemented method of claim 2 wherein the data that relates to the query object in the query image further describes information indicating a position of the query object.
  - 4. The computer-implemented method of claim 1 wherein the query audio feature data includes data that defines one or more restrictions on the search results.
  - 5. The computer-implemented method of claim 1 wherein one or more resource images are used to refine the search results.
  - 6. The computer-implemented method of claim 1 wherein one or more resource text data are used to refine the search results.

7. A system, comprising:
- a data processing apparatus; and
  
  a computer storage medium encoded with a computer program, the program comprising instructions that when executed by the data processing apparatus cause the data processing apparatus to perform operations comprising;
  
  receiving a joint image-audio query sent to the data processing apparatus from a client device separate from the data processing apparatus, the joint image-audio query including query image data defining a query image and query audio data defining query audio, wherein;
  
  the query image data is an image file;
  
  the query audio data is an audio recording file of speech; and
  
  the query image data and the query audio data are paired as the join-image audio query at the client device and then sent to the data processing apparatus;
  
  determining query image feature data from the query image data included in the received joint image-audio query, the query image feature data describing image features of the query image;
  
  determining query audio feature data from the audio data included in the received joint image-audio query, the query audio feature data including text derived from the audio recording of speech;
  
  providing the query image feature data and the query audio feature data to a joint image-audio relevance model that i) receives, as input, image feature data and audio feature data, and ii) is trained to generate relevance scores for a plurality of resources based on a combined relevance of the query image feature data to image feature data of the resource and the text derived from the audio recording of speech to text of the resource;
  
  identifying resources responsive to the joint image-audio query based, in part, on a corresponding relevance score that was determined by the joint image audio relevance model, wherein each identified resource includes resource image data defining a resource image for the identified resource and text data defining resource text for the identified resource, and wherein each relevance score for each identified resource is a measure of the relevance of the corresponding resource image data and text data defining the resource text to the query image feature data and the text derived from the audio recording of speech;
  
  ordering the identified resources according to the corresponding relevance scores; and
  
  providing data defining search results indicating the order of the identified resources to the client device.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7 wherein the query audio feature data includes data that relates to a query object in the query image data by further describing the query object, the query object being a subset of the query image that includes image data that depicts an object of interest.
  - 9. The system of claim 7 wherein the data that relates to the query object in the query image further describes information indicating a position of the query object.
  - 10. The system of claim 7 wherein the query audio feature data includes data that defines one or more restrictions on the search results.
  - 11. The system of claim 7 wherein one or more resource images are used to refine the search results.
  - 12. The system of claim 7 wherein one or more resource text data are used to refine the search results.

13. A computer storage device encoded with a computer program, the program comprising instructions that when executed by a client device cause the client device to perform operations comprising:
- receiving a joint image-audio query sent to the data processing apparatus from a client device separate from the data processing apparatus, the joint image-audio query including query image data defining a query image and query audio data defining query audio, wherein;
  
  the query image data is an image file;
  
  the query audio data is an audio recording file of speech; and
  
  the query image data and the query audio data are paired as the join-image audio query at the client device and then sent to the data processing apparatus;
  
  determining query image feature data from the query image data included in the received joint image-audio query, the query image feature data describing image features of the query image;
  
  determining query audio feature data from the audio data included in the received joint image-audio query, the query audio feature data including text derived from the query audio;
  
  providing the query image feature data and the query audio feature data to a joint image-audio relevance model that i) receives, as input, image feature data and audio feature data, and ii) is trained to generate relevance scores for a plurality of resources based on a combined relevance of the query image feature data to image feature data of the resource and the text derived from the audio recording of speech to text of the resource;
  
  identifying resources responsive to the joint image-audio query based, in part, on a corresponding relevance score that was determined by the joint image audio relevance model, wherein each identified resource includes resource image data defining a resource image for the identified resource and text data defining resource text for the identified resource, and wherein each relevance score for each identified resource is a measure of the relevance of the corresponding resource image data and text data defining the resource text to the query image feature data and the text derived from the audio recording of speech;
  
  ordering the identified resources according to the corresponding relevance scores; and
  
  providing data defining search results indicating the order of the identified resources to the client device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Makadia, Ameesh, Weston, Jason E.
Primary Examiner(s)
VINCENT, DAVID ROBERT

Application Number

US12/914,653
Publication Number

US 20120109858A1
Time in Patent Office

1,363 Days
Field of Search

706/12, 706/45, 706/62
US Class Current

706/12
CPC Class Codes

G06F 16/433   using audio data

G06F 16/434   using image data, e.g. imag...

G06F 16/435   Filtering based on addition...

G06F 16/438   Presentation of query results

Search with joint image-audio queries

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

8 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Search with joint image-audio queries

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

8 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links