Extracting audiovisual features from content elements on online documents

US 10,586,127 B1
Filed: 06/23/2016
Issued: 03/10/2020
Est. Priority Date: 11/14/2011
Status: Active Grant

First Claim

Patent Images

1. A system to extract audiovisual features from online document elements, comprising:

a recognition engine that executes on a data processing system having one or more processors that;

receives, from a client device, a request for content to insert into an online document, the online document including a first audiovisual content element loaded into a first content slot and a second content slot, the second content slot separate from the first content slot on the online document, the first audiovisual content element originating from a source different from the content to be inserted into the second content slot and including image data, the request for content related to a search query including the first audiovisual content element and a characteristic of the second content slot;

retrieves, responsive to receipt of the request for content, a plurality of candidate audiovisual content elements from a content provider database based on the characteristic of the second content slot, the second audiovisual content element including image data;

extracts an image feature from the first audiovisual content element by applying an image feature detection to the image data of the first audiovisual content element;

identifies a text label corresponding the first audiovisual content element from a metadata field of the online document;

extracts an image feature from each candidate audiovisual content element by applying the image feature detection to the image data of the candidate audiovisual content element;

identifies a keyword of each candidate audiovisual content element, the keyword associated with the candidate audiovisual content element from on a previous search query and a corresponding interaction event;

determines an image feature match between the image feature of the first audiovisual content element and the image feature of each candidate audiovisual content element;

determines a keyword match between the text label of the first audiovisual content element from the metadata field of the online document and the keyword of the second audiovisual content from on the previous search query and the corresponding interaction event;

selects, from the plurality of candidate audiovisual content elements, a second audiovisual content element for display by the client device on the online document based on the image feature match and the keyword match; and

the data processing system that transmits, responsive to the selection of the second audiovisual content element, via a network interface, the second audiovisual content element to the client device for insertion by the client device into the second content slot of the online document to be presented on the online document with the first audiovisual content element loaded into the first content slot.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for extracting audiovisual features from online document elements are described herein. A computing device can identify a first audiovisual content element on an online document and can retrieve a second audiovisual content element from a content provider database. The computing device can extract an image, video, or audio feature from the first and the second audiovisual content elements by applying image feature, video frame feature, or audio fingerprint detection. The computing device can determine a match between the features extracted from the first and the second audiovisual content elements. The computing device can select the second audiovisual content element for display on the online document based on the match. The computing device can transmit the second audiovisual content element for insertion in a content slot of the online document.

165 Citations

16 Claims

1. A system to extract audiovisual features from online document elements, comprising:
- a recognition engine that executes on a data processing system having one or more processors that;
  
  receives, from a client device, a request for content to insert into an online document, the online document including a first audiovisual content element loaded into a first content slot and a second content slot, the second content slot separate from the first content slot on the online document, the first audiovisual content element originating from a source different from the content to be inserted into the second content slot and including image data, the request for content related to a search query including the first audiovisual content element and a characteristic of the second content slot;
  
  retrieves, responsive to receipt of the request for content, a plurality of candidate audiovisual content elements from a content provider database based on the characteristic of the second content slot, the second audiovisual content element including image data;
  
  extracts an image feature from the first audiovisual content element by applying an image feature detection to the image data of the first audiovisual content element;
  
  identifies a text label corresponding the first audiovisual content element from a metadata field of the online document;
  
  extracts an image feature from each candidate audiovisual content element by applying the image feature detection to the image data of the candidate audiovisual content element;
  
  identifies a keyword of each candidate audiovisual content element, the keyword associated with the candidate audiovisual content element from on a previous search query and a corresponding interaction event;
  
  determines an image feature match between the image feature of the first audiovisual content element and the image feature of each candidate audiovisual content element;
  
  determines a keyword match between the text label of the first audiovisual content element from the metadata field of the online document and the keyword of the second audiovisual content from on the previous search query and the corresponding interaction event;
  
  selects, from the plurality of candidate audiovisual content elements, a second audiovisual content element for display by the client device on the online document based on the image feature match and the keyword match; and
  
  the data processing system that transmits, responsive to the selection of the second audiovisual content element, via a network interface, the second audiovisual content element to the client device for insertion by the client device into the second content slot of the online document to be presented on the online document with the first audiovisual content element loaded into the first content slot.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The system of claim 1, wherein the recognition engine:
    - extracts a plurality of image features from the first audiovisual content element by applying an image feature detection to the image data of the first audiovisual content element;
      
      extracts a plurality of image features from the second audiovisual content element by applying the image feature detection to the image data of the second audiovisual content element;
      
      identifies a number of image feature matches between the plurality of image features of the first audiovisual content element and the plurality of image features of the second audiovisual content element;
      
      determines that the number of image feature matches exceeds a threshold number; and
      
      selects the second audiovisual content element responsive to the determination that the number of image feature matches exceeds the threshold number.
  - 3. The system of claim 1, wherein the recognition engine:
    - extracts a first image feature from the first audiovisual content element by applying a first image feature detection to the image data of the first audiovisual content element;
      
      extracts a first image feature from the second audiovisual content element by applying the first image feature detection to the image data of the second audiovisual content element;
      
      extracts a second image feature from the first audiovisual content element by applying a second image feature detection to the image data of the first audiovisual content element, the second image feature detection different from the first image feature detection;
      
      extracts a second feature from the second audiovisual content element by applying the second image feature detection to the image data of the second audiovisual content element; and
      
      determines a first image feature match between the first image feature of the first audiovisual content element and the first image feature of the second audiovisual content element;
      
      determines a second image feature match between the second image feature of the first audiovisual content element and the second image feature of the second audiovisual content element; and
      
      selects the second audiovisual content element based on the first image feature match and the second image feature match.
  - 4. The system of claim 1, wherein the recognition engine:
    - retrieves the second audiovisual content element including video data, the video data defining a frame image of the second audiovisual content element;
      
      extracts a video feature from the second audiovisual content element by applying a video frame feature detection to the frame image of the video data of the second audiovisual content element;
      
      determines a video feature match between the image feature of the first audiovisual content element and the video feature of the second audiovisual content element; and
      
      selects the second audiovisual content element based on the image feature match and the video feature match.
  - 5. The system of claim 1, wherein the recognition engine:
    - identifies the first audiovisual content element including video data, the video data defining a frame image of the first audiovisual content element;
      
      retrieves the second audiovisual content element including video data, the video data defining a frame image of the second audiovisual content element;
      
      extracts a plurality of video features from the second audiovisual content element by applying a video frame feature detection to the frame image of the video data of the first audiovisual content element;
      
      extracts a plurality of video features from the second audiovisual content element by applying the video frame feature detection to the frame image of the video data of the second audiovisual content element;
      
      identifies a number of video feature matches between the plurality of video features feature of the first audiovisual content element and the plurality of video features of the second audiovisual content element;
      
      determines that the number of video feature matches exceeds a threshold number; and
      
      selects the second audiovisual content element responsive to the determination that the number of video feature matches exceeds the threshold number.
  - 6. The system of claim 1, wherein the recognition engine:
    - identifies the first audiovisual content element including audio data;
      
      retrieves the second audiovisual content element including audio data;
      
      extracts an audio feature from the first audiovisual content element by applying an audio fingerprint detection to the audio data of the first audiovisual content element;
      
      extracts an audio feature from the second audiovisual content element by applying the audio fingerprint detection to the audio data of the second audiovisual content element;
      
      determines an audio feature match between the audio feature of the first audiovisual content element and the audio feature of the second audiovisual content element; and
      
      selects the second audiovisual content element based on the image feature match and the audio feature match.
  - 7. The system of claim 1, wherein the recognition engine:
    - identifies text label included in the first audiovisual content element;
      
      identifies keyword of the second audiovisual content, the keyword associated with the second audiovisual content based on a previous search query and a corresponding interaction event;
      
      determines a keyword match between the text label of the first audiovisual content element the keyword of the second audiovisual content; and
      
      selects the second audiovisual content based on the keyword match.
  - 8. The system of claim 1, wherein the recognition engine generates the second audiovisual content based on one or more specified parameters.

9. A method of extracting audiovisual features from online document elements, comprising:
- receiving, by a recognition engine executing on a data processing system having one or more processors, from a client device, a request for content to insert into an online document, the online document including a first audiovisual content element loaded into a first content slot and a second content slot, the second content slot separate from the first content slot on the online document, the first audiovisual content element originating from a source different from the content to be inserted into the second content slot and including image data, the request for content related to a search query including the first audiovisual content element and a characteristic of the second content slot;
  
  retrieving, by the recognition engine, responsive to receipt of the request for content, a plurality of candidate audiovisual content elements from a content provider database based on the characteristic of the second content slot, the second audiovisual content element including image data;
  
  extracting, by the recognition engine, an image feature from the first audiovisual content element by applying an image feature detection to the image data of the first audiovisual content element;
  
  identifying, by the recognition engine, a text label corresponding the first audiovisual content element from a metadata field of the online document;
  
  extracting, by the recognition engine, an image feature from each candidate audiovisual content element by applying the image feature detection to the image data of the candidate audiovisual content element;
  
  identifying, by the recognition engine, a keyword of each candidate audiovisual content element, the keyword associated with the candidate audiovisual content element from on a previous search query and a corresponding interaction eventdetermining, by the recognition engine, an image feature match between the image feature of the first audiovisual content element and the image feature of each candidate audiovisual content element;
  
  determining, by the recognition engine, a keyword match between the text label of the first audiovisual content element from the metadata field of the online document and the keyword of the second audiovisual content from on the previous search query and the corresponding interaction event;
  
  selecting, by the recognition engine, from the plurality of candidate audiovisual content elements, a second audiovisual content element for display by the client device on the online document based on the image feature match and the keyword match; and
  
  transmitting, by the data processing system, responsive to the selection of the second audiovisual content element, via a network interface, the second audiovisual content element to the client device for insertion by the client device into the second content slot of the online document to be presented on the online document with the first audiovisual content element loaded into the first content slot.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The method of claim 9, comprising:
    - extracting, by the recognition engine, a plurality of image features from the first audiovisual content element by applying an image feature detection to the image data of the first audiovisual content element;
      
      extracting, by the recognition engine, a plurality of image features from the second audiovisual content element by applying the image feature detection to the image data of the second audiovisual content element;
      
      identifying, by the recognition engine, a number of image feature matches between the plurality of image features of the first audiovisual content element and the plurality of image features of the second audiovisual content element;
      
      determining, by the recognition engine, that the number of image feature matches exceeds a threshold number;
      
      selecting, by the recognition engine, the second audiovisual content element responsive to determining that the number of image feature matches exceeds the threshold number.
  - 11. The method of claim 9, comprising:
    - extracting, by the recognition engine, a first image feature from the first audiovisual content element by applying a first image feature detection to the image data of the first audiovisual content element;
      
      extracting, by the recognition engine, a first image feature from the second audiovisual content element by applying the first image feature detection to the image data of the second audiovisual content element;
      
      extracting, by the recognition engine, a second image feature from the first audiovisual content element by applying a second image feature detection to the image data of the first audiovisual content element, the second image feature detection different from the first image feature detection;
      
      extracting, by the recognition engine, a second feature from the second audiovisual content element by applying the second image feature detection to the image data of the second audiovisual content element; and
      
      determining, by the recognition engine, a first image feature match between the first image feature of the first audiovisual content element and the first image feature of the second audiovisual content element;
      
      determining, by the recognition engine, a second image feature match between the second image feature of the first audiovisual content element and the second image feature of the second audiovisual content element; and
      
      selecting, by the recognition engine, the second audiovisual content element based on the first image feature match and the second image feature match.
  - 12. The method of claim 9, comprising:
    - retrieving, by the recognition engine, the second audiovisual content element including video data, the video data defining a frame image of the second audiovisual content element;
      
      extracting, by the recognition engine, a video feature from the second audiovisual content element by applying a video frame feature detection to the frame image of the video data of the second audiovisual content element;
      
      determining, by the recognition engine, a video feature match between the image feature of the first audiovisual content element and the video feature of the second audiovisual content element; and
      
      selecting, by the recognition engine, the second audiovisual content element based on the image feature match and the video feature match.
  - 13. The method of claim 9, comprising:
    - identifying, by the recognition engine, the first audiovisual content element including video data, the video data defining a frame image of the first audiovisual content element;
      
      retrieving, by the recognition engine, the second audiovisual content element including video data, the video data defining a frame image of the second audiovisual content element;
      
      extracting, by the recognition engine, a plurality of video features from the second audiovisual content element by applying a video frame feature detection to the frame image of the video data of the first audiovisual content element;
      
      extracting, by the recognition engine, a plurality of video features from the second audiovisual content element by applying the video frame feature detection to the frame image of the video data of the second audiovisual content element;
      
      identifying, by the recognition engine, a number of video feature matches between the plurality of video features feature of the first audiovisual content element and the plurality of video features of the second audiovisual content element;
      
      determining, by the recognition engine, that the number of video feature matches exceeds a threshold number; and
      
      selecting, by the recognition engine, the second audiovisual content element responsive to determining that the number of video feature matches exceeds the threshold number.
  - 14. The method of claim 9, comprising:
    - identifying, by the recognition engine, the first audiovisual content element including audio data;
      
      retrieving, by the recognition engine, the second audiovisual content element including audio data;
      
      extracting, by the recognition engine, an audio feature from the first audiovisual content element by applying an audio fingerprint detection to the audio data of the first audiovisual content element;
      
      extracting, by the recognition engine, an audio feature from the second audiovisual content element by applying the audio fingerprint detection to the audio data of the second audiovisual content element;
      
      determining, by the recognition engine, an audio feature match between the audio feature of the first audiovisual content element and the audio feature of the second audiovisual content element; and
      
      selecting, by the recognition engine, the second audiovisual content element based on the image feature match and the audio feature match.
  - 15. The method of claim 9, comprising:
    - identifying, by the recognition engine, text label included in the first audiovisual content element;
      
      identifying, by the recognition engine, keyword of the second audiovisual content, the keyword associated with the second audiovisual content based on a previous search query and a corresponding interaction event;
      
      determining, by the recognition engine, a keyword match between the text label of the first audiovisual content element the keyword of the second audiovisual content; and
      
      selecting, by the recognition engine, the second audiovisual content based on the keyword match.
  - 16. The method of claim 9, comprising:
    - generating, by the recognition engine, the second audiovisual content based on one or more specified parameters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Yeo, Boon-Lock, Gu, Xuemei, Li, Gangjiang
Primary Examiner(s)
Entezari, Michelle M

Application Number

US15/190,897
Time in Patent Office

1,356 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 16/41   Indexing; Data structures t...

G06V 30/40   Document-oriented image-bas...

G10L 19/018   Audio watermarking, i.e. em...

G10L 25/54   for retrieval

Extracting audiovisual features from content elements on online documents

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

165 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Extracting audiovisual features from content elements on online documents

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

165 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links