Systems and methods for manipulating electronic content based on speech recognition

US 9,311,395 B2
Filed: 06/09/2011
Issued: 04/12/2016
Est. Priority Date: 06/10/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for manipulating electronic multimedia content, the method comprising:

generating, using a processor, a speech model, a non-speech model, at least one speaker model of an individual speaker, and a non-speaker speech model;

receiving electronic media content over a network;

extracting an audio track from the electronic media content;

detecting speech segments within the extracted audio track based on the speech model and the non-speech model, the speech segments containing speech from at least one of a plurality of speakers;

detecting a speaker segment within the detected speech segments based on the speaker model and the non-speaker speech model, the speaker segment containing speech from the individual speaker;

calculating a first probability of the detected speaker segment involving the individual speaker based on the at least one speaker speech model and the non-speaker speech model;

determining a ranking or filtration of the electronic media content relative to other electronic media content based on the first probability of the detected speaker segment;

detecting a face within a part of the electronic media content corresponding to the detected speaker segment and calculating a second probability of the detected face being a face of the individual speaker; and

adjusting the ranking or filtration of the electronic media content based on the second probability.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are disclosed for displaying electronic multimedia content to a user. One computer-implemented method for manipulating electronic multimedia content includes generating, using a processor, a speech model and at least one speaker model of an individual speaker. The method further includes receiving electronic media content over a network; extracting an audio track from the electronic media content; and detecting speech segments within the electronic media content based on the speech model. The method further includes detecting a speaker segment within the electronic media content and calculating a probability of the detected speaker segment involving the individual speaker based on the at least one speaker model.

33 Citations

View as Search Results

20 Claims

1. A computer-implemented method for manipulating electronic multimedia content, the method comprising:
- generating, using a processor, a speech model, a non-speech model, at least one speaker model of an individual speaker, and a non-speaker speech model;
  
  receiving electronic media content over a network;
  
  extracting an audio track from the electronic media content;
  
  detecting speech segments within the extracted audio track based on the speech model and the non-speech model, the speech segments containing speech from at least one of a plurality of speakers;
  
  detecting a speaker segment within the detected speech segments based on the speaker model and the non-speaker speech model, the speaker segment containing speech from the individual speaker;
  
  calculating a first probability of the detected speaker segment involving the individual speaker based on the at least one speaker speech model and the non-speaker speech model;
  
  determining a ranking or filtration of the electronic media content relative to other electronic media content based on the first probability of the detected speaker segment;
  
  detecting a face within a part of the electronic media content corresponding to the detected speaker segment and calculating a second probability of the detected face being a face of the individual speaker; and
  
  adjusting the ranking or filtration of the electronic media content based on the second probability.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 11)
- - 2. The computer-implemented method of claim 1, further comprising:
    - displaying electronic media content to users based on the ranking or filtration.
  - 3. The computer-implemented method of claim 1, further comprises:
    - analyzing a query to generate a list of associated speakers; and
      
      adjusting the ranking of electronic media content based on detected speech segments from speakers in the list.
  - 4. The computer-implemented method of claim 1, further comprises:
    - analyzing a query to generate a list of associated speakers; and
      
      selecting electronic media content that have speech from speakers in the list.
  - 5. The computer-implemented method of claim 1, further comprising:
    - generating a plurality of speaker models for a subset of people, each speaker model corresponding to one person in the subset of people; and
      
      calculating a probability of the speaker segment involving one of the people in the subset of people, based on the plurality of speaker models.
  - 6. The computer implemented method of claim 1, further comprising:
    - applying speaker segments and their probabilities to detect duplicated videos, among electronic media content.
  - 7. The computer implemented method of claim 1, further comprising:
    - applying speaker segments and their probabilities to detect words spoken by a particular individual speaker.
  - 8. The computer-implemented method of claim 7, further comprising:
    - applying detected words from the particular individual speaker to the ranking or filtration of electronic media content; and
      
      displaying electronic media content to users based on the ranking or filtration.
  - 9. The computer-implemented method of claim 1, further comprising:
    - applying speaker segments and their probabilities to detect individual speakers represented in electronic media content.
  - 11. The computer-implemented method of claim 1, further comprising:
    - applying speaker segments and their probabilities to extract preview clips from electronic media content; and
      
      displaying the extracted preview clips associated with electronic media content to users.

10. The computer-implemented method of claim further comprising:
- applying detected individual speakers to the ranking or filtration of electronic media content; and
  
  displaying electronic media content to users based on the ranking or filtration.

12. A system for manipulating electronic multimedia content, the system comprising:
- a data storage device storing instructions for manipulating electronic multimedia content; and
  
  a processor configured to execute the instructions stored in the data storage device for;
  
  generating a speech model, a non-speech model, at least one speaker model of an individual speaker, and a non-speaker speech model;
  
  receiving electronic media content over a network;
  
  extracting an audio track from the electronic media content;
  
  detecting speech segments within the extracted audio track based on the speech model and the non-speech model, the speech segments containing speech from at least one of a plurality of speakers;
  
  detecting a speaker segment within the detected speech segments based on the speaker model and the non-speaker speech model, the speaker segment containing speech from the individual speaker;
  
  calculating a first probability of the detected speaker segment involving the individual speaker based on the at least one speaker speech model and the non-speaker speech model;
  
  determining a ranking or filtration of the electronic media content relative to other electronic media content based on the first probability of the detected speaker segment;
  
  detecting a face within a part of the electronic media content corresponding to the detected speaker segment and calculating a second probability of the detected face being a face of the individual speaker; and
  
  adjusting the ranking or filtration of the electronic media content based on the second probability.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
- - 13. The system of claim 12, wherein the processor is further configured to execute instructions for:
    - displaying electronic media content to users based on the ranking or filtration.
  - 14. The system of claim 12, wherein the processor is further configured to execute instructions for:
    - analyzing a query to generate a list of associated speakers; and
      
      adjusting the ranking of electronic media content based on detected speech segments from speakers in the list.
  - 15. The system of claim 12, wherein the processor is further configured to execute instructions for:
    - analyzing a query to generate a list of associated speakers; and
      
      selecting electronic media content that have speech from speakers in the list.
  - 16. The system of claim 12, wherein the processor is further configured to execute instructions for:
    - generating a plurality of speaker models for a subset of people, each speaker model corresponding to one person in the subset of people; and
      
      calculating a probability of the speaker segment involving one of the people in the subset of people, based on the plurality of speaker models.
  - 17. The system of claim 12, wherein the processor is further configured for:
    - applying speaker segments and their probabilities to detect duplicated videos, among electronic media content.
  - 18. The system of claim 12, wherein the processor is further configured to execute instructions for:
    - applying speaker segments and their probabilities to detect words spoken by a particular individual speaker.
  - 19. The system of claim 18, wherein the processor is further configured for:
    - applying detected words from the particular individual speaker to the ranking or filtration of electronic media content; and
      
      displaying electronic media content to users based on the ranking or filtration.
  - 20. The system of claim 12, wherein the processor is further configured to execute instructions for:
    - applying speaker segments and their probabilities to extract preview clips from electronic media content; and
      
      displaying the extracted preview clips associated with electronic media content to users.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verizon Patent and Licensing Incorporated (Verizon Communications Inc.)
Original Assignee
AOL Inc. (Apollo Global Management, Inc.)
Inventors
Kocks, Peter F., Hu, Guoning, Wu, Ping-Hao
Primary Examiner(s)
BAKER, MATTHEW H

Application Number

US13/156,780
Publication Number

US 20120010884A1
Time in Patent Office

1,769 Days
Field of Search

704/246, 704/248, 704/250
US Class Current

1/1
CPC Class Codes

G06F 16/433   using audio data

G06F 16/7834   using audio features

G06F 16/784   the detected or recognised ...

G10L 15/06   Creation of reference templ...

G10L 15/08   Speech classification or se...

G10L 17/00   Speaker identification or v...

G10L 25/57   for processing of video sig...

H04N 21/4394   involving operations for an...

H04N 21/4668   for recommending content, e...

Systems and methods for manipulating electronic content based on speech recognition

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

33 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Systems and methods for manipulating electronic content based on speech recognition

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

33 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others