Systems and methods for manipulating electronic content based on speech recognition
First Claim
1. A computer-implemented method comprising the following operations performed by at least one processor:
- extracting an audio track from an electronic media content;
detecting, based on a speech model, a speaker segment within the extracted audio track;
determining, by the processor, a first probability of the detected speaker segment being associated with an individual speaker by using both a speaker speech model and a non-speaker speech model, wherein the speaker speech model represents an individual speaker and the non-speaker speech model represents common characteristics from one or more speakers;
determining a first ranking value of the electronic media content relative to other electronic media content based on the first probability of the detected speaker segment and probabilities for detected speaker segments within the other electronic media content;
receiving a search query from a user;
determining a second ranking value of the electronic media content based on relevancy between the query and the individual speaker; and
determining a final ranking value of the electronic media content based on the first ranking value and the second ranking value.
4 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are disclosed for displaying electronic multimedia content to a user. One computer-implemented method for manipulating electronic multimedia content includes generating, using a processor, a speech model and at least one speaker model of an individual speaker. The method further includes receiving electronic media content over a network; extracting an audio track from the electronic media content; and detecting speech segments within the electronic media content based on the speech model. The method further includes detecting a speaker segment within the electronic media content and calculating a probability of the detected speaker segment involving the individual speaker based on the at least one speaker model.
-
Citations
20 Claims
-
1. A computer-implemented method comprising the following operations performed by at least one processor:
-
extracting an audio track from an electronic media content; detecting, based on a speech model, a speaker segment within the extracted audio track; determining, by the processor, a first probability of the detected speaker segment being associated with an individual speaker by using both a speaker speech model and a non-speaker speech model, wherein the speaker speech model represents an individual speaker and the non-speaker speech model represents common characteristics from one or more speakers; determining a first ranking value of the electronic media content relative to other electronic media content based on the first probability of the detected speaker segment and probabilities for detected speaker segments within the other electronic media content; receiving a search query from a user; determining a second ranking value of the electronic media content based on relevancy between the query and the individual speaker; and determining a final ranking value of the electronic media content based on the first ranking value and the second ranking value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system, comprising:
-
at least one processor; and a memory storing executable instructions that, when executed by the at least one processor, causes the at least one processor to perform the following operations; extracting an audio track from an electronic media content; detecting, based on a speech model, a speaker segment within the extracted audio track; determining, by the processor, a first probability of the detected speaker segment being associated with an individual speaker by using both a speaker speech model and a non-speaker speech model, wherein the speaker speech model represents an individual speaker and the non-speaker speech model represents common characteristics from one or more speakers; determining a first ranking value of the electronic media content relative to other electronic media content based on the first probability of the detected speaker segment and probabilities for detected speaker segments within the other electronic media content; receiving a search query from a user; determining a second ranking value of the electronic media content based on relevancy between the query and the individual speaker; and determining a final ranking value of the electronic media content based on the first ranking value and the second ranking value. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A tangible, non-transitory computer-readable medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform operations comprising:
-
extracting an audio track from an electronic media content; detecting, based on a speech model, a speaker segment within the extracted audio track; determining, by the processor, a first probability of the detected speaker segment being associated with an individual speaker by using both a speaker speech model and a non-speaker speech model, wherein the speaker speech model represents an individual speaker and the non-speaker speech model represents common characteristics from one or more speakers; determining a first ranking value of the electronic media content relative to other electronic media content based on the first probability of the detected speaker segment and probabilities for detected speaker segments within the other electronic media content; receiving a search query from a user; determining a second ranking value of the electronic media content based on relevancy between the query and the individual speaker; and determining a final ranking value of the electronic media content based on the first ranking value and the second ranking value.
-
Specification