Systems and methods for manipulating electronic content based on speech recognition
First Claim
1. A computer-implemented method for manipulating electronic multimedia content, the method comprising:
- generating, using a processor, a speech model, a non-speech model, at least one speaker model of an individual speaker, and a non-speaker speech model;
receiving electronic media content over a network;
extracting an audio track from the electronic media content;
detecting speech segments within the extracted audio track based on the speech model and the non-speech model, the speech segments containing speech from at least one of a plurality of speakers;
detecting a speaker segment within the detected speech segments based on the speaker model and the non-speaker speech model, the speaker segment containing speech from the individual speaker;
calculating a first probability of the detected speaker segment involving the individual speaker based on the at least one speaker speech model and the non-speaker speech model;
determining a ranking or filtration of the electronic media content relative to other electronic media content based on the first probability of the detected speaker segment;
detecting a face within a part of the electronic media content corresponding to the detected speaker segment and calculating a second probability of the detected face being a face of the individual speaker; and
adjusting the ranking or filtration of the electronic media content based on the second probability.
6 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are disclosed for displaying electronic multimedia content to a user. One computer-implemented method for manipulating electronic multimedia content includes generating, using a processor, a speech model and at least one speaker model of an individual speaker. The method further includes receiving electronic media content over a network; extracting an audio track from the electronic media content; and detecting speech segments within the electronic media content based on the speech model. The method further includes detecting a speaker segment within the electronic media content and calculating a probability of the detected speaker segment involving the individual speaker based on the at least one speaker model.
33 Citations
20 Claims
-
1. A computer-implemented method for manipulating electronic multimedia content, the method comprising:
-
generating, using a processor, a speech model, a non-speech model, at least one speaker model of an individual speaker, and a non-speaker speech model; receiving electronic media content over a network; extracting an audio track from the electronic media content; detecting speech segments within the extracted audio track based on the speech model and the non-speech model, the speech segments containing speech from at least one of a plurality of speakers; detecting a speaker segment within the detected speech segments based on the speaker model and the non-speaker speech model, the speaker segment containing speech from the individual speaker; calculating a first probability of the detected speaker segment involving the individual speaker based on the at least one speaker speech model and the non-speaker speech model; determining a ranking or filtration of the electronic media content relative to other electronic media content based on the first probability of the detected speaker segment; detecting a face within a part of the electronic media content corresponding to the detected speaker segment and calculating a second probability of the detected face being a face of the individual speaker; and adjusting the ranking or filtration of the electronic media content based on the second probability. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 11)
-
-
10. The computer-implemented method of claim further comprising:
-
applying detected individual speakers to the ranking or filtration of electronic media content; and displaying electronic media content to users based on the ranking or filtration.
-
-
12. A system for manipulating electronic multimedia content, the system comprising:
-
a data storage device storing instructions for manipulating electronic multimedia content; and a processor configured to execute the instructions stored in the data storage device for; generating a speech model, a non-speech model, at least one speaker model of an individual speaker, and a non-speaker speech model; receiving electronic media content over a network; extracting an audio track from the electronic media content; detecting speech segments within the extracted audio track based on the speech model and the non-speech model, the speech segments containing speech from at least one of a plurality of speakers; detecting a speaker segment within the detected speech segments based on the speaker model and the non-speaker speech model, the speaker segment containing speech from the individual speaker; calculating a first probability of the detected speaker segment involving the individual speaker based on the at least one speaker speech model and the non-speaker speech model; determining a ranking or filtration of the electronic media content relative to other electronic media content based on the first probability of the detected speaker segment; detecting a face within a part of the electronic media content corresponding to the detected speaker segment and calculating a second probability of the detected face being a face of the individual speaker; and adjusting the ranking or filtration of the electronic media content based on the second probability. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification