Text transcript generation from a communication session
First Claim
1. A method for transcribing speech in a communication session comprising:
- transmitting a virtual communication session in substantially real-time to a plurality of end user devices;
receiving, by one or more processors, a combined media stream comprising a plurality of media sub-streams each associated with one of the plurality of end user devices, wherein each of the plurality of media sub-streams in the combined media stream comprises a respective video component and a respective audio component;
for each of the plurality of media sub-streams, separating, by the one or more processors, the respective audio component from the respective video component;
for each separate audio component, transcribing, by the one or more processors, at least a portion of speech from the audio component to text;
providing a transcription in substantially real-time; and
annotating the text for the audio component of each respective media sub-stream to include additional content, wherein annotating the text comprises;
determining one or more keywords of the text;
selecting, based on the one or more keywords, one or more advertisements or a link; and
updating the transcription with the one or more advertisements or the link in association with at least a portion of the text.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques, systems, and devices for managing streaming media among end user devices in a video conferencing system are described. For example, a transcript may be automatically generated for a video conference. In one example, a method may include receiving a combined media stream comprising a plurality of media sub-streams each associated with one of a plurality of end user devices, wherein each of the plurality of media sub-streams comprises a respective video component and a respective audio component. The method may also include, for each of the media-sub-streams, separating the audio component from the respective video component, for each audio component of the respective media sub-streams, transcribing speech from the audio component to text for the respective media sub-stream, and combining the text for each of the respective media sub-streams into a combined transcription. In some examples, the combined transcription may also be translated into a user selected language.
110 Citations
19 Claims
-
1. A method for transcribing speech in a communication session comprising:
-
transmitting a virtual communication session in substantially real-time to a plurality of end user devices; receiving, by one or more processors, a combined media stream comprising a plurality of media sub-streams each associated with one of the plurality of end user devices, wherein each of the plurality of media sub-streams in the combined media stream comprises a respective video component and a respective audio component; for each of the plurality of media sub-streams, separating, by the one or more processors, the respective audio component from the respective video component; for each separate audio component, transcribing, by the one or more processors, at least a portion of speech from the audio component to text; providing a transcription in substantially real-time; and annotating the text for the audio component of each respective media sub-stream to include additional content, wherein annotating the text comprises; determining one or more keywords of the text; selecting, based on the one or more keywords, one or more advertisements or a link; and updating the transcription with the one or more advertisements or the link in association with at least a portion of the text. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A server device operable to transcribe speech in a communication session comprising:
-
a memory; and one or more processors coupled to the memory and operable to execute instructions stored in the memory, the one or more processors configured to; transmit a virtual communication session in substantially real-time to the plurality of end user devices; receive a media stream associated with a plurality of end user devices, wherein the media stream comprises a video component and an audio component; separate the audio component from the video component; transcribe at least a portion of speech from the audio component to text; provide a transcription in substantially real-time; and annotate the text for the audio component to include additional content by; determining one or more keywords of the text; searching for one or more of an image, a video, music, and an article that correspond to the one or more keywords of the text; selecting, based on the one or more keywords, one or more advertisements or a link that correspond to the one or more of the image, the video, the music, and the article; and updating the transcription with the one or more advertisements or the link in association with at least a portion of the text to a user. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A non-transitory computer storage medium encoded with a computer program, the computer program comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
-
transmitting a virtual communication session in substantially real-time to the plurality of end user devices; receiving, by one or more processors, a combined media stream comprising a plurality of media sub-streams each associated with one of the plurality of end user devices, wherein each of the plurality of media sub-streams in the combined media stream comprises a respective video component and a respective audio component; for each of the plurality of media sub-streams, separating, by the one or more processors, the respective audio component from the respective video component; for each separate audio component, transcribing, by the one or more processors, at least a portion of speech from the audio component to text; providing a transcription in substantially real-time; and annotating the text for the audio component of each respective media sub-stream to include additional content, wherein annotating the text comprises; determining one or more keywords of the text; selecting, based on the one or more keywords, one or more advertisements or a link; and updating the transcription with the one or more advertisements or the link in association with at least a portion of the text. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification