Speech recognition and summarization
First Claim
Patent Images
1. A method comprising:
- receiving, at data processing hardware of a video conference system, speech data from a first computing device, the speech data representing an utterance spoken by a first participant of a video conference and captured by a microphone of the first computing device;
receiving, at the data processing hardware, video data from a second computing device, the video data associated with a second participant of the video conference and captured by a camera of the second computing device while the first participant was speaking the utterance;
transcribing, by the data processing hardware, the speech data representing the utterance spoken by the first participant of the video conference into text in real-time;
detecting, by the data processing hardware, an emotional state of the second participant based on the video data received from the second computing device; and
transmitting, by the data processing hardware, the text of the speech data to the second computing device based on the detected emotional state of the second participant, the text of the speech data when received by the second computing device causing the second computing device to display the text of the speech data on a videoconference graphical user interface executing on the second computing device.
0 Assignments
0 Petitions
Accused Products
Abstract
The subject matter of this specification can be embodied in, among other things, a method that includes receiving two or more data sets each representing speech of a corresponding individual attending an internet-based social networking video conference session, decoding the received data sets to produce corresponding text for each individual attending the internet-based social networking video conference, and detecting characteristics of the session from a coalesced transcript produced from the decoded text of the attending individuals for providing context to the internet-based social networking video conference session.
-
Citations
22 Claims
-
1. A method comprising:
-
receiving, at data processing hardware of a video conference system, speech data from a first computing device, the speech data representing an utterance spoken by a first participant of a video conference and captured by a microphone of the first computing device; receiving, at the data processing hardware, video data from a second computing device, the video data associated with a second participant of the video conference and captured by a camera of the second computing device while the first participant was speaking the utterance; transcribing, by the data processing hardware, the speech data representing the utterance spoken by the first participant of the video conference into text in real-time; detecting, by the data processing hardware, an emotional state of the second participant based on the video data received from the second computing device; and transmitting, by the data processing hardware, the text of the speech data to the second computing device based on the detected emotional state of the second participant, the text of the speech data when received by the second computing device causing the second computing device to display the text of the speech data on a videoconference graphical user interface executing on the second computing device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 16, 17, 18, 19, 20, 21, 22)
-
-
12. A video conference system comprising:
-
data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions, that when executed by the data processing hardware, cause the data processing hardware to perform operations comprising; receiving speech data from a first computing device, the speech data representing an utterance spoken by a first participant of a video conference and captured by a microphone of the first computing device; receiving video data from a second computing device, the video data associated with a second participant of the video conference and captured by a camera of the second computing device while the first participant was speaking the utterance; transcribing the speech data representing the utterance spoken by the first participant of the video conference into text in real-time; detecting an emotional state of the second participant based on the video data received from the second computing device; and transmitting the text of the speech data to the second computing device based on the detected emotional state of the second participant, the text of the speech data when received by the second computing device causing the second computing device to display the text of the speech data on a videoconference graphical user interface executing on the second computing device. - View Dependent Claims (13, 14, 15)
-
Specification