Speech recognition and summarization

US 10,496,746 B2
Filed: 12/11/2018
Issued: 12/03/2019
Est. Priority Date: 09/10/2012
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, at data processing hardware of a video conference system, speech data from a first computing device, the speech data representing an utterance spoken by a first participant of a video conference and captured by a microphone of the first computing device;

receiving, at the data processing hardware, video data from a second computing device, the video data associated with a second participant of the video conference and captured by a camera of the second computing device while the first participant was speaking the utterance;

transcribing, by the data processing hardware, the speech data representing the utterance spoken by the first participant of the video conference into text in real-time;

detecting, by the data processing hardware, an emotional state of the second participant based on the video data received from the second computing device; and

transmitting, by the data processing hardware, the text of the speech data to the second computing device based on the detected emotional state of the second participant, the text of the speech data when received by the second computing device causing the second computing device to display the text of the speech data on a videoconference graphical user interface executing on the second computing device.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The subject matter of this specification can be embodied in, among other things, a method that includes receiving two or more data sets each representing speech of a corresponding individual attending an internet-based social networking video conference session, decoding the received data sets to produce corresponding text for each individual attending the internet-based social networking video conference, and detecting characteristics of the session from a coalesced transcript produced from the decoded text of the attending individuals for providing context to the internet-based social networking video conference session.

Citations

22 Claims

1. A method comprising:
- receiving, at data processing hardware of a video conference system, speech data from a first computing device, the speech data representing an utterance spoken by a first participant of a video conference and captured by a microphone of the first computing device;
  
  receiving, at the data processing hardware, video data from a second computing device, the video data associated with a second participant of the video conference and captured by a camera of the second computing device while the first participant was speaking the utterance;
  
  transcribing, by the data processing hardware, the speech data representing the utterance spoken by the first participant of the video conference into text in real-time;
  
  detecting, by the data processing hardware, an emotional state of the second participant based on the video data received from the second computing device; and
  
  transmitting, by the data processing hardware, the text of the speech data to the second computing device based on the detected emotional state of the second participant, the text of the speech data when received by the second computing device causing the second computing device to display the text of the speech data on a videoconference graphical user interface executing on the second computing device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 16, 17, 18, 19, 20, 21, 22)
- - 2. The method of claim 1, wherein the video data associated with the second participant represents at least one of gestures, facial expressions, physical characteristics, or body language of the second participant.
  - 3. The method of claim 1, wherein the video data associated with the second participant of the video conference represents facial expressions of the second participant.
  - 4. The method of claim 3, wherein detecting the emotional state of the second participant comprises detecting that the second participant is interested or confused based on the facial expressions of the second participant.
  - 5. The method of claim 1, further comprising:
    - determining, by the data processing hardware, a topic of the video conference by analyzing one or more words and/or phrases in the text of the speech data;
      
      annotating, by the data processing hardware, the text of the speech data by;
      
      determining one or more relevant terms in text of the speech data as being potentially relevant to the determined topic; and
      
      identifying, using the one or more relevant terms in the text, a resource associated with the determined topic of the video conference;
      
      generating, by the data processing hardware, a user interface component for the identified resource; and
      
      outputting, by the data processing hardware, the user interface component for the identified resource to the second computing device in real-time based on the detected emotional state of the second participant, the user interface component when received by the second computing device causing the second computing device to display the user interface component on the videoconference graphical user interface executing on the second computing device.
  - 6. The method of claim 5, wherein detecting the emotional state of the second participant comprises detecting that the second participant is particularly interested in the determined topic based on the video data associated with the second participant and received from the second computing device, the video data associated with the second participant representing at least one of gestures, facial expressions, physical characteristics, or body language of the second participant.
  - 7. The method of claim 5, wherein identifying the resource associated with the determined topic of the video conference comprises, obtaining, from an advertising module, advertising content that corresponds to the determined topic of the video conference.
  - 8. The method of claim 5, wherein identifying the resource associated with the determined topic of the video conference comprises:
    - obtaining, from a search engine, one or more search results that are identified as a result of performing a query using at least one of the one or more relevant terms in the text of the speech data as being potentially relevant to the determined topic; and
      
      selecting a particular search result from among the one or more search results identified as the result of performing the query.
  - 9. The method of claim 5, wherein generating the user interface component for the identified resource comprises:
    - identifying an event associated with the determined topic of the video conference; and
      
      generating an invitation for the event, the invitation comprising at least one of a calendar date, a time, a location , or a guest list for the identified event.
  - 10. The method of claim 5, wherein generating the user interface component for the identified resource comprises:
    - identifying a location associated with the determined topic of the video conference; and
      
      generating a map image associated with the identified location.
  - 11. The method of claim 5, wherein generating the user interface component for the identified resource comprises:
    - identifying a location associated with the determined topic of the video conference; and
      
      generating a hyperlink to a map image associated with the identified location.
  - 16. The video conference system of claim 1, wherein the operations further comprise:
    - determining a topic of the video conference by analyzing one or more words and/or phrases in the text of the speech data;
      
      annotating the text of the speech data by;
      
      determining one or more relevant terms in text of the speech data as being potentially relevant to the determined topic; and
      
      identifying, using the one or more relevant terms in the text, a resource associated with the determined topic of the video conference;
      
      generating a user interface component for the identified resource; and
      
      outputting the user interface component for the identified resource to the second computing device in real-time based on the detected emotional state of the second participant, the user interface component when received by the second computing device causing the second computing device to display the user interface component on the videoconference graphical user interface executing on the second computing device.
  - 17. The video conference system of claim 16, wherein detecting the emotional state of the second participant comprises detecting that the second participant is particularly interested in the determined topic based on the video data associated with the second participant and received from the second computing device, the video data associated with the second participant representing at least one of gestures, facial expressions, physical characteristics, or body language of the second participant.
  - 18. The video conference system of claim 16, wherein identifying the resource associated with the determined topic of the video conference comprises, obtaining, from an advertising module, advertising content that corresponds to the determined topic of the video conference.
  - 19. The video conference system of claim 16, wherein identifying the resource associated with the determined topic of the video conference comprises:
    - obtaining, from a search engine, one or more search results that are identified as a result of performing a query using at least one of the one or more relevant terms in the text of the speech data as being potentially relevant to the determined topic; and
      
      selecting a particular search result from among the one or more search results identified as the result of performing the query.
  - 20. The video conference system of claim 16, wherein generating the user interface component for the identified resource comprises:
    - identifying an event associated with the determined topic of the video conference; and
      
      generating an invitation for the event, the invitation comprising at least one of a calendar date, a time, a location , or a guest list for the identified event.
  - 21. The video conference system of claim 16, wherein generating the user interface component for the identified resource comprises:
    - identifying a location associated with the determined topic of the video conference; and
      
      generating a map image associated with the identified location.
  - 22. The video conference system of claim 16, wherein generating the user interface component for the identified resource comprises:
    - identifying a location associated with the determined topic of the video conference; and
      
      generating a hyperlink to a map image associated with the identified location.

12. A video conference system comprising:
- data processing hardware; and
  
  memory hardware in communication with the data processing hardware and storing instructions, that when executed by the data processing hardware, cause the data processing hardware to perform operations comprising;
  
  receiving speech data from a first computing device, the speech data representing an utterance spoken by a first participant of a video conference and captured by a microphone of the first computing device;
  
  receiving video data from a second computing device, the video data associated with a second participant of the video conference and captured by a camera of the second computing device while the first participant was speaking the utterance;
  
  transcribing the speech data representing the utterance spoken by the first participant of the video conference into text in real-time;
  
  detecting an emotional state of the second participant based on the video data received from the second computing device; and
  
  transmitting the text of the speech data to the second computing device based on the detected emotional state of the second participant, the text of the speech data when received by the second computing device causing the second computing device to display the text of the speech data on a videoconference graphical user interface executing on the second computing device.
- View Dependent Claims (13, 14, 15)
- - 13. The video conference system of claim 12, wherein the video data associated with the second participant represents at least one of gestures, facial expressions, physical characteristics, or body language of the second participant.
  - 14. The video conference system of claim 12, wherein the video data associated with the second participant of the video conference represents facial expressions of the second participant.
  - 15. The video conference system of claim 14, wherein detecting the emotional state of the second participant comprises detecting that the second participant is interested or confused based on the facial expressions of the second participant.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Shires, Glen, Swigart, Sterling, Zolla, Jonathan, Gauci, Jason J.
Primary Examiner(s)
Guerra-Erazo, Edgar X

Application Number

US16/216,565
Publication Number

US 20190121851A1
Time in Patent Office

357 Days
Field of Search
US Class Current
CPC Class Codes

G06F 40/279   Recognition of textual enti...

G06F 40/30   Semantic analysis

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/1822   Parsing for meaning underst...

G10L 15/26   Speech to text systems G10L...

G10L 21/10   Transforming into visible i...

H04N 7/15   Conference systems

Speech recognition and summarization

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition and summarization

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links