Text transcript generation from a communication session

US 10,019,989 B2
Filed: 09/12/2016
Issued: 07/10/2018
Est. Priority Date: 08/31/2011
Status: Active Grant

First Claim

Patent Images

1. A method for transcribing speech in a communication session comprising:

transmitting a virtual communication session in substantially real-time to a plurality of end user devices;

receiving, by one or more processors, a combined media stream comprising a plurality of media sub-streams each associated with one of the plurality of end user devices, wherein each of the plurality of media sub-streams in the combined media stream comprises a respective video component and a respective audio component;

for each of the plurality of media sub-streams, separating, by the one or more processors, the respective audio component from the respective video component;

for each separate audio component, transcribing, by the one or more processors, at least a portion of speech from the audio component to text;

providing a transcription in substantially real-time; and

annotating the text for the audio component of each respective media sub-stream to include additional content, wherein annotating the text comprises;

determining one or more keywords of the text;

selecting, based on the one or more keywords, one or more advertisements or a link; and

updating the transcription with the one or more advertisements or the link in association with at least a portion of the text.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques, systems, and devices for managing streaming media among end user devices in a video conferencing system are described. For example, a transcript may be automatically generated for a video conference. In one example, a method may include receiving a combined media stream comprising a plurality of media sub-streams each associated with one of a plurality of end user devices, wherein each of the plurality of media sub-streams comprises a respective video component and a respective audio component. The method may also include, for each of the media-sub-streams, separating the audio component from the respective video component, for each audio component of the respective media sub-streams, transcribing speech from the audio component to text for the respective media sub-stream, and combining the text for each of the respective media sub-streams into a combined transcription. In some examples, the combined transcription may also be translated into a user selected language.

110 Citations

19 Claims

1. A method for transcribing speech in a communication session comprising:
- transmitting a virtual communication session in substantially real-time to a plurality of end user devices;
  
  receiving, by one or more processors, a combined media stream comprising a plurality of media sub-streams each associated with one of the plurality of end user devices, wherein each of the plurality of media sub-streams in the combined media stream comprises a respective video component and a respective audio component;
  
  for each of the plurality of media sub-streams, separating, by the one or more processors, the respective audio component from the respective video component;
  
  for each separate audio component, transcribing, by the one or more processors, at least a portion of speech from the audio component to text;
  
  providing a transcription in substantially real-time; and
  
  annotating the text for the audio component of each respective media sub-stream to include additional content, wherein annotating the text comprises;
  
  determining one or more keywords of the text;
  
  selecting, based on the one or more keywords, one or more advertisements or a link; and
  
  updating the transcription with the one or more advertisements or the link in association with at least a portion of the text.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the one or more advertisements are provided within the text.
  - 3. The method of claim 1, wherein the one or more advertisements are provided at least one of in a border and next to a field containing the text.
  - 4. The method of claim 1, wherein:
    - annotating the text for the audio component of each respective media sub-stream to include additional content further comprises determining that the one or more keywords include a street address of a property; and
      
      the link is a map to the street address of the property included in the one or more keywords.
  - 5. The method of claim 1, wherein:
    - determining the one or more keywords of the text includes determining that the one or more keywords include a phone number; and
      
      the link is for dialing the phone number.
  - 6. The method of claim 1, wherein determining the one or more keywords of the text is based on at least one of a context of the text and a frequency with which at least one of a word and a phrase is used in the text.
  - 7. The method of claim 1, wherein:
    - the communication session is a real-time communication session; and
      
      the text and one or more advertisements in association with the text is provided during the real-time communication session.

8. A server device operable to transcribe speech in a communication session comprising:
- a memory; and
  
  one or more processors coupled to the memory and operable to execute instructions stored in the memory, the one or more processors configured to;
  
  transmit a virtual communication session in substantially real-time to the plurality of end user devices;
  
  receive a media stream associated with a plurality of end user devices, wherein the media stream comprises a video component and an audio component;
  
  separate the audio component from the video component;
  
  transcribe at least a portion of speech from the audio component to text;
  
  provide a transcription in substantially real-time; and
  
  annotate the text for the audio component to include additional content by;
  
  determining one or more keywords of the text;
  
  searching for one or more of an image, a video, music, and an article that correspond to the one or more keywords of the text;
  
  selecting, based on the one or more keywords, one or more advertisements or a link that correspond to the one or more of the image, the video, the music, and the article; and
  
  updating the transcription with the one or more advertisements or the link in association with at least a portion of the text to a user.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The server device of claim 8, wherein the one or more advertisements are provided at least one of in a border and next to a field containing the text.
  - 10. The server device of claim 8, wherein annotating the text for the audio component to include additional content further comprises:
    - selecting, based on the one or more keywords, one or more hyperlinks; and
      
      inserting at least one of the one or more hyperlinks into the text.
  - 11. The server device of claim 10, wherein the one or more hyperlinks include at least one of a map of an address based on the one or more keywords including the address, an option to dial a phone number based on the one or more keywords including the phone number, an image, a video, music, and an article.
  - 12. The server device of claim 8, wherein determining the one or more keywords of the text is based on at least one of a context of the text and a frequency with which at least one of a word and a phrase is used in the text.
  - 13. The server device of claim 8, wherein:
    - the communication session is a real-time communication session; and
      
      the text and one or more advertisements in association with the text is provided during the real-time communication session.

14. A non-transitory computer storage medium encoded with a computer program, the computer program comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
- transmitting a virtual communication session in substantially real-time to the plurality of end user devices;
  
  receiving, by one or more processors, a combined media stream comprising a plurality of media sub-streams each associated with one of the plurality of end user devices, wherein each of the plurality of media sub-streams in the combined media stream comprises a respective video component and a respective audio component;
  
  for each of the plurality of media sub-streams, separating, by the one or more processors, the respective audio component from the respective video component;
  
  for each separate audio component, transcribing, by the one or more processors, at least a portion of speech from the audio component to text;
  
  providing a transcription in substantially real-time; and
  
  annotating the text for the audio component of each respective media sub-stream to include additional content, wherein annotating the text comprises;
  
  determining one or more keywords of the text;
  
  selecting, based on the one or more keywords, one or more advertisements or a link; and
  
  updating the transcription with the one or more advertisements or the link in association with at least a portion of the text.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The computer storage medium of claim 14, wherein the one or more advertisements are provided within the text.
  - 16. The computer storage medium of claim 14, wherein the one or more advertisements are provided at least one of in a border and next to a field containing the text.
  - 17. The computer storage medium of claim 14, wherein annotating the text for the audio component of each respective media sub-stream to include additional content further comprises:
    - selecting, based on the one or more keywords, one or more hyperlinks; and
      
      inserting at least one of the one or more hyperlinks into the text.
  - 18. The computer storage medium of claim 17, wherein the one or more hyperlinks include at least one of a map of an address based on the one or more keywords including the address, an option to dial a phone number based on the one or more keywords including the phone number, an image, a video, music, and an article.
  - 19. The computer storage medium of claim 14, wherein determining the one or more keywords of the text is based on at least one of a context of the text and a frequency with which at least one of a word and a phrase is used in the text.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Gauci, Jason John
Primary Examiner(s)
SHIN, SEONG-AH A

Application Number

US15/262,284
Publication Number

US 20170011740A1
Time in Patent Office

666 Days
Field of Search

704 2, 704 3, 704235, 704260, 704276
US Class Current
CPC Class Codes

G06F 40/134   Hyperlinking

G06F 40/169   Annotation, e.g. comment da...

G06F 40/40   Processing or translation o...

G06Q 30/0277   Online advertisement

G10L 15/08   Speech classification or se...

G10L 15/26   Speech to text systems G10L...

G10L 2015/088   Word spotting

G10L 25/78   Detection of presence or ab...

H04L 65/403   Arrangements for multi-part...

H04M 2203/2061   Language aspects

H04M 3/56   Arrangements for connecting...

H04M 7/0012   Details of application prog...

H04N 7/15   Conference systems

Text transcript generation from a communication session

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

110 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Text transcript generation from a communication session

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

110 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others