Real-time transcription of conference calls
First Claim
Patent Images
1. A system for transcribing a conference call among a plurality of participants using a plurality of audio connections;
- the system comprising;
(a) a plurality of capture mechanisms, each one of the plurality of capture mechanisms capturing a portion of audio associated with one of the plurality of audio connections,wherein each one of the plurality of capture mechanisms comprisesa voice activity detector for detecting a voice snippet included in the portion of audio, the length of the voice snippet being determined by detecting a break in the portion of audio, andmeans for capturing the voice snippet;
(b) a plurality of speech recognition instances for converting audio to text, each one of the plurality of speech recognition instances having substantially the same capability;
(c) a dispatcher for forwarding a first captured portion of audio from a selected one of the plurality of capture mechanisms to a first one of the plurality of speech recognition instances, and for forwarding a second captured portion of audio from the selected one of the plurality of capture mechanisms to a second one of the plurality of speech recognition instances when the first one of the plurality of speech recognition instances is processing the first captured portion of audio, wherein the second captured portion of audio is subsequent to the first captured portion of audio; and
(d) a combiner for re-assembling the text converted by the first one of the plurality of speech recognition instances and the text converted by the second one of the plurality of speech recognition instances from captured portions of audio from the plurality of capture mechanisms.
1 Assignment
0 Petitions
Accused Products
Abstract
Described herein are embodiments of systems, methods and computer program products for real-time transcription of conference calls that employ voice activity detection, audio snippet capture, and multiple transcription instances to deliver practical real-time or near real-time conference call transcription.
66 Citations
21 Claims
-
1. A system for transcribing a conference call among a plurality of participants using a plurality of audio connections;
- the system comprising;
(a) a plurality of capture mechanisms, each one of the plurality of capture mechanisms capturing a portion of audio associated with one of the plurality of audio connections, wherein each one of the plurality of capture mechanisms comprises a voice activity detector for detecting a voice snippet included in the portion of audio, the length of the voice snippet being determined by detecting a break in the portion of audio, and means for capturing the voice snippet; (b) a plurality of speech recognition instances for converting audio to text, each one of the plurality of speech recognition instances having substantially the same capability; (c) a dispatcher for forwarding a first captured portion of audio from a selected one of the plurality of capture mechanisms to a first one of the plurality of speech recognition instances, and for forwarding a second captured portion of audio from the selected one of the plurality of capture mechanisms to a second one of the plurality of speech recognition instances when the first one of the plurality of speech recognition instances is processing the first captured portion of audio, wherein the second captured portion of audio is subsequent to the first captured portion of audio; and (d) a combiner for re-assembling the text converted by the first one of the plurality of speech recognition instances and the text converted by the second one of the plurality of speech recognition instances from captured portions of audio from the plurality of capture mechanisms. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 21)
- the system comprising;
-
11. A method for transcribing a conference call among a plurality of participants using a plurality of audio connections;
- the method comprising;
(a) capturing a plurality of portions of audio, each of the plurality of portions of audio being associated with at least one of the plurality of audio connections; (b) forwarding a first portion of audio of the captured plurality of portions of audio to a first one of a plurality of speech recognition instances, and for forwarding a second portion of audio of the captured plurality of portions of audio to a second one of the plurality of speech recognition instances, the first portion of audio and the second portion of audio being associated with a selected one of the plurality of audio connections, whereby each of the plurality of speech recognition instances converts the audio to text, wherein the second portion of audio is subsequent to the first portion of audio, and wherein each one of the plurality of speech recognition instances has substantially the same capability; (c) re-assembling the text converted by the first one of the plurality of speech recognition instances and the text converted by the second one of the plurality of speech recognition instances. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- the method comprising;
-
20. A non-transitory computer-readable medium having a computer program recorded thereon, the computer program comprising computer code instructions for implementing a method for transcribing a conference call among a plurality of participants using a plurality of audio connections;
- the non-transitory computer-readable medium comprising;
(a) a first computer code instruction portion for capturing a plurality of portions of audio, each of the plurality of portions of audio being associated with at least one of the plurality of audio connections; (b) a second computer code instruction portion for forwarding a first portion of audio of the captured plurality of portions of audio to a first one of a plurality of speech recognition instances, and for forwarding a second portion of audio of the captured plurality of portions of audio to a second one of the plurality of speech recognition instances, the first portion of audio and the second portion of audio being associated with a selected one of the plurality of audio connections, whereby each of the plurality of speech recognition instances converts the audio to text, wherein the second portion of audio is subsequent to the first portion of audio, and wherein each one of the plurality of speech recognition instances has substantially the same capability; and (c) a third computer code instruction portion for re-assembling the text converted by the first one of the plurality of speech recognition instances and the text converted by the second one of the plurality of speech recognition instances.
- the non-transitory computer-readable medium comprising;
Specification