Real-time transcription of conference calls

US 8,370,142 B2
Filed: 10/28/2010
Issued: 02/05/2013
Est. Priority Date: 10/30/2009
Status: Active Grant

First Claim

Patent Images

1. A system for transcribing a conference call among a plurality of participants using a plurality of audio connections;

the system comprising;

(a) a plurality of capture mechanisms, each one of the plurality of capture mechanisms capturing a portion of audio associated with one of the plurality of audio connections,wherein each one of the plurality of capture mechanisms comprisesa voice activity detector for detecting a voice snippet included in the portion of audio, the length of the voice snippet being determined by detecting a break in the portion of audio, andmeans for capturing the voice snippet;

(b) a plurality of speech recognition instances for converting audio to text, each one of the plurality of speech recognition instances having substantially the same capability;

(c) a dispatcher for forwarding a first captured portion of audio from a selected one of the plurality of capture mechanisms to a first one of the plurality of speech recognition instances, and for forwarding a second captured portion of audio from the selected one of the plurality of capture mechanisms to a second one of the plurality of speech recognition instances when the first one of the plurality of speech recognition instances is processing the first captured portion of audio, wherein the second captured portion of audio is subsequent to the first captured portion of audio; and

(d) a combiner for re-assembling the text converted by the first one of the plurality of speech recognition instances and the text converted by the second one of the plurality of speech recognition instances from captured portions of audio from the plurality of capture mechanisms.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Described herein are embodiments of systems, methods and computer program products for real-time transcription of conference calls that employ voice activity detection, audio snippet capture, and multiple transcription instances to deliver practical real-time or near real-time conference call transcription.

66 Citations

View as Search Results

21 Claims

1. A system for transcribing a conference call among a plurality of participants using a plurality of audio connections;
- the system comprising;
  
  (a) a plurality of capture mechanisms, each one of the plurality of capture mechanisms capturing a portion of audio associated with one of the plurality of audio connections,wherein each one of the plurality of capture mechanisms comprisesa voice activity detector for detecting a voice snippet included in the portion of audio, the length of the voice snippet being determined by detecting a break in the portion of audio, andmeans for capturing the voice snippet;
  
  (b) a plurality of speech recognition instances for converting audio to text, each one of the plurality of speech recognition instances having substantially the same capability;
  
  (c) a dispatcher for forwarding a first captured portion of audio from a selected one of the plurality of capture mechanisms to a first one of the plurality of speech recognition instances, and for forwarding a second captured portion of audio from the selected one of the plurality of capture mechanisms to a second one of the plurality of speech recognition instances when the first one of the plurality of speech recognition instances is processing the first captured portion of audio, wherein the second captured portion of audio is subsequent to the first captured portion of audio; and
  
  (d) a combiner for re-assembling the text converted by the first one of the plurality of speech recognition instances and the text converted by the second one of the plurality of speech recognition instances from captured portions of audio from the plurality of capture mechanisms.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 21)
- - 2. The system of claim 1, wherein the voice activity detector automatically adapts to maintain a target snippet length.
  - 3. The system of claim 1, wherein each of the plurality of capture mechanisms generates an audio connection ID and sequence number associated with the captured portion of audio, and wherein the combiner re-assembles the text based on the audio connection IDs and sequence numbers generated by each of the plurality of capture mechanisms.
  - 4. The system of claim 3, wherein the sequence number associated with each captured portion of audio comprises a sequence in time relative to other captured portions of audio.
  - 5. The system of claim 1, wherein each of the plurality of speech recognition instances performs an automated speech to text algorithm.
  - 6. The system of claim 1, wherein each of the plurality of speech recognition instances is a human transcriptionist.
  - 7. The system of claim 1, wherein the combiner function is accomplished by storing the converted text and any associated meta-data to a database, and further comprising an output mechanism which retrieves and displays the converted text and meta-data.
  - 8. The system of claim 7, wherein the output mechanism displays the re-assembled text in near real-time.
  - 9. The system of claim 7, wherein the output mechanism allows a user to hear one or more audio snippets associated with selected re-assembled text.
  - 10. The system of claim 7, wherein the output mechanism provides access to the re-assembled text after the conference call has ended.
  - 21. The system of claim 1, wherein the dispatcher allocates a third speech recognition instance to the plurality of speech recognition instances for converting a third captured portion of audio to text, wherein the third captured portion of text is subsequent to the second portion of audio.

11. A method for transcribing a conference call among a plurality of participants using a plurality of audio connections;
- the method comprising;
  
  (a) capturing a plurality of portions of audio, each of the plurality of portions of audio being associated with at least one of the plurality of audio connections;
  
  (b) forwarding a first portion of audio of the captured plurality of portions of audio to a first one of a plurality of speech recognition instances, and for forwarding a second portion of audio of the captured plurality of portions of audio to a second one of the plurality of speech recognition instances, the first portion of audio and the second portion of audio being associated with a selected one of the plurality of audio connections, whereby each of the plurality of speech recognition instances converts the audio to text, wherein the second portion of audio is subsequent to the first portion of audio, and wherein each one of the plurality of speech recognition instances has substantially the same capability;
  
  (c) re-assembling the text converted by the first one of the plurality of speech recognition instances and the text converted by the second one of the plurality of speech recognition instances.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The method of claim 11, wherein the capturing step comprises detecting a voice snippet, wherein the length of the voice snippet is determined by detecting a break in audio of the associated audio connection.
  - 13. The method of claim 12, wherein the capturing step automatically adapts to maintain a target snippet length.
  - 14. The method of claim 11, wherein the capturing step comprises:
    - generating an audio connection ID and sequence number associated with each of the plurality of portions of audio, andre-assembling the text based on the audio connection IDs and sequence numbers generated by the capturing step.
  - 15. The method of claim 14, wherein the sequence number associated with each captured portion of audio comprises a sequence in time relative to other captured portions of audio.
  - 16. The method of claim 11, wherein the re-assembling step is accomplished by storing the converted text and any associated meta-data to a database, and further comprising the steps of retrieving and displaying the converted text and meta-data.
  - 17. The method of claim 11, further comprising displaying the re-assembled text in near real-time.
  - 18. The method of claim 11, further comprising allowing a user to hear one or more audio snippets associated with selected re-assembled text.
  - 19. The method of claim 11, further comprising providing access to the re-assembled text after the conference call has ended.

20. A non-transitory computer-readable medium having a computer program recorded thereon, the computer program comprising computer code instructions for implementing a method for transcribing a conference call among a plurality of participants using a plurality of audio connections;
- the non-transitory computer-readable medium comprising;
  
  (a) a first computer code instruction portion for capturing a plurality of portions of audio, each of the plurality of portions of audio being associated with at least one of the plurality of audio connections;
  
  (b) a second computer code instruction portion for forwarding a first portion of audio of the captured plurality of portions of audio to a first one of a plurality of speech recognition instances, and for forwarding a second portion of audio of the captured plurality of portions of audio to a second one of the plurality of speech recognition instances, the first portion of audio and the second portion of audio being associated with a selected one of the plurality of audio connections, whereby each of the plurality of speech recognition instances converts the audio to text, wherein the second portion of audio is subsequent to the first portion of audio, and wherein each one of the plurality of speech recognition instances has substantially the same capability; and
  
  (c) a third computer code instruction portion for re-assembling the text converted by the first one of the plurality of speech recognition instances and the text converted by the second one of the plurality of speech recognition instances.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
ZipDX LLC
Original Assignee
ZipDX LLC
Inventors
Frankel, David P., Tarnoff, Noel
Primary Examiner(s)
Neway, Samuel G

Application Number

US12/914,617
Publication Number

US 20110112833A1
Time in Patent Office

831 Days
Field of Search

379/202.01, 704231-257
US Class Current

704/235
CPC Class Codes

G06F 16/685   using automatically derived...

G10L 15/26   Speech to text systems G10L...

G10L 15/32   Multiple recognisers used i...

Real-time transcription of conference calls

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

66 Citations

21 Claims

Specification

Use Cases

Quick Links

Others

Real-time transcription of conference calls

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

66 Citations

21 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others