Real-time transcription system utilizing divided audio chunks

US 9,710,819 B2
Filed: 11/15/2009
Issued: 07/18/2017
Est. Priority Date: 05/05/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A system, comprising a processor and a memory in communication with the processor, the memory storing programming instructions executable by the processor to:

cause a first chunk of audio data to be played for a first analyst, where the first chunk of audio data represents a segment of a first audio stream, the segment associated with a participant in a conference call;

accept input from the first analyst sufficient to indicate a transcription of the segment of the first audio stream, the input being dependent upon a selected fidelity mode chosen from a plurality of fidelity modes, the plurality of fidelity modes having corresponding levels of fidelity and including all of;

a first verbatim interpreting mode wherein the first analyst provides a substantially verbatim transcription of the first audio stream;

a second text interpreting mode wherein the first analyst listens to the first chunk of audio data and provides input of text that has a substantially identical meaning to the words spoken in the first chunk of audio data; and

a third automatic transcription mode wherein the first analyst repeats the first chunk of audio as input for an automatic transcription subsystem responsive to the automatic transcription subsystem having a level of confidence below a threshold when provided with the first audio stream as input;

cause the transcription and an identity of the participant in the conference call to be displayed on a web-based user interface in substantially real time relative to the capture of the segment of the first audio stream;

determining that the transcription displayed on the user interface is not accurate; and

automatically, by a computer system and without user input, selecting, based on the levels of fidelity corresponding to the plurality of fidelity modes, a different fidelity mode in the plurality of fidelity modes having a corresponding level of fidelity higher than the selected fidelity mode.

View all claims

17 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computing system accepts audio from one or more sources, parses the audio into chunks, and transcribes the chunks in substantially real time. Some transcription is performed automatically, while other transcription is performed by humans who listen to the audio and enter the words spoken and/or the intent of the caller (such as directions given to the system). The system provides for participants a user interface that is updated in substantially real time with the transcribed text from the audio stream(s). A single audio line can be used for simple transcription, and multiple audio lines are used to provide a real-time transcript of a conference call, deposition, or the like. A pool of analysts creates, checks, and/or corrects transcription, and callers/observers can even assist in the correction process through their respective user interfaces. Ads derived from the transcript are displayed together with the text in substantially real time.

Citations

14 Claims

1. A system, comprising a processor and a memory in communication with the processor, the memory storing programming instructions executable by the processor to:
- cause a first chunk of audio data to be played for a first analyst, where the first chunk of audio data represents a segment of a first audio stream, the segment associated with a participant in a conference call;
  
  accept input from the first analyst sufficient to indicate a transcription of the segment of the first audio stream, the input being dependent upon a selected fidelity mode chosen from a plurality of fidelity modes, the plurality of fidelity modes having corresponding levels of fidelity and including all of;
  
  a first verbatim interpreting mode wherein the first analyst provides a substantially verbatim transcription of the first audio stream;
  
  a second text interpreting mode wherein the first analyst listens to the first chunk of audio data and provides input of text that has a substantially identical meaning to the words spoken in the first chunk of audio data; and
  
  a third automatic transcription mode wherein the first analyst repeats the first chunk of audio as input for an automatic transcription subsystem responsive to the automatic transcription subsystem having a level of confidence below a threshold when provided with the first audio stream as input;
  
  cause the transcription and an identity of the participant in the conference call to be displayed on a web-based user interface in substantially real time relative to the capture of the segment of the first audio stream;
  
  determining that the transcription displayed on the user interface is not accurate; and
  
  automatically, by a computer system and without user input, selecting, based on the levels of fidelity corresponding to the plurality of fidelity modes, a different fidelity mode in the plurality of fidelity modes having a corresponding level of fidelity higher than the selected fidelity mode.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, wherein the programming instructions are further executable to:
    - cause additional chunks of audio data to be played for a plurality of analysts that comprises the first analyst, where the first chunk and the additional chunks together represent all of the audio in the first audio stream;
      
      accept input from the plurality of analysts sufficient to indicate a transcription of the first audio stream; and
      
      cause the transcription to be displayed on the web-based user interface in substantially real time relative to the parsing of each of the chunk and additional chunks.
  - 3. The system of claim 2, wherein the first chunk and at least one of the additional chunks represent overlapping segments of the first audio stream.
  - 4. The system of claim 2, wherein the programming instructions are further executable to:
    - cause further audio chunks of audio data to be played for the plurality of analysts, where the further chunks represent a second audio stream associated with the first audio stream;
      
      accept input from the plurality of analysts sufficient to indicate a transcription of the second audio stream; and
      
      cause the transcription of the second audio stream to be displayed on the web-based user interface in substantially real time relative to when each of the further chunks was captured.
  - 5. The system of claim 4, wherein the first audio stream and the second audio stream are produced by different participants in the conference call.
  - 6. The system of claim 1, wherein the parsing chooses low volume points in the stream for the beginning and end of the chunk.
  - 7. The system of claim 1, wherein the audio stream is parsed from a video stream.
  - 8. The system of claim 1, wherein:
    - the selected fidelity mode is the text interpreting mode, and the input from the first analyst also indicates the analyst'"'"'s interpretation of an intent of a person whose voice is represented by the first chunk of audio data; and
      
      the programming instructions are further executable to respond automatically to the interpreted intent.
  - 9. The system of claim 8, wherein:
    - the interpreted intent is to suspend transcription of audio from all audio streams; and
      
      the automatic response is to suspend transcription of audio from all audio streams until the system receives instruction from a user to resume.
  - 10. The system of claim 9, wherein:
    - while transcription is suspended, the system still causes the audio streams to be played for analysts, butno transcription is accepted other than an interpretation of the intent of a person to resume transcription.

11. A system comprising a processor and a memory in communication with the processor, the memory storing programming instructions executable by the processor to:
- play each of a plurality of audio chunks, each for at least one of a plurality of analysts, where the audio chunks were each captured from a single line associated with a participant of a conference call, and together represent speech arriving on all lines of the conference call;
  
  accept, responsive to a selected fidelity mode chosen from a plurality of fidelity modes, input from the plurality of analysts, the input indicating a transcript of each of the plurality of chunks, and collectively indicating a transcript of speech on all lines of the conference call, the plurality of fidelity modes having corresponding levels of fidelity and including all of;
  
  a first verbatim interpreting mode wherein a first analyst provides a substantially verbatim transcription of a first audio stream;
  
  a second text interpreting mode wherein the first analyst listens to a first chunk of audio data and provides input of text that has a substantially identical meaning to the words spoken in the first chunk of audio data; and
  
  a third automatic transcription mode wherein the first analyst repeats the first chunk of audio as input for an automatic transcription subsystem responsive to the automatic transcription subsystem having a level of confidence below a threshold when provided with the first audio stream as input;
  
  cause the transcript to be displayed on a web-based user interface to at least one participant in the conference call, where the display of the transcript of each chunk occurs in substantially real time relative to the capture of that chunk and includes an identity of the participant of the conference call associated with the line from which the chunk was captured;
  
  determine that the transcript displayed on the user interface is not accurate; and
  
  automatically, by a computer system and without user input, selecting, based on the levels of fidelity corresponding to the plurality of fidelity modes, a different fidelity mode in the plurality of fidelity modes having a corresponding level of fidelity higher than the selected fidelity mode.
- View Dependent Claims (12, 13, 14)
- - 12. The system of claim 11, wherein the programming instructions are further executable by the processor to:
    - accept input indicating that the transcript of a chunk is incorrect;
      
      correct the transcript of that chunk; and
      
      display the corrected transcript of that chunk.
  - 13. The system of claim 12, wherein the programming instructions executable by the processor to correct the transcript comprise instructions to accept input from an analyst in the plurality of analysts indicating a corrected transcript of the chunk.
  - 14. The system of claim 11, wherein the programming instructions are further executable by the processor to:
    - accept intent input from at least one of the plurality of analysts, where the intent input indicates the analyst'"'"'s interpretation of an intent of a conference call participant; and
      
      automatically respond to the intent of the conference call participant.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Interactions, LLC
Original Assignee
Interactions, LLC
Inventors
Cloran, Michael Eric, Heitzman, David Paul, Shields, Mitchell Gregory, Goetz, Jeromey Russell
Primary Examiner(s)
WOZNIAK, JAMES S

Application Number

US12/618,742
Publication Number

US 20100063815A1
Time in Patent Office

2,802 Days
Field of Search

704235, 704270, 7042701, 348 1408, 715753, 3792021, 379908, 705 11, 705301
US Class Current
CPC Class Codes

G06Q 10/10   Office automation; Time man...

G06Q 10/103   Workflow collaboration or p...

G06Q 30/02   Marketing; Price estimation...

G10L 15/26   Speech to text systems G10L...

H04M 3/568   audio processing specific t...

Real-time transcription system utilizing divided audio chunks

First Claim

17 Assignments

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Real-time transcription system utilizing divided audio chunks

First Claim

17 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links