Real-time transcription system utilizing divided audio chunks
First Claim
1. A system, comprising a processor and a memory in communication with the processor, the memory storing programming instructions executable by the processor to:
- cause a first chunk of audio data to be played for a first analyst, where the first chunk of audio data represents a segment of a first audio stream, the segment associated with a participant in a conference call;
accept input from the first analyst sufficient to indicate a transcription of the segment of the first audio stream, the input being dependent upon a selected fidelity mode chosen from a plurality of fidelity modes, the plurality of fidelity modes having corresponding levels of fidelity and including all of;
a first verbatim interpreting mode wherein the first analyst provides a substantially verbatim transcription of the first audio stream;
a second text interpreting mode wherein the first analyst listens to the first chunk of audio data and provides input of text that has a substantially identical meaning to the words spoken in the first chunk of audio data; and
a third automatic transcription mode wherein the first analyst repeats the first chunk of audio as input for an automatic transcription subsystem responsive to the automatic transcription subsystem having a level of confidence below a threshold when provided with the first audio stream as input;
cause the transcription and an identity of the participant in the conference call to be displayed on a web-based user interface in substantially real time relative to the capture of the segment of the first audio stream;
determining that the transcription displayed on the user interface is not accurate; and
automatically, by a computer system and without user input, selecting, based on the levels of fidelity corresponding to the plurality of fidelity modes, a different fidelity mode in the plurality of fidelity modes having a corresponding level of fidelity higher than the selected fidelity mode.
17 Assignments
0 Petitions
Accused Products
Abstract
A computing system accepts audio from one or more sources, parses the audio into chunks, and transcribes the chunks in substantially real time. Some transcription is performed automatically, while other transcription is performed by humans who listen to the audio and enter the words spoken and/or the intent of the caller (such as directions given to the system). The system provides for participants a user interface that is updated in substantially real time with the transcribed text from the audio stream(s). A single audio line can be used for simple transcription, and multiple audio lines are used to provide a real-time transcript of a conference call, deposition, or the like. A pool of analysts creates, checks, and/or corrects transcription, and callers/observers can even assist in the correction process through their respective user interfaces. Ads derived from the transcript are displayed together with the text in substantially real time.
-
Citations
14 Claims
-
1. A system, comprising a processor and a memory in communication with the processor, the memory storing programming instructions executable by the processor to:
-
cause a first chunk of audio data to be played for a first analyst, where the first chunk of audio data represents a segment of a first audio stream, the segment associated with a participant in a conference call; accept input from the first analyst sufficient to indicate a transcription of the segment of the first audio stream, the input being dependent upon a selected fidelity mode chosen from a plurality of fidelity modes, the plurality of fidelity modes having corresponding levels of fidelity and including all of; a first verbatim interpreting mode wherein the first analyst provides a substantially verbatim transcription of the first audio stream; a second text interpreting mode wherein the first analyst listens to the first chunk of audio data and provides input of text that has a substantially identical meaning to the words spoken in the first chunk of audio data; and a third automatic transcription mode wherein the first analyst repeats the first chunk of audio as input for an automatic transcription subsystem responsive to the automatic transcription subsystem having a level of confidence below a threshold when provided with the first audio stream as input; cause the transcription and an identity of the participant in the conference call to be displayed on a web-based user interface in substantially real time relative to the capture of the segment of the first audio stream; determining that the transcription displayed on the user interface is not accurate; and automatically, by a computer system and without user input, selecting, based on the levels of fidelity corresponding to the plurality of fidelity modes, a different fidelity mode in the plurality of fidelity modes having a corresponding level of fidelity higher than the selected fidelity mode. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system comprising a processor and a memory in communication with the processor, the memory storing programming instructions executable by the processor to:
-
play each of a plurality of audio chunks, each for at least one of a plurality of analysts, where the audio chunks were each captured from a single line associated with a participant of a conference call, and together represent speech arriving on all lines of the conference call; accept, responsive to a selected fidelity mode chosen from a plurality of fidelity modes, input from the plurality of analysts, the input indicating a transcript of each of the plurality of chunks, and collectively indicating a transcript of speech on all lines of the conference call, the plurality of fidelity modes having corresponding levels of fidelity and including all of; a first verbatim interpreting mode wherein a first analyst provides a substantially verbatim transcription of a first audio stream; a second text interpreting mode wherein the first analyst listens to a first chunk of audio data and provides input of text that has a substantially identical meaning to the words spoken in the first chunk of audio data; and a third automatic transcription mode wherein the first analyst repeats the first chunk of audio as input for an automatic transcription subsystem responsive to the automatic transcription subsystem having a level of confidence below a threshold when provided with the first audio stream as input; cause the transcript to be displayed on a web-based user interface to at least one participant in the conference call, where the display of the transcript of each chunk occurs in substantially real time relative to the capture of that chunk and includes an identity of the participant of the conference call associated with the line from which the chunk was captured; determine that the transcript displayed on the user interface is not accurate; and automatically, by a computer system and without user input, selecting, based on the levels of fidelity corresponding to the plurality of fidelity modes, a different fidelity mode in the plurality of fidelity modes having a corresponding level of fidelity higher than the selected fidelity mode. - View Dependent Claims (12, 13, 14)
-
Specification