Tool and method for enhanced human machine collaboration for rapid and accurate transcriptions
First Claim
1. A transcription system for transcribing a set of audio data into transcribed text comprising:
- an audio processor configured to convert the set of audio data to segment the audio data into a first set of audio segments;
the audio processor configured to store the set of audio segments in an audio repository;
a set of transcription hosts connected to a network, each transcription host of the set of transcription hosts in communication with an acoustic speech recognition system, the audio processor and the audio repository, wherein each transcription host of the set of transcription hosts comprises;
a processor,a display,a set of human interface devices,an audio playback controller, anda transcription controller;
wherein the acoustic speech recognition system is configured to operate on the audio data to produce a first set of word lattices;
wherein the audio playback controller of each transcription host is configurable to audibly playback the set of audio segments;
wherein the transcription controller of each transcription host in the set of transcription hosts is configured to;
retrieve a second set of audio segments from the first set of audio segments and a second set of word lattices from the first set of word lattices;
associate a first word lattice from the second set of word lattices with a first audio segment from the second set of audio segments;
associate a second word lattice from the second set of word lattices with a second audio segment from the second set of audio segments;
display a graphical representation of the first word lattice and second word lattice; and
accept an operator input via the set of human interface devices to confirm at least one word of the first word lattice as transcribed text.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and methods for transcribing text from audio and video files including a set of transcription hosts and an automatic speech recognition system. ASR word-lattices are dynamically selected from either a text box or word-lattice graph wherein the most probable text sequences are presented to the transcriptionist. Secure transcriptions may be accomplished by segmenting a digital audio file into a set of audio slices for transcription by a plurality of transcriptionist. No one transcriptionist is aware of the final transcribed text, only small portions of transcribed text. Secure and high quality transcriptions may be accomplished by segmenting a digital audio file into a set of audio slices, sending them serially to a set of transcriptionists and updating the acoustic and language models at each step to improve the word-lattice accuracy.
106 Citations
31 Claims
-
1. A transcription system for transcribing a set of audio data into transcribed text comprising:
-
an audio processor configured to convert the set of audio data to segment the audio data into a first set of audio segments; the audio processor configured to store the set of audio segments in an audio repository; a set of transcription hosts connected to a network, each transcription host of the set of transcription hosts in communication with an acoustic speech recognition system, the audio processor and the audio repository, wherein each transcription host of the set of transcription hosts comprises; a processor, a display, a set of human interface devices, an audio playback controller, and a transcription controller; wherein the acoustic speech recognition system is configured to operate on the audio data to produce a first set of word lattices; wherein the audio playback controller of each transcription host is configurable to audibly playback the set of audio segments; wherein the transcription controller of each transcription host in the set of transcription hosts is configured to; retrieve a second set of audio segments from the first set of audio segments and a second set of word lattices from the first set of word lattices; associate a first word lattice from the second set of word lattices with a first audio segment from the second set of audio segments; associate a second word lattice from the second set of word lattices with a second audio segment from the second set of audio segments; display a graphical representation of the first word lattice and second word lattice; and accept an operator input via the set of human interface devices to confirm at least one word of the first word lattice as transcribed text. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method for transcription of audio data into transcribed text by a transcription host including an audio playback controller and a transcription controller, a display and a set of human interface devices, the method including the steps of:
-
providing audio controls in the audio playback controller to play the audio data at an audio playback rate; converting the audio data into a visual audio format; segmenting the audio data into a set of audio segments; operating on the audio data with an automatic speech recognition system to arrive at a set of word lattices; correlating a first word lattice in the set of word lattices to a first audio segment in the set of audio segments; correlating a second word lattice in the set of word lattices to a second audio segment in the set of audio segments; displaying a portion of converted audio data associated to the first and second audio segment in the visual audio format; displaying a graphic of the first word lattice on the display as a graphical word lattice; configuring a textual input box to show the first word lattice and to capture a textual input from a human interface device; playing the first audio segment using the audio playback controller; performing a transcription input; controlling the audio playback rate; repeating the transcription input step for the first word lattice until a text sequence is accepted as transcribed text; displaying a graphic of the second word lattice on the display as the graphical word lattice; configuring the textual input box to show the second word lattice and to capture a textual input from a human interface device; playing the second audio segment using the audio playback controller; repeating the transcription input step for the second word lattice until a text sequence is accepted as and appended to the transcribed text. - View Dependent Claims (15, 16, 17)
-
-
18. A method for performing transcriptions of audio data into transcribed text utilizing a transcription host device having a display, and wherein the audio data is segmented into a set of audio slices, the method including the steps of:
-
a. determining a universe of ASR word-lattices for the audio data; b. associating an available ASR word-lattice in the universe of ASR word-lattices with an audio slice in the set of audio slices; c. playing an audio slice from the set of audio slices; d. upon a textual input of at least one character, identifying a set of viable text sequences from the available ASR word-lattice; e. displaying the set of viable text sequences as an N-best list; f. displaying the available ASR word lattice as a graph; g. waiting for at least one of the group of a word selection from the N-best list, a text sequence selection within the graph, and a typed character; h. if a typed character occurs, repeating the preceding steps beginning with the step of identifying a set of viable text sequences; i. if a word selection occurs or a text sequence selection occurs, narrow the set of viable text sequences based on the word or text sequence selection; j. if the audio slice has not been fully transcribed then repeating steps g-h; and k. if the audio slice is fully transcribed, obtaining a next audio slice in the set of audio slices and repeating steps b-j with the next audio slice. - View Dependent Claims (19, 20)
-
-
21. A method for secure transcription of a digital audio file into a transcribed text document comprising the steps of:
-
providing a first transcription host to a first transcriptionist, wherein the first transcription host is equipped with a first automatic speech recognition system; providing a second transcription host to a second transcriptionist, wherein the second transcription host is equipped with a second automatic speech recognition system; providing a master transcription controller in communication with the first and second transcription hosts; segmenting the digital audio file into a first set of audio slices and a second set of audio slices; sending the first set of audio slices from the master transcription controller to the first transcriptionist; sending the second set of audio slices from the master transcription controller to the second transcriptionist; the first transcriptionist transcribing the first set of audio slices using the first transcription host into a first transcribed text; the second transcriptionist transcribing the second set of audio slices using the second transcription host into a second transcribed text; the first and second transcriptionist sending the first and second transcribed texts to the master transcription controller; and the master transcription controller combining the first transcribed text and the second transcribed text into a final transcribed text as the digital audio file. - View Dependent Claims (22, 23, 24, 25, 26)
-
-
27. A method for secure and accurate transcription of a digital audio file into a transcribed text document comprising the steps of:
-
providing a set of transcription hosts to a set of transcriptionists comprising at least three transcriptionists, wherein each transcription host in the set of transcription hosts is equipped with an automatic speech recognition system; providing a master transcription controller in communication with the set of transcription hosts; segmenting the digital audio file into at least three sets of audio slices, distributing each set of audio slices from the master transcription controller to each transcriptionist in the set of transcriptionists; the set of transcriptionist transcribing the at least three sets of audio slices into at least three transcribed texts; the set of transcriptionists sending the at least three transcribed texts to the master transcription controller; and the master transcription controller combining the at least three transcribed texts into a final transcribed text for the digital audio file. - View Dependent Claims (28, 29, 30, 31)
-
Specification