Tool and method for enhanced human machine collaboration for rapid and accurate transcriptions

US 20120016671A1
Filed: 07/15/2010
Published: 01/19/2012
Est. Priority Date: 07/15/2010
Status: Abandoned Application

First Claim

Patent Images

1. A transcription system for transcribing a set of audio data into transcribed text comprising:

an audio processor configured to convert the set of audio data to segment the audio data into a first set of audio segments;

the audio processor configured to store the set of audio segments in an audio repository;

a set of transcription hosts connected to a network, each transcription host of the set of transcription hosts in communication with an acoustic speech recognition system, the audio processor and the audio repository, wherein each transcription host of the set of transcription hosts comprises;

a processor,a display,a set of human interface devices,an audio playback controller, anda transcription controller;

wherein the acoustic speech recognition system is configured to operate on the audio data to produce a first set of word lattices;

wherein the audio playback controller of each transcription host is configurable to audibly playback the set of audio segments;

wherein the transcription controller of each transcription host in the set of transcription hosts is configured to;

retrieve a second set of audio segments from the first set of audio segments and a second set of word lattices from the first set of word lattices;

associate a first word lattice from the second set of word lattices with a first audio segment from the second set of audio segments;

associate a second word lattice from the second set of word lattices with a second audio segment from the second set of audio segments;

display a graphical representation of the first word lattice and second word lattice; and

accept an operator input via the set of human interface devices to confirm at least one word of the first word lattice as transcribed text.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and methods for transcribing text from audio and video files including a set of transcription hosts and an automatic speech recognition system. ASR word-lattices are dynamically selected from either a text box or word-lattice graph wherein the most probable text sequences are presented to the transcriptionist. Secure transcriptions may be accomplished by segmenting a digital audio file into a set of audio slices for transcription by a plurality of transcriptionist. No one transcriptionist is aware of the final transcribed text, only small portions of transcribed text. Secure and high quality transcriptions may be accomplished by segmenting a digital audio file into a set of audio slices, sending them serially to a set of transcriptionists and updating the acoustic and language models at each step to improve the word-lattice accuracy.

106 Citations

View as Search Results

31 Claims

1. A transcription system for transcribing a set of audio data into transcribed text comprising:
- an audio processor configured to convert the set of audio data to segment the audio data into a first set of audio segments;
  
  the audio processor configured to store the set of audio segments in an audio repository;
  
  a set of transcription hosts connected to a network, each transcription host of the set of transcription hosts in communication with an acoustic speech recognition system, the audio processor and the audio repository, wherein each transcription host of the set of transcription hosts comprises;
  
  a processor,a display,a set of human interface devices,an audio playback controller, anda transcription controller;
  
  wherein the acoustic speech recognition system is configured to operate on the audio data to produce a first set of word lattices;
  
  wherein the audio playback controller of each transcription host is configurable to audibly playback the set of audio segments;
  
  wherein the transcription controller of each transcription host in the set of transcription hosts is configured to;
  
  retrieve a second set of audio segments from the first set of audio segments and a second set of word lattices from the first set of word lattices;
  
  associate a first word lattice from the second set of word lattices with a first audio segment from the second set of audio segments;
  
  associate a second word lattice from the second set of word lattices with a second audio segment from the second set of audio segments;
  
  display a graphical representation of the first word lattice and second word lattice; and
  
  accept an operator input via the set of human interface devices to confirm at least one word of the first word lattice as transcribed text.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The transcription system of claim 1 wherein the set of transcription hosts are selected from the group of a desktop computer, a laptop computer, a personal digital assistant (PDA), a cellular telephone, a web-enabled communications device, a transcription server serving a transcription host application over the internet to a web-enabled client, and a dedicated transcription device.
  - 3. The transcription system of claim 1 wherein each transcription controller in the set of transcription hosts is further configured:
    - to display the first word lattice and the second word lattice in a textual form in a text input area; and
      
      to allow for selection of at least one word from the first word lattice and the second word lattice.
  - 4. The transcription system of claim 1 wherein the audio playback controller is connected to at least one human interface device of the set of human interface devices.
  - 5. The transcription system of claim 1 wherein the transcription host is configured so that the audio playback controller and the transcription controller are synchronized to establish an audio playback rate in response to a transcription input rate.
  - 6. The transcription system of claim 1 wherein the transcription controller, in displaying the graphical representation of the first word lattice and second word lattice, is further configured to display a set of connecting lines between words in a pre-defined number of most probable text sequences.
  - 7. The transcription system of claim 1 wherein the transcription controller, in displaying the graphic representation of the first word lattice and second word lattice, is further configured to:
    - a. establish a set of probabilities of occurrence for a predefined number of most probable text sequences contained in a word lattice; and
      
      b. display a probability indicator of a set of likely text sequences.
  - 8. The transcription system of claim 7 where the most probable text sequences are comprised of an ordered set of words;
    - and where, the probability indicator is selected from a group including a number, a graphic indicator beside each word in the ordered set of words, an object containing each word in the ordered set of words, a line connecting each word in the ordered set of words.
  - 9. The probability indicator of claim 8 wherein the graphic indicator is assigned a color based on a probability of occurrence.
  - 10. The probability indicator of claim 8 wherein the graphic indicator is assigned a shape based on a probability of occurrence.
  - 11. The transcription system of claim 1 wherein at least one transcription host in the set of transcription hosts is a master transcription controller serving a set of transcription applications over a network to the other transcription hosts in the set of transcription hosts.
  - 12. The transcription system of claim 11 wherein the master transcription controller is enabled to control distribution of audio segments and word-lattices to the other transcription hosts in the set of transcription hosts.
  - 13. The transcription system of claim 1 wherein each transcription host in the set of transcription hosts further comprises an acoustic speech recognition system.

14. A method for transcription of audio data into transcribed text by a transcription host including an audio playback controller and a transcription controller, a display and a set of human interface devices, the method including the steps of:
- providing audio controls in the audio playback controller to play the audio data at an audio playback rate;
  
  converting the audio data into a visual audio format;
  
  segmenting the audio data into a set of audio segments;
  
  operating on the audio data with an automatic speech recognition system to arrive at a set of word lattices;
  
  correlating a first word lattice in the set of word lattices to a first audio segment in the set of audio segments;
  
  correlating a second word lattice in the set of word lattices to a second audio segment in the set of audio segments;
  
  displaying a portion of converted audio data associated to the first and second audio segment in the visual audio format;
  
  displaying a graphic of the first word lattice on the display as a graphical word lattice;
  
  configuring a textual input box to show the first word lattice and to capture a textual input from a human interface device;
  
  playing the first audio segment using the audio playback controller;
  
  performing a transcription input;
  
  controlling the audio playback rate;
  
  repeating the transcription input step for the first word lattice until a text sequence is accepted as transcribed text;
  
  displaying a graphic of the second word lattice on the display as the graphical word lattice;
  
  configuring the textual input box to show the second word lattice and to capture a textual input from a human interface device;
  
  playing the second audio segment using the audio playback controller;
  
  repeating the transcription input step for the second word lattice until a text sequence is accepted as and appended to the transcribed text.
- View Dependent Claims (15, 16, 17)
- - 15. The method of claim 14 wherein the step of performing a transcription input comprises selecting a word or a phrase from the graphical word lattice using a human interface device connected to the transcription controller.
  - 16. The method of claim 14 wherein the step of performing a transcription input comprises typing a character and selecting a word or phrase in the textual input box.
  - 17. The method of claim 14 including the steps of:
    - analyzing an average transcription input rate from the repeated transcription input steps;
      
      controlling the audio playback rate automatically based on the average transcription input rate.

18. A method for performing transcriptions of audio data into transcribed text utilizing a transcription host device having a display, and wherein the audio data is segmented into a set of audio slices, the method including the steps of:
- a. determining a universe of ASR word-lattices for the audio data;
  
  b. associating an available ASR word-lattice in the universe of ASR word-lattices with an audio slice in the set of audio slices;
  
  c. playing an audio slice from the set of audio slices;
  
  d. upon a textual input of at least one character, identifying a set of viable text sequences from the available ASR word-lattice;
  
  e. displaying the set of viable text sequences as an N-best list;
  
  f. displaying the available ASR word lattice as a graph;
  
  g. waiting for at least one of the group of a word selection from the N-best list, a text sequence selection within the graph, and a typed character;
  
  h. if a typed character occurs, repeating the preceding steps beginning with the step of identifying a set of viable text sequences;
  
  i. if a word selection occurs or a text sequence selection occurs, narrow the set of viable text sequences based on the word or text sequence selection;
  
  j. if the audio slice has not been fully transcribed then repeating steps g-h; and
  
  k. if the audio slice is fully transcribed, obtaining a next audio slice in the set of audio slices and repeating steps b-j with the next audio slice.
- View Dependent Claims (19, 20)
- - 19. The method of claim 18 including the steps of:
    - establishing a set of probabilities of occurrence for a predefined number of most probable text sequences contained the available ASR word lattice; and
      
      displaying a probability indicator of the most probable text sequences.
  - 20. The method of claim 18 wherein the step of displaying a probability indicator includes the step of:
    - identifying a text sequence path with a number.

21. A method for secure transcription of a digital audio file into a transcribed text document comprising the steps of:
- providing a first transcription host to a first transcriptionist, wherein the first transcription host is equipped with a first automatic speech recognition system;
  
  providing a second transcription host to a second transcriptionist, wherein the second transcription host is equipped with a second automatic speech recognition system;
  
  providing a master transcription controller in communication with the first and second transcription hosts;
  
  segmenting the digital audio file into a first set of audio slices and a second set of audio slices;
  
  sending the first set of audio slices from the master transcription controller to the first transcriptionist;
  
  sending the second set of audio slices from the master transcription controller to the second transcriptionist;
  
  the first transcriptionist transcribing the first set of audio slices using the first transcription host into a first transcribed text;
  
  the second transcriptionist transcribing the second set of audio slices using the second transcription host into a second transcribed text;
  
  the first and second transcriptionist sending the first and second transcribed texts to the master transcription controller; and
  
  the master transcription controller combining the first transcribed text and the second transcribed text into a final transcribed text as the digital audio file.
- View Dependent Claims (22, 23, 24, 25, 26)
- - 22. The method of claim 21 wherein the step of segmenting the digital audio file further comprises the steps of:
    - segmenting the digital audio file according to a series of time intervals wherein each time interval is subsequent to the previous time interval;
      
      assigning the first time interval in the series of time intervals as a current time interval;
      
      creating a first audio slice recorded during the current time interval;
      
      creating a second audio slice recorded during the next time interval immediately subsequent to the first time interval;
      
      including the first audio slice in the first set of audio slices;
      
      including the second audio slice in the second set of audio slices; and
      
      repeating the preceding steps starting with the step of creating a first audio slice, for the entire series of time intervals.
  - 23. The method of claim 22 wherein the step of segmenting the digital audio file further comprises the steps of:
    - segmenting the digital audio file according to a series of time intervals wherein each time interval partially overlaps with the previous time interval;
      
      assigning the first time interval in the series of time intervals as a current time interval;
      
      creating a first audio slice recorded during a current time interval;
      
      creating a second audio slice recorded during the next time interval in the series of time intervals following, but overlapping with the current time interval;
      
      including the first audio slice in the first set of audio slices;
      
      including the second audio slice in the second set of audio slices; and
      
      repeating the preceding steps starting with the step of creating a first audio slice, for the entire series of time intervals.
  - 24. The method of claim 23 wherein the step of segmenting the digital audio file further comprises the steps of:
    - segmenting the digital audio file according to a series of time intervals wherein each time interval is subsequent to the previous time interval;
      
      assigning the first time interval in the series of time intervals as a current time interval;
      
      creating a current audio slice recorded during the current time interval;
      
      including the current audio slice in the first set of audio slices;
      
      including the current audio slice in the second set of audio slices; and
      
      repeating the preceding steps starting with the step of creating a first audio slice, for the entire series of time intervals.
  - 25. The method of claim 24 including the further step of the master controller comparing the first transcribed text to the second transcribed text to assess the quality of at least one of the group of the first transcribed text, the second transcribed text, and the final transcribed text.
  - 26. The method of claim 24 including the further steps of:
    - associating an accurate text to the digital audio file; and
      
      comparing the first transcribed text and the second transcribed text to the accurate text to assess the quality of transcription by at least one of the first transcriptionist and the second transcriptionist.

27. A method for secure and accurate transcription of a digital audio file into a transcribed text document comprising the steps of:
- providing a set of transcription hosts to a set of transcriptionists comprising at least three transcriptionists, wherein each transcription host in the set of transcription hosts is equipped with an automatic speech recognition system;
  
  providing a master transcription controller in communication with the set of transcription hosts;
  
  segmenting the digital audio file into at least three sets of audio slices,distributing each set of audio slices from the master transcription controller to each transcriptionist in the set of transcriptionists;
  
  the set of transcriptionist transcribing the at least three sets of audio slices into at least three transcribed texts;
  
  the set of transcriptionists sending the at least three transcribed texts to the master transcription controller; and
  
  the master transcription controller combining the at least three transcribed texts into a final transcribed text for the digital audio file.
- View Dependent Claims (28, 29, 30, 31)
- - 28. The method of claim 27 wherein the step of segmenting the digital audio file includes the additional step of ensuring that audio slices comprising each set of audio slices are not associated to consecutive recorded time intervals in the digital audio file.
  - 29. The method of claim 27 wherein the step of segmenting the digital audio file includes the additional step of constructing each set of audio slices from audio slices associated to random recorded time intervals in the digital audio file.
  - 30. The method of claim 27 including the additional step of assessing the accuracy of the transcribed text by counting the number of matching words in the at least three transcribed texts.
  - 31. The method of claim 27 including the additional step of assessing the accuracy of the transcribed text further comprising the steps of:
    - computing a correlation coefficient for each word in the at least three transcribed texts;
      
      assigning a weight to each word in the at least three transcribed texts;
      
      deriving a set of scores containing one score for each word in the at least three transcribed texts, by multiplying the weight by the correlation coefficient; and
      
      ,selecting a set of words for inclusion in the final transcribed text based on the set of scores.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Speetra, Inc. (EverCommerce, Inc.)
Original Assignee
Speetra, Inc. (EverCommerce, Inc.)
Inventors
Jaggi, Pawan, Sangwan, Abhijeet

Application Number

US12/804,159
Publication Number

US 20120016671A1
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/083   Recognition networks G10L15...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2015/221   Announcement of recognition...

G10L 21/10   Transforming into visible i...

Tool and method for enhanced human machine collaboration for rapid and accurate transcriptions

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

106 Citations

31 Claims

Specification

Use Cases

Quick Links

Others

Tool and method for enhanced human machine collaboration for rapid and accurate transcriptions

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

106 Citations

31 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others