Systems and methods for evaluating speaker suitability for automatic speech recognition aided transcription

US 20040049385A1
Filed: 04/28/2003
Published: 03/11/2004
Est. Priority Date: 05/01/2002
Status: Active Grant

First Claim

Patent Images

1. A method for determining a speaker'"'"'s suitability for automated speech recognition in a transcription system comprising the steps of:

accumulating a predetermined number of the speaker'"'"'s voice files and associated text files, wherein the associated text files are transcriptions of the voice files transcribed by a transcriptionist;

providing a voice file to a speech recognition engine to generate a recognized text file;

determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file;

determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and

setting a transcription mode for the speaker, wherein the transcription mode indicates the method of transcription thereafter used in the transcription system.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention is a method for determining the most efficient mode of transcription in a transcription system utilizing both a human transcriptionist and automated speech recognition, and systems employing this method. The invention allows determination of speaker suitability for automated speech recognition based on voice files that have already been transcribed by a human transcriptionist, and thus does not generally require a speaker to read a transcript and does not generally require a transcriptionist to transcribe a voice file specifically for the purposes of the determination. The invention allows one of several different modes of transcription to be associated with the speaker, and provides a method for determining which of these several different modes would maximize the efficiency of the transcription system for transcribing voice files generated by the speaker.

Citations

74 Claims

1. A method for determining a speaker'"'"'s suitability for automated speech recognition in a transcription system comprising the steps of:
- accumulating a predetermined number of the speaker'"'"'s voice files and associated text files, wherein the associated text files are transcriptions of the voice files transcribed by a transcriptionist;
  
  providing a voice file to a speech recognition engine to generate a recognized text file;
  
  determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file;
  
  determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and
  
  setting a transcription mode for the speaker, wherein the transcription mode indicates the method of transcription thereafter used in the transcription system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
- - 2. The method of claim 1, wherein the score is based on an edit distance between the associated text file and the recognized text file.
  - 3. The method of claim 2, wherein the score is based on characteristics of the voice file selected from the group consisting of the number of pauses in the voice file, the amount of silence in the voice file, and the audio quality of the voice file.
  - 4. The method of claim 2, wherein the score is based on characteristics of the recognized text file and associated text file selected from the group consisting of the amount of correction required to bring the recognized text file into conformity with the finished text file, the amount of out of order text in the recognized text file, and the amount of formatting in the associated text file.
  - 5. The method of claim 3, wherein the score is based on characteristics of the recognized text file and associated text file selected from the group consisting of the amount of correction required to bring the recognized text file into conformity with the finished text file, the amount of out of order text in the recognized text file, and the amount of formatting in the associated text file.
  - 6. The method of claim 2, wherein the score is based on results of the transcription provided by the speech recognition engine selected from the group consisting of the confidence of the recognition engine in each of the tokens recognized and the ability of the recognition engine to generate a recognized text file using guided recognition.
  - 7. The method of claim 3, wherein the score is based on results of the transcription provided by the speech recognition engine selected from the group consisting of the confidence of the recognition engine in each of the tokens recognized and the ability of the recognition engine to generate a recognized text file using guided recognition.
  - 8. The method of claim 4, wherein the score is based on results of the transcription provided by the speech recognition engine selected from the group consisting of the confidence of the recognition engine in each of the tokens recognized and the ability of the recognition engine to generate a recognized text file using guided recognition.
  - 9. The method of claim 5, wherein the score is based on results of the transcription provided by the speech recognition engine selected from the group consisting of the confidence of the recognition engine in each of the tokens recognized and the ability of the recognition engine to generate a recognized text file using guided recognition.
  - 10. The method of claim 2, wherein the score is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file.
  - 11. The method of claim 3, wherein the score is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file.
  - 12. The method of claim 4, wherein the score is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file.
  - 13. The method of claim 5, wherein the score is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file.
  - 14. The method of claim 6, wherein the score is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file.
  - 15. The method of claim 7, wherein the score is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file.
  - 16. The method of claim 8, wherein the score is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file.
  - 17. The method of claim 9, wherein the score is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file.
  - 18. The method of claim 1, wherein the method of transcription is presenting a transcriptionist with a recognized text file for editing.
  - 19. The method of claim 2, wherein the method of transcription is presenting a transcriptionist with a recognized text file for editing.
  - 20. The method of claim 3, wherein the method of transcription is presenting a transcriptionist with a recognized text file for editing.
  - 21. The method of claim 4, wherein the method of transcription is presenting a transcriptionist with a recognized text file for editing.
  - 22. The method of claim 6, wherein the method of transcription is presenting a transcriptionist with a recognized text file for editing.
  - 23. The method of claim 10, wherein the method of transcription is presenting a transcriptionist with a recognized text file for editing.
  - 24. The method of claim 1, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing, wherein non-speech noises have been removed from the voice file using automated speech recognition.
  - 25. The method of claim 2, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing, wherein non-speech noises have been removed from the voice file using automated speech recognition.
  - 26. The method of claim 3, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing, wherein non-speech noises have been removed from the voice file using automated speech recognition.
  - 27. The method of claim 4, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing, wherein non-speech noises have been removed from the voice file using automated speech recognition.
  - 28. The method of claim 6, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing, wherein non-speech noises have been removed from the voice file using automated speech recognition.
  - 29. The method of claim 10, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing, wherein non-speech noises have been removed from the voice file using automated speech recognition.
  - 30. The method of claim 1, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing.
  - 31. The method of claim 2, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing.
  - 32. The method of claim 3, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing.
  - 33. The method of claim 4, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing.
  - 34. The method of claim 6, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing.
  - 35. The method of claim 10, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing.

36. A transcription system comprising:
- a server for accumulating a predetermined number of the speaker'"'"'s voice files and associated text files, wherein the associated text files are transcriptions of the voice files transcribed by a transcriptionist;
  
  a speech recognition engine, wherein a voice file is provided to said speech recognition engine and a recognized text file is generated therefrom;
  
  a means for determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file;
  
  a means for determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and
  
  a means for setting a transcription mode for the speaker, wherein the transcription mode indicates the method of transcription thereafter used in the transcription system.
- View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70)
- - 37. The system of claim 36, wherein the score is based on an edit distance between the associated text file and the recognized text file.
  - 38. The system of claim 37, wherein the score is based on characteristics of the voice file selected from the group consisting of the number of pauses in the voice file, the amount of silence in the voice file, and the audio quality of the voice file.
  - 39. The system of claim 37, wherein the score is based on characteristics of the recognized text file and associated text file selected from the group consisting of the amount of correction required to bring the recognized text file into conformity with the finished text file, the amount of out of order text in the recognized text file, and the amount of formatting in the associated text file.
  - 40. The system of claim 38, wherein the score is based on characteristics of the recognized text file and associated text file selected from the group consisting of the amount of correction required to bring the recognized text file into conformity with the finished text file, the amount of out of order text in the recognized text file, and the amount of formatting in the associated text file.
  - 41. The system of claim 37, wherein the score is based on results of the transcription provided by the speech recognition engine selected from the group consisting of the confidence of the recognition engine in each of the tokens recognized and the ability of the recognition engine to generate a recognized text file using guided recognition.
  - 42. The system of claim 38, wherein the score is based on results of the transcription provided by the speech recognition engine selected from the group consisting of the confidence of the recognition engine in each of the tokens recognized and the ability of the recognition engine to generate a recognized text file using guided recognition.
  - 43. The system of claim 39, wherein the score is based on results of the transcription provided by the speech recognition engine selected from the group consisting of the confidence of the recognition engine in each of the tokens recognized and the ability of the recognition engine to generate a recognized text file using guided recognition.
  - 44. The system of claim 40, wherein the score is based on results of the transcription provided by the speech recognition engine selected from the group consisting of the confidence of the recognition engine in each of the tokens recognized and the ability of the recognition engine to generate a recognized text file using guided recognition.
  - 45. The system of claim 37, wherein the score is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file.
  - 46. The system of claim 38, wherein the score is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file.
  - 47. The system of claim 39, wherein the score is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file.
  - 48. The system of claim 40, wherein the score is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file.
  - 49. The system of claim 41, wherein the score is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file.
  - 50. The system of claim 42, wherein the score is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file.
  - 51. The system of claim 43, wherein the score is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file.
  - 52. The system of claim 44, wherein the score is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file.
  - 53. The system of claim 36, wherein the method of transcription is presenting a transcriptionist with a recognized text file for editing.
  - 54. The system of claim 37, wherein the method of transcription is presenting a transcriptionist with a recognized text file for editing.
  - 55. The system of claim 38, wherein the method of transcription is presenting a transcriptionist with a recognized text file for editing.
  - 56. The system of claim 39, wherein the method of transcription is presenting a transcriptionist with a recognized text file for editing.
  - 57. The system of claim 41, wherein the method of transcription is presenting a transcriptionist with a recognized text file for editing.
  - 58. The system of claim 45, wherein the method of transcription is presenting a transcriptionist with a recognized text file for editing.
  - 59. The system of claim 36, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing, wherein non-speech noises have been removed from the voice file using automated speech recognition.
  - 60. The system of claim 37, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing, wherein non-speech noises have been removed from the voice file using automated speech recognition.
  - 61. The system of claim 38, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing, wherein non-speech noises have been removed from the voice file using automated speech recognition.
  - 62. The system of claim 39, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing, wherein non-speech noises have been removed from the voice file using automated speech recognition.
  - 63. The system of claim 41, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing, wherein non-speech noises have been removed from the voice file using automated speech recognition.
  - 64. The system of claim 45, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing, wherein non-speech noises have been removed from the voice file using automated speech recognition.
  - 65. The system of claim 36, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing.
  - 66. The system of claim 37, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing.
  - 67. The system of claim 38, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing.
  - 68. The system of claim 39, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing.
  - 69. The system of claim 40, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing.
  - 70. The system of claim 45, wherein the method of transcription is presenting a transcriptionist with a voice file for transcribing.

71. A method for determining a speaker'"'"'s suitability for automated speech recognition in a transcription system comprising the steps of:
- accumulating a predetermined number of the speaker'"'"'s voice files and associated text files, wherein the associated text files are transcriptions of the voice files transcribed by a transcriptionist;
  
  providing a voice file to a speech recognition engine to generate a recognized text file;
  
  determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file, wherein the score is based on an edit distance between the associated text file and the recognized text file, the number of pauses in the voice file, the amount of silence in the voice file, the audio quality of the voice file, the amount of correction required to bring the recognized text file into conformity with the finished text file, the amount of out of order text in the recognized text file, the amount of formatting in the associated text file, the confidence of the recognition engine in each of the tokens recognized and the ability of the recognition engine to generate a recognized text file using guided recognition, the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file, the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file, or any combination thereof;
  
  determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and
  
  setting a transcription mode for the speaker, wherein the transcription mode indicates the method of transcription thereafter used in the transcription system, and wherein the method of transcription is presenting a transcriptionist with a recognized text file for editing, presenting a transcriptionist with a voice file for transcribing, wherein non-speech noises have been removed from the voice file using automated speech recognition, or presenting the transcriptionist with a voice file for transcribing.

72. A transcription system comprising:
- a voice server for accumulating a predetermined number of the speaker'"'"'s voice files, a text server for accumulating a predetermined number of associated text files, wherein the associated text files are transcriptions of the voice files transcribed by a transcriptionist;
  
  a speech recognition engine, wherein a voice file is provided to said speech recognition engine and a recognized text file is generated therefrom;
  
  a means for determining a score based on the recognized text file generated from the voice file and the associated text file oft he voice file, wherein the score i s based on an edit distance between the associated text file and the recognized text file, the number of pauses in the voice file, the amount of silence in the voice file, the audio quality of the voice file, the amount of correction required to bring the recognized text file into conformity with the finished text file, the amount of out of order text in the recognized text file, the amount of formatting in the associated text file, the confidence of the recognition engine in each of the tokens recognized and the ability of the recognition engine to generate a recognized text file using guided recognition, the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file, the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file, or any combination thereof;
  
  a means for determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and
  
  a means for setting a transcription mode for the speaker, wherein the transcription mode indicates the method of transcription thereafter used in the transcription system, and wherein the method of transcription is presenting a transcriptionist w with a r recognized text file for editing, presenting a transcriptionist with a voice file for transcribing, wherein non-speech noises have been removed from the voice file using automated speech recognition, or presenting the transcriptionist with a voice file for transcribing.
- View Dependent Claims (73, 74)
- - 73. The transcription system of claim 72, further comprising a user database, wherein the transcription mode for the speaker is stored in the user database.
  - 74. The transcription system of claim 73, further comprising a speech recognition server, wherein the speech recognition server receives voice files form the voice server, sends voice files to the speech recognition engine, and sends recognized text files to the voice server or the text server.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Dictaphone Corporation (Microsoft Corporation)
Inventors
MacGinitie, Andrew, Lovance, Elizabeth M.

Granted Patent

US 7,292,975 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/063   Training

G10L 15/183   using context dependencies,...

G10L 15/26   Speech to text systems G10L...

Systems and methods for evaluating speaker suitability for automatic speech recognition aided transcription

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

Citations

74 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for evaluating speaker suitability for automatic speech recognition aided transcription

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

74 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links