Systems and methods for evaluating speaker suitability for automatic speech recognition aided transcription
First Claim
1. A method for determining a speaker'"'"'s suitability for automated speech recognition in a transcription system comprising the steps of:
- accumulating a predetermined number of the speaker'"'"'s voice files and associated text files, wherein the associated text files are transcription of the voice files transcribed by a transcriptionist;
generating a recognized text file from said voice file using a speech recognition engine;
determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file, wherein the score is based on the edit distance between the associated text file and the recognized text file, and on the results of the transcription automatically measured during transcription by a transcriptionist selected from one of (a) the time it takes the transcriptionist to edit a recognized text file into conformity with a finished document or (b) the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file;
determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and
setting a transcription mode for the speaker, wherein the transcription mode indicates the method of transcription thereafter used in the transcription system.
9 Assignments
0 Petitions
Accused Products
Abstract
The invention is a method for determining the most efficient mode of transcription in a transcription system utilizing both a human transcriptionist and automated speech recognition, and systems employing this method. The invention allows determination of speaker suitability for automated speech recognition based on voice files that have already been transcribed by a human transcriptionist, and thus does not generally require a speaker to read a transcript and does not generally require a transcriptionist to transcribe a voice file specifically for the purposes of the determination. The invention allows one of several different modes of transcription to be associated with the speaker, and provides a method for determining which of these several different modes would maximize the efficiency of the transcription system for transcribing voice files generated by the speaker.
107 Citations
13 Claims
-
1. A method for determining a speaker'"'"'s suitability for automated speech recognition in a transcription system comprising the steps of:
-
accumulating a predetermined number of the speaker'"'"'s voice files and associated text files, wherein the associated text files are transcription of the voice files transcribed by a transcriptionist; generating a recognized text file from said voice file using a speech recognition engine; determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file, wherein the score is based on the edit distance between the associated text file and the recognized text file, and on the results of the transcription automatically measured during transcription by a transcriptionist selected from one of (a) the time it takes the transcriptionist to edit a recognized text file into conformity with a finished document or (b) the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file; determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and setting a transcription mode for the speaker, wherein the transcription mode indicates the method of transcription thereafter used in the transcription system.
-
-
2. A method for determining a speaker'"'"'s suitability for automated speech recognition in a transcription system comprising the steps of:
-
accumulating a predetermined number of the speaker'"'"'s voice files and associated text files, wherein the associated text files are transcription of the voice files transcribed by a transcriptionist; generating a recognized text file from said voice file using a speech recognition engine; determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file, wherein the score is based on the edit distance between the associated text file and the recognized text file, and on characteristics of the voice file selected from the group consisting of the number of pauses in the voice file, the amount of silence in the voice file, and the audio quality of the voice file, and on the results of transcription automatically measured during transcription by a transcriptionist selected from one of;
(a) the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished document or (b) and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file;determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores arc scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and setting a transcription mode for the speaker, wherein the transcription mode indicates the method of transcription thereafter used in the transcription system.
-
-
3. A method for determining a speaker'"'"'s suitability for automated speech recognition in a transcription system comprising the steps of:
-
accumulating a predetermined number of the speaker'"'"'s voice files and associated text files, wherein the associated text files are transcription of the voice files transcribed by a transcriptionist; generating a recognized text file from said voice file using a speech recognition engine; determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file, wherein the score is based on the edit distance between the associated text file and the recognized text file and on characteristics of the recognized text file and associated text file selected from the group consisting of the amount of correction required to bring the recognized text file into conformity with the finished text file, the amount of out of order text in the recognized text file, and the amount of formatting in the associated text file, and on results of the transcription automatically measured during transcription by a transcriptionist selected from one of (a) the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished document or (b) the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file; determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and setting a transcription mode for the speaker, wherein the transcription mode indicates the method of transcription thereafter used in the transcription system.
-
-
4. A method for determining a speaker'"'"'s suitability for automated speech recognition in a transcription system comprising the steps of:
-
accumulating a predetermined number of the speaker'"'"'s voice tiles and associated text files, wherein the associated text files are transcription of the voice files transcribed by a transcriptionist; generating a recognized text file from said voice tile using a speech recognition engine; determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file, wherein the score is based on the edit distance between the associated text file and the recognized text file and on characteristics of the voice file selected from the group consisting of the number of pauses in the voice file, the amount of silence in the voice file, and the audio quality of the voice file, and based on characteristics of the recognized text file and associated text file selected from the group consisting of the amount of correction required to bring the recognized text file into conformity with the finished text file, the amount of out of order text in the recognized text file, and the amount of formatting in the associated text file, and on the results of transcription automatically measured during transcription by a transcriptionist selected from one of (a) the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished document or (b) and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file; determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and setting a transcription mode for the speaker, wherein the transcription mode indicates the method of transcription thereafter used in the transcription system.
-
-
5. A method for determining a speaker'"'"'s suitability for automated speech recognition in a transcription system comprising the steps of:
-
accumulating a predetermined number of the speaker'"'"'s voice files and associated text files, wherein the associated text files are transcription of the voice files transcribed by a transcriptionist; generating a recognized text file from said voice file using a speech recognition engine; determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file, wherein the score is based on the edit distance between the associated text file and the recognized text file and on results of the transcription provided by the speech recognition engines selected from the group consisting of the confidence of the recognition engine in each of the tokens recognized and the ability of the recognition engine to generate a recognized text file using guided recognition, and on the results of the transcription automatically measured during transcription by a transcriptionist selected from one of (a) the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished document or (b) the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file; determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and setting a transcription mode for the speaker, wherein the transcription mode indicates the method of transcription thereafter used in the transcription system.
-
-
6. A transcription system comprising:
-
a server for accumulating a predetermined number of the speaker'"'"'s voice files and associated text files, wherein the associated text files are transcriptions of the voice files transcribed by a transcriptionist; a speech recognition engine, wherein a voice file is provided to said speech recognition engine and a recognized text file is generated therefrom; a means for determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file, wherein the score is based on the edit distance between the associated text file and the recognized text file and on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file; a means for determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and a means for setting a transcription mode for the speaker, wherein the transcription mode indicates the transcription thereafter used in the transcription system.
-
-
7. A transcription system comprising:
-
a server for accumulating a predetermined number of the speaker'"'"'s voice files and associated text files, wherein the associated text files are transcriptions of the voice files transcribed b a transcriptionist; a speech recognition engine, wherein a voice file is provided to said speech recognition engine and a recognized text file is generated therefrom; a means for determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file, wherein the score is based on the edit distance between the associated text file and the recognized text file and on characteristics of the voice file selected from the group consisting of the number of pauses in the voice file, the amount of silence in the voice file, and the audio quality f the voice file, and is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file; a means for determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and a means for setting a transcription mode for the speaker, wherein the transcription mode indicates the transcription thereafter used in the transcription system.
-
-
8. A transcription system comprising:
-
a server for accumulating a predetermined number of the speaker'"'"'s voice files and associated text files, wherein the associated text files are transcriptions of the voice files transcribed by a transcriptionist; a speech recognition engine, wherein a voice file is provided to said speech recognition engine and a recognized text file is generated therefrom; a means for determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file, wherein the score is based on the edit distance between the associated text file and the recognized text file and on characteristics of the recognized text file and associated text file selected from the group consisting of the amount of correction required to bring the recognized text file into conformity with the finished text file, the amount of out of order text in the recognized text file, and the amount of formatting in the associated text file, and further is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file; a means for determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and a means for setting a transcription mode for the speaker, wherein the transcription mode indicates the transcription thereafter used in the transcription system.
-
-
9. A transcription system comprising:
-
a server for accumulating a predetermined number of the speaker'"'"'s voice flies and associated text files, wherein the associated text files are transcriptions of the voice files transcribed by a transcriptionist; a speech recognition engine, wherein a voice file is provided to said speech recognition engine and a recognized text file is generated therefrom; a means for determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file, wherein the score is based on the edit distance between the associated text file and the recognized text file and on characteristics of the voice file selected from the group consisting of the number of pauses in the voice file, the amount of silence in the voice file, and the audio quality of the voice file and on characteristics of the recognized text file and associated text file selected from the group consisting of the amount of correction required to bring the recognized text file into conformity with the finished text file, the amount of out of order text in the recognized text file, and the amount of formatting in the associated text file, and further is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file; a means for determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and a means for setting a transcription mode for the speaker wherein the transcription mode indicates the transcription thereafter used in the transcription system.
-
-
10. A transcription system comprising:
-
a server for accumulating a predetermined number of the speaker'"'"'s voice files and associated text files, wherein the associated text files are transcriptions of the voice files transcribed by a transcriptionist; a speech recognition engine, wherein a voice file is provided to said speech recognition engine and a recognized text file is generated therefrom; a means for determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file, wherein the score is based on the edit distance between the associated text file and the recognized text file and on results of the transcription provided by the speech recognition engine selected from the group consisting of the confidence of the recognition engine in each of the tokens recognized and the ability of the recognition engine to generate a recognized text file using guided recognition, and further is based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file; a means for determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and a means for setting a transcription mode for the speaker, wherein the transcription mode indicates the transcription thereafter used in the transcription system.
-
-
11. A transcription system comprising:
-
a server for accumulating a predetermined number of the speaker'"'"'s voice files and associated text files, wherein the associated text files are transcriptions of the voice files transcribed by a transcriptionist; a speech recognition engine, wherein a voice file is provided to said speech recognition engine and a recognized text file is generated therefrom; a means for determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file, wherein the score is based on the edit distance between the associated text file and the recognized text file and on characteristics of the voice file selected from the group consisting of the number of pauses in the voice file, the amount of silence in the voice file, and the audio quality of the voice file and on results of the transcription provided by the speech recognition engine selected from the group consisting of the confidence of the recognition engine in each of the tokens recognized and the ability of the recognition engine to generate a recognized text file using guided recognition and further based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file; a means for determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and a means for setting a transcription mode for the speaker, wherein the transcription mode indicates the transcription thereafter used in the transcription system.
-
-
12. A transcription system comprising:
-
a server for accumulating a predetermined number of the speaker'"'"'s voice files and associated text files, wherein the associated text files are transcriptions of the voice files transcribed by a transcriptionist; a speech recognition engine, wherein a voice file is provided to said speech recognition engine and a recognized text file is generated therefrom; a means for determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file, wherein the score is based on the edit distance between the associated text file and the recognized text file and on characteristics of the recognized text file and associated text file selected from the group consisting of the amount of correction required to bring the recognized text file into conformity with the finished text file, the amount of out of order text in the recognized text file, and the amount of formatting in the associated text file, and on results of the transcription provided by the speech recognition engine selected from the group consisting of the confidence of the recognition engine in each of the tokens recognized and the ability of the recognition engine to generate a recognized text file using guided recognition, and further based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file; a means for determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and a means for setting a transcription mode for the speaker, wherein the transcription mode indicates the transcription thereafter used in the transcription system.
-
-
13. A transcription system comprising:
-
a server for accumulating a predetermined number of the speaker'"'"'s voice files and associated text files, wherein the associated text files are transcriptions of the voice files transcribed by a transcriptionist; a speech recognition engine, wherein a voice file is provided to said speech recognition engine and a recognized text file is generated therefrom; a means for determining a score based on the recognized text file generated from the voice file and the associated text file of the voice file, wherein the score is based on the edit distance between the associated text file and the recognized text file and on characteristics of the voice file selected from the group consisting of the number of pauses in the voice file, the amount of silence in the voice file, and the audio quality of the voice file and on characteristics of the recognized text file and associated text file selected from the group consisting of the amount of correction required to bring the recognized text file into conformity with the finished text file, the amount of out of order text in the recognized text file, and the amount of formatting in the associated text file, and on results of the transcription provided by the speech recognition engine selected from the group consisting of the confidence of the recognition engine in each of the tokens recognized and the ability of the recognition engine to generate a recognized text file using guided recognition, and further based on results of transcription automatically measured during transcription by a transcriptionist selected from the group consisting of the time it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file and the number of keystrokes it takes the transcriptionist to edit a recognized text file to bring it into conformity with a finished text file; a means for determining a preferred mode of transcription based on a predetermined number of scores, wherein the predetermined number of scores are scores generated from recognized text files and associated text files corresponding to voice files generated by the speaker; and a means for setting a transcription mode for the speaker, wherein the transcription mode indicates the transcription thereafter used in the transcription system.
-
Specification