Quality estimation of hybrid transcription of audio
First Claim
1. A system configured to estimate quality of hybrid transcription of audio, comprising:
- a computer configured to;
receive a segment of an audio recording comprising speech of a person;
generate a transcription of the segment utilizing an automatic speech recognition (ASR) system;
receive properties of a review of the transcription, by a transcriber, which produced a reviewed transcription;
wherein the properties are indicative of at least one of the following;
an extent of corrections made by the transcriber to the transcription during the review, and a duration of the review;
generate feature values based on data comprising the properties; and
utilize a model to calculate, based on the feature values, a value indicative of an expected accuracy of the reviewed transcription.
3 Assignments
0 Petitions
Accused Products
Abstract
Hybrid transcription of audio relies on having one or more layers of transcribers who review transcriptions generated by automatic speech recognition (ASR) systems in order to correct errors that are found in the transcriptions. When it comes to determining how much human reviewing is needed, such as determining how many layers of review to use, there is a cost/benefit tradeoff to consider. Some embodiments described herein utilize a machine learning-based approach for estimating quality of hybrid transcription of audio. In one embodiment, a computer generates a transcription of a segment of audio using an ASR system, which is subsequently reviewed by a transcriber. The computer then calculates, based on properties of the review by the transcriber, a value indicative of an expected accuracy of the reviewed transcription. The computer may suggest a second transcriber review the reviewed transcription if the value indicative of the expected accuracy is below a threshold.
70 Citations
20 Claims
-
1. A system configured to estimate quality of hybrid transcription of audio, comprising:
-
a computer configured to; receive a segment of an audio recording comprising speech of a person; generate a transcription of the segment utilizing an automatic speech recognition (ASR) system; receive properties of a review of the transcription, by a transcriber, which produced a reviewed transcription;
wherein the properties are indicative of at least one of the following;
an extent of corrections made by the transcriber to the transcription during the review, and a duration of the review;generate feature values based on data comprising the properties; and utilize a model to calculate, based on the feature values, a value indicative of an expected accuracy of the reviewed transcription. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for estimating quality of hybrid transcription of audio, comprising:
-
receiving a segment of an audio recording comprising speech of a person; generating a transcription of the segment utilizing an automatic speech recognition (ASR) system; receiving properties of a review of the transcription, by a transcriber, which produced a reviewed transcription;
wherein the properties are indicative of at least one of the following;
an extent of corrections made by the transcriber to the transcription during the review, and a duration of the review;generating feature values based on data comprising the properties; and utilizing a model to calculate, based on the feature values, a value indicative of an expected accuracy of the reviewed transcription. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A non-transitory computer-readable medium having instructions stored thereon that, in response to execution by a system including a processor and memory, causes the system to perform operations comprising:
-
receiving a segment of an audio recording comprising speech of a person; generating a transcription of the segment utilizing an automatic speech recognition (ASR) system; receiving properties of a review of the transcription, by a transcriber, which produced a reviewed transcription;
wherein the properties are indicative of at least one of the following;
an extent of corrections made by the transcriber to the transcription during the review, and a duration of the review;generating feature values based on data comprising the properties; and utilizing a model to calculate, based on the feature values, a value indicative of an expected accuracy of the reviewed transcription.
-
Specification