Quality estimation of hybrid transcription of audio

US 10,614,809 B1
Filed: 10/07/2019
Issued: 04/07/2020
Est. Priority Date: 09/06/2019
Status: Active Grant

First Claim

Patent Images

1. A system configured to estimate quality of hybrid transcription of audio, comprising:

a computer configured to;

receive a segment of an audio recording comprising speech of a person;

generate a transcription of the segment utilizing an automatic speech recognition (ASR) system;

receive properties of a review of the transcription, by a transcriber, which produced a reviewed transcription;

wherein the properties are indicative of at least one of the following;

an extent of corrections made by the transcriber to the transcription during the review, and a duration of the review;

generate feature values based on data comprising the properties; and

utilize a model to calculate, based on the feature values, a value indicative of an expected accuracy of the reviewed transcription.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Hybrid transcription of audio relies on having one or more layers of transcribers who review transcriptions generated by automatic speech recognition (ASR) systems in order to correct errors that are found in the transcriptions. When it comes to determining how much human reviewing is needed, such as determining how many layers of review to use, there is a cost/benefit tradeoff to consider. Some embodiments described herein utilize a machine learning-based approach for estimating quality of hybrid transcription of audio. In one embodiment, a computer generates a transcription of a segment of audio using an ASR system, which is subsequently reviewed by a transcriber. The computer then calculates, based on properties of the review by the transcriber, a value indicative of an expected accuracy of the reviewed transcription. The computer may suggest a second transcriber review the reviewed transcription if the value indicative of the expected accuracy is below a threshold.

70 Citations

View as Search Results

20 Claims

1. A system configured to estimate quality of hybrid transcription of audio, comprising:
- a computer configured to;
  
  receive a segment of an audio recording comprising speech of a person;
  
  generate a transcription of the segment utilizing an automatic speech recognition (ASR) system;
  
  receive properties of a review of the transcription, by a transcriber, which produced a reviewed transcription;
  
  wherein the properties are indicative of at least one of the following;
  
  an extent of corrections made by the transcriber to the transcription during the review, and a duration of the review;
  
  generate feature values based on data comprising the properties; and
  
  utilize a model to calculate, based on the feature values, a value indicative of an expected accuracy of the reviewed transcription.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The system of claim 1, wherein the value indicative of the expected accuracy of the reviewed transcription is indicative of an expected word error rate (WER) in the reviewed transcription.
  - 3. The system of claim 1, wherein the computer is further configured to suggest a second transcriber review the reviewed transcription responsive to the value indicative of the expected accuracy being below a threshold.
  - 4. The system of claim 1, wherein the model is generated based on data corresponding to multiple transcribers, which comprises:
    - properties of reviews of transcriptions of segments of audio by the multiple transcribers, and values indicative of accuracies of reviewed transcriptions resulting from the reviews.
  - 5. The system of claim 4, wherein the computer is further configured to receive an indication of an experience level of the transcriber and to generate a feature value, from among the feature values, based on the indication;
    - and wherein the model is generated based on indications of the experience levels of the transcribers.
  - 6. The system of claim 1, wherein the model is generated based on:
    - properties of reviews of the transcriber of transcriptions of previously recorded segments of audio, and values indicative of accuracies of reviewed transcriptions resulting from the reviews.
  - 7. The system of claim 1, wherein the computer is further configured to:
    - receive additional properties of the review comprising at least one of;
      
      an indication of a speed at which the audio was listened to by the transcriber during the review, and an attention level of the transcriber during the review; and
      
      generate at least one of the feature values based on the additional properties.
  - 8. The system of claim 1, wherein the computer is further configured to receive an indication of an accent spoken by the person in the segment, and to generate at least one of the feature values based on the indication.
  - 9. The system of claim 1, wherein the computer is further configured to receive an indication of a topic of speech in the segment, and to generate at least one of the feature values based on the indication.
  - 10. The system of claim 1, wherein the computer is further configured to receive an indication of audio quality of segment, and to generate at least one of the feature values based on the indication.
  - 11. The system of claim 1, wherein the computer is further configured to calculate a value indicative of intelligibility of speech in the segment based on a lattice constructed by the ASR system, and to generate at least one of the feature values based on the value indicative of intelligibility.

12. A method for estimating quality of hybrid transcription of audio, comprising:
- receiving a segment of an audio recording comprising speech of a person;
  
  generating a transcription of the segment utilizing an automatic speech recognition (ASR) system;
  
  receiving properties of a review of the transcription, by a transcriber, which produced a reviewed transcription;
  
  wherein the properties are indicative of at least one of the following;
  
  an extent of corrections made by the transcriber to the transcription during the review, and a duration of the review;
  
  generating feature values based on data comprising the properties; and
  
  utilizing a model to calculate, based on the feature values, a value indicative of an expected accuracy of the reviewed transcription.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
- - 13. The method of claim 12, wherein the value indicative of the expected accuracy of the reviewed transcription is indicative of an expected word error rate (WER) in the reviewed transcription.
  - 14. The method of claim 12, further comprising suggesting a second transcriber review the reviewed transcription responsive to the value indicative of the expected accuracy being below a threshold.
  - 15. The method of claim 12, further comprising generating the model based on data corresponding to multiple transcribers, which comprises:
    - properties of reviews of transcriptions of segments of audio by the multiple transcribers, and values indicative of reviewed transcriptions resulting from the reviews.
  - 16. The method of claim 12, further comprising generating the model based on:
    - properties of reviews of the transcriber of transcriptions of previously recorded segments of audio, and values indicative of accuracies of reviewed transcriptions resulting from the reviews.
  - 17. The method of claim 12, further comprising:
    - receiving additional properties of the review comprising at least one of an indication of a speed at which the audio was listened to by the transcriber during the review and an attention level of the transcriber during the review, and generating at least one of the feature values based on the additional properties.
  - 18. The method of claim 12, further comprising:
    - receiving an indication of an accent spoken by the person in the segment, and generating at least one of the feature values based on the indication.
  - 19. The method of claim 12, further comprising:
    - receiving an indication of audio quality of segment, and generating at least one of the feature values based on the indication.

20. A non-transitory computer-readable medium having instructions stored thereon that, in response to execution by a system including a processor and memory, causes the system to perform operations comprising:
- receiving a segment of an audio recording comprising speech of a person;
  
  generating a transcription of the segment utilizing an automatic speech recognition (ASR) system;
  
  receiving properties of a review of the transcription, by a transcriber, which produced a reviewed transcription;
  
  wherein the properties are indicative of at least one of the following;
  
  an extent of corrections made by the transcriber to the transcription during the review, and a duration of the review;
  
  generating feature values based on data comprising the properties; and
  
  utilizing a model to calculate, based on the feature values, a value indicative of an expected accuracy of the reviewed transcription.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verbit Software Ltd.
Original Assignee
Verbit Software Ltd.
Inventors
Shellef, Eric Ariel, Ben Tsvi, Yaakov Kobi, Getz, Iris, Livne, Tom, Rosensweig, Elisha Yehuda
Primary Examiner(s)
Leland, III, Edwin S

Application Number

US16/595,298
Time in Patent Office

183 Days
Field of Search

704235
US Class Current
CPC Class Codes

G06F 3/0484   for the control of specific...

G06F 40/20   Natural language analysis s...

G06F 40/30   Semantic analysis

G10L 15/01   Assessment or evaluation of...

G10L 15/02   Feature extraction for spee...

G10L 15/04   Segmentation; Word boundary...

G10L 15/063   Training

G10L 15/08   Speech classification or se...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/183   using context dependencies,...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/20   Speech recognition techniqu...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 2015/0631   Creating reference template...

G10L 2015/0635   updating or merging of old ...

G10L 2015/0638   Interactive procedures

G10L 2015/223   Execution procedure of a sp...

G10L 25/60 : for measuring the quality o...

H04R 1/406 : microphones

H04R 3/005 : for combining the signals o...

H04R 5/027 : Spatial or constructional a...

View All

Quality estimation of hybrid transcription of audio

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

70 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Quality estimation of hybrid transcription of audio

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

70 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links