Machine learning-based prediction of transcriber performance on a segment of audio

US 10,607,611 B1
Filed: 10/07/2019
Issued: 03/31/2020
Est. Priority Date: 09/06/2019
Status: Active Grant

First Claim

Patent Images

1. A system configured to calculate an expected accuracy of a transcription by a certain transcriber, comprising:

a computer configured to;

receive a segment of an audio recording, which comprises speech of a person;

identify, based on the segment, an accent of the person;

identify, based on a transcription of the segment generated using an automatic speech recognition (ASR) system, a topic of the segment;

generate feature values based on data comprising an indication of the accent and an indication of the topic; and

utilize a model to calculate, based on the feature values, a value indicative of an expected accuracy of a transcription of the segment by the certain transcriber;

wherein the model is generated based on training data comprising feature values generated based on segments of previous audio recordings, and values of accuracies of transcriptions, by the certain transcriber, of the segments.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

When transcribing large audio files, such as in the case of legal depositions, there are often many transcribers to choose from. Embodiments described herein enable calculation of expected accuracy of transcriptions by transcribers, which can be used to guide the selection of transcribers for specific tasks. In one embodiment, a computer receives a segment of an audio recording that includes speech of a person, and identifies an accent of the person and a topic of the segment. The computer generates feature values based on data that includes the accent and the topic, and utilizes a model to calculate, based on the feature values, an expected accuracy of a transcription of the segment by a certain transcriber. The model is generated based on training data that includes segments of previous audio recordings and values of accuracies of transcriptions, by the certain transcriber, of the segments.

65 Citations

View as Search Results

20 Claims

1. A system configured to calculate an expected accuracy of a transcription by a certain transcriber, comprising:
- a computer configured to;
  
  receive a segment of an audio recording, which comprises speech of a person;
  
  identify, based on the segment, an accent of the person;
  
  identify, based on a transcription of the segment generated using an automatic speech recognition (ASR) system, a topic of the segment;
  
  generate feature values based on data comprising an indication of the accent and an indication of the topic; and
  
  utilize a model to calculate, based on the feature values, a value indicative of an expected accuracy of a transcription of the segment by the certain transcriber;
  
  wherein the model is generated based on training data comprising feature values generated based on segments of previous audio recordings, and values of accuracies of transcriptions, by the certain transcriber, of the segments.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The system of claim 1, wherein the value indicative of the expected accuracy is indicative of an expected word error rate (WER) for the transcription of the segment were it transcribed by the certain transcriber.
  - 3. The system of claim 1, wherein the computer is further configured to utilize additional models to calculate additional values indicative of expected accuracies of transcriptions of the segment by respective additional transcribers, and to select the certain transcriber to transcribe the segment based on the value being greater than most of the additional values.
  - 4. The system of claim 3, wherein each of the additional models corresponds to a specific transcriber from among the additional transcribers, and is generated based on segments of additional previous audio recordings, and additional values of accuracies of transcriptions by the specific transcriber of the segments of the additional previous audio recordings.
  - 5. The system of claim 1, wherein the feature values comprise a feature value indicative of one or more of the following:
    - a duration of the segment, and a number of speakers in the segment.
  - 6. The system of claim 1, wherein the segment belongs to a set of segments comprising speech of the person, the data utilized to generate the feature values further comprises information related to other segments in the set, and the feature values comprise a feature value indicative of one or more of the following:
    - a number of segments that preceded the segment, a duration of the segments that preceded the segment, a number of the segments already transcribed by the certain transcriber, and a duration of the segments already transcribed by the certain transcriber.
  - 7. The system of claim 1, wherein the data utilized to generate the feature values further comprises data related to recent transcription activity of the certain transcriber during that day, and the feature values comprise a feature value indicative of one or more of the following:
    - a number of hours the certain transcriber has been working that day, a number of different speakers the certain transcriber has been transcribing.
  - 8. The system of claim 1, wherein one or more of the feature values are indicative of a signal-to-noise ratio of the audio in the segment.
  - 9. The system of claim 1, wherein at least one of the feature values is generated by utilizing natural language understanding (NLU) to calculate a value indicative of intelligibility of a transcription of the segment generated utilizing the ASR system.
  - 10. The system of claim 1, wherein the computer is further configured to utilize a classifier to identify the accent, and responsive to confidence in an identification of the accent using the classifier being below a threshold, the computer provides the segment to another transcriber to listen to, and the computer receives an identification of the accent from the other transcriber.
  - 11. The system of claim 1, wherein the segments of the previous audio recordings comprise recordings of a plurality of speakers speaking in different accents.

12. A method for calculating an expected accuracy of a transcription by a certain transcriber, comprising:
- receiving a segment of an audio recording, which comprises speech of a person;
  
  identifying, based on the segment, an accent of the person;
  
  identifying, based on a transcription of the segment generated using an automatic speech recognition (ASR) system, a topic of the segment;
  
  generating feature values based on data comprising an indication of the accent and an indication of the topic; and
  
  utilizing a model to calculate, based on the feature values, a value indicative of an expected accuracy of a transcription of the segment by the certain transcriber;
  
  wherein the model is generated based on training data comprising feature values generated based on segments of previous audio recordings, and values of accuracies of transcriptions, by the certain transcriber, of the segments.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
- - 13. The method of claim 12, wherein the value indicative of the expected accuracy is indicative of an expected word error rate (WER) for the transcription of the segment were it transcribed by the certain transcriber.
  - 14. The method of claim 12, further comprising:
    - calculating additional values indicative of expected accuracies of transcriptions of the segment by respective additional transcribers, and selecting the certain transcriber to transcribe the segment based on the value being greater than most of the additional values.
  - 15. The method of claim 12, further comprising generating a feature value, from among the feature values, which is indicative of one or more of the following:
    - a duration of the segment, and a number of speakers in the segment.
  - 16. The method of claim 12, wherein the segment belongs to a set of segments comprising speech of the person, the data utilized to generate the feature values further comprises information related to other segments in the set, and further comprising generating a feature value, from among the feature values, which is indicative of one or more of the following:
    - a number of segments that preceded the segment, a duration of the segments that preceded the segment, a number of the segments already transcribed by the certain transcriber, and a duration of the segments already transcribed by the certain transcriber.
  - 17. The method of claim 12, wherein the data utilized to generate the feature values further comprises data related to recent transcription activity of the certain transcriber during that day, and further comprising generating a feature value, from among the feature values, which is indicative of one or more of the following:
    - a number of hours the certain transcriber has been working that day, a number of different speakers the certain transcriber has been transcribing.
  - 18. The method of claim 12, further comprising generating a feature value, from among the feature values, by utilizing natural language understanding (NLU) to calculate a value indicative of intelligibility of a transcription of the segment generated utilizing the ASR system.
  - 19. The method of claim 12, further comprising:
    - utilizing a classifier to identify the accent, and responsive to confidence in an identification of the accent using the classifier being below a threshold, providing the segment to another transcriber to listen to and provide an identification of the accent.

20. A non-transitory computer-readable medium having instructions stored thereon that, in response to execution by a system including a processor and memory, causes the system to perform operations comprising:
- receiving a segment of an audio recording, which comprises speech of a person;
  
  identifying, based on the segment, an accent of the person;
  
  identifying, based on a transcription of the segment generated using an automatic speech recognition (ASR) system, a topic of the segment;
  
  generating feature values based on data comprising an indication of the accent and an indication of the topic; and
  
  utilizing a model to calculate, based on the feature values, a value indicative of an expected accuracy of a transcription of the segment by a certain transcriber;
  
  wherein the model is generated based on training data comprising feature values generated based on segments of previous audio recordings, and values of accuracies of transcriptions, by the certain transcriber, of the segments.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verbit Software Ltd.
Original Assignee
Verbit Software Ltd.
Inventors
Shellef, Eric Ariel, Ben Tsvi, Yaakov Kobi, Getz, Iris, Livne, Tom, Rosensweig, Elisha Yehuda
Primary Examiner(s)
Han, Qi

Application Number

US16/595,279
Time in Patent Office

176 Days
Field of Search

704235, 704231, 704236, 704246, 704250, 704255, 704257, 704270, 704276
US Class Current
CPC Class Codes

G06F 3/0484   for the control of specific...

G06F 40/20   Natural language analysis s...

G06F 40/30   Semantic analysis

G10L 15/01   Assessment or evaluation of...

G10L 15/02   Feature extraction for spee...

G10L 15/04   Segmentation; Word boundary...

G10L 15/063   Training

G10L 15/08   Speech classification or se...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/183   using context dependencies,...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/20   Speech recognition techniqu...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 2015/0631   Creating reference template...

G10L 2015/0635   updating or merging of old ...

G10L 2015/0638   Interactive procedures

G10L 2015/223   Execution procedure of a sp...

G10L 25/60 : for measuring the quality o...

H04R 1/406 : microphones

H04R 3/005 : for combining the signals o...

H04R 5/027 : Spatial or constructional a...

View All

Machine learning-based prediction of transcriber performance on a segment of audio

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

65 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Machine learning-based prediction of transcriber performance on a segment of audio

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

65 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links