System and method of automated evaluation of transcription quality

US 9,368,106 B2
Filed: 06/30/2014
Issued: 06/14/2016
Est. Priority Date: 07/30/2013
Status: Active Grant

First Claim

Patent Images

1. A method of automated evaluation of a transcription quality, the method comprising:

obtaining audio data;

segmenting the audio data into a plurality of utterances with a voice activity detector operating on a computer processor;

transcribing the plurality of utterances into at least one word lattice with a large vocabulary continuous speech recognition system operating on the processor;

transcribing the plurality of utterances into a word lattice;

creating a confusion network from each word lattice;

applying, with the processor, a minimum Bayes risk decoder to the at least one word lattice to create at least one confusion network representing the at least one word lattice as a plurality of sequential word bins and ε

-bins;

calculating at least one conformity ratio from the at least one confusion network;

calculating a conformity ratio for each confusion network by identifying a probability value of a most probable word arc in each word bin; and

calculating a joint probability for each ε

-bin and a preceding word bin, wherein the conformity ratio is an average of the calculated joint probabilities for the confusion network.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods automatedly evaluate a transcription quality. Audio data is obtained. The audio data is segmented into a plurality of utterances with a voice activity detector operating on a computer processor. The plurality of utterances are transcribed into at least one word lattice with a large vocabulary continuous speech recognition system operating on the processor. A minimum Bayes risk decoder is applied to the at least one word lattice to create at least one confusion network. At least conformity ratio is calculated from the at least one confusion network.

Citations

14 Claims

1. A method of automated evaluation of a transcription quality, the method comprising:
- obtaining audio data;
  
  segmenting the audio data into a plurality of utterances with a voice activity detector operating on a computer processor;
  
  transcribing the plurality of utterances into at least one word lattice with a large vocabulary continuous speech recognition system operating on the processor;
  
  transcribing the plurality of utterances into a word lattice;
  
  creating a confusion network from each word lattice;
  
  applying, with the processor, a minimum Bayes risk decoder to the at least one word lattice to create at least one confusion network representing the at least one word lattice as a plurality of sequential word bins and ε
  
  -bins;
  
  calculating at least one conformity ratio from the at least one confusion network;
  
  calculating a conformity ratio for each confusion network by identifying a probability value of a most probable word arc in each word bin; and
  
  calculating a joint probability for each ε
  
  -bin and a preceding word bin, wherein the conformity ratio is an average of the calculated joint probabilities for the confusion network.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein the audio data is streaming audio data.
  - 3. The method of claim 1, further comprising calculating a transcription quality score from the at least one conformity ratio.
  - 4. The method of claim 1, wherein the transcription quality score is a normalized value of the conformity ratio.
  - 5. The method of claim 1, further comprising calculating an overall conformity ratio for a transcription of the audio data from the conformity ratios calculated from the confusion network of each of the utterances in the plurality of utterances.
  - 6. The method of claim 1, further comprising:
    - filtering the plurality confusion networks based upon the calculated transcription quality score for each confusion network;
      
      selecting those confusion networks from the plurality of confusion networks having a transcription quality score greater than a predetermined value;
      
      storing the selected confusion networks as a plurality of high quality transcriptions.
  - 7. The method of claim 6, further comprising creating a transcription model by analyzing the plurality of high quality transcriptions.
  - 8. The method of claim 6, further comprising:
    - obtaining the utterances associated with each of the confusion networks in the plurality of high quality transcriptions; and
      
      creating a transcription model based upon the obtained utterances.
  - 9. The method of claim 1, further comprising producing an indication of the of the transcription quality score.
  - 10. The method of claim 1, wherein transcribing the plurality of utterances comprises applying at least one transcription model to each of the plurality of utterances and wherein the at least one conformity ratio is indicative of a conformity between the audio data and the at least one transcription model.
  - 11. The method of claim 10, further comprising:
    - selecting a new at least one transcription model based upon the at least one conformity ratio; and
      
      transcribing the plurality of utterances by applying the new at least one transcription model to each of the plurality of utterances.

12. A system for automated evaluation of transcription quality, the system comprising:
- an audio data source upon which a plurality of audio data files are stored;
  
  a processor that receives the plurality of audio data files, segments the audio data files into a plurality of utterances and applies at least one transcription model to the plurality of utterances to transcribe the plurality of utterances into a word lattice;
  
  transcribes of the plurality of utterances into a word lattice;
  
  creates a confusion network from each word lattice; and
  
  calculates a conformity ratio for each confusion network;
  
  a non-transient computer readable medium communicatively connected to the processor and programmed with computer readable code that when executed by the processor causes the processor to;
  
  apply a minimum Bayes risk decoder to the at least one word lattice to create at least one confusion network representing the at least one word lattice as a plurality of sequential word bins and ε
  
  -bins;
  
  calculate at least one conformity ratio from the at least one confusion network;
  
  calculate a transcription quality score from the at least one conformity ratio;
  
  identify a probability value of a most probable word arc in each word bin; and
  
  calculate a joint probability for each ε
  
  -bin and a preceding word bin;
  
  wherein the conformity ratio is an average of the calculated joint probabilities for the confusion network.

13. A non-transient computer readable medium programmed with computer readable code that upon execution by as processor causes the processor to:
- obtain audio data;
  
  segment the audio data into a plurality of utterances with a voice activity detector;
  
  transcribe the plurality of utterances into at least one word lattice with a large vocabulary continuous speech recognition system, each of the plurality of utterances being transcribed into a word lattice;
  
  create a confusion network from each word lattice;
  
  calculate a conformity ratio for each confusion network;
  
  apply a minimum Bayes risk decoder to the at least one word lattice to create at least one confusion network representing the at least one word lattice as a plurality of sequential word bins and ε
  
  -bins;
  
  calculate at least one conformity ratio from the at least one confusion network;
  
  calculate a transcription quality score from the at least one conformity ratio;
  
  provide an indication of the transcription quality score;
  
  identify a probability value of a most probable word arc in each word bin; and
  
  calculate a joint probability for each ε
  
  -bin and a preceding word bin;
  
  wherein the conformity ratio is an average of the calculated joint probabilities for the confusion network.
- View Dependent Claims (14)
- - 14. The non-transient computer readable medium of claim 13, wherein at least one transcription model is applied to the plurality of utterances to transcribe the plurality of utterances and wherein the at least one conformity ratio is indicative of a conformity between the audio data and the at least one transcription model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verint Systems Incorporated
Original Assignee
Verint Systems Limited (Verint Systems Incorporated)
Inventors
Sidi, Oana, Wein, Ron
Primary Examiner(s)
Singh, Satwant

Application Number

US14/319,853
Publication Number

US 20150039306A1
Time in Patent Office

715 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G10L 15/01   Assessment or evaluation of...

G10L 15/04   Segmentation; Word boundary...

G10L 15/12   using dynamic programming t...

G10L 15/26   Speech to text systems G10L...

System and method of automated evaluation of transcription quality

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

System and method of automated evaluation of transcription quality

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links