System and method of automated evaluation of transcription quality
First Claim
Patent Images
1. A method of automated evaluation of a transcription quality, the method comprising:
- obtaining audio data;
segmenting the audio data into a plurality of utterances with a voice activity detector operating on a computer processor;
transcribing the plurality of utterances into at least one word lattice with a large vocabulary continuous speech recognition system operating on the processor;
transcribing the plurality of utterances into a word lattice;
creating a confusion network from each word lattice;
applying, with the processor, a minimum Bayes risk decoder to the at least one word lattice to create at least one confusion network representing the at least one word lattice as a plurality of sequential word bins and ε
-bins;
calculating at least one conformity ratio from the at least one confusion network;
calculating a conformity ratio for each confusion network by identifying a probability value of a most probable word arc in each word bin; and
calculating a joint probability for each ε
-bin and a preceding word bin, wherein the conformity ratio is an average of the calculated joint probabilities for the confusion network.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods automatedly evaluate a transcription quality. Audio data is obtained. The audio data is segmented into a plurality of utterances with a voice activity detector operating on a computer processor. The plurality of utterances are transcribed into at least one word lattice with a large vocabulary continuous speech recognition system operating on the processor. A minimum Bayes risk decoder is applied to the at least one word lattice to create at least one confusion network. At least conformity ratio is calculated from the at least one confusion network.
-
Citations
14 Claims
-
1. A method of automated evaluation of a transcription quality, the method comprising:
-
obtaining audio data; segmenting the audio data into a plurality of utterances with a voice activity detector operating on a computer processor; transcribing the plurality of utterances into at least one word lattice with a large vocabulary continuous speech recognition system operating on the processor; transcribing the plurality of utterances into a word lattice; creating a confusion network from each word lattice; applying, with the processor, a minimum Bayes risk decoder to the at least one word lattice to create at least one confusion network representing the at least one word lattice as a plurality of sequential word bins and ε
-bins;calculating at least one conformity ratio from the at least one confusion network; calculating a conformity ratio for each confusion network by identifying a probability value of a most probable word arc in each word bin; and calculating a joint probability for each ε
-bin and a preceding word bin, wherein the conformity ratio is an average of the calculated joint probabilities for the confusion network. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for automated evaluation of transcription quality, the system comprising:
-
an audio data source upon which a plurality of audio data files are stored; a processor that receives the plurality of audio data files, segments the audio data files into a plurality of utterances and applies at least one transcription model to the plurality of utterances to transcribe the plurality of utterances into a word lattice;
transcribes of the plurality of utterances into a word lattice;
creates a confusion network from each word lattice; and
calculates a conformity ratio for each confusion network;a non-transient computer readable medium communicatively connected to the processor and programmed with computer readable code that when executed by the processor causes the processor to; apply a minimum Bayes risk decoder to the at least one word lattice to create at least one confusion network representing the at least one word lattice as a plurality of sequential word bins and ε
-bins;calculate at least one conformity ratio from the at least one confusion network; calculate a transcription quality score from the at least one conformity ratio; identify a probability value of a most probable word arc in each word bin; and calculate a joint probability for each ε
-bin and a preceding word bin;
wherein the conformity ratio is an average of the calculated joint probabilities for the confusion network.
-
-
13. A non-transient computer readable medium programmed with computer readable code that upon execution by as processor causes the processor to:
-
obtain audio data; segment the audio data into a plurality of utterances with a voice activity detector; transcribe the plurality of utterances into at least one word lattice with a large vocabulary continuous speech recognition system, each of the plurality of utterances being transcribed into a word lattice; create a confusion network from each word lattice; calculate a conformity ratio for each confusion network; apply a minimum Bayes risk decoder to the at least one word lattice to create at least one confusion network representing the at least one word lattice as a plurality of sequential word bins and ε
-bins;calculate at least one conformity ratio from the at least one confusion network; calculate a transcription quality score from the at least one conformity ratio; provide an indication of the transcription quality score; identify a probability value of a most probable word arc in each word bin; and calculate a joint probability for each ε
-bin and a preceding word bin;
wherein the conformity ratio is an average of the calculated joint probabilities for the confusion network. - View Dependent Claims (14)
-
Specification