System and method of automated evaluation of transcription quality
First Claim
Patent Images
1. A method of automated evaluation of a transcription quality, the method comprising:
- obtaining audio data;
segmenting the audio data into a plurality of utterances with a voice activity detector operating on a computer processor;
transcribing the plurality of utterances into at least one word lattice with a large vocabulary continuous speech recognition system operating on the processor, wherein each of the plurality of utterances is transcribed by the processor into a respective word lattice;
applying, by the processor, a minimum Bayes risk decoder to the at least one word lattice to create at least one confusion network representing the at least one word lattice as a plurality sequential word bins and ε
-bins, wherein the processor creates a confusion network for each word lattice; and
calculating, with the processor, at least one conformity ratio from the least one confusion network, wherein the processor calculates a conformity ratio for each confusion network;
calculating, with the processor, a transcription quality score from the at least one conformity ratio;
filtering, by the processor, the plurality confusion networks based upon the calculated transcription plurality score of each confusion network;
selecting, by the processor, those confusion networks from the plurality of confusion networks having a transcription quality score greater than a predetermined value; and
storing, by the processor, the selected confusion networks as a plurality of high quality transcriptions.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods automatedly evaluate a transcription quality. Audio data is obtained. The audio data is segmented into a plurality of utterances with a voice activity detector operating on a computer processor. The plurality of utterances are transcribed into at least one word lattice with a large vocabulary continuous speech recognition system operating on the processor. A minimum Bayes risk decoder is applied to the at least one word lattice to create at least one confusion network. At least conformity ratio is calculated from the at least one confusion network.
26 Citations
14 Claims
-
1. A method of automated evaluation of a transcription quality, the method comprising:
-
obtaining audio data; segmenting the audio data into a plurality of utterances with a voice activity detector operating on a computer processor; transcribing the plurality of utterances into at least one word lattice with a large vocabulary continuous speech recognition system operating on the processor, wherein each of the plurality of utterances is transcribed by the processor into a respective word lattice; applying, by the processor, a minimum Bayes risk decoder to the at least one word lattice to create at least one confusion network representing the at least one word lattice as a plurality sequential word bins and ε
-bins, wherein the processor creates a confusion network for each word lattice; andcalculating, with the processor, at least one conformity ratio from the least one confusion network, wherein the processor calculates a conformity ratio for each confusion network; calculating, with the processor, a transcription quality score from the at least one conformity ratio; filtering, by the processor, the plurality confusion networks based upon the calculated transcription plurality score of each confusion network; selecting, by the processor, those confusion networks from the plurality of confusion networks having a transcription quality score greater than a predetermined value; and storing, by the processor, the selected confusion networks as a plurality of high quality transcriptions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for automated evaluation of transcription quality, the system comprising:
-
an audio data source upon which a plurality of audio data files are stored; a processor that receives the plurality of audio data files, segments the audio data files into a plurality of utterances and applies at least one transcription model to the plurality of utterances to transcribe the plurality of utterances into at least one word lattice, wherein each of the plurality of utterances are transcribed into a respective word lattice; a non-transient computer readable medium communicatively connected to the processor and programmed with computer readable code that when executed by the processor causes the processor to; apply a minimum Bayes risk decoder to the at least one word lattice to create at least one confusion network representing the at least one word lattice as a plurality of sequential word bins and ε
-bins, wherein a confusion network is created for each word lattice;calculate at least one conformity ratio from the at least one confusion network, wherein a conformity ratio is calculated for each confusion network; calculate a transcription quality score from the at least one conformity ratio filter the plurality confusion networks based upon the calculated transcription quality score of each confusion network; select those confusion networks from the plurality of confusion networks having a transcription quality score greater than a predetermined value; and store the selected confusion networks as a plurality of high quality transcriptions.
-
-
12. A non-transient computer readable medium programmed with computer readable code that upon execution by as processor causes the processor to:
-
obtain audio data; segment the audio data into a plurality of utterances with a voice activity detector; transcribe the plurality of utterances into at least one word lattice with a large vocabulary continuous speech recognition system, each of the plurality of utterances being transcribed into a respective word lattice; apply a minimum Bayes risk decoder to the at least one word lattice to create at least one confusion network representing the at least one word lattice as a plurality of sequential word bins and ε
-bins, wherein a confusion network is created for each word lattice;calculate at least one conformity ratio from the at least one confusion network, wherein a conformity ratio is calculated for each confusion network; calculate a transcription quality score from the at least one conformity ratio; filter the plurality confusion networks based upon the calculated transcription quality score of each confusion network; select those confusion networks from the plurality of confusion networks having a transcription quality score greater than a predetermined value; and store the selected confusion networks as a plurality of high quality transcriptions. - View Dependent Claims (13, 14)
-
Specification