Continuous speech voice transcription
First Claim
1. A continuous speech voice transcription method comprising the steps of:
- recording historical text spoken by an individual relating to a subject;
generating a sound dictionary that comprises sounds contained in the recorded historical text;
creating a sound alphabet by causing an individual to speak predetermined words containing a predefined set of sounds;
speaking text relating to the subject that is to be transcribed;
comparing sequences of sounds found in the sound alphabet that correspond to sequences of sounds found in the spoken text to sequences of sounds contained in the sound dictionary;
when a match between the compared sequences of sounds occurs, locating text that is associated with the sequence of sounds contained in the sound dictionary; and
outputting the associated text from the recorded historical text as a transcription.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus that provide for automatic speech transcription that may be used transcribe structured reports. The method and apparatus provide for speaker dependent, continuous speech recognition using a limited vocabulary. Transcription is based on recognition of a vocabulary of sounds followed by a translation to text. The translation to text matches spoken sounds to sounds contained in similar recorded text sequences. Training involves speaking a set of words with the desired sounds imbedded therein. The method and apparatus recognize phrases, not words, and works well in a radiology or similar application because of the use of a very limited vocabulary to generate reports.
-
Citations
6 Claims
-
1. A continuous speech voice transcription method comprising the steps of:
-
recording historical text spoken by an individual relating to a subject;
generating a sound dictionary that comprises sounds contained in the recorded historical text;
creating a sound alphabet by causing an individual to speak predetermined words containing a predefined set of sounds;
speaking text relating to the subject that is to be transcribed;
comparing sequences of sounds found in the sound alphabet that correspond to sequences of sounds found in the spoken text to sequences of sounds contained in the sound dictionary;
when a match between the compared sequences of sounds occurs, locating text that is associated with the sequence of sounds contained in the sound dictionary; and
outputting the associated text from the recorded historical text as a transcription. - View Dependent Claims (2, 6)
collecting sounds contained in the sound alphabet spoken by the individual;
generating a Cepstral template for each sound;
collecting Cepstral coefficients of the template;
averaging the Cepstral coefficients; and
normalizing the Cepstral coefficients by subtracting the average value from the collected Cepstral coefficients.
-
-
6. The method of claim 1 wherein the step of comparing sequences of sounds comprises the steps of:
-
comparing a first sound contained in a spoken sound sequence with a first sound contained in a text generated sound sequence;
when the comparison generates a match, stepping to the next sound in each sequence and comparing the next sounds of each sequence;
when the comparison does not generate a match, comparing the nonmatching sound of one sequence with the next sound of the other sequence and comparing the nonmatching sound of the other sequence with the next sound of the one sequence until a match is found;
when a match is found, stepping to the next sound in each sequence; and
repeating the last three steps until the end of either sequence is reached to determine if sounds of one sequence are imbedded in the other sound sequence.
-
-
3. Continuous speech voice transcription apparatus comprising:
-
a microphone;
a sound processor comprising;
a memory for storing historical text spoken by an individual relating to a subject;
a sound dictionary that comprises sounds contained in the recorded historical text;
a sound alphabet translator formed by causing an individual to speak predetermined words containing a predefined set of sounds; and
a comparator for comparing sequences of sounds found in the sound alphabet that correspond to sequences of sounds found in the spoken text to sequences of sounds contained in the sound dictionary, and for locating text that is associated with the sequence of sounds contained in the sound dictionary when a match between the compared sequences of sounds occurs; and
apparatus for outputting the associated text from the recorded historical text as a transcription.
-
-
4. A method of translating sounds a spoken into a microphone by an individual into sound alphabet comprising the steps of:
-
converting sound spoken the microphone into electrical signals;
processing electrical signals corresponding to short segments of sound using a fast Fourier transform to generate energy as a function of frequency for each of the short segments of sound;
processing the energy produced by the fast Fourier transform to generate Cepstral coefficients representative of each of the short segments of sound; and
finding sounds in a training set of sounds spoken by the individual that have a set of Cepstral coefficients most like those of the Cepstral coefficients contained in the spoken sounds to produce the sound alphabet.
-
-
5. A method of creating a sound alphabet comprising the steps of:
-
Fourier transform processing sounds spoken by an individual to produce energy as a function of frequency for the sounds;
processing the energy produced by Fourier transform processing to generate Cepstral coefficients representative of the sounds;
comparing Cepstral coefficients of sounds from a training set sounds spoken by the individual to Cepstral coefficients contained in each of the spoken sounds to produce the sound alphabet;
deleting short sounds from the sound alphabet except for plosives; and
deleting repetitions of the sounds in the sound alphabet after a predetermined number of repetitions.
-
Specification