Symbiotic automatic speech recognition and vocoder
First Claim
1. A method for automatic speech recognition (ASR) and vocoding (VC), comprising the steps of:
- converting a first signal representing speech to a second signal having raw mel capstrum vector (MCV) and a third signal having raw pitch;
subtracting a calibration vector from said MCV to form a difference vector;
multiplying a calibration matrix with said difference vector to produce a recalibrated MCV;
recalibrating said raw pitch with a logarithmic function;
concatenating said recalibrated MCV with said recalibrated pitch to form a recalibrated vector;
compressing and quantizing said recalibrated vector to form a vector quantized signal; and
forwarding said vector quantized signal to a remote receiver for decoding said vector quantized signal received by the remote receiver to recover said speech.
1 Assignment
0 Petitions
Accused Products
Abstract
The device and method of the invention receives a digital speech signal, which is processed by an Acoustic Processor to produce a Mel-Cepstrum Vector and Pitch. This is recalibrated and encoded. The encoded signal is transmitted over a narrow-band Channel, then decoded, split and recalibrated. From the split signals, one signal feeds a Statistical Processor which produces Recognized Text. Another signal feeds a Regenerator, which produces Regenerated Speech. The device and method according to the invention achieve simultaneously very perceptive Automatic Speech Recognition and high quality VoCoding, using Speech communicated or stored via a Channel with narrow-bandwidth; very perceptive Automatic Speech Recognition on a Client & Server system without a need to store or to communicate wide-bandwidth Speech signals; very perceptive Automatic Speech Recognition with Deferred Review and Editing without storage of wide-bandwidth Speech signals; better feedback in a system for Automatic Speech Recognition particularly for Deferred Automatic Speech Recognition; and good usability for unified Automatic Speech Recognition and VoCoding.
133 Citations
13 Claims
-
1. A method for automatic speech recognition (ASR) and vocoding (VC), comprising the steps of:
-
converting a first signal representing speech to a second signal having raw mel capstrum vector (MCV) and a third signal having raw pitch; subtracting a calibration vector from said MCV to form a difference vector; multiplying a calibration matrix with said difference vector to produce a recalibrated MCV; recalibrating said raw pitch with a logarithmic function; concatenating said recalibrated MCV with said recalibrated pitch to form a recalibrated vector; compressing and quantizing said recalibrated vector to form a vector quantized signal; and forwarding said vector quantized signal to a remote receiver for decoding said vector quantized signal received by the remote receiver to recover said speech. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of decoding vector quantized data representing speech, comprising the steps of:
-
dequantizing and decompressing said vector quantized data including acoustic data substantially independent of phonemic information into a mel-capstrum vector (MCV), a recalibrated MCV, and pitch; adding said MCV with a calibration vector; statistically processing said sum vector into text; and regenerating said calibration MCV by frequency domain transformation into speech. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A program storage device having stored program instructions executable by a computer to perform method steps for automatic speech recognition (ASR) and vocoding (VC), the method steps comprising:
-
converting a first signal representing speech to a second signal having raw mel capstrum vector (MCV) and a third signal having raw pitch; subtracting a calibration vector from said MCV to form a difference vector; multiplying a calibration matrix with said difference vector to produce a recalibrated MCV; recalibrating said raw pitch with a logorithmic function; concatenating said recalibrated MCV with said recalibrated pitch to form a recalibrated vector; compressing and quantizing said recalibrated vector to form a vector quantized signal; and forwarding said vector quantized signal to a remote receiver for decoding said vector quantized signal received by the remote receiver to recover said speech.
-
-
13. A program storage device having stored program instructions executable by a computer to perform method steps for decoding vector quantized data representing speech, the method comprising the steps of:
-
dequantizing and decompressing said vector quantized data into a mel-capstrum vector (MCV), a recalibrated MCV, and pitch; adding said MCV with a calibration vector; statistically processing said sum vector into text; and regenerating said calibration MCV by frequency domain transformation into speech.
-
Specification