Low bit rate speech coding system and compression
First Claim
1. A speech coder apparatus for encoding input speech signals for transmission over a communication channel at bit rates of 100 bits per second or less, comprising:
- transmitting means responsive to an input speech signal for providing a first and a second output signal for transmission, said transmitting means including;
continuous speech recognition means having a first output and a second output, said continuous speech recognition means having a memory for storing templates and means responsive to said stored templates to provide at an output, digital signals indicative of recognized words in said input speech signal as those matching said stored templates with said digital signals providing said first output signal and providing at a second output a word end point signal wherein each of said recognized works in said input speech signal has a value of pitch, duration and amplitude; and
front end processing means having an input and an output, said front end processing means responsive to said input speech signal for providing at said output of said front end processing means, digitized speech samples during a given frame interval including side information encoding means responsive to said digitized speech samples and capable of determining value of pitch, duration and amplitude, said side information encoding means having an input coupled to said second output of said continuous speech recognition means and operably responsive thereto, to provide at an output of said side information encoding means a signal indicative of at least the value of the pitch and duration for each word recognized by said continuous speech recognition means with said output of said side information encoding means providing said second output signal for transmission and wherein said side information encoding means includes means for comparing and determining differences of values of said pitch and duration of each recognized word with values of pitch and duration as stored in a memory associated therewith to provide an output parameter signal indicative of said differences.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech coder apparatus operates to compress speech signals to a low bit rate. The apparatus includes a continuous speech recognizer (CSR) which has a memory for storing templates. Input speech is processed by the CSR where information in the speech is compared against the templates to provide an output digital signal indicative of recognized words, which signal is transmitted along a first path. There is further included a front end processor which is also responsive to the input speech signal for providing output digitized speech samples during a given frame interval. A side information encoder circuit responds to the output from the front end processor to provide at the output of the encoder a parameter signal indicative of the value of the pitch and word duration for each word as recognized by the CSR unit. The output of the encoder is transmitted as a second signal. There is a receiver which includes a synthesizer responsive to the first and second transmitted signals for providing an output synthesized signal for each recognized word where the pitch, duration and amplitude of the synthesized signal is changed according to the parameter signal to preserve the quality of the synthesized speech.
119 Citations
34 Claims
-
1. A speech coder apparatus for encoding input speech signals for transmission over a communication channel at bit rates of 100 bits per second or less, comprising:
-
transmitting means responsive to an input speech signal for providing a first and a second output signal for transmission, said transmitting means including; continuous speech recognition means having a first output and a second output, said continuous speech recognition means having a memory for storing templates and means responsive to said stored templates to provide at an output, digital signals indicative of recognized words in said input speech signal as those matching said stored templates with said digital signals providing said first output signal and providing at a second output a word end point signal wherein each of said recognized works in said input speech signal has a value of pitch, duration and amplitude; and front end processing means having an input and an output, said front end processing means responsive to said input speech signal for providing at said output of said front end processing means, digitized speech samples during a given frame interval including side information encoding means responsive to said digitized speech samples and capable of determining value of pitch, duration and amplitude, said side information encoding means having an input coupled to said second output of said continuous speech recognition means and operably responsive thereto, to provide at an output of said side information encoding means a signal indicative of at least the value of the pitch and duration for each word recognized by said continuous speech recognition means with said output of said side information encoding means providing said second output signal for transmission and wherein said side information encoding means includes means for comparing and determining differences of values of said pitch and duration of each recognized word with values of pitch and duration as stored in a memory associated therewith to provide an output parameter signal indicative of said differences. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method for coding speech signals for providing compression of such speech signals to permit transmission of speech over a communication channel at bit rates of 100 per second or less, comprising the steps of:
-
comparing input speech with word templates stored in a memory to provide a coding indicative of recognized word data samples upon a favorable comparison; transmitting said coding indicative of recognized word data samples over a first path; simultaneously processing said input speech in a processor for each recognized word to provide an output parameter indicative of differences of values of pitch and duration data for each transmitted word with values of pitch and duration as stored in a memory associated therewith; transmitting said output parameters indicative of said differences of values of pitch and duration data over a second path; receiving said transmitted coding indicative of said recognized word data samples and said output parameters indicative of said differences of values of pitch and duration data; synthesizing said received coding indicative of said recognized word data according to words stored in a library memory to provide a replication of said recognized word data; and using said transmitted output parameters indicative of said differences of values of pitch and duration data to change the pitch and duration data of said words as stored in said library memory to provide a synthesized pitch and duration for each word. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A speech coder apparatus for encoding input speech signals for transmission over a communication channel at low bit rates, comprising:
-
transmitting means responsive to an input speech signal for providing a first and a second output signal for transmission, said transmitting means including; continuous speech recognition means having a first output and a second output, said continuous speech recognition means having a memory for storing templates and means responsive to said stored templates to provide, at said first output, digital signals indicative of recognized words in said input speech signal as those matching said stored templates, with said digital signals providing said first output signal, and providing, at said second output, a word end point signal wherein each of said recognized words in said input speech signal has a value of pitch, duration and amplitude; front end processing means having an input and an output, said front end processing means responsive to said input speech signal for providing at said output of said front end processing means, digitized speech samples during a given frame interval including side information encoding means responsive to said digitized speech samples and capable of determining values of pitch, duration and amplitude, said side information encoding means having an input coupled to said second output of said continuous speech recognition means and operably responsive thereto, to provide at an output of said side information encoding means a signal indicative of at least the value of the pitch and duration for each word recognized by said continuous speech recognition means with said output of said side information encoding means includes means for comparing and determining differences of values of said pitch and duration of each recognized word with values of pitch and duration as stored in a memory associated therewith to provide an output parameter signal indicative of said differences; and receiving means responsive to said first and second output signals as transmitted to provide at an output a synthesized speech signal, said receiving means including; a synthesizer means responsive to said first and second output signals and having a pre-recorded word memory coupled to said synthesizer and having stored therein values of the pitch, duration and amplitude of a library of words as those words that can be recognized by said continuous speech recognition means, said synthesizer having means for processing said first and second output signals in conjunction with said values from said pre-recorded word memory to change the pitch, duration and amplitude of received words in said first output signal according to said second output signal, wherein said synthesizer means includes means for converting received speech signals via said first output signal into N sets of M signals with each signal including said parameter signal, wherein N and M are positive integers greater than one, and wherein there are 240 (M) samples for each set of four sets (N) of coded excitation constituting one frame. - View Dependent Claims (29, 30, 31)
-
-
32. A speech coder apparatus for encoding input speech signals for transmission over a communication channel at low bit rates, comprising:
-
transmitting means responsive to an input speech signal for providing a first and a second output signal for transmission, said transmitting means including; continuous speech recognition means having a first output and a second output, said continuous speech recognition means having a memory of storing templates and means responsive to said stored templates to provide, at said first output, digital signals indicative of recognized words in said input speech signal as those matching said stored templates, with said digital signals providing said first output signal, and providing, at said second output, a word end point signal wherein each of said recognized words in said input speech signal has a value of pitch, duration and amplitude; front end processing means having an input and an output, said front end processing means responsive to said input speech signal for providing at said output of said front end processing means, digitized speech samples during a given frame interval including side information encoding means responsive to said digitized speech samples and capable of determining values of pitch, duration and amplitude, said side information encoding means having an input coupled to said second output of said continuous speech recognition means and operably responsive thereto, to provide at an output of said side information encoding means a signal indicative of at least the value of the pitch and duration for each word recognized by said continuous speech recognition means with said output of said side information encoding means providing said second output signal for transmission and wherein said side information encoding means includes means for comparing and determining differences of values of said pitch and duration of each recognized word with values of pitch and duration as stored in a memory associated therewith to provide an output parameter signal indicative of said differences; receiving means responsive to said first and second output signals as transmitted to provide at an output a synthesized speech signal, said receiving means including; a synthesizer means responsive to said first and second output signals and having a pre-recorded word memory coupled to said synthesizer and having stored therein values of the pitch, duration and amplitude of a library of words as those words that can be recognized by said continuous speech recognition means, said synthesizer having means for processing said first and second output signals in conjunction with said values from said pre-recorded word memory to change the pitch, duration and amplitude of received words in said first output signal according to said second output signal, wherein said synthesizer means includes means for converting received speech signals via said first output signal into N sets of M signals with each signal including said parameter signal, wherein N and M are positive integers greater than one; and means for determining a long-term delay for a frame, , and duration changing means, said duration changing means responsive to said second output signal and responsive to at least one set of N sets of M signals to add or delete to said M signals, multiple sets of samples, each set of samples containing a number of samples which is the same as the number of the long term delay, for the frame to increase or decrease the duration of a word. - View Dependent Claims (33, 34)
-
Specification