System and method for synthesizing multiplexed speech and text at a receiving terminal
First Claim
1. A speech sound communication system comprising;
- a transmission terminal having text input means, speech sound input means, speech coding means, and multiplexing means;
a remote reception terminal having reception means, separation means, language analysis means, prosody generation means, and synthesizing means, wherein, said text input means inputs uncoded text information;
said speech sound input means inputs speech sound signals;
said speech coding means converts said inputted speech sound signals into a speech code series;
said multiplexing means multiplexes said uncoded text information and said speech code series into a multiplexed signal for transmission to the remote reception terminal;
said reception means receives said multiplexed signal;
said separation means separates said multiplexed signal into uncoded text information and said speech code series;
said language analysis means analyses said uncoded text information so that said text information is converted to phonetic transcription information;
said prosody generation means converts said phonetic transcription information into phonetic transcription with prosody information;
said synthesizing means synthesizes a speech sound by utilizing said phonetic transcription information with prosody information and converts said speech code series into a speech sound using a format that is the same as a format for converting the text information into speech sound.
1 Assignment
0 Petitions
Accused Products
Abstract
The reception terminal receives a code series from the communication path. The separator separates the code series into a speech code series and text information. The speech code series is decoded into a pitch period, a LSP coefficient, and code numerals by the synthesizer to reproduce the speech sound in the CELP system. Also, the text information is converted into pronunciation and accent information by the language analyzer and added to prosody information, such as phoneme time length and pitch pattern by the prosody generator. The LSP coefficient, and code numerals suitable for the phoneme are read from the segment database and the pitch frequency from the prosody information is inputted to the synthesizer and synthesized into speech sound.
-
Citations
16 Claims
-
1. A speech sound communication system comprising;
-
a transmission terminal having text input means, speech sound input means, speech coding means, and multiplexing means;
a remote reception terminal having reception means, separation means, language analysis means, prosody generation means, and synthesizing means, wherein, said text input means inputs uncoded text information;
said speech sound input means inputs speech sound signals;
said speech coding means converts said inputted speech sound signals into a speech code series;
said multiplexing means multiplexes said uncoded text information and said speech code series into a multiplexed signal for transmission to the remote reception terminal;
said reception means receives said multiplexed signal;
said separation means separates said multiplexed signal into uncoded text information and said speech code series;
said language analysis means analyses said uncoded text information so that said text information is converted to phonetic transcription information;
said prosody generation means converts said phonetic transcription information into phonetic transcription with prosody information;
said synthesizing means synthesizes a speech sound by utilizing said phonetic transcription information with prosody information and converts said speech code series into a speech sound using a format that is the same as a format for converting the text information into speech sound. - View Dependent Claims (13, 14, 15)
-
-
2. A speech sound communication system comprising a transmission terminal having text input means, language analysis means, speech sound input means, speech coding means, multiplexing means, and transmission means;
-
a remote reception terminal having reception means, separation means, prosody generation means, and synthesizing means, wherein, said text input means inputs text information;
said language analysis means converts said text information into phonetic transcription information;
said speech sound input means inputs speech sound signals;
said speech coding means converts said inputted speech sound signals into a speech code series;
said multiplexing means multiplexes said phonetic transcription information and said speech code series to generate one code series;
said transmission means transmits said generated one code series;
said reception means receives said generated one code series;
said separation means separates said one code series into said phonetic transcription information and said speech code series;
said prosody generation means converts said phonetic transcription information into phonetic transcription information with prosody information; and
said synthesizing means converts said speech code series into speech sound using a format that is the same as a format for converting said phonetic transcription information into speech sound. - View Dependent Claims (10, 11, 12)
-
-
3. A speech sound communication system comprising a transmission terminal having text input means, language analysis means, prosody generation means, speech input means, speech coding means, multiplexing means, and transmission means;
-
a remote reception terminal having reception means, separation means, segment data memory means, segment read-out means and synthesizing means, wherein, said text input means inputs text information;
said language analysis means converts said text information into phonetic transcription information;
said prosody generation means converts said phonetic transcription information into phonetic transcription information with prosody information;
said speech input means inputs speech sound signals;
said speech coding means converts said speech sound signals into a speech code series by analyzing pitch, voicing source characteristics and vocal tract transmission characteristics of the signal to be coded;
said multiplexing means multiplexes said phonetic transcription information with prosody information and said speech code series to generate one code series;
said transmission means transmits said generated one code series;
said reception means receives said generated one code series;
said separation means separates said one code series into said phonetic transcription information with prosody information and said speech code series;
said segment read-out means reads out segment data from said segment data memory means in accordance with said phonetic transcription information with prosody information;
said synthesizing means synthesizes a speech sound by utilizing said phonetic transcription information with prosody information and said segment data;
said segment data memory means stores voicing source characteristics and vocal tract transmission characteristics information; and
said synthesizing means converts said speech code series into speech sound using a format that is the same as a format for converting said text information into speech sound.
-
-
4. A speech sound communication systems comprising:
-
a transmission terminal having text input means and first transmission means;
a repeater having first reception means, language analysis means and second transmission means; and
a reception terminal having second reception means, prosody generation means, segment data memory means, segment read-out means and synthesizing means;
wherein, said text input means inputs text information, the text information being uncoded;
said first transmission means transmits said uncoded text information to a first communication path;
said first reception means receives said uncoded text information from said first communication path;
said language analysis means converts said uncoded text information into phonetic transcription information;
said second transmission means transmits said phonetic transcription information into a second communication path;
said second reception means receives said phonetic transcription information from said second communication path;
said prosody generation means converts said phonetic transcription information into phonetic transcription information with prosody information;
said segment read-out means reads out segment data from said segment data memory means in accordance with said phonetic transcription information with prosody information;
said synthesizing means synthesizes speech sounds by utilizing said phonetic transcription information with prosody information and said segment data;
said segment data memory means stores voicing source characteristics and vocal tract transmission characteristics information; and
said synthesizing means synthesizes speech sounds by generating a voicing source wave form having a period in accordance with said prosody information and having characteristics in accordance with said sound characteristics and by filter processing said voicing source wave form in accordance with said vocal tract transmission characteristics information. - View Dependent Claims (5)
said transmission terminal has speech sound input means, speech coding means and first multiplexing means;
said repeater has first separation means and second multiplexing means; and
said reception terminal has second separation means;
said speech sound input means inputs speech sound signals;
said speech coding means converts said speech sound signals into a speech code series by analyzing pitch, voicing source characteristics and vocal tract transmission characteristics of the signals to be coded;
said first multiplexing means multiplexes said uncoded text information and said speech code series to generate a combined signal;
said first separation means separates said combined signal into said uncoded text information and said speech code series;
said second multiplexing means multiplexes said phonetic transcription information and said speech code series to generate one code series;
said second separation means separates the one code series multiplexed by said second multiplexing means into said phonetic transcription information and said speech code series; and
said synthesizing means converts said speech code series into speech sound using a format that is the same as a format for converting said uncoded text information into speech sound.
-
-
6. A speech sound communication system comprising:
-
a transmission terminal having text input means and first transmission means;
a repeater having first reception means, language analysis means, prosody generation means and second transmission means; and
a reception terminal having second reception means, segment data memory means, segment read-out means and synthesizing means;
wherein, said text input means inputs text information, the text information being uncoded;
said first transmission means transmits said uncoded text information to a ii first communication path;
said first reception means receives said uncoded text information from said first communication path;
said language analysis means converts said uncoded text information into phonetic transcription information;
said prosody generation means converts said phonetic transcription information into phonetic transcription information with prosody information;
said second transmission means transmits said phonetic transcription information with prosody information into a second communication path;
said second reception means receives said phonetic transcription information with prosody information from said second communication path;
said segment read-out means reads out segment data from said segment data memory means in accordance with said phonetic transcription information with prosody information;
said synthesizing means synthesizes speech sounds by utilizing said phonetic transcription information with prosody information and said segment data;
said segment data memory means stores voicing source characteristics and vocal tract transmission characteristics information; and
said synthesizing means synthesizes speech sounds by generating a voicing source wave form having a period in accordance with said prosody information and having characteristics in accordance with said voicing source characteristics and by filter processing said voicing source wave form in accordance with said vocal tract transmission characteristics information. - View Dependent Claims (7)
said transmission terminal has speech sound input means, speech coding means and first multiplexing means, said repeater has first separation means and second multiplexing means, and said reception terminal has second separation means;
said speech sound input means inputs speech sound signals;
said speech coding means converts said speech sound signals into a speech code series by analyzing pitch, voicing source characteristics and vocal tract transmission characteristics of the signal to be coded;
said first multiplexing means multiplexes said uncoded text information and said speech code series to generate a combined signal;
said first separation means separates said combined signal into said uncoded text information and said speech code series;
said second multiplexing means multiplexes said phonetic transcription information with prosody information and said speech code series to generate one code series;
said second separation means separates said one code series multiplexed by said second multiplexing means into said phonetic transcription information with prosody information and said speech code series; and
said synthesizing means converts said speech code series into speech sound using a format that is the same as a format for converting said uncoded text information into speech sound.
-
-
8. A speech sound communication system comprising a transmission terminal having text input means, language analysis means and first transmission means,
a repeater having first reception means, prosody generation means and second transmission means, and a reception terminal having second reception means, segment data memory means, segment read-out means and synthesizing means, wherein, said text input means inputs text information; -
said language analysis means converts said text information into phonetic transcription information;
said first transmission means transmits said phonetic transcription information into a first communication path;
said first reception means receives phonetic transcription information from said first communication path;
said prosody generation means converts said phonetic transcription information into phonetic transcription information with prosody information;
said second transmission means transmits said phonetic transcription information with prosody information to a second communication path;
said second reception means receives said phonetic transcription information with prosody information from said second communication path;
said segment read-out means reads out segment data from said segment data memory means in accordance with said phonetic transcription information with prosody information;
said synthesizing means synthesizes speech sounds by using said phonetic transcription information with prosody information and said segment data;
said segment data memory means stores the voicing source characteristics and the vocal tract transmission characteristics information; and
said synthesizing means synthesizes speech sounds by generating a voicing source wave form having a period in accordance with said prosody information and having characteristics in accordance with said voicing source characteristics and by filter-processing said voicing source wave form in accordance with said vocal tract transmission characteristics information. - View Dependent Claims (9)
said transmission terminal has speech sound input means, speech coding means and first multiplexing means, said repeater has first separation means and second multiplexing means, and said reception terminal has second separation means;
said speech sound input means speech sound signals;
said speech coding means converts said speech sound signals into a speech code series by analyzing pitch, voicing source characteristics and vocal tract transmission characteristics of the signal to be coded;
said first multiplexing means multiplexes said phonetic transcription information and said speech code series to generate a combined signal;
said first separation means separates said combined signal into said phonetic transcription information and said sound code series;
said second multiplexing means multiplexes said phonetic transcription information with prosody information and said speech code series to generate one code series;
said second separation means separates said one code series multiplexed by said second multiplexing means into said phonetic transcription information with prosody information and said speech code series; and
said synthesizing means converts said speech code series into speech sound using a format that is the same as a format for converting said uncoded text information into speech sound.
-
-
16. A method of communicating speech from a transmitter to a remote receiver comprising the steps of:
-
(a) converting speech to a speech input signal at a transmission terminal;
(b) converting text to a text input signal that is uncoded at the transmission terminal;
(c) coding the speech input signal according to a coding format;
(d) multiplexing the coded speech input signal with the uncoded text input signal;
(e) transmitting the multiplexed signal to a remote receiver;
(f) receiving at the remote receiver and separating the multiplexed signal into a coded first received signal related to the speech input signal and a second received signal related to the uncoded text input signal;
(g) converting at the remote receiver the second received signal into phonetic transcription;
(h) coding at the remote receiver the phonetic transcription of step (g) according to the same coding format as in step (c); and
(i) decoding at the remote receiver, respectively, (1) the coded first received signal to produce a first speech output signal and (2) the coded phonetic transcription to produce a second speech output signal, wherein the decoding includes a decoding format which is the same for decoding the coded first received signal and for decoding the coded phonetic transcription.
-
Specification