System and method for synthesizing multiplexed speech and text at a receiving terminal

US 6,516,298 B1
Filed: 04/17/2000
Issued: 02/04/2003
Est. Priority Date: 04/16/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A speech sound communication system comprising;

a transmission terminal having text input means, speech sound input means, speech coding means, and multiplexing means;

a remote reception terminal having reception means, separation means, language analysis means, prosody generation means, and synthesizing means, wherein, said text input means inputs uncoded text information;

said speech sound input means inputs speech sound signals;

said speech coding means converts said inputted speech sound signals into a speech code series;

said multiplexing means multiplexes said uncoded text information and said speech code series into a multiplexed signal for transmission to the remote reception terminal;

said reception means receives said multiplexed signal;

said separation means separates said multiplexed signal into uncoded text information and said speech code series;

said language analysis means analyses said uncoded text information so that said text information is converted to phonetic transcription information;

said prosody generation means converts said phonetic transcription information into phonetic transcription with prosody information;

said synthesizing means synthesizes a speech sound by utilizing said phonetic transcription information with prosody information and converts said speech code series into a speech sound using a format that is the same as a format for converting the text information into speech sound.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The reception terminal receives a code series from the communication path. The separator separates the code series into a speech code series and text information. The speech code series is decoded into a pitch period, a LSP coefficient, and code numerals by the synthesizer to reproduce the speech sound in the CELP system. Also, the text information is converted into pronunciation and accent information by the language analyzer and added to prosody information, such as phoneme time length and pitch pattern by the prosody generator. The LSP coefficient, and code numerals suitable for the phoneme are read from the segment database and the pitch frequency from the prosody information is inputted to the synthesizer and synthesized into speech sound.

Citations

16 Claims

1. A speech sound communication system comprising;
- a transmission terminal having text input means, speech sound input means, speech coding means, and multiplexing means;
  
  a remote reception terminal having reception means, separation means, language analysis means, prosody generation means, and synthesizing means, wherein, said text input means inputs uncoded text information;
  
  said speech sound input means inputs speech sound signals;
  
  said speech coding means converts said inputted speech sound signals into a speech code series;
  
  said multiplexing means multiplexes said uncoded text information and said speech code series into a multiplexed signal for transmission to the remote reception terminal;
  
  said reception means receives said multiplexed signal;
  
  said separation means separates said multiplexed signal into uncoded text information and said speech code series;
  
  said language analysis means analyses said uncoded text information so that said text information is converted to phonetic transcription information;
  
  said prosody generation means converts said phonetic transcription information into phonetic transcription with prosody information;
  
  said synthesizing means synthesizes a speech sound by utilizing said phonetic transcription information with prosody information and converts said speech code series into a speech sound using a format that is the same as a format for converting the text information into speech sound.
- View Dependent Claims (13, 14, 15)
- - 13. A speech sound communication system according to claims 1, 2, 3, 5, 7 or 9 wherein the user can input an arbitrary text into said text input means.
  - 14. A speech sound communication system according to claims 1, 2, 3, 5, 7 or 9 wherein said text input means carries out input by reading out a text from a memory medium, network like Internet, LAN or a data base.
  - 15. A speech sound communication system according to claims 1, 2, 3, 5, 7 or 9 further comprising said parameter input means and in that the user can input parameter values of speech sounds as desired by said parameter input means and said prosody generation means and said segment readout means output values modified in accordance with said parameter values.

2. A speech sound communication system comprising a transmission terminal having text input means, language analysis means, speech sound input means, speech coding means, multiplexing means, and transmission means;
- a remote reception terminal having reception means, separation means, prosody generation means, and synthesizing means, wherein, said text input means inputs text information;
  
  said language analysis means converts said text information into phonetic transcription information;
  
  said speech sound input means inputs speech sound signals;
  
  said speech coding means converts said inputted speech sound signals into a speech code series;
  
  said multiplexing means multiplexes said phonetic transcription information and said speech code series to generate one code series;
  
  said transmission means transmits said generated one code series;
  
  said reception means receives said generated one code series;
  
  said separation means separates said one code series into said phonetic transcription information and said speech code series;
  
  said prosody generation means converts said phonetic transcription information into phonetic transcription information with prosody information; and
  
  said synthesizing means converts said speech code series into speech sound using a format that is the same as a format for converting said phonetic transcription information into speech sound.
- View Dependent Claims (10, 11, 12)
- - 10. A speech sound communication system according to claims 2, 3, 4, 6 or 8 wherein the user can input an arbitrary text into said text input means.
  - 11. A speech sound communication system according to claims 2, 3, 4, 6 or 8 wherein said text input means carries out input by reading out a text from a memory medium, network like Internet, LAN or a data base.
  - 12. A speech sound communication system according to claims 2, 3, 4, 6 or 8, further comprising a parameter input means and in that the user can input parameter values of speech sounds as desired by said parameter input means and said prosody generation means and said segment read-out means output values modified in accordance with said parameter values.

3. A speech sound communication system comprising a transmission terminal having text input means, language analysis means, prosody generation means, speech input means, speech coding means, multiplexing means, and transmission means;
- a remote reception terminal having reception means, separation means, segment data memory means, segment read-out means and synthesizing means, wherein, said text input means inputs text information;
  
  said language analysis means converts said text information into phonetic transcription information;
  
  said prosody generation means converts said phonetic transcription information into phonetic transcription information with prosody information;
  
  said speech input means inputs speech sound signals;
  
  said speech coding means converts said speech sound signals into a speech code series by analyzing pitch, voicing source characteristics and vocal tract transmission characteristics of the signal to be coded;
  
  said multiplexing means multiplexes said phonetic transcription information with prosody information and said speech code series to generate one code series;
  
  said transmission means transmits said generated one code series;
  
  said reception means receives said generated one code series;
  
  said separation means separates said one code series into said phonetic transcription information with prosody information and said speech code series;
  
  said segment read-out means reads out segment data from said segment data memory means in accordance with said phonetic transcription information with prosody information;
  
  said synthesizing means synthesizes a speech sound by utilizing said phonetic transcription information with prosody information and said segment data;
  
  said segment data memory means stores voicing source characteristics and vocal tract transmission characteristics information; and
  
  said synthesizing means converts said speech code series into speech sound using a format that is the same as a format for converting said text information into speech sound.

4. A speech sound communication systems comprising:
- a transmission terminal having text input means and first transmission means;
  
  a repeater having first reception means, language analysis means and second transmission means; and
  
  a reception terminal having second reception means, prosody generation means, segment data memory means, segment read-out means and synthesizing means;
  
  wherein, said text input means inputs text information, the text information being uncoded;
  
  said first transmission means transmits said uncoded text information to a first communication path;
  
  said first reception means receives said uncoded text information from said first communication path;
  
  said language analysis means converts said uncoded text information into phonetic transcription information;
  
  said second transmission means transmits said phonetic transcription information into a second communication path;
  
  said second reception means receives said phonetic transcription information from said second communication path;
  
  said prosody generation means converts said phonetic transcription information into phonetic transcription information with prosody information;
  
  said segment read-out means reads out segment data from said segment data memory means in accordance with said phonetic transcription information with prosody information;
  
  said synthesizing means synthesizes speech sounds by utilizing said phonetic transcription information with prosody information and said segment data;
  
  said segment data memory means stores voicing source characteristics and vocal tract transmission characteristics information; and
  
  said synthesizing means synthesizes speech sounds by generating a voicing source wave form having a period in accordance with said prosody information and having characteristics in accordance with said sound characteristics and by filter processing said voicing source wave form in accordance with said vocal tract transmission characteristics information.
- View Dependent Claims (5)
- - 5. A speech sound communication system according to claim 4 wherein:

6. A speech sound communication system comprising:
- a transmission terminal having text input means and first transmission means;
  
  a repeater having first reception means, language analysis means, prosody generation means and second transmission means; and
  
  a reception terminal having second reception means, segment data memory means, segment read-out means and synthesizing means;
  
  wherein, said text input means inputs text information, the text information being uncoded;
  
  said first transmission means transmits said uncoded text information to a ii first communication path;
  
  said first reception means receives said uncoded text information from said first communication path;
  
  said language analysis means converts said uncoded text information into phonetic transcription information;
  
  said prosody generation means converts said phonetic transcription information into phonetic transcription information with prosody information;
  
  said second transmission means transmits said phonetic transcription information with prosody information into a second communication path;
  
  said second reception means receives said phonetic transcription information with prosody information from said second communication path;
  
  said segment read-out means reads out segment data from said segment data memory means in accordance with said phonetic transcription information with prosody information;
  
  said synthesizing means synthesizes speech sounds by utilizing said phonetic transcription information with prosody information and said segment data;
  
  said segment data memory means stores voicing source characteristics and vocal tract transmission characteristics information; and
  
  said synthesizing means synthesizes speech sounds by generating a voicing source wave form having a period in accordance with said prosody information and having characteristics in accordance with said voicing source characteristics and by filter processing said voicing source wave form in accordance with said vocal tract transmission characteristics information.
- View Dependent Claims (7)
- - 7. A speech sound communication system according to claim 6 wherein:

8. A speech sound communication system comprising a transmission terminal having text input means, language analysis means and first transmission means,a repeater having first reception means, prosody generation means and second transmission means, and a reception terminal having second reception means, segment data memory means, segment read-out means and synthesizing means, wherein, said text input means inputs text information;
- said language analysis means converts said text information into phonetic transcription information;
  
  said first transmission means transmits said phonetic transcription information into a first communication path;
  
  said first reception means receives phonetic transcription information from said first communication path;
  
  said prosody generation means converts said phonetic transcription information into phonetic transcription information with prosody information;
  
  said second transmission means transmits said phonetic transcription information with prosody information to a second communication path;
  
  said second reception means receives said phonetic transcription information with prosody information from said second communication path;
  
  said segment read-out means reads out segment data from said segment data memory means in accordance with said phonetic transcription information with prosody information;
  
  said synthesizing means synthesizes speech sounds by using said phonetic transcription information with prosody information and said segment data;
  
  said segment data memory means stores the voicing source characteristics and the vocal tract transmission characteristics information; and
  
  said synthesizing means synthesizes speech sounds by generating a voicing source wave form having a period in accordance with said prosody information and having characteristics in accordance with said voicing source characteristics and by filter-processing said voicing source wave form in accordance with said vocal tract transmission characteristics information.
- View Dependent Claims (9)
- - 9. A speech sound communication system according to claim 8 characterized in that:

16. A method of communicating speech from a transmitter to a remote receiver comprising the steps of:
- (a) converting speech to a speech input signal at a transmission terminal;
  
  (b) converting text to a text input signal that is uncoded at the transmission terminal;
  
  (c) coding the speech input signal according to a coding format;
  
  (d) multiplexing the coded speech input signal with the uncoded text input signal;
  
  (e) transmitting the multiplexed signal to a remote receiver;
  
  (f) receiving at the remote receiver and separating the multiplexed signal into a coded first received signal related to the speech input signal and a second received signal related to the uncoded text input signal;
  
  (g) converting at the remote receiver the second received signal into phonetic transcription;
  
  (h) coding at the remote receiver the phonetic transcription of step (g) according to the same coding format as in step (c); and
  
  (i) decoding at the remote receiver, respectively, (1) the coded first received signal to produce a first speech output signal and (2) the coded phonetic transcription to produce a second speech output signal, wherein the decoding includes a decoding format which is the same for decoding the coded first received signal and for decoding the coded phonetic transcription.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Weizhong, Zhu, Kamai, Takahiro, Matsui, Kenji
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US09/550,891
Time in Patent Office

1,023 Days
Field of Search

704/260, 704/219, 704/262, 704/270
US Class Current

704/260
CPC Class Codes

G10L 13/08 Text analysis or generation...

System and method for synthesizing multiplexed speech and text at a receiving terminal

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for synthesizing multiplexed speech and text at a receiving terminal

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links