Simultaneous plural-voice text-to-speech synthesizer

US 7,249,021 B2
Filed: 12/27/2001
Issued: 07/24/2007
Est. Priority Date: 12/28/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A text-to-speech synthesizer for selecting necessary speech segment information from speech segment database based on reading and word class information on input text information and generating a speech signal based on the selected speech segment information, comprising:

text analyzing means for analyzing the input text information and obtaining reading and word class information;

prosody generating means for generating prosody information based on the reading and the word class information;

plural speech instructing means for instructing simultaneous speaking of an identical input text by a plurality of voices; and

plural speech synthesizing means for generating a plurality of synthesized speech signals based on prosody information from the prosody generating means and speech segment information selected from the speech segment database upon reception of an instruction from the plural speech instructing means.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A multiple-voice instructing unit (17) instructs pitch deforming ratio and mixing ratio to a multiple-voice synthesis unit (16). The multiple voice synthesis unit (16) generates a standard voice signal by means of waveform superimposition based on voice element data read from a voice element database (15) and prosodic information from a voice element selecting unit (14), expands/contracts the time base of the above standard voice signal based on the prosodic information and instruction information from the multiple-voice instructing unit (17) to change a voice pitch, and mixes the standard voice signal with an expansion/contraction voice signal for outputting via an output terminal (18). Accordingly, a concurrent vocalization by multiple speakers based on the same text can be implemented without the need of time-division, parallel text analyzing and prosody generating and of adding pitch converting as post-processing.

Citations

18 Claims

1. A text-to-speech synthesizer for selecting necessary speech segment information from speech segment database based on reading and word class information on input text information and generating a speech signal based on the selected speech segment information, comprising:
- text analyzing means for analyzing the input text information and obtaining reading and word class information;
  
  prosody generating means for generating prosody information based on the reading and the word class information;
  
  plural speech instructing means for instructing simultaneous speaking of an identical input text by a plurality of voices; and
  
  plural speech synthesizing means for generating a plurality of synthesized speech signals based on prosody information from the prosody generating means and speech segment information selected from the speech segment database upon reception of an instruction from the plural speech instructing means.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The text-to-speech synthesizer as defined in claim 1, whereinthe plural speech synthesizing means comprises:
    - waveform overlap-add means for generating a speech signal by waveform overlap-add technique based on the speech segment information and the prosody information;
      
      waveform expanding/contracting means for expanding or contracting a time base of a waveform of the speech signal generated by the waveform overlap-add means based on the prosody information and the instruction information from the plural speech instructing means and generating a speech signal different in pitch of speech; and
      
      mixing means for mixing the speech signal from the waveform overlap-add means and the speech signal from the waveform expanding/contracting means.
  - 3. The text-to-speech synthesizer as defined in claim 1, whereinthe plural speech synthesizing means comprises:
    - a first waveform overlap-add means for generating a speech signal by waveform overlap-add technique based on the speech segment information and the prosody information;
      
      a second waveform overlap-add means for generating a speech signal by waveform overlap-add technique based on the speech segment information, the prosody information, and the instruction information from the plural speech instructing means at a basic cycle different from that of the first waveform overlap-add means; and
      
      mixing means for mixing the speech signal from the first waveform overlap-add means and the speech signal from the second waveform overlap-add means.
  - 4. The text-to-speech synthesizer as defined in claim 1, whereinthe plural speech synthesizing means comprises:
    - a first waveform overlap-add means for generating a speech signal by waveform overlap-add technique based on the speech segment information and the prosody information;
      
      a second speech segment database for storing speech segment information different from that stored in a first speech segment database as the speech segment database;
      
      a second waveform overlap-add means for generating a speech signal by waveform overlap-add technique based on speech segment information selected from the second speech segment database, the prosody information, and instruction information from the plural speech instructing means; and
      
      mixing means for mixing the speech signal from the first waveform overlap-add means and the speech signal from the second waveform overlap-add means.
  - 5. The text-to-speech synthesizer as defined in claim 1, whereinthe plural speech synthesizing means comprises:
    - waveform overlap-add means for generating a speech signal by waveform overlap-add technique based on the speech segment information and the prosody information;
      
      waveform expanding/contracting overlap-add means for expanding or contracting a time base of a waveform of the speech signal based on the prosody information and the instruction information from the plural speech instructing means and generating a speech signal by the waveform overlap-add technique; and
      
      mixing means for mixing the speech signal from the waveform overlap-add means and the speech signal from the waveform expanding/contracting overlap-add means.
  - 6. The text-to-speech synthesizer as defined in claim 1, whereinthe plural speech synthesizing means comprises:
    - first excitation waveform generating means for generating a first excitation waveform based on the prosody information;
      
      second excitation waveform generating means for generating a second excitation waveform different in frequency from the first excitation waveform based on the prosody information and the instruction information from the plural speech instructing means;
      
      mixing means for mixing the first excitation waveform and the second excitation waveform; and
      
      a synthetic filter for obtaining vocal tract articulatory feature parameters contained in the speech segment information and generating a synthetic speech signal based on the mixed excitation waveform with use of the vocal tract articulatory feature parameters.
  - 7. The text-to-speech synthesizer as defined in claim 2, further comprisinga plurality of the waveform expanding/contracting means.
  - 8. The text-to-speech synthesizer as defined in claim 3, further comprisinga plurality of the second waveform overlap-add means.
  - 9. The text-to-speech synthesizer as defined in claim 4, further comprising a plurality of the second waveform overlap-add means.
  - 10. The text-to-speech synthesizer as defined in claim 5, further comprising a plurality of the waveform expanding/contracting overlap-add means.
  - 11. The text-to-speech synthesizer as defined in claim 6, further comprisinga plurality of the second excitation waveform generating means.
  - 12. The text-to-speech synthesizer as defined in claim 2, whereinthe mixing means performs the mixing operation with a mixing ratio based on the instruction information from the plural speech instructing means.
  - 13. The text-to-speech synthesizer as defined in claim 3, whereinthe mixing means performs the mixing operation with a mixing ratio based on the instruction information from the plural speech instructing means.
  - 14. The text-to-speech synthesizer as defined in claim 4, whereinthe mixing means performs the mixing operation with a mixing ratio based on the instruction information from the plural speech instructing means.
  - 15. The text-to-speech synthesizer as defined in claim 5, whereinthe mixing means performs the mixing operation with a mixing ratio based on the instruction information from the plural speech instructing means.
  - 16. The text-to-speech synthesizer as defined in claim 6, whereinthe mixing means performs the mixing operation with a mixing ratio based on the instruction information from the plural speech instructing means.
  - 17. A computer readable program storage medium, storing a text-to-speech synthesis processing program for causing the computer, havingthe text analyzing means the prosody generating means the plural speech instructing means, and the plural speech synthesizing means to perform the functions as defined in claim 1.

18. A computer readable program storage medium. storing a text-to-speech synthesis processing program for causing a computer to perform the steps of:
- analyzing input text information and obtaining reading and word class information;
  
  generating prosody information based on the reading and the word class information;
  
  instructing simultaneous speaking of an identical input text by a plurality of voices;
  
  generating a plurality of synthesized speech signals based on prosody information and speech segment information selected from a speech segment database upon reception of an instruction.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sharp Kabushiki Kaisha (Hon Hai Precision Industry Co., Ltd.)
Original Assignee
Sharp Kabushiki Kaisha (Hon Hai Precision Industry Co., Ltd.)
Inventors
Morio, Tomokazu, Kimura, Osamu
Primary Examiner(s)
{hacek over (S)}mits; Talivaldis Ivars
Assistant Examiner(s)
Serrou; Abdelali

Application Number

US10/451,825
Publication Number

US 20040054537A1
Time in Patent Office

2,035 Days
Field of Search

704/258, 704/259, 704/260
US Class Current

704/258
CPC Class Codes

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/0335   Pitch control

G10L 13/08   Text analysis or generation...

Simultaneous plural-voice text-to-speech synthesizer

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Simultaneous plural-voice text-to-speech synthesizer

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links