Text voice synthesis device and program recording medium

US 20040054537A1
Filed: 06/26/2003
Published: 03/18/2004
Est. Priority Date: 12/28/2000
Status: Active Grant

First Claim

Patent Images

1. A text-to-speech synthesizer for selecting necessary speech segment information from speech segment database based on reading and word class information on input text information and generating a speech signal based on the selected speech segment information, comprising:

text analyzing means (12) for analyzing the input text information and obtaining reading and word class information;

prosody generating means (13) for generating prosody information based on the reading and the word class information;

plural speech instructing means (17) for instructing simultaneous speaking of an identical input text by a plurality of voices; and

plural speech synthesizing means (16) for generating a plurality of synthesized speech signals based on prosody information from the prosody generating means (13) and speech segment information selected from the speech segment database (15) upon reception of an instruction from the plural speech instructing means (17).

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A multiple-voice instructing unit (17) instructs pitch deforming ratio and mixing ratio to a multiple-voice synthesis unit (16). The multiple voice synthesis unit (16) generates a standard voice signal by means of waveform superimposition based on voice element data read from a voice element database (15) and prosodic information from a voice element selecting unit (14), expands/contracts the time base of the above standard voice signal based on the prosodic information and instruction information from the multiple-voice instructing unit (17) to change a voice pitch, and mixes the standard voice signal with an expansion/contraction voice signal for outputting via an output terminal (18). Accordingly, a concurrent vocalization by multiple speakers based on the same text can be implemented without the need of time-division, parallel text analyzing and prosody generating and of adding pitch converting as post-processing.

29 Citations

View as Search Results

17 Claims

1. A text-to-speech synthesizer for selecting necessary speech segment information from speech segment database based on reading and word class information on input text information and generating a speech signal based on the selected speech segment information, comprising:
- text analyzing means (12) for analyzing the input text information and obtaining reading and word class information;
  
  prosody generating means (13) for generating prosody information based on the reading and the word class information;
  
  plural speech instructing means (17) for instructing simultaneous speaking of an identical input text by a plurality of voices; and
  
  plural speech synthesizing means (16) for generating a plurality of synthesized speech signals based on prosody information from the prosody generating means (13) and speech segment information selected from the speech segment database (15) upon reception of an instruction from the plural speech instructing means (17).
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The text-to-speech synthesizer as defined in claim 1, wherein the plural speech synthesizing means (16) comprises:
    - waveform overlap-add means (21) for generating a speech signal by waveform overlap-add technique based on the speech segment information and the prosody information;
      
      waveform expanding/contracting means (22) for expanding or contracting a time base of a waveform of the speech signal generated by the waveform overlap-add means (21) based on the prosody information and the instruction information from the plural speech instructing means (17) and generating a speech signal different in pitch of speech; and
      
      mixing means (23) for mixing the speech signal from the waveform overlap-add means (21) and the speech signal from the waveform expanding/contracting means (22).
  - 3. The text-to-speech synthesizer as defined in claim 1, wherein the plural speech synthesizing means (16) comprises:
    - a first waveform overlap-add means (25) for generating a speech signal by waveform overlap-add technique based on the speech segment information and the prosody information;
      
      a second waveform overlap-add means (26) for generating a speech signal by waveform overlap-add technique based on the speech segment information, the prosody information, and the instruction information from the plural speech instructing means (17) at a basic cycle different from that of the first waveform overlap-add means (25); and
      
      mixing means (27) for mixing the speech signal from the first waveform overlap-add means and the speech signal from the second waveform overlap-add means.
  - 4. The text-to-speech synthesizer as defined in claim 1, wherein the plural speech synthesizing means (16) comprises:
    - a first waveform overlap-add means (35) for generating a speech signal by waveform overlap-add technique based on the speech segment information and the prosody information;
      
      a second speech segment database (38) for storing speech segment information different from that stored in a first speech segment database as the speech segment database (15);
      
      a second waveform overlap-add means (36) for generating a speech signal by waveform overlap-add technique based on speech segment information selected from the second speech segment database (38), the prosody information, and instruction information from the plural speech instructing means (17); and
      
      mixing means (37) for mixing the speech signal from the first waveform overlap-add means (35) and the speech signal from the second waveform overlap-add means (36).
  - 5. The text-to-speech synthesizer as defined in claim 1, wherein the plural speech synthesizing means (16) comprises:
    - waveform overlap-add means (31) for generating a speech signal by waveform overlap-add technique based on the speech segment information and the prosody information;
      
      waveform expanding/contracting overlap-add means (32) for expanding or contracting a time base of a waveform of the speech signal based on the prosody information and the instruction information from the plural speech instructing means (17) and generating a speech signal by the waveform overlap-add technique; and
      
      mixing means (33) for mixing the speech signal from the waveform overlap-add means (31) and the speech signal from the waveform expanding/contracting overlap-add means (32).
  - 6. The text-to-speech synthesizer as defined in claim 1, wherein the plural speech synthesizing means (16) comprises:
    - first excitation waveform generating means (41) for generating a first excitation waveform based on the prosody information;
      
      second excitation waveform generating means (42) for generating a second excitation waveform different in frequency from the first excitation waveform based on the prosody information and the instruction information from the plural speech instructing means (17);
      
      mixing means (43) for mixing the first excitation waveform and the second excitation waveform; and
      
      a synthetic filter (44) for obtaining vocal tract articulatory feature parameters contained in the speech segment information and generating a synthetic speech signal based on the mixed excitation waveform with use of the vocal tract articulatory feature parameters.
  - 7. The text-to-speech synthesizer as defined in claim 2, wherein a plurality of the waveform expanding/contracting means (22) are present.
  - 8. The text-to-speech synthesizer as defined in claim 3, wherein a plurality of the second waveform overlap-add means (26) are present.
  - 9. The text-to-speech synthesizer as defined in claim 4, wherein a plurality of the second waveform overlap-add means (36) are present.
  - 10. The text-to-speech synthesizer as defined in claim 5, wherein a plurality of the waveform expanding/contracting overlap-add means (32) are present.
  - 11. The text-to-speech synthesizer as defined in claim 6, wherein a plurality of the second excitation waveform generating means (42) are present.
  - 12. The text-to-speech synthesizer as defined in claim 2, wherein the mixing means (23) performs the mixing operation with a mixing ratio based on the instruction information from the plural speech instructing means (17).
  - 13. The text-to-speech synthesizer as defined in claim 3, wherein the mixing means (27) performs the mixing operation with a mixing ratio based on the instruction information from the plural speech instructing means (17).
  - 14. The text-to-speech synthesizer as defined in claim 4, wherein the mixing means (37) performs the mixing operation with a mixing ratio based on the instruction information from the plural speech instructing means (17).
  - 15. The text-to-speech synthesizer as defined in claim 5, wherein the mixing means (33) performs the mixing operation with a mixing ratio based on the instruction information from the plural speech instructing means (17).
  - 16. The text-to-speech synthesizer as defined in claim 6, wherein the mixing means (43) performs the mixing operation with a mixing ratio based on the instruction information from the plural speech instructing means (17).
  - 17. A program storage medium allowing read by a computer, characterized by storing a text-to-speech synthesis processing program for letting the computer function as:
    - the text analyzing means (12), the prosody generating means (13), the plural speech instructing means (17), and the plural speech synthesizing means (16) as defined in claim 1.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sharp Kabushiki Kaisha (Hon Hai Precision Industry Co., Ltd.)
Original Assignee
Sharp Kabushiki Kaisha (Hon Hai Precision Industry Co., Ltd.)
Inventors
Kimura, Osamu, Morio, Tomokazu

Granted Patent

US 7,249,021 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/0335   Pitch control

G10L 13/08   Text analysis or generation...

Text voice synthesis device and program recording medium

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

29 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Text voice synthesis device and program recording medium

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

29 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links