Speech synthesis method

US 20030088418A1
Filed: 10/07/2002
Published: 05/08/2003
Est. Priority Date: 12/04/1995
Status: Active Grant

First Claim

Patent Images

1. A speech synthesis method comprising the steps of:

generating a plurality of synthesis speech segments by changing at least one of a pitch and a duration of each of a plurality of second speech segments in accordance with at least one of a pitch and a duration of each of a plurality of first speech segments;

selecting a plurality of synthesis units from the second speech segments on the basis of a distance between the synthesis speech segments and the first speech segments; and

generating a synthesis speech by selecting predetermined synthesis units from the synthesis units and connecting the predetermined synthesis units to one another to generate a synthesis speech.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a synthesis unit generator, a plurality of synthesis speech segments are generated by synthesizing training speech segments labeled with phonetic contexts and input speech segments while altering the pitch/duration of the input speech segments in accordance with the pitch/duration of the training speech segments. Typical speech segments are selected from the input speech segments on the basis of a distance between the synthesis speech segments and the training speech segments, and are stored in a storage. In addition, a plurality of phonetic context clusters corresponding to the synthesis units are generated on the basis of the distance, and are stored in a storage. A synthesis speech signal is generated by reading out, from the storage, those of the synthesis units, which correspond to the phonetic context clusters including phonetic contexts of input phonemes, and connecting the selected synthesis units in a speech synthesizer.

Citations

36 Claims

1. A speech synthesis method comprising the steps of:
- generating a plurality of synthesis speech segments by changing at least one of a pitch and a duration of each of a plurality of second speech segments in accordance with at least one of a pitch and a duration of each of a plurality of first speech segments;
  
  selecting a plurality of synthesis units from the second speech segments on the basis of a distance between the synthesis speech segments and the first speech segments; and
  
  generating a synthesis speech by selecting predetermined synthesis units from the synthesis units and connecting the predetermined synthesis units to one another to generate a synthesis speech.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The speech synthesis method according to claim 1, wherein said synthesis unit selection step includes a step of spectrum-shaping the synthesis speech segments and a step of selecting a plurality of synthesis units from said second speech segments on the basis of the distance between said spectrum-shaped synthesis speech segments and said first speech segments, and said synthesis speech generation step includes a step of spectrum-shaping the synthesis speech to generate a final synthesis speech.
  - 3. The speech synthesis method according to claim 1, wherein said synthesis unit selection-step includes a step of storing, as said synthesis units, speech source signals and information on combinations of coefficients of a synthesis filter for receiving said speech source signals and generating a synthesis speech signal.
  - 4. The speech synthesis method according to claim 3, wherein the synthesis unit selection step includes a step of quantizing the speech source signals and the coefficients of the synthesis filter, and storing, as the synthesis units, the quantized speech source signals and information on combinations of the coefficients of the synthesis filter.
  - 5. The speech synthesis method according to claim 1, wherein the synthesis unit selection step includes a step of storing, as the synthesis units, speech source signals and information on combinations of coefficients of a synthesis filter for receiving the speech source signals and generating a synthesis speech signal, at least one of the number of the speech source signals stored as the synthesis units and the number of the coefficients of the synthesis filter stored as the synthesis units being less than the total number of speech synthesis units.

6. A speech synthesis method comprising the steps of:
- generating a plurality of synthesis speech segments by changing at least one of a pitch and a duration of each of a plurality of second speech segments in accordance with at least one of a pitch and a duration of each of a plurality of first speech segments;
  
  selecting a plurality of synthesis speech segments using information regarding a distance between the synthesis speech segments;
  
  forming a plurality of synthesis context clusters using the information regarding the distance and the synthesis units; and
  
  generating a synthesis speech by selecting those of the synthesis units, which correspond to at least one of the phonetic context clusters which includes phonetic contexts of input phonemes, and connecting the selected synthesis units.
- View Dependent Claims (7, 8, 9, 10, 11)
- - 7. The speech synthesis method according to claim 6, wherein the synthesis speech generation step includes a step of spectrum-shaping the synthesis speech to generate a final synthesis speech.
  - 8. The speech synthesis method according to claim 6, wherein the synthesis unit selection step includes a step of storing, as the synthesis units, speech source signals and information on combinations of coefficients of a synthesis filter for receiving the speech source signals and generating a synthesis speech signal.
  - 9. The speech synthesis method according to claim 8, wherein the synthesis unit selection step includes a step of quantizing the speech source signals and the coefficients of the synthesis filter, and storing, as the synthesis units, the quantized speech source signals and information on combinations of the coefficients of the synthesis filter.
  - 10. The speech synthesis method according to claim 6, wherein the synthesis unit selection step includes a step of storing, as the synthesis units, speech source signals and information on combinations of coefficients of a synthesis filter for receiving the speech source signals and generating a synthesis speech signal, at least one of the number of the speech source signals stored as the synthesis units and the number of the coefficients of the synthesis filter stored as the synthesis units being less than the total number of speech synthesis units.
  - 11. The speech synthesis method according to claim 6, wherein the synthesis unit selection step includes a step of storing, as the synthesis units, speech source signals and information on combinations of coefficients of a synthesis filter for receiving the speech source signals and generating a synthesis speech signal, at least one of the number of the speech source signals stored as the synthesis units and the number of the coefficients of the synthesis filter stored as the synthesis units being less than the total number of the phonetic context clusters.

12. A speech synthesis method comprising the steps of:
- generating a plurality of synthesis speech segments by changing at least one of a pitch and a duration of each of a plurality of second speech segments in accordance with at least one of the pitch and duration of each of a plurality of first speech segments labeled with phonetic contexts;
  
  forming a plurality of synthesis context clusters using information regarding a distance between the synthesis speech segments and the first speech segments and information regarding the synthesis units;
  
  selecting the synthesis units using the information regarding the distance and the synthesis context cluster; and
  
  generating a synthesis speech by selecting predetermined synthesis units from the synthesis units and connecting the selected synthesis units.

13. A speech synthesis method comprising the steps of:
- generating a plurality of synthesis speech segments by changing at least one of a pitch and a duration of each of a plurality of second speech segments and a plurality of second speech segments in accordance with at least one of the pitch and duration of each of a plurality of first speech segments labeled with phonetic contexts;
  
  generating a plurality of phonetic context clusters on the basis of a distance between the synthesis speech segments and the first speech segments;
  
  selecting a plurality of synthesis units corresponding to the phonetic context clusters from the second speech segments on the basis of the distance; and
  
  generating a synthesis speech by selecting those of the synthesis units, which correspond to the phonetic context clusters including phonetic contexts of input phonemes, and connecting the selected synthesis units.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The speech synthesis method according to claim 13, wherein the synthesis speech generation step includes a step of spectrum-shaping the synthesis speech to generate a final synthesis speech.
  - 15. The speech synthesis method according to claim 13, wherein the phonetic context cluster generation step includes a step of spectrum-shaping the synthesis speech segments and a step of generating a plurality of phonetic context clusters on the basis of the distance between the spectrum-shaped synthesis speech segments and the first speech segments.
  - 16. The speech synthesis method according to claim 15, wherein the synthesis speech generation step includes a step of spectrum-shaping the synthesis speech to generate a final synthesis speech.
  - 17. The speech synthesis method according to claim 13, wherein the synthesis unit selection step includes a step of storing, as the synthesis units, speech source signals and information on combinations of coefficients of a synthesis filter for receiving the speech source signals and generating a synthesis speech signal.
  - 18. The speech synthesis method according to claim 17, wherein the synthesis unit selection step includes a step of quantizing the speech source signals and the coefficients of the synthesis filter, and storing, as the synthesis units, the quantized speech source signals and information on combinations of the coefficients of the synthesis filter.
  - 19. The speech synthesis method according to claim 13, wherein the synthesis unit selection step includes a step of storing, as the synthesis units, speech source signals and information on combinations of coefficients of a synthesis filter for receiving the speech source signals and generating a synthesis speech signal, at least one of the number of the speech source signals stored as the synthesis units and the number of the coefficients of the synthesis filter stored as the synthesis units being less than the total number of speech synthesis units.
  - 20. The speech synthesis method according to claim 13, wherein the synthesis unit selection step includes a step of storing, as the synthesis units, speech source signals and information on combinations of coefficients of a synthesis filter for receiving the speech source signals and generating a synthesis speech signal, at least one of the number of the speech source signals stored as the synthesis units and the number of the coefficients of the synthesis filter stored as the synthesis units being less than the total number of the phonetic context clusters.

21. A speech synthesis method comprising the steps of:
- prestoring information on a plurality of speech synthesis units including at least speech spectrum parameters;
  
  selecting predetermined information from the stored information on the speech synthesis units;
  
  generating a synthesis speech signal by connecting the selected predetermined information; and
  
  emphasizing a formant of the synthesis speech signal by a formant emphasis filter whose filtering coefficient is determined in accordance with the spectrum parameters of the selected information.
- View Dependent Claims (22, 23, 24, 25, 26)
- - 22. The speech synthesis method according to claim 21, wherein the information on the speech synthesis units includes not only the speech spectrum parameters but also a vocal track filter drive signal of a 1-pitch cycle.
  - 23. The speech synthesis method according to claim 21, wherein the information on the speech synthesis units includes at least a speech wave with an emphasized formant of a 1-pitch cycle.
  - 24. The speech synthesis method according to claim 21, further including a step of emphasizing the pitch of the synthesis speech signal by a pitch emphasis filter whose filtering coefficient is determined in accordance with a speech pitch parameter.
  - 25. The speech synthesis method according to claim 24, wherein the information on the speech synthesis units includes not only the speech spectrum parameters but also a vocal track filter drive signal of a 1-pitch cycle.
  - 26. The speech synthesis method according to claim 24, wherein the information on the speech synthesis units includes at least a speech wave with an emphasized formant of a 1-pitch cycle.

27. A speech synthesis method comprising the steps of:
- generating linear prediction coefficients by subjecting a reference speech signal to a linear prediction analysis;
  
  producing a residual pitch wave from a typical speech pitch wave extracted from the reference speech signal, using the linear prediction coefficients;
  
  storing information regarding the residual pitch wave as information of a speech synthesis unit in a voiced period; and
  
  synthesizing a speech, using the information of the speech synthesis unit.

28. A speech synthesis method comprising the steps of:
- storing information on a residual pitch wave generated from a reference speech signal and a spectrum parameter extracted from the reference speech signal;
  
  driving a vocal track filter having the spectrum parameter as a filtering coefficient, by a voiced speech source signal generated by using the information on the residual pitch wave in a voiced period, and by an unvoiced speech source signal in an unvoiced period, thereby generating a synthesis speech; and
  
  generating the residual pitch wave from a typical speech pitch wave extracted from the reference speech signal, by using a linear prediction coefficient obtained by subjecting the reference speech signal to linear prediction analysis.
- View Dependent Claims (29, 30, 31, 32, 33)
- - 29. The speech synthesis method according to claim 28, wherein the residual pitch wave generation step includes a step of generating the residual pitch wave by filtering the speech pitch wave through a linear prediction inverse filter having characteristics determined in accordance with the linear prediction coefficient.
  - 30. The speech synthesis method according to claim 28, wherein the residual pitch wave generation step includes a step of performing, as the linear prediction analysis, pitch synchronous linear prediction analysis synchronized with the pitch of the reference speech signal.
  - 31. The speech synthesis method according to claim 28, wherein the storing step includes a step of storing, as information on the residual pitch wave, a code obtained by compression-encoding the residual pitch wave, the code being decoded for use in speech synthesis.
  - 32. The speech synthesis method according to claim 28, wherein the storing step includes a step of storing, as information on the residual pitch wave, a code obtained by subjecting the residual pitch wave to inter-frame prediction encoding, the code being decoded for use in speech synthesis.
  - 33. The speech synthesis method according to claim 28, wherein in the residual pitch wave generation step, the linear prediction coefficient is used as the spectrum parameter.

34. A speech synthesis apparatus comprising:
- a speech segment generator for generating a plurality of synthesis speech segments by changing at least one of a pitch and a duration of each of a plurality of second speech segments in accordance with at least one of a pitch and a duration of each of a plurality of first speech segments;
  
  a synthesis unit selector for selecting a plurality of synthesis units from the second speech segments on the basis of a distance between the synthesis speech segments and the first speech segments; and
  
  a speech synthesis section for generating a synthesis speech by selecting predetermined synthesis units from the synthesis units and connecting the predetermined synthesis units to one another to generate a synthesis speech.

35. A speech synthesis apparatus comprising:
- a speech segment generator for generating a plurality of synthesis speech segments by changing at least one of a pitch and a duration of each of a plurality of second speech segments and a plurality of second speech segments in accordance with at least one of the pitch and duration of each of a plurality of first speech segments labeled with phonetic contexts;
  
  a phonetic context cluster generator for generating a plurality of phonetic context clusters on the basis of a distance between the synthesis speech segments and the first speech segments;
  
  a synthesis unit selector for selecting a plurality of synthesis units corresponding to the phonetic context clusters from the second speech segments on the basis of the distance; and
  
  a speech synthesis unit for generating a synthesis speech by selecting those of the synthesis units, which correspond to the phonetic context clusters including phonetic contexts of input phonemes, and connecting the selected synthesis units.

36. A speech synthesis apparatus comprising:
- a storage for prestoring information on a plurality of speech synthesis units including at least speech spectrum parameters;
  
  a selector for selecting predetermined information from the stored information on the speech synthesis units;
  
  a speech synthesis section for generating a synthesis speech signal by connecting the selected predetermined information; and
  
  an emphasis section including a formant emphasis filter whose filtering coefficient is determined in accordance with the spectrum parameters of the selected information for emphasizing a formant of the synthesis speech signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Masami Akamine, Takehiko Kagoshima
Original Assignee
Masami Akamine, Takehiko Kagoshima
Inventors
Akamine, Masami, Kagoshima, Takehiko

Granted Patent

US 6,760,703 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/258
CPC Class Codes

G10L 13/07 Concatenation rules

G10L 25/90 Pitch determination of spee...

Speech synthesis method

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis method

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links