Speech signal processing apparatus and method, and storage medium

US 20010032079A1
Filed: 03/28/2001
Published: 10/18/2001
Est. Priority Date: 03/31/2000
Status: Abandoned Application

First Claim

Patent Images

1. A speech signal processing apparatus for performing speech synthesis by concatenating a plurality of selected synthesis units and modifying the synthesis units based on predetermined prosody parameters, said apparatus comprising:

distortion obtaining means for obtaining a distortion which may be generated from selection to synthesis of the synthesis units;

selection means for selecting synthesis units to be used for speech synthesis, based on the distortion obtained by said distortion obtaining means; and

speech synthesis means for performing speech synthesis based on the synthesis units selected by said selection means.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An object of the present invention is to suppress degradation of the quality in speech synthesis by selecting synthesis units so as to minimize a distortion caused by concatenation distortions and modification distortions. For that purpose, speech synthesis is performed by extracting a plurality of synthesis units corresponding to a phoneme environment from a synthesis-unit holding unit for holding a plurality of synthesis units so as to correspond to a predetermined prosody environment, calculating a distortion of each of the plurality of extracted synthesis units, obtaining a minimum distortion within a predetermined interval determined based on the prosody environment, selecting a series of synthesis units providing a minimum-distortion path, and modifying and concatenating the synthesis units.

44 Citations

View as Search Results

25 Claims

1. A speech signal processing apparatus for performing speech synthesis by concatenating a plurality of selected synthesis units and modifying the synthesis units based on predetermined prosody parameters, said apparatus comprising:
- distortion obtaining means for obtaining a distortion which may be generated from selection to synthesis of the synthesis units;
  
  selection means for selecting synthesis units to be used for speech synthesis, based on the distortion obtained by said distortion obtaining means; and
  
  speech synthesis means for performing speech synthesis based on the synthesis units selected by said selection means.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. An apparatus according to claim 1, wherein said selection means selects a plurality of synthesis units based on a phoneme series including a plurality of phonemes.
  - 3. An apparatus according to claim 1, wherein said distortion obtaining means obtains a distortion which may be generated in each of a plurality of synthesis units corresponding to one phoneme, and wherein said selection means selects one synthesis unit from among the plurality of synthesis units corresponding to the one phoneme.
  - 4. An apparatus according to claim 1, wherein said selection means selects the synthesis units to be used in speech synthesis so as to minimize the distortion.
  - 5. An apparatus according to claim 1, wherein said distortion obtaining means obtains the distortion based on a concatenation distortion generated by concatenating a synthesis unit to another synthesis unit and a modification distortion generated by modifying the synthesis unit.
  - 6. An apparatus according to claim 1, wherein said distortion obtaining means uses a value obtained by adding a concatenation distortion generated by concatenating a synthesis unit to another synthesis unit and a modification distortion generated by modifying the synthesis unit as the distortion.
  - 7. An apparatus according to claim 3, wherein said distortion obtaining means calculates the distortion as a weighted sum of the concatenation distortion and the modification distortion.
  - 8. An apparatus according to claim 5, wherein said distortion obtaining means calculates the concatenation distortion using a cepstrum distance.
  - 9. An apparatus according to claim 5, wherein said distortion obtaining means calculates the modification distortion using a cepstrum distance.
  - 10. An apparatus according to claim 5, wherein said distortion obtaining means includes a table storing modification distortions, and determines the modification distortion by referring to the table.
  - 11. An apparatus according to claim 5, wherein said distortion obtaining means includes a table storing concatenation distortions, and determines the concatenation distortion by referring to the table.
  - 12. An apparatus according to claim 1, further comprising:
    - input means for inputting text data;
      
      language analysis means for performing language analysis of the text data; and
      
      prosody-parameter generation means for generating the predetermined prosody parameters based on a result of analysis of said language analysis means.

13. A speech signal processing method comprising:
- a distortion obtaining step of obtaining a distortion generated by concatenating a plurality of selected synthesis units and modifying the synthesis units based on predetermined prosody parameters;
  
  a selection step of selecting synthesis units to be used for speech synthesis, based on the distortion obtained in said distortion obtaining step; and
  
  a speech synthesis step of performing speech synthesis based on the synthesis units selected in said selection step.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 14. A method according to claim 13, wherein in said selection step, a plurality of synthesis units are selected based on a phoneme series including a plurality of phonemes.
  - 15. A method according to claim 13, wherein in said distortion obtaining step, a distortion which may be generated in each of a plurality of synthesis units corresponding to one phoneme is obtained, and wherein in said selection step, one synthesis unit is selected from among the plurality of synthesis units corresponding to the one phoneme.
  - 16. A method according to claim 13, wherein in said selection step, the synthesis units to be used in speech synthesis are selected so as to minimize the distortion.
  - 17. A method according to claim 13, wherein said distortion obtaining means obtains the distortion based on a concatenation distortion generated by concatenating a synthesis unit to another synthesis unit and a modification distortion generated by modifying the synthesis unit.
  - 18. A method according to claim 13, wherein in said distortion obtaining step, a value obtained by adding a concatenation distortion generated by concatenating a synthesis unit to another synthesis unit and a modification distortion generated by modifying the synthesis unit is used as the distortion.
  - 19. A method according to claim 17, wherein in said distortion obtaining step, the distortion is calculated as a weighted sum of the concatenation distortion and the modification distortion.
  - 20. A method according to claim 17, wherein in said distortion obtaining step, the concatenation distortion is calculated using a cepstrum distance.
  - 21. A method according to claim 17, wherein in said distortion obtaining step, the modification distortion is calculated using a cepstrum distance.
  - 22. A method according to claim 17, wherein in said distortion obtaining step, a table storing modification distortions is provided, and the modification distortion is determined by referring to the table.
  - 23. A method according to claim 17, wherein in said distortion obtaining step, a table storing concatenation distortions is provided, and the concatenation distortion is determined by referring to the table.
  - 24. A method according to claim 13, further comprising:
    - an input step of inputting text data;
      
      a language analysis step of performing language analysis of the text data; and
      
      a prosody-parameter generation step of generating the predetermined prosody parameters based on a result of analysis in said language analysis step.
  - 25. A storage medium, capable of being read by a computer, storing a program for executing a method according to any one of claims 13 through 24.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Canon Ayutthaya Limited (Canon Inc.)
Original Assignee
Canon Ayutthaya Limited (Canon Inc.)
Inventors
Komori, Yasuhiro, Okutani, Yasuo

Application Number

US09/818,607
Publication Number

US 20010032079A1
Time in Patent Office

Days
Field of Search
US Class Current

704/258
CPC Class Codes

G10L 13/07 Concatenation rules

Speech signal processing apparatus and method, and storage medium

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

44 Citations

25 Claims

Specification

Use Cases

Quick Links

Others

Speech signal processing apparatus and method, and storage medium

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

44 Citations

25 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others