Speech synthesis apparatus and speech synthesis method

US 20050119890A1
Filed: 11/29/2004
Published: 06/02/2005
Est. Priority Date: 11/28/2003
Status: Abandoned Application

First Claim

Patent Images

1. A speech synthesis apparatus that obtains text data and converts text indicated by the text data into speech, comprising:

a storage unit operable to previously store, with respect to each speech-unit, speech-unit data that represents (i) a loan word attribute indicating whether or not a speech-unit belongs to a class of loan words and (ii) an acoustic characteristic of the speech-unit;

a characteristic prediction unit operable to obtain text data and predict, with respect to each of a plurality of speech-units that form text indicated by the text data, a loan word attribute and an acoustic characteristic;

a selection unit operable to select speech-unit data that represents a loan word attribute and an acoustic characteristic similar to the loan word attribute and the acoustic characteristic of each speech-unit predicted by the characteristic prediction unit, from among the speech-unit data stored in the storage unit; and

a speech output unit operable to generate synthesized speech using a plurality of the speech-unit data selected by the selection unit and output the synthesized speech.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention includes: a characteristic parameter DB 106 that holds, with respect to each speech-unit, speech-unit data indicating a loan word attribute and acoustic characteristics; a language analysis unit 104 and a prosody prediction unit 109 that obtain text data and respectively predict a loan word attribute and acoustic characteristics of each of a plurality of speech-units that form text indicated by the text data; a speech-unit selection unit 108 that selects, from the characteristic parameter DB 106, speech-unit data that represents the loan word attribute and the acoustic characteristics similar to the predicted loan word attribute and acoustic characteristics of each speech-unit; and a speech synthesis unit 110 that generates synthesized speech using a plurality of the selected speech-units and outputs the synthesized speech.

Citations

18 Claims

1. A speech synthesis apparatus that obtains text data and converts text indicated by the text data into speech, comprising:
- a storage unit operable to previously store, with respect to each speech-unit, speech-unit data that represents (i) a loan word attribute indicating whether or not a speech-unit belongs to a class of loan words and (ii) an acoustic characteristic of the speech-unit;
  
  a characteristic prediction unit operable to obtain text data and predict, with respect to each of a plurality of speech-units that form text indicated by the text data, a loan word attribute and an acoustic characteristic;
  
  a selection unit operable to select speech-unit data that represents a loan word attribute and an acoustic characteristic similar to the loan word attribute and the acoustic characteristic of each speech-unit predicted by the characteristic prediction unit, from among the speech-unit data stored in the storage unit; and
  
  a speech output unit operable to generate synthesized speech using a plurality of the speech-unit data selected by the selection unit and output the synthesized speech.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The speech synthesis apparatus according to claim 1, wherein when the characteristic prediction unit predicts the loan word attribute indicating that a speech-unit belongs to the class of loan words, the selection unit preferentially selects speech-unit data that represents the loan word attribute indicating that a speech-unit belongs to the class of loan words.
  - 3. The speech synthesis apparatus according to claim 1, wherein each speech-unit data further represents a final particle attribute indicating whether or not the speech-unit belongs to a class of final particles, the characteristic prediction unit predicts, with respect to each of a plurality of speech-units that form the text indicated by the text data, the loan word attribute, the acoustic characteristic and a final particle attribute, and the selection unit selects speech-unit data that represents a loan word attribute, an acoustic characteristic and a final particle attribute similar to the loan word attribute, the acoustic characteristic and the final particle attribute of the speech-unit predicted by the characteristic prediction unit, from among the speech-unit data stored in the storage unit.
  - 4. The speech synthesis apparatus according to claim 3, wherein when the characteristic prediction unit predicts the final particle attribute indicating that the speech-unit belongs to the class of final particles, the selection unit preferentially selects speech-unit data that represents the final particle attribute indicating that a speech-unit belongs to the class of final particles.
  - 5. The speech synthesis apparatus according to claim 3, wherein the acoustic characteristic indicates at least one of a duration, a fundamental frequency and a power of a speech-unit.
  - 6. The speech synthesis apparatus according to claim 5, wherein each speech-unit data further represents a phonetic environment to which the speech-unit belong to, syntax information relating to a syntax of the speech-unit and accent phrase information relating to an accent phrase of the speech-unit, the characteristic prediction unit predicts, with respect to each of a plurality of speech-units that form the text indicated by the text data, the loan word attribute, the acoustic characteristic, the final particle attribute, phonetic environment, syntax information and accent phrase information, and the selection unit selects speech-unit data that represents a loan word attribute, an acoustic characteristic, a final particle attribute, a phonetic environment, syntax information and accent phrase information similar to the loan word attribute, the acoustic characteristic, the final particle attribute, the phonetic environment, the syntax information and the accent phrase information of the speech-unit predicted by the characteristic prediction unit, from among the speech-unit data stored in the storage unit.
  - 7. The speech synthesis apparatus according to claim 1, wherein the selection unit includes:
    - a first calculation unit operable to calculate a first sub-cost by quantitatively evaluating a similarity level between the loan word attribute of the speech-unit predicted by the characteristic prediction unit and the loan word attribute of the speech-unit data stored in the storage unit;
      
      a second calculation unit operable to calculate a second sub-cost by quantitatively evaluating a similarity level between the acoustic characteristic of the speech-unit predicted by the characteristic prediction unit and the acoustic characteristic of the speech-unit data stored in the storage unit;
      
      a cost calculation unit operable to calculate a cost using the first and second sub-costs calculated by the first and second calculation units; and
      
      a data selection unit operable to select speech-unit data from among the speech-unit data stored in the storage unit, based on the cost calculated by the cost calculation unit.
  - 8. The speech synthesis apparatus according to claim 7, wherein the cost calculation unit calculates the cost by assigning weights to the first and second sub-costs calculated by the first and second calculation units and adding up the weighted first and second sub-costs.
  - 9. The speech synthesis apparatus according to claim 8, further comprising a weight determination unit operable to specify a confidence level of the acoustic characteristic predicted by the characteristic prediction unit and determine the weights to be assigned to the first and second sub-costs depending on the confidence level, and the cost calculation unit assigns the weights determined by the weight determination unit to the first and second sub-costs.
  - 10. The speech synthesis apparatus according to claim 9, wherein when the confidence level of the acoustic characteristic is low, the weight determination unit determines the weights to be assigned to the first and second sub-costs so that the similarity level between the loan word attributes is more influential in the selection of the speech-unit data by the data selection unit than the similarity level between the acoustic characteristics.
  - 11. The speech synthesis apparatus according to claim 10, wherein the selection unit further include a third calculation unit operable to calculate a concatenation cost by quantitatively evaluating an acoustic distortion that occurs when a plurality of speech-unit data stored in the storage unit are concatenated, and the cost calculation unit calculates the cost using the first and second sub-costs calculated by the first and second calculation units and the concatenation cost calculated by the third calculation unit.

12. A speech synthesis method for obtaining text data and converting text indicated by the text data into speech using data stored in a storage unit, wherein the storage unit previously stores, with respect to each speech-unit, speech-unit data that represents (i) a loan word attribute indicating whether or not a speech-unit belongs to a class of loan words and (ii) an acoustic characteristic of the speech-unit, and the method comprises:
- obtaining text data and predicting, with respect to each of a plurality of speech-units that form text indicated by the text data, a loan word attribute and an acoustic characteristic of the speech-unit;
  
  selecting speech-unit data that represents a loan word attribute and an acoustic characteristic similar to the predicted loan word attribute and acoustic characteristic of each speech-unit, from among the speech-unit data stored in the storage unit; and
  
  generating synthesized speech using a plurality of the selected speech-unit data and outputting the synthesized speech.
- View Dependent Claims (13)
- - 13. The speech synthesis method according to claim 12, wherein each speech-unit data further represents a final particle attribute indicating whether or not the speech-unit belongs to a class of final particles, in the predicting, the loan word attribute, the acoustic characteristic and a final particle attribute are predicted with respect to each of a plurality of speech-units that form the text indicated by the text data, and in the selecting, speech-unit data that represents a loan word attribute, an acoustic characteristic and a final particle attribute similar to the predicted loan word attribute, acoustic characteristic and final particle attribute is selected from among the speech-unit data stored in the storage unit.

14. A program for obtaining text data and converting text indicated by the text data into speech using data stored in a storage unit, wherein the storage unit previously stores, with respect to each speech-unit, speech-unit data that represents (i) a loan word attribute indicating whether or not a speech-unit belongs to a class of loan words and (ii) an acoustic characteristic of the speech-unit, and the program causes a computer to execute:
- obtaining text data and predicting, with respect to each of a plurality of speech-units that form text indicated by the text data, a loan word attribute and an acoustic characteristic of the speech-unit;
  
  selecting speech-unit data a loan word attribute and an acoustic characteristic similar to the predicted loan word attribute and acoustic characteristic of each speech-unit, from among the speech-unit data stored in the storage unit; and
  
  generating synthesized speech using a plurality of the selected speech-unit data and outputting the synthesized speech.

15. A data creation apparatus that creates speech-unit data to be used for speech synthesis, comprising:
- a speech storage unit operable to store a speech waveform signal that represents speech in a waveform;
  
  a text storage unit operable to store text data indicating text that corresponds to the speech represented by the speech waveform signal;
  
  a language analysis unit operable to obtain text data from the text storage unit, divide text indicated by the text data into speech-units, and analyze a loan word attribute of each speech-unit indicating whether or not the speech-unit belongs to a class of loan words;
  
  an acoustic analysis unit operable to obtain a speech waveform signal from the speech storage unit, divide the speech represented by the speech waveform signal into speech-units, and analyze an acoustic characteristic of each speech-unit; and
  
  a creation unit operable to create speech-unit data of each speech-unit so that said speech-unit data indicates the loan word attribute as analyzed by the language analysis unit and the acoustic characteristic as analyzed by the acoustic analysis unit, and store the created speech-unit data into a memory.
- View Dependent Claims (16, 17)
- - 16. The data creation apparatus according to claim 15, wherein the language analysis unit further analyzes a final particle attribute indicating whether or not each speech-unit belongs to a class of final particles, and the creation unit creates the speech-unit data of each speech data so that said speech-unit data indicates the loan word attribute and the final particle attribute as analyzed by the language analysis unit and the acoustic characteristic as analyzed by the acoustic analysis unit.
  - 17. The data creation apparatus according to claim 16, wherein the acoustic characteristic indicates at least one of a duration, a fundamental frequency and a power of a speech-unit.

18. A data creation method for creating speech-unit data to be used for speech synthesis using data stored in a storage unit, wherein the storage unit previously stores a speech waveform signal that represents speech in a waveform and text data indicating text that corresponds to the speech represented by the speech waveform signal, and the method comprises:
- obtaining text data from the text storage unit, dividing text indicated by the text data into speech-units, and analyzing a loan word attribute of each speech-unit indicating whether or not the speech-unit belongs to a class of loan words;
  
  obtaining a speech waveform signal from the speech storage unit, dividing the speech represented by the speech waveform signal into speech-units, and analyzing an acoustic characteristic of each speech-unit; and
  
  creating speech-unit data of each speech-unit so that said speech-unit data indicates the analyzed loan word attribute and acoustic characteristic, and storing the created speech-unit data into a memory.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Hirose, Yoshifumi

Application Number

US10/998,035
Publication Number

US 20050119890A1
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/08 Text analysis or generation...

Speech synthesis apparatus and speech synthesis method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis apparatus and speech synthesis method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links