Speech synthesizer

US 20080243511A1
Filed: 10/22/2007
Published: 10/02/2008
Est. Priority Date: 10/24/2006
Status: Active Grant

First Claim

Patent Images

1. A speech synthesizer that synthesizes text including a fixed part and a variable part, comprising:

a recorded speech database that previously stores first speech data being speech data including the fixed part, generated based on recorded speech;

a rule-based synthesizer that generates second speech data including the variable part and at least part of the fixed part from the received text;

a concatenation boundary calculator that selects the position of a concatenation boundary between the recorded speech data and speech data generated by rule-based synthesis, based on acoustic characteristics of a region in which the first speech data and the second speech data that correspond to the text overlap; and

a concatenative synthesizer that synthesizes speech data of the text by concatenating third speech data produced by separating the first speech data in the concatenation boundary, and fourth speech data segmented by separating the second speech data in the concatenation boundary.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention is a speech synthesizer that generates speech data of text including a fixed part and a variable part, in combination with recorded speech and rule-based synthetic speech. The speech synthesizer is a high-quality one in which recorded speech and synthetic speech are concatenated with the discontinuity of timbres and prosodies not perceived. The speech synthesizer includes: a recorded speech database that previously stores recorded speech data including a recorded fixed part; a rule-based synthesizer that generates rule-based synthetic speech data including a variable part and at least part of the fixed part, from received text; a concatenation boundary calculator that a concatenation boundary position in a region in which the recorded speech data and the rule-based synthetic speech data overlap, based on acoustic characteristics of the recorded speech data and the rule-based synthetic speech data that correspond to the text; a concatenative synthesizer that generates synthetic speech data corresponding to the text by concatenating the recorded speech data and the rule-based synthetic speech data that are segmented in the concatenation boundary position.

Citations

21 Claims

1. A speech synthesizer that synthesizes text including a fixed part and a variable part, comprising:
- a recorded speech database that previously stores first speech data being speech data including the fixed part, generated based on recorded speech;
  
  a rule-based synthesizer that generates second speech data including the variable part and at least part of the fixed part from the received text;
  
  a concatenation boundary calculator that selects the position of a concatenation boundary between the recorded speech data and speech data generated by rule-based synthesis, based on acoustic characteristics of a region in which the first speech data and the second speech data that correspond to the text overlap; and
  
  a concatenative synthesizer that synthesizes speech data of the text by concatenating third speech data produced by separating the first speech data in the concatenation boundary, and fourth speech data segmented by separating the second speech data in the concatenation boundary.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The speech synthesizer according to claim 1,wherein the rule-based synthesizer uses acoustic characteristics of the first speech data in a region in which the first speech data and the second speech data that correspond to the text overlap, to generate the second speech data matching the first speech data.
  - 3. The speech synthesizer according to claim 1,wherein the rule-based synthesizer processes the second speech data, based on the acoustic characteristics of the first speech data and the second speech data in the position of a concatenation boundary obtained from the concatenation boundary calculator.
  - 4. The speech synthesizer according to any one of claims 1 to 3,wherein as the acoustic characteristics, at least one of phoneme class, fundamental frequency, phonemic duration, power, and spectrum is used.
  - 5. The speech synthesizer according to claim 1,wherein the rule-based synthesizer generates the second speech data in any unit of the whole of the fixed part, one breath group, and one sentence, of the variable part and the fixed part preceding or following the variable part.
  - 6. The speech synthesizer according to claim 1,wherein the concatenation boundary calculator makes selection from among plural phoneme boundaries contained in an overlap region between the first speech data and the second speech data.
  - 7. The speech synthesizer according to claim 1 or 2,wherein the recorded speech database stores speech data previously recorded in the unit of one breath group or one sentence that includes the fixed part and at least part of other than the fixed part, as the first speech data.
  - 8. The speech synthesizer according to claim 1 or 2,wherein the concatenation boundary position is calculated as time in the first speech data and time in the second speech data, and the speech data is cut off and concatenated using the calculated times.
  - 9. The speech synthesizer according to claim 1 or 2,wherein a means that outputs the speech data synthesized by the concatenative synthesizer is provided.

10. A speech synthesizer that synthesizes text including a fixed part and a variable part, comprising:
- a recorded speech database that previously stores recorded speech data including the recorded fixed part;
  
  a rule-based synthesizer that generates rule-based synthetic speech data including the variable part and at least part of the fixed part from the received text;
  
  a concatenation boundary calculator that calculates a concatenation boundary position in a region in which the recorded speech data and the rule-based synthetic speech data overlap, based on acoustic characteristics of the recorded speech data and the rule-based synthetic speech data that correspond to the text; and
  
  a concatenative synthesizer that concatenates the recorded speech data and the rule-based synthetic speech data that are segmented in the concatenation boundary position, to generate synthetic speech data corresponding to the text.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The speech synthesizer according to claim 10,wherein the concatenative synthesizer uses acoustic characteristics of the recorded speech data in a region in which the recorded speech data and the rule-based synthetic speech data that correspond to the text overlap, to generate the rule-based synthetic speech data matching the recorded speech data.
  - 12. The speech synthesizer according to claim 10,wherein the rule-based synthesizer processes the rule-based synthetic speech data, based on the acoustic characteristics of the recorded speech data and the rule-based synthetic speech data in the position of a concatenation boundary obtained from the concatenation boundary calculator.
  - 13. The speech synthesizer according to any one of claims 10 to 12,wherein as the acoustic characteristics, at least one of phoneme class, fundamental frequency, phonemic duration, power, and spectrum is used.
  - 14. The speech synthesizer according to claim 10,wherein the rule-based synthesizer generates the second speech data in any unit of the whole of the fixed part, one breath group, and one sentence, of the variable part and the fixed part preceding or following the variable part.
  - 15. The speech synthesizer according to claim 10,wherein the concatenation boundary calculator makes selection from among plural phoneme boundaries contained in an overlap region between the recorded speech data and the rule-based synthetic speech data.
  - 16. The speech synthesizer according to claim 10 or 11,wherein the recorded speech database stores speech data previously recorded in the unit of one breath group or one sentence that includes the fixed part and at least part of other than the fixed part, as the recorded speech data.
  - 17. The speech synthesizer according to claim 10 or 11,wherein the concatenation boundary position is calculated as time in the recorded speech data and time in the rule-based synthetic speech data, and the speech data is cut off and concatenated using the calculated times.
  - 18. The speech synthesizer according to claim 10 or 11,wherein a means that outputs the synthetic speech data generated by the concatenative synthesizer is provided.

19. A speech synthesizer that synthesizes text including a fixed part and a variable part, comprising:
- a recorded speech database that previously stores recorded speech data including the recorded fixed part;
  
  a rule-based synthetic parameter calculator that calculates rule-based synthetic parameters including the variable part and at least part of the fixed part from the received text to generate acoustic characteristics of rule-based synthetic speech;
  
  a concatenation boundary calculator that calculates a concatenation boundary position in a region in which the recorded speech data and the rule-based synthetic speech data overlap, using acoustic characteristics of the recorded speech data and acoustic characteristics of the rule-based synthetic speech data;
  
  a rule-based synthetic speech data part that generates rule-based synthetic speech data by using acoustic characteristics of the recorded speech, acoustic characteristics of the rule-based synthetic speech, and the concatenation boundary position;
  
  a concatenative synthesizer that concatenates the recorded speech data and the rule-based synthetic speech data that are segmented in the concatenation boundary position, to generate synthetic speech data corresponding to the text; and
  
  a means that outputs the synthetic speech data.

20. A speech synthesizer that creates synthetic speech by concatenating a speech block including a variable part and a speech block including a fixed part, previously recorded, comprising:
- a recorded speech database that stores speech data including the speech blocks previously recorded;
  
  an input parser that generates intermediate code of a speech block of the variable part, and intermediate code of a speech block of the fixed part, from received input text;
  
  a recorded speech selector that selects appropriate recorded speech data from among plural recorded speech data having the same fixed part according to the input of the variable part;
  
  a rule-based synthesizer that uses intermediate code of a speech block of the variable part obtained by the input parser, and intermediate code of a speech block of the fixed part that are obtained in the input parser to determine the range of generating rule-based synthetic speech data;
  
  a concatenation boundary calculator that calculates a concatenation boundary position in a region in which the recorded speech data and the rule-based synthetic speech data overlap, using acoustic characteristics of the recorded speech data and acoustic characteristics of the rule-based synthetic speech data;
  
  a concatenative synthesizer that uses the concatenation boundary position obtained from the concatenation boundary calculator to cut off the recorded speech data and the rule-based synthetic speech data, and generates synthetic speech data corresponding to a speech block including the variable part by concatenating the recorded speech data and the rule-based synthetic speech data that are cut off; and
  
  a speech block concatenator that concatenates speech blocks, based on the order of speech blocks obtained from the input text, and creates output speech.

21. A speech synthesizing method comprising:
- a first step of previously storing recorded speech data and first intermediate code corresponding to the recorded speech data to prepare for input text;
  
  a second step of converting the input text into second intermediate code;
  
  a third step of referring to the first intermediate code to distinguish the second intermediate code into a fixed part corresponding to the first intermediate code and a variable part not corresponding to it;
  
  a fourth step of acquiring a part of the first intermediate code that corresponds to the fixed part, from the recorded speech data;
  
  a fifth step of using the second intermediate code to generate rule-based synthetic speech data of the whole of a part corresponding to the variable part and at least part of a part corresponding to the fixed part; and
  
  a sixth step of concatenating the acquired part of the recorded speech data and part of the generated rule-based synthetic speech data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hitachi, Ltd.
Original Assignee
Hitachi, Ltd.
Inventors
Kamoshida, Ryota, Nagamatsu, Kenji, Fujita, Yusuke

Granted Patent

US 7,991,616 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/047 Architecture of speech synt...

Speech synthesizer

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesizer

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links