Speech synthesis apparatus and method

US 6,212,501 B1
Filed: 07/13/1998
Issued: 04/03/2001
Est. Priority Date: 07/14/1997
Status: Expired due to Fees

First Claim

Patent Images

1. A speech synthesis apparatus comprising:

means for storing, for each exemplary text segment containing a fixed form portion having a fixed text segment and an unfixed form portion on which an arbitrary text segment can be specified by a user, exemplary text segment data including context information relating to the fixed form portion to be connected with the unfixed form portion and parameter data obtained by analyzing a speech corresponding to the fixed form portion;

means, responsive to an instruction by a user, for selecting data from among the exemplary text segment data and inputting a text segment corresponding to the unfixed form portion of the selected exemplary text segment data;

means for generating parameter data of at least the unfixed form portion on the basis of the inputted text segment of the unfixed form portion and corresponding context information; and

means for concatenating the generated parameter data of the unfixed form portion to the stored parameter data of the fixed form portion, and generating synthesized speech from the concatenated parameter data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A text segment selection unit extracts parameters of exemplary text segment of a user'"'"'s choice and a fixed form portion in the exemplary text segment from an exemplary text segment database. A text segment input unit inputs a text segment of a user'"'"'s choice to e embedded to an unfixed form portion in the exemplary text segment. A text segment generation unit concatenates the input text segment to the text segment of the fixed form portion. A parameter generation unit generates a parameter from the concatenated text segment. A parameter extraction unit extracts the parameter of the unfixed form portion from the generated parameter. A parameter embedding unit concatenates the parameter of the unfixed form portion to the parameter of the fixed form portion to generate a parameter for speech synthesis. A synthesis unit generates synthesized speech from this parameter. With this arrangement, more natural synthesis can be realized without any sense of incongruous prosody between the synthesis-by-rule portion and the analysis portion.

19 Citations

View as Search Results

20 Claims

1. A speech synthesis apparatus comprising:
- means for storing, for each exemplary text segment containing a fixed form portion having a fixed text segment and an unfixed form portion on which an arbitrary text segment can be specified by a user, exemplary text segment data including context information relating to the fixed form portion to be connected with the unfixed form portion and parameter data obtained by analyzing a speech corresponding to the fixed form portion;
  
  means, responsive to an instruction by a user, for selecting data from among the exemplary text segment data and inputting a text segment corresponding to the unfixed form portion of the selected exemplary text segment data;
  
  means for generating parameter data of at least the unfixed form portion on the basis of the inputted text segment of the unfixed form portion and corresponding context information; and
  
  means for concatenating the generated parameter data of the unfixed form portion to the stored parameter data of the fixed form portion, and generating synthesized speech from the concatenated parameter data.
- View Dependent Claims (2, 3, 4, 5)
- - 2. An apparatus according to claim 1, wherein the parameter data obtained by analysis is constituted by a phonetic string and prosodic information.
  - 3. An apparatus according to claim 1, wherein the exemplary text segment data further includes positional information of the unfixed form portion in the exemplary text segment.
  - 4. An apparatus according to claim 1, wherein a pitch of the unfixed form portion is shifted to substantially equal to that of the fixed form portion on their concatenated point in generating the parameter data of the unfixed form portion or generating the synthesized speech.
  - 5. An apparatus according to claim 1, wherein in a case where a pause period is provided between the unfixed form portion and the fixed form portion, the pause period is adjusted in generating the synthesized speech.

6. A speech synthesis apparatus comprising:
- means for storing, for each exemplary text segment containing a fixed form portion having a fixed text segment and an unfixed form portion on which an arbitrary text segment can be specified by a user, exemplary text segment data including context connected with the unfixed form portion and speech waveform data of the fixed form portion;
  
  information relating to the fixed form portion to be means, responsive to an instruction by a user, for selecting data from among the exemplary text segment data and inputting a text segment corresponding to the unfixed form portion of the selected exemplary text segment data;
  
  means for generating parameter data of at least the unfixed form portion on the basis of the inputted text segment of the unfixed form portion and corresponding context information, and generating synthesized speech from the generated parameter data; and
  
  means for concatenating speech waveform data of the generated synthesized speech of the unfixed form portion to the stored speech waveform data of the fixed form portion, and generating synthesized speech from the concatenated speech waveform data.
- View Dependent Claims (7, 8, 9)
- - 7. An apparatus according to claim 6, wherein the exemplary text segment data further includes positional information of the unfixed form portion in the exemplary text segment.
  - 8. An apparatus according to claim 6, wherein a pitch of the unfixed form portion is shifted to substantially equal to that of the fixed form portion on their concatenated point in generating the parameter data of the unfixed form portion or generating the synthesized speech.
  - 9. An apparatus according to claim 6, wherein a waveform phase of the unfixed form portion is adjusted to substantially equal to that of the fixed form portion on their concatenated point in generating the synthesized speech.

10. A speech synthesis method comprising the steps of:
- providing a database for storing, for each exemplary text segment containing a fixed form portion having a fixed text segment and an unfixed form portion on which an arbitrary text segment can be specified by a user, exemplary text segment data including context information relating to the fixed form portion to be connected with the unfixed form portion and parameter data obtained by analyzing a speech corresponding to the fixed form portion;
  
  in response to an instruction by a user, selecting data from among the exemplary text segment data and inputting a text segment corresponding to the unfixed form portion of the selected exemplary text segment data;
  
  generating parameter data of at least the unfixed form portion on the basis of the inputted text segment of the unfixed form portion and corresponding context information; and
  
  concatenating the generated parameter data of the unfixed form portion to the stored parameter data of the fixed form portion, and generating synthesized speech from the concatenated parameter data.
- View Dependent Claims (11, 12, 13, 14)
- - 11. A method according to claim 10, wherein the parameter data obtained by analysis is constituted by a phonetic string and prosodic information.
  - 12. A method according to claim 10, wherein the exemplary text segment data further includes positional information of the unfixed form portion in the exemplary text segment.
  - 13. A method according to claim 10, wherein a pitch of the unfixed form portion is shifted to substantially equal to that of the fixed form portion on their concatenated point in generating the parameter data of the unfixed form portion or generating the synthesized speech.
  - 14. A method according to claim 10, wherein in a case where a pause period is provided between the unfixed form portion and the fixed form portion, the pause period is adjusted in generating the synthesized speech.

15. A speech synthesis method comprising the steps of:
- providing a database for storing, for each exemplary text segment containing a fixed form portion having a fixed text segment and an unfixed form portion on which an arbitrary text segment can be specified by a user, exemplary text segment data including context information relating to the fixed form portion to be connected with the unfixed form portion and speech waveform data of the fixed form portion;
  
  in response to an instruction by a user, selecting data from among the exemplary text segment data and inputting a text segment corresponding to the unfixed form portion of the selected exemplary text segment data;
  
  generating parameter data of at least the unfixed form portion on the basis of the inputted text segment of the unfixed form portion and corresponding context information, and generating synthesized speech from the generated parameter data; and
  
  concatenating speech waveform data of the generated synthesized speech of the unfixed form portion to the stored speech waveform data of the fixed form portion, and generating synthesized speech from the concatenated speech waveform data.
- View Dependent Claims (16, 17, 18)
- - 16. A method according to claim 15, wherein the exemplary text segment data further includes positional information of the unfixed form portion in the exemplary text segment.
  - 17. A method according to claim 15, wherein a pitch of the unfixed form portion is shifted to substantially equal to that of the fixed form portion on their concatenated point in generating the parameter data of the unfixed form portion or generating the synthesized speech.
  - 18. A method according to claim 15, wherein a waveform phase of the unfixed form portion is adjusted to substantially equal to that of the fixed form portion on their concatenated point in generating the synthesized speech.

19. A storage medium storing computer-executable program code for performing speech synthesis, the program code comprising:
- means for causing a computer to store on a database, for each exemplary text segment containing a fixed form portion having a fixed text segment and an unfixed form portion on which an arbitrary text segment can be specified by a user, exemplary text segment data including context information relating to the fixed form portion to be connected with the unfixed form portion and parameter data obtained by analyzing a speech corresponding to the fixed form portion;
  
  means for causing a computer to select data from among the exemplary text segment data and inputting a text segment corresponding to the unfixed form portion of the selected exemplary text segment data in response to an instruction by a user;
  
  means for causing a computer to generate parameter data of at least the unfixed form portion on the basis of the inputted text segment of the unfixed form portion and corresponding context information; and
  
  means for causing a computer to concatenate the generated parameter data of the unfixed form portion to the stored parameter data of the fixed form portion, and generate synthesized speech from the concatenated parameter data.

20. A storage medium storing computer-executable program code for performing speech synthesis, the program code comprising;
- means for causing a computer to store on a database, for each exemplary text segment containing a fixed form portion having a fixed text segment and an unfixed form portion on which an arbitrary text segment can be specified by a user, exemplary text segment data including context information relating to the fixed form portion to be connected with the unfixed form portion and speech waveform data of the fixed form portion;
  
  means for causing a computer to select data from among the exemplary text segment data and inputting a text segment corresponding to the unfixed form portion of the selected exemplary text segment data in response to an instruction by a user;
  
  means for causing a computer to generate parameter data of at least the unfixed form portion on the basis of the inputted text segment of the unfixed form portion and corresponding context information, and generate synthesized speech from the generated parameter data; and
  
  means for causing a computer to concatenate speech waveform data of the generated synthesized speech of the unfixed form portion to the stored speech waveform data of the fixed form portion, and generate synthesized speech from the concatenated speech waveform data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Kaseno, Osamu
Primary Examiner(s)
Hudspeth, David
Assistant Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US09/114,150
Time in Patent Office

995 Days
Field of Search

704/260, 704/256, 704/258, 704/270
US Class Current

704/258
CPC Class Codes

G10L 13/08 Text analysis or generation...

Speech synthesis apparatus and method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

19 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Speech synthesis apparatus and method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

19 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others