Speech synthesis with prosodic model data and accent type

US 6,778,962 B1
Filed: 07/21/2000
Issued: 08/17/2004
Est. Priority Date: 07/23/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A speech synthesis method of creating voice message data corresponding to an input character string, comprising the steps of:

using (a) a word dictionary that stores a large number of character strings having at least one character with its accent type, (b) a prosody dictionary that stores typical prosodic model data among prosodic model data representing the prosodic information for the character strings stored in said word dictionary, and (c) a waveform dictionary that stores voice waveform data of a composition unit with a recorded voice;

determining the accent type of the input character string;

selecting the prosodic model data from said prosody dictionary, based on the input character string and the accent type;

transforming the prosodic information of said prosodic model data in accordance with the input character string in response to the character string of the selected prosodic model data not being coincident with the input character string;

selecting the waveform data corresponding to each character of the input character string from the waveform dictionary, based on the prosodic model data;

connecting the selected waveform data with each other;

storing the prosodic model data including the character string, a mora number, the accent type, and syllabic information in said prosody dictionary;

creating the syllabic information of an input character string;

providing a prosodic model candidate by extracting the prosodic model data having the mora number and accent type coincident to that of the input character string from said prosody dictionary;

creating prosodic reconstructed information by comparing the syllabic information of each prosodic model data candidate and the syllabic information of the input character string; and

selecting an optimal prosodic model data based on the character string of each prosodic model data candidate and the prosodic reconstructed information thereof.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesizing method includes determining the accent type of the input character string, selecting the prosodic model data from a prosody dictionary for storing typical ones of the prosodic models representing the prosodic information for the character strings in a word dictionary, based on the input character string and the accent type, transforming the prosodic information of the prosodic model when the character string of the selected prosodic model is not coincident with the input character string, selecting the waveform data corresponding to each character of the input character string from a waveform dictionary, based on the prosodic model data after transformation, and connecting the selected waveform data with each other. Therefore, a difference between an input character string and a character string stored in a dictionary is absorbed, then it is possible to synthesize a natural voice.

Citations

19 Claims

1. A speech synthesis method of creating voice message data corresponding to an input character string, comprising the steps of:
- using (a) a word dictionary that stores a large number of character strings having at least one character with its accent type, (b) a prosody dictionary that stores typical prosodic model data among prosodic model data representing the prosodic information for the character strings stored in said word dictionary, and (c) a waveform dictionary that stores voice waveform data of a composition unit with a recorded voice;
  
  determining the accent type of the input character string;
  
  selecting the prosodic model data from said prosody dictionary, based on the input character string and the accent type;
  
  transforming the prosodic information of said prosodic model data in accordance with the input character string in response to the character string of the selected prosodic model data not being coincident with the input character string;
  
  selecting the waveform data corresponding to each character of the input character string from the waveform dictionary, based on the prosodic model data;
  
  connecting the selected waveform data with each other;
  
  storing the prosodic model data including the character string, a mora number, the accent type, and syllabic information in said prosody dictionary;
  
  creating the syllabic information of an input character string;
  
  providing a prosodic model candidate by extracting the prosodic model data having the mora number and accent type coincident to that of the input character string from said prosody dictionary;
  
  creating prosodic reconstructed information by comparing the syllabic information of each prosodic model data candidate and the syllabic information of the input character string; and
  
  selecting an optimal prosodic model data based on the character string of each prosodic model data candidate and the prosodic reconstructed information thereof.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The speech synthesis method according to claim 1, wherein:
3. Apparatus for performing the method of claim 2.
4. The speech synthesis method according to claim 1, further including obtaining the syllable length after transformation from the average syllable length calculated ahead for all the characters used in the speech synthesis and the syllable length in said prosodic model data for every character not coincident among the prosodic model data in response to the character string of said selected prosodic model data not being coincident with the input character string.
5. Apparatus for performing the method of claim 4.
6. Apparatus for performing the method of claim 1.

7. A speech synthesis method of creating voice message data corresponding to an input character string, comprising the steps of:
- using (a) a word dictionary that stores a large number of character strings having at least one character with its accent type, (b) a prosody dictionary that stores typical prosodic model data among prosodic model data representing the prosodic information for the character strings stored in said word dictionary, and (c) a waveform dictionary that stores voice waveform data of a composition unit with a recorded voice;
  
  determining the accent type of the input character string;
  
  selecting the prosodic model data from said prosody dictionary, based on the input character string and the accent type;
  
  transforming the prosodic information of said prosodic model data in accordance with the input character string in response to the character string of the selected prosodic model data not being coincident with the input character string;
  
  selecting the waveform data corresponding to each character of the input character string from the waveform dictionary, based on the prosodic model data;
  
  selecting the waveform data of a pertinent phoneme in the prosodic model data from the waveform dictionary, the pertinent phoneme having a position and phoneme coincident with those of the prosodic model data for each phoneme making up an input character string; and
  
  selecting the waveform data of a corresponding phoneme having the frequency closest to that of the prosodic model data from said waveform dictionary for other phonemes.
- View Dependent Claims (8, 9)
- - 8. The speech synthesis method according to claim 7, further including obtaining the syllable length after transformation from the average syllable length calculated ahead for all the characters for use in the voice synthesis and the syllable length in said prosodic model data for every character not coincident among the prosodic model data in response to the character string of said selected prosodic model data not being coincident with the input character string.
  - 9. Apparatus for performing the method of claim 7.

10. A speech synthesis apparatus for creating voice message data corresponding to an input character string, comprising:
- a word dictionary storing a large number of character strings including at least one character with its accent type;
  
  a prosody dictionary storing typical prosodic model data among prosodic model data representing prosodic information for the character strings stored in said word dictionary, said prosody dictionary including the character string, mora number, accent type, and syllabic information;
  
  a waveform dictionary storing voice waveform data of a composition unit with a recorded voice;
  
  accent type determining means for determining the accent type of the input character string;
  
  prosodic model selecting means for selecting the prosodic model data from said prosody dictionary, based on the input character string and the accent type;
  
  prosodic transforming means for transforming the prosodic information of the prosodic model data in accordance with the input character string in response to the character string of said selected prosodic model data not being coincident with the input character string;
  
  waveform selecting means for selecting the waveform data corresponding to each character of the input character string from said waveform dictionary, based on the prosodic model data;
  
  waveform connecting means for connecting the selected waveform data with each other; and
  
  prosodic model selecting means for;
  
  creating the syllabic information of an input character string, extracting the prosodic model data having the mora number and accent type coincident to those of the input character string from said prosody dictionary to provide a prosodic model candidate, creating prosodic reconstructed information by comparing the syllabic information of each prosodic model data candidate and the syllabic information of the input character string, and selecting an optimal prosodic model data based on the character string of each prosodic model data candidate and the prosodic reconstructed information thereof.
- View Dependent Claims (11, 12)
- - 11. The speech synthesis apparatus according to claim 10, wherein the prosodic model selecting means is arranged so that:
12. The speech synthesis apparatus according to claim 10, further comprising prosody transforming means arranged to be responsive to the character string of said selected prosodic model data not being coincident with the input character string, for obtaining the syllable length after transformation from the average syllable length calculated ahead for all the characters for use in the speech synthesis and the syllable length in said prosodic model data for each character not coincident among the prosodic model data.

13. A speech synthesis apparatus for creating voice message data corresponding to an input character string, comprising:
- a word dictionary storing a large number of character strings including at least one character having an accent type;
  
  a prosody dictionary storing typical prosodic model data among prosodic model data representing prosodic information for the character strings stored in said word dictionary;
  
  a waveform dictionary storing voice waveform data of a composition unit with a recorded voice;
  
  accent type determining means for determining the accent type of the input character string;
  
  prosodic model selecting means for selecting the prosodic model data from said prosody dictionary, based on the input character string and the accent type;
  
  prosodic transforming means for transforming the prosodic information of the prosodic model data in accordance with the input character string in response to the character string of said selected prosodic model data not being coincident with the input character string;
  
  waveform selecting means for;
  
  selecting the waveform data corresponding to each character of the input character string from said waveform dictionary, based on the prosodic model data, selecting the waveform data of a pertinent phoneme in the prosodic model data from said waveform dictionary, the pertinent phoneme having a position and phoneme coincident with those of the prosodic model data for each phoneme making up an input character string, and selecting the waveform data of a phoneme having the frequency closest to that of the prosodic model data from said waveform dictionary for other phonemes; and
  
  waveform connecting means for connecting the selected waveform data with each other.
- View Dependent Claims (14)
- - 14. The speech synthesis apparatus according to claim 13, further comprising prosody transforming means for obtaining the syllable length after transformation is obtained from the average syllable length calculated ahead for all the characters for use in the voice synthesis and the syllable length in said prosodic model data for each character not coincident among the prosodic model data in response to the character string of said selected prosodic model data not being coincident with the input character string.

15. A computer-readable medium having stored thereon a speech synthesis program, wherein said program, when read by a computer, enables the computer to operate as:
- a word dictionary for storing a large number of character strings including at least one character with its accent type;
  
  a prosody dictionary for storing typical prosodic model data among prosodic model data representing prosodic information for the character strings stored in said word dictionary, said prosody dictionary including the character string, a mora number, accent type, and syllabic information; and
  
  a waveform dictionary for storing the voice waveform data of a composition unit with a recorded voice;
  
  accent type determining means for determining the accent type of an input character string;
  
  prosodic model selecting means for;
  
  selecting the prosodic model data from said prosody dictionary, based on the input character string and the accent type, and creating the syllabic information of the input character string, extracting the prosodic model data having the mora number and accent type coincident to those of the input character string from said prosody dictionary to provide a prosodic model candidate, creating prosodic reconstructed information by comparing the syllabic information of each prosodic model data candidate and the syllabic information of the input character string, and selecting optimal prosodic model data based on the character string of each prosodic model data and the prosodic reconstructed information thereof;
  
  prosodic transforming means for transforming the prosodic information of said prosodic model data in accordance with the input character string in response to the character string of said selected prosodic model data not being coincident with the input character string;
  
  waveform selecting means for selecting the waveform data corresponding to each character of the input character string from said waveform dictionary, based on the prosodic model data; and
  
  waveform connecting means for connecting said selected waveform data with each other.
- View Dependent Claims (16, 17)
- - 16. The computer-readable medium according to claim 15, wherein the program enables the computer to perform the following steps:
17. The computer-readable medium according to claim 15, wherein said speech synthesis program further enables the computer to operate as prosody transforming means for obtaining the syllable length after transformation from the average syllable length calculated ahead for all the characters for use in the voice synthesis and the syllable length in said prosodic model data for each character not coincident among the prosodic model data in response to the character string of said selected prosodic model data not being coincident with the input character string.

18. A computer-readable medium having recorded thereon a speech synthesis program, wherein said program, when read by a computer, enables the computer to operate as:
- a word dictionary for storing a large number of character strings including at least one character with its accent type, a prosody dictionary for storing typical prosodic model data among prosodic model data representing the prosodic information for the character strings stored in said word dictionary, and a waveform dictionary for storing the voice waveform data of a composition unit with the recorded voice;
  
  accent type determining means for determining the accent type of an input character string;
  
  prosodic model selecting means for selecting the prosodic model data from said prosody dictionary, based on the input character string and the accent type;
  
  prosodic transforming means for transforming the prosodic information of said prosodic model data in accordance with the input character string in response to the character string of said selected prosodic model data not being coincident with the input character string;
  
  waveform selecting means for selecting the waveform data corresponding to each character of the input character string from said waveform dictionary, based on the prosodic model data, and for selecting the waveform data of pertinent phoneme in the prosodic model data from said waveform dictionary, the pertinent phoneme having the position and phoneme coincident with those of the prosodic model data for every phoneme making up an input character string, and selecting the waveform data of phoneme having the frequency closest to that of the prosodic model data from said waveform dictionary for other phonemes; and
  
  waveform connecting means for connecting said selected waveform data with each other.
- View Dependent Claims (19)
- - 19. The computer-readable medium according to claim 18, wherein said speech synthesis program further enables the computer to operate as prosody transforming means for obtaining the syllable length after transformation is obtained from the average syllable length calculated ahead for all the characters for use in the voice synthesis and the syllable length in said prosodic model data for each character not coincident among the prosodic model data in response to the character string of said selected prosodic model data not being coincident with the input character string.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Konami Co., Ltd., Konami Computer Entertainment Tokyo Co., Ltd.
Original Assignee
Konami Computer Entertainment Tokyo, Inc. (Konami Holdings Corp.), Konami Corporation (Konami Holdings Corp.)
Inventors
Mizoguchi, Toshiyuki, Kasai, Osamu
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Lerner, Martin

Application Number

US09/621,545
Time in Patent Office

1,488 Days
Field of Search

704/258, 704/260, 704/266, 704/267, 704/268, 704/269
US Class Current

704/266
CPC Class Codes

A63F 2300/6063 for sound processing

G10L 13/10 Prosody rules derived from ...

Speech synthesis with prosodic model data and accent type

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis with prosodic model data and accent type

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links