Method of speech synthesis
First Claim
1. A method of computerized text-based speech synthesis, whereinat least one portion of a text is specified;
- the intonation of each portion is determined;
target allophones are associated with each portion;
physical parameters of the target allophones are determined, by a computing device, for each of the target allophones;
allophones most similar to the target allophones in terms of said physical parameters are found in a speech database;
speech is synthesized as a sequence of the found allophones, whereinthe physical parameters of the target allophones are determined according to the determined intonation.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention relates to a method of text-based speech synthesis, wherein at least one portion of a text is specified; the intonation of each portion is determined; target speech sounds are associated with each portion; physical parameters of the target speech sounds are determined; speech sounds most similar in terms of the physical parameters to the target speech sounds are found in a speech database; and speech is synthesized as a sequence of the found speech sounds. The physical parameters of said target speech sounds are determined in accordance with the determined intonation. The present method, when used in a speech synthesizer, allows improved quality of synthesized speech due to precise reproduction of intonation.
-
Citations
14 Claims
-
1. A method of computerized text-based speech synthesis, wherein
at least one portion of a text is specified; -
the intonation of each portion is determined; target allophones are associated with each portion; physical parameters of the target allophones are determined, by a computing device, for each of the target allophones; allophones most similar to the target allophones in terms of said physical parameters are found in a speech database; speech is synthesized as a sequence of the found allophones, wherein the physical parameters of the target allophones are determined according to the determined intonation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
and/or wherein the value of at least one of the following functions is calculated for each allophone from the speech database which can be used in synthesis, said functions characterizing the attributes of this allophone; a duration function defining the deviation in duration of corresponding allophone from the average duration of same name allophones in the database with regard to the phrasal stress; an amplitude function defining the deviation in amplitude of corresponding allophones from the average amplitude of same-name allophones in the database with regard to the phrasal stress; a fundamental pitch maximum frequency function defining the maximum frequency of the fundamental pitch of corresponding allophone; a fundamental pitch frequency jump function defining frequency jump of the fundamental pitch on corresponding allophone; and/or wherein the value of at least one of the following functions is calculated for each pair of allophones from the allophones database which can be used in synthesis of each subsequent pair of the target allophones, the functions defining the quality of connection between said allophones from the speech database; a fundamental pitch frequency connection function of corresponding pair of allophones, the function defining the relation of frequencies of the fundamental pitch at the ends of the allophones of said pair; a fundamental pitch frequency derivative connection function of corresponding pair of allophones, the function defining the relation of frequency derivatives of the fundamental pitch at the ends of the allophones of said pair; a MFCC connection function defining the relation of normalized MFCC at the ends of allophones of said pair; a continuity function defining whether the allophones of corresponding pair from a single fragment of a speech block.
-
-
11. A method according to claim 8, wherein when calculating the sum of values of functions said values are taken with different weights.
-
12. A method according to claim 8, wherein if the found most similar allophone does not conform to a certain criterion, when synthesizing speech the allophone is replaced by an allophone from the database that conforms to said criterion.
-
13. A text-based speech synthesizer comprising
a speech database containing allophones; -
a specifying module configured to specify at least one portion of a text; an intonation determining module configured to determine the intonation of each of the at least one portion; a target allophone associating module configured to associate target allophones with each of the at least one portion; a target allophone associating module configured to associate target allophones with each of the at least one portion; a physical parameter determining module configured to determine physical parameters of the target allophones for each of the target allophone; an allophone forming module configured to search for allophones most similar to the target allophones in terms of said physical parameters in the speech database and form a sequence of allophones for an output speech signal on the basis of the allophones found in the database; and speech signal generating module configured to generated the output speech signal on the basis of the formed sequence of allophones, wherein the physical parameter determining module are configured to determine said physical parameters of the target allophones on the basis of the intonation determined by the intonation determining module. - View Dependent Claims (14)
-
Specification