Method of speech synthesis

US 8,942,983 B2
Filed: 11/23/2011
Issued: 01/27/2015
Est. Priority Date: 08/07/2009
Status: Active Grant

First Claim

Patent Images

1. A method of computerized text-based speech synthesis, whereinat least one portion of a text is specified;

the intonation of each portion is determined;

target allophones are associated with each portion;

physical parameters of the target allophones are determined, by a computing device, for each of the target allophones;

allophones most similar to the target allophones in terms of said physical parameters are found in a speech database;

speech is synthesized as a sequence of the found allophones, whereinthe physical parameters of the target allophones are determined according to the determined intonation.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to a method of text-based speech synthesis, wherein at least one portion of a text is specified; the intonation of each portion is determined; target speech sounds are associated with each portion; physical parameters of the target speech sounds are determined; speech sounds most similar in terms of the physical parameters to the target speech sounds are found in a speech database; and speech is synthesized as a sequence of the found speech sounds. The physical parameters of said target speech sounds are determined in accordance with the determined intonation. The present method, when used in a speech synthesizer, allows improved quality of synthesized speech due to precise reproduction of intonation.

Citations

14 Claims

1. A method of computerized text-based speech synthesis, whereinat least one portion of a text is specified;
- the intonation of each portion is determined;
  
  target allophones are associated with each portion;
  
  physical parameters of the target allophones are determined, by a computing device, for each of the target allophones;
  
  allophones most similar to the target allophones in terms of said physical parameters are found in a speech database;
  
  speech is synthesized as a sequence of the found allophones, whereinthe physical parameters of the target allophones are determined according to the determined intonation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. A method according to claim 1 wherein linguistic parameters of the target allophones are further determined and when the allophones are searched for in the speech database, allophones most similar to the target allophones also in terms of said linguistic parameters are found in the speech database.
  - 3. A method according to claim 2, wherein the linguistic parameters of an speech sound allophone include at least one of the following parameters:
    - transcription, allophones preceding and following said allophone;
      
      the position of said allophone with respect to the stressed vowel.
  - 4. A method according to claim 1, wherein the at least one portion of a text is specified based on grammatical characteristics of words in the text and punctuation in the text.
  - 5. A method according to claim 1, wherein at least one preconstructed intonation model is selected according to the determined intonation, said model being defined by at least one of the following parameters:
    - inclination of the trajectory of the fundamental pitch, shaping of the fundamental pitch on stressed vowels, energy of allophones and law of duration variation of allophones, and the physical parameters of the target allophones are determined based on at least one of said parameters of corresponding model.
  - 6. A method according to claim 5, wherein shaping of the fundamental pitch on stressed vowels includes shaping on the first stressed vowel and/or middle stressed vowel and/or last stressed vowel.
  - 7. A method according to claim 5, wherein said physical parameters of allophones include at least duration of allophones, frequency of the fundamental pitch of allophones and energy of allophones.
  - 8. A method according to claim 1, wherein the most similar allophones are determined by calculating the value of at least one function defining the difference in physical and/or linguistic parameters of the target allophone and an allophone from the speech database,and/or by calculating the value of at least one function for each allophone from the speech database which can be used in synthesis, said function characterizing the attributes of this allophone,and/or by calculating the value of at least one function for each pair of allophones from the allophones database which can be used in synthesis of each subsequent pair of the target allophones, said function defining the quality of connection between said pair of allophones from the speech database,wherein said most similar allophones are determined as allophones forming a sequence to synthesize a predetermined fragment of said text, for which sequence the sum of calculated values of said functions is minimal.
  - 9. A method according to claim 8, wherein the predetermined fragment of the text is a sentence or a paragraph.
  - 10. A method according to claim 8, wherein the value of at least one of the following functions is calculated, said functions defining the difference in a physical and/or linguistic parameter of speech allophones:
    - a context function defining the degree of similarity of allophones preceding and following compared allophones;
      
      an intonation function defining the correspondence of said intonation models of compared allophones and their position with respect to the phrasal stress;
      
      a fundamental pitch frequency function defining the difference of frequency of the fundamental pitch of compared allophones;
      
      a positional function defining the difference in position within the word of compared allophones;
      
      a positional function defining the difference in position within the syllable of compared allophones;
      
      a positional function defining the difference in position within the specified portion of a text of compared allophones, the position being defined by the number of syllables from the beginning of said portion of a text;
      
      a positional function defining the difference in position within the specified portion of a text of compared allophones, the position being defined by the number of syllables to the end of said portion of a text;
      
      a positional function defining the difference in position within the specified portion of a text of compared allophones, the position being defined by the number of stressed syllables from the beginning of said portion of a text;
      
      a positional function defining the difference in position within the specified portion of a text of compared allophones, the position being defined by the number of stressed syllables to the end of said portion of a text;
      
      a pronunciation function defining the degree of the correspondence between the pronunciation of an allophone from the speech database and the ideal pronunciation of this allophone according to the language rules;
      
      an orthographical function defining the orthographic difference of the words comprising compared allophones;
      
      a stress function defining correspondence of stress type of compared allophones;
11. A method according to claim 8, wherein when calculating the sum of values of functions said values are taken with different weights.
12. A method according to claim 8, wherein if the found most similar allophone does not conform to a certain criterion, when synthesizing speech the allophone is replaced by an allophone from the database that conforms to said criterion.

13. A text-based speech synthesizer comprisinga speech database containing allophones;
- a specifying module configured to specify at least one portion of a text;
  
  an intonation determining module configured to determine the intonation of each of the at least one portion;
  
  a target allophone associating module configured to associate target allophones with each of the at least one portion;
  
  a target allophone associating module configured to associate target allophones with each of the at least one portion;
  
  a physical parameter determining module configured to determine physical parameters of the target allophones for each of the target allophone;
  
  an allophone forming module configured to search for allophones most similar to the target allophones in terms of said physical parameters in the speech database and form a sequence of allophones for an output speech signal on the basis of the allophones found in the database; and
  
  speech signal generating module configured to generated the output speech signal on the basis of the formed sequence of allophones,wherein the physical parameter determining module are configured to determine said physical parameters of the target allophones on the basis of the intonation determined by the intonation determining module.
- View Dependent Claims (14)
- - 14. The text-based speech synthesizer according to claim 13 further comprising a linguistic parameters determining module configured to determine linguistic parameters of the target allophones, wherein the allophone forming module are further configured to search for allophones in the speech database most similar to the target allophones also in terms of said linguistic parameters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Speech Technology Center LLC
Original Assignee
Speech Technology Center LLC
Inventors
Khitrov, Mikhail Vasilievich
Primary Examiner(s)
Vo, Huyen X.

Application Number

US13/303,174
Publication Number

US 20120072224A1
Time in Patent Office

1,161 Days
Field of Search

704/258, 704/260, 704/261, 704/266, 704/9, 704/257, 704/267, 704/270
US Class Current

704/260
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 13/08 Text analysis or generation...

Method of speech synthesis

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Method of speech synthesis

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links