SPEECH SYNTHESIS DICTIONARY GENERATION APPARATUS, SPEECH SYNTHESIS DICTIONARY GENERATION METHOD AND COMPUTER PROGRAM PRODUCT

US 20150228271A1
Filed: 01/27/2015
Published: 08/13/2015
Est. Priority Date: 02/10/2014
Status: Active Grant

First Claim

Patent Images

1. A speech synthesis dictionary generation apparatus for generating a speech synthesis dictionary containing a model of an object speaker based on speech data of the object speaker, the apparatus comprising:

a speech analyzer configured to analyze the speech data and generate a speech database containing data representing characteristics of utterance by the object speaker;

a speaker adapter configured to generate the model of the object speaker by performing speaker adaptation of converting a predetermined base model to be closer to characteristics of the object speaker based on the speech database;

a target speaker level designation unit configured to accept designation of a target speaker level that is a speaker level to be targeted, the speaker level representing at least one of a speaker'"'"'s utterance skill and a speaker'"'"'s native level in a language of the speech synthesis dictionary; and

a determination unit configured to determine a value of a parameter related to fidelity of reproduction of speaker properties in the speaker adaptation, in accordance with a relationship between the designated target speaker level and an object speaker level that is the speaker level of the object speaker, whereinthe determination unit is configured to determine the value of the parameter so that the fidelity is lower when the designated target speaker level is higher than the object speaker level, compared to when the designated target speaker level is not higher than the object speaker level, andthe speaker adapter is configured to perform the speaker adaptation in accordance with the value of a parameter determined by the determination unit.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

According to an embodiment, a speech synthesis dictionary generation apparatus includes an analyzer, a speaker adapter, a level designation unit, and a determination unit. The analyzer is configured to analyze speech data and generate a speech database containing characteristics of utterance by an object speaker. The speaker adapter is configured to generate the model of the object speaker by speaker adaptation of converting a base model to be closer to characteristics of the object speaker based on the database. The level designation unit is configured to accept designation of a target speaker level representing a speaker'"'"'s utterance skill and/or a speaker'"'"'s native level in a language of the speech synthesis dictionary. The determination unit is configured to determine a parameter related to fidelity of reproduction of speaker properties in the speaker adaptation, in accordance with a relationship between the target speaker level and a speaker level of the object speaker.

15 Citations

View as Search Results

11 Claims

1. A speech synthesis dictionary generation apparatus for generating a speech synthesis dictionary containing a model of an object speaker based on speech data of the object speaker, the apparatus comprising:
- a speech analyzer configured to analyze the speech data and generate a speech database containing data representing characteristics of utterance by the object speaker;
  
  a speaker adapter configured to generate the model of the object speaker by performing speaker adaptation of converting a predetermined base model to be closer to characteristics of the object speaker based on the speech database;
  
  a target speaker level designation unit configured to accept designation of a target speaker level that is a speaker level to be targeted, the speaker level representing at least one of a speaker'"'"'s utterance skill and a speaker'"'"'s native level in a language of the speech synthesis dictionary; and
  
  a determination unit configured to determine a value of a parameter related to fidelity of reproduction of speaker properties in the speaker adaptation, in accordance with a relationship between the designated target speaker level and an object speaker level that is the speaker level of the object speaker, whereinthe determination unit is configured to determine the value of the parameter so that the fidelity is lower when the designated target speaker level is higher than the object speaker level, compared to when the designated target speaker level is not higher than the object speaker level, andthe speaker adapter is configured to perform the speaker adaptation in accordance with the value of a parameter determined by the determination unit.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The apparatus according to claim 1, further comprising an object speaker level designation unit configured to accept designation of the object speaker level,wherein the determination unit is configured to determine the value of the parameter depending on a relationship between the designated target speaker level and the designated object speaker level.
  - 3. The apparatus according to claim 1, further comprising an object speaker level estimator configured to automatically estimate the object speaker level based on at least a portion of the speech database,wherein the determination unit is configured to determine the value of the parameter depending on a relationship between the designated target speaker level and the estimated object speaker level.
  - 4. The apparatus according to claim 1, wherein the target speaker level designation unit is configured todisplay, based on the object speaker level, a relationship between the target speaker level and similarity of speaker properties assumed in the model of the object speaker to be generated, and a range in which the target speaker level is allowed to be designated, andaccept an operation of designating the target speaker level within the displayed range.
  - 5. The apparatus according to claim 1, wherein the speaker adapter uses as the base model an average voice model obtained by modeling a speaker having a high speaker level.
  - 6. The apparatus according to claim 1, wherein the parameter is a parameter that defines the number of conversion matrices used for conversion of the base model in the speaker adaptation such that as the number of conversion matrices is smaller, the fidelity becomes lower.
  - 7. The apparatus according to claim 1, whereinthe speaker adapter is configured to perform the speaker adaptation by using, as the base model, a model represented by a weighted sum of a plurality of clusters, and adjusting the weight vector to the object speaker, the model being trained by cluster adaptive training from data of a plurality of speakers each having a different speaker level, the weight vector being a set of weights of the plurality of clusters,the weight vector is calculated by interpolating an optimal weight vector for the object speaker and an optimal weight vector of one speaker having a high speaker level among the plurality of speakers, andthe parameter is an interpolation ratio to calculate the weight vector.
  - 8. The apparatus according to claim 1, whereinthe model of the object speaker includes a prosodic model and an acoustic model,the parameter includes a first parameter used in generation of the prosodic model and a second parameter used in generation of the acoustic model, andthe determination unit is configured to set a larger changing degree of the first parameter from its default value causing a higher fidelity, than a changing a degree of the second parameter from its default value, when determining the value of the parameter so that the fidelity is lower.
  - 9. The apparatus according to claim 1, further comprising a recording unit configured to record the speech data while presenting to the object speaker at least information on pronunciation of an utterance text for each utterance unit, whereinthe information on the pronunciation is not represented in a phonetic description of the target language, but in a converted phonetic description of a language usually used by the object speaker, and the information does not contain signs related to intonation such as accents and tones at least when a native level of the object speaker is lower than a predetermined level.

10. A speech synthesis dictionary generation method executed in a speech synthesis dictionary generation apparatus for generating a speech synthesis dictionary containing a model of an object speaker based on speech data of the object speaker, the method comprising:
- analyzing the speech data to generate a speech database containing data representing characteristics of utterance by the object speaker;
  
  generating the model of the object speaker by performing speaker adaptation of converting a predetermined base model to be closer to characteristics of the object speaker based on the speech database;
  
  accepting designation of a target speaker level that is a speaker level to be targeted, the speaker level representing at least one of a speaker'"'"'s utterance skill and a speaker'"'"'s native level in a language of the speech synthesis dictionary; and
  
  determining a value of a parameter related to fidelity of reproduction of speaker properties in the speaker adaptation, in accordance with a relationship between the designated target speaker level and an object speaker level that is the speaker level of the object speaker, whereinthe determining includes determining the value of the parameter so that the fidelity is lower when the designated target speaker level is higher than the object speaker level, compared to when the designated target speaker level is not higher than the object speaker level, andthe generating includes performing the speaker adaptation in accordance with the value of a parameter determined at the determining.

11. A computer program product comprising a computer-readable medium containing a program for generating a speech synthesis dictionary containing a model of an object speaker based on speech data of the object speaker, the program causing a computer to execute:
- analyzing the speech data to generate a speech database containing data representing characteristics of utterance by the object speaker;
  
  generating the model of the object speaker by performing speaker adaptation of converting a predetermined base model to be closer to characteristics of the object speaker based on the speech database;
  
  accepting designation of a target speaker level that is a speaker level to be targeted, the speaker level representing at least one of a speaker'"'"'s utterance skill and a speaker'"'"'s native level in a language of the speech synthesis dictionary; and
  
  determining a value of a parameter related to fidelity of reproduction of speaker properties in the speaker adaptation, in accordance with a relationship between the designated target speaker level and an object speaker level that is the speaker level of the object speaker, whereinthe determining includes determining the value of the parameter so that the fidelity is lower when the designated target speaker level is higher than the object speaker level, compared to when the designated target speaker level is not higher than the object speaker level, andthe generating includes performing the speaker adaptation in accordance with the value of a parameter determined at the determining.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation), Toshiba Digital Solutions Corporation (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Morita, Masahiro

Granted Patent

US 9,484,012 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 13/033 Voice editing, e.g. manipul...

SPEECH SYNTHESIS DICTIONARY GENERATION APPARATUS, SPEECH SYNTHESIS DICTIONARY GENERATION METHOD AND COMPUTER PROGRAM PRODUCT

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

15 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

SPEECH SYNTHESIS DICTIONARY GENERATION APPARATUS, SPEECH SYNTHESIS DICTIONARY GENERATION METHOD AND COMPUTER PROGRAM PRODUCT

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

15 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links