TEXT-TO-SPEECH DEVICE, TEXT-TO-SPEECH METHOD, AND COMPUTER PROGRAM PRODUCT

US 20160300564A1
Filed: 06/17/2016
Published: 10/13/2016
Est. Priority Date: 12/20/2013
Status: Active Grant

First Claim

Patent Images

1. A text-to-speech device comprising:

a context acquirer configured to acquire a context sequence that is an information sequence affecting fluctuations in voice;

an acoustic model parameter acquirer configured to acquire an acoustic model parameter sequence corresponding to the context sequence, the acoustic model parameter sequence representing a standard speaking style of a target speaker;

a conversion parameter acquirer configured to acquire a conversion parameter sequence corresponding to the context sequence, the conversion parameter sequence being used in converting an acoustic model parameter in the standard speaking style into one in a speaking style different from the standard speaking style;

a converter configured to convert the acoustic model parameter sequence using the conversion parameter sequence; and

a waveform generator configured to generate a voice signal based on the acoustic model parameter sequence acquired after conversion.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

According to an embodiment, a text-to-speech device includes a context acquirer, an acoustic model parameter acquirer, a conversion parameter acquirer, a converter, and a waveform generator. The context acquirer is configured to acquire a context sequence affecting fluctuations in voice. The acoustic model parameter acquirer is configured to acquire an acoustic model parameter sequence that corresponds to the context sequence and represents an acoustic model in a standard speaking style of a target speaker. The conversion parameter acquirer is configured to acquire a conversion parameter sequence corresponding to the context sequence to convert an acoustic model parameter in the standard speaking style into one in a different speaking style. The converter is configured to convert the acoustic model parameter sequence using the conversion parameter sequence. The waveform generator is configured to generate a voice signal based on the acoustic model parameter sequence acquired after conversion.

17 Citations

View as Search Results

14 Claims

1. A text-to-speech device comprising:
- a context acquirer configured to acquire a context sequence that is an information sequence affecting fluctuations in voice;
  
  an acoustic model parameter acquirer configured to acquire an acoustic model parameter sequence corresponding to the context sequence, the acoustic model parameter sequence representing a standard speaking style of a target speaker;
  
  a conversion parameter acquirer configured to acquire a conversion parameter sequence corresponding to the context sequence, the conversion parameter sequence being used in converting an acoustic model parameter in the standard speaking style into one in a speaking style different from the standard speaking style;
  
  a converter configured to convert the acoustic model parameter sequence using the conversion parameter sequence; and
  
  a waveform generator configured to generate a voice signal based on the acoustic model parameter sequence acquired after conversion.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The device according to claim 1, wherein the context sequence includes at least a phoneme sequence.
  - 3. The device according to claim 1, further comprising:
    - an acoustic model parameter storage configured to store a plurality of acoustic model parameters classified according to contexts and store first classification information used in determining one of the acoustic model parameters corresponding to a given context; and
      
      a conversion parameter storage configured to store a plurality of conversion parameters classified according to contexts and store second classification information used in determining one of the conversion parameters corresponding to a given context, whereinthe acoustic model parameter acquirer determines, based on the first classification information stored in the acoustic model parameter storage, the acoustic model parameter sequence corresponding to the context sequence acquired by the context acquirer, andthe conversion parameter acquirer determines, based on the second classification information stored in the conversion parameter storage, the conversion parameter sequence corresponding to the context sequence acquired by the context acquirer.
  - 4. The device according to claim 3, wherein the conversion parameter is created using voice samples uttered by a certain speaker in a standard speaking style and voice samples uttered by the same speaker in a different speaking style from the standard speaking style.
  - 5. The device according to claim 3, whereinthe acoustic model parameter is created using voice samples uttered by the target speaker, andthe conversion parameter is created using voice samples uttered by a speaker different from the target speaker.
  - 6. The device according to claim 3, whereinthe acoustic model parameter is created using voice samples uttered by the target speaker in a speaking style expressing neutral feeling, andthe conversion parameter represents information used in conversing an acoustic model parameter of the speaking style expressing neutral feeling into one expressing a feeling other than neutral.
  - 7. The device according to claim 1, whereinthe acoustic model is a probabilistic model in which output probabilities of respective phonetic parameters that represent characteristics of a voice are expressed using Gaussian distribution,the acoustic model parameter includes a mean vector representing a mean of an output probability distribution of each phonetic parameter,the conversion parameter represents a vector having the same dimensionality as the mean vector included in the acoustic model parameter, andthe converter adds a conversion parameter included in the conversion parameter sequence to a mean vector included in the acoustic model parameter sequence to generate a post-conversion acoustic model parameter sequence.
  - 8. The device according to claim 1, further comprising:
    - a plurality of conversion parameter storages configured to store conversion parameters corresponding to mutually different speaking styles; and
      
      a speaking style selector configured to select one of the plurality of conversion parameter storages, wherein the conversion parameter acquirer acquires the conversion parameter sequence from the conversion parameter storage selected by the speaking style selector.
  - 9. The device according to claim 1, further comprising:
    - a plurality of conversion parameter storages configured to store conversion parameters corresponding to mutually different speaking styles; and
      
      a speaking style selector configured to select two or more of the plurality of conversion parameter storages, whereinthe conversion parameter acquirer acquires the conversion parameter sequence from each of the two or more conversion parameter storages selected by the speaking style selector, andthe converter converts the acoustic model parameter sequence using the two or more conversion parameter sequences.
  - 10. The device according to claim 9, further comprising a degree controller configured to control ratios at which the respective conversion parameters acquired from two or more of the conversion parameter storages selected by the speaking style selector are to be reflected in the acoustic model parameters.
  - 11. The device according to claim 1, further comprising:
    - a plurality of acoustic model parameter storages configured to store the acoustic model parameters corresponding to mutually different speakers; and
      
      a speaker selector configured to select one of the plurality of acoustic model parameter storages, whereinthe acoustic model parameter acquirer acquires the acoustic model parameter sequence from the acoustic model parameter storage selected by the speaker selector.
  - 12. The device according to claim 11, further comprising a speaker adapter configured to convert the acoustic model parameter stored in one of the acoustic model parameter storages into the acoustic model parameter corresponding to a specific speaker using speaker adaptation, and write the acoustic model parameter acquired by conversion in the acoustic model parameter storage corresponding to the specific speaker.

13. A text-to-speech method comprising:
- acquiring a context sequence that is an information sequence affecting fluctuations in voice;
  
  acquiring an acoustic model parameter sequence corresponding to the context sequence, the acoustic model parameter sequence representing an acoustic model in a standard speaking style of a target speaker;
  
  acquiring a conversion parameter sequence corresponding to the context sequence, the conversion parameter sequence being used in converting an acoustic model parameter in the standard speaking style into one in a speaking style different from the standard speaking style;
  
  converting the acoustic model parameter sequence using the conversion parameter sequence; and
  
  generating a voice signal based on the acoustic model parameter sequence acquired after conversion.

14. A computer program product comprising a computer-readable medium containing a program executed by a computer, the program causing the computer to execute:
- acquiring a context sequence that is an information sequence affecting fluctuations in voice;
  
  acquiring an acoustic model parameter sequence corresponding to the context sequence, the acoustic model parameter sequence representing an acoustic model in a standard speaking style of a target speaker;
  
  acquiring a conversion parameter sequence corresponding to the context sequence, the conversion parameter sequence being used in converting an acoustic model parameter in the standard speaking style into one in a speaking style different from the standard speaking style;
  
  converting the acoustic model parameter sequence using the conversion parameter sequence; and
  
  generating a voice signal based on the acoustic model parameter sequence acquired after conversion.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation), Toshiba Digital Solutions Corporation (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
NASU, Yu, Tamura, Masatsune, Morinaka, Ryo, Morita, Masahiro

Granted Patent

US 9,830,904 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/06   Elementary speech units use...

G10L 13/10   Prosody rules derived from ...

TEXT-TO-SPEECH DEVICE, TEXT-TO-SPEECH METHOD, AND COMPUTER PROGRAM PRODUCT

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

17 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

TEXT-TO-SPEECH DEVICE, TEXT-TO-SPEECH METHOD, AND COMPUTER PROGRAM PRODUCT

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

17 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links