Voice synthesizer, voice synthesizing method, and computer program

US 20070112570A1
Filed: 11/09/2006
Published: 05/17/2007
Est. Priority Date: 11/17/2005
Status: Active Grant

First Claim

Patent Images

1. A voice synthesizer that uses recorded voices that are pre-recorded to generate a synthesized voice that reads out a text, comprising:

a recorded voice storage portion that stores the recorded voices that are pre-recorded;

a voice input portion that is input with a reading voice that is a natural voice reading out a text that is to be generated by the synthesized voice;

an attribute information input portion that is input with a label string and label information, the label string being a string of labels that are respectively assigned to each phoneme included in the reading voice and that are placed in a time series, and the label information indicating the border position of each phoneme corresponding to each label;

a parameter extraction portion that extracts a characteristic parameter that indicates a characteristic of the reading voice based on the label string, the label information, and the reading voice; and

a voice synthesis portion that selects at least one of the recorded voices from the recorded voice storage portion in accordance with the characteristic parameter, synthesizes the selected at least one recorded voice, and generates the synthesized voice that reads out the text.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice synthesizer includes a recorded voice storage portion (124) that stores recorded voices that are pre-recorded; a voice input portion (110) that is input with a reading voice reading out a text that is to be generated by the synthesized voice; an attribute information input portion (112) that is input with a label string, which is a string of labels assigned to each phoneme included in the reading voice, and label information, which indicates the border position of each phoneme corresponding to each label; a parameter extraction portion (116) that extracts characteristic parameters of the reading voice based on the label string, the label information, and the reading voice; and a voice synthesis portion (122) that selects the recorded voices from the recorded voice storage portion in accordance with the characteristic parameters, synthesizes the recorded voices, and generates the synthesized voice that reads out the text.

42 Citations

View as Search Results

10 Claims

1. A voice synthesizer that uses recorded voices that are pre-recorded to generate a synthesized voice that reads out a text, comprising:
- a recorded voice storage portion that stores the recorded voices that are pre-recorded;
  
  a voice input portion that is input with a reading voice that is a natural voice reading out a text that is to be generated by the synthesized voice;
  
  an attribute information input portion that is input with a label string and label information, the label string being a string of labels that are respectively assigned to each phoneme included in the reading voice and that are placed in a time series, and the label information indicating the border position of each phoneme corresponding to each label;
  
  a parameter extraction portion that extracts a characteristic parameter that indicates a characteristic of the reading voice based on the label string, the label information, and the reading voice; and
  
  a voice synthesis portion that selects at least one of the recorded voices from the recorded voice storage portion in accordance with the characteristic parameter, synthesizes the selected at least one recorded voice, and generates the synthesized voice that reads out the text.
- View Dependent Claims (2, 3, 4)
- - 2. The voice synthesizer according to claim 1, wherein the characteristic parameter extracted by the parameter extraction portion includes an acoustic parameter that indicates an acoustic characteristic of the reading voice, and a prosody parameter that indicates a prosody characteristic of the reading voice.
  - 3. The voice synthesizer according to claim 1, wherein the characteristic parameter extracted by the parameter extraction portion includes a prosody parameter that indicates a prosody characteristic of the reading voice, and the voice synthesizer further comprises:
    - a phoneme model storage portion that stores an acoustic model and a prosody model that are generated in advance based on the recorded voices stored in the recorded voice storage portion, the acoustic model modeling an acoustic characteristic of each phoneme included in the recorded voices, and the prosody model modeling a prosody characteristic of each phoneme included in the recorded voices;
      
      a text input portion that is input with a text that is to be generated by the synthesized voice;
      
      a text analysis portion that analyses the text and obtains language prosody information; and
      
      a characteristic estimation portion that estimates an acoustic characteristic of the natural voice reading out the text based on the label string, the label information, the prosody parameter, the language prosody information, and the acoustic model and the prosody model stored in the phoneme model storage portion, and derives an acoustic parameter that indicates the acoustic characteristic.
  - 4. The voice synthesizer according to claim 1, further comprising:
    - an individual label acoustic model storage portion that stores respective individual label acoustic models for each label that model the acoustic characteristic of each phoneme corresponding to each label; and
      
      a label information derivation portion that derives the label information based on the reading voice, the label string, and the individual label acoustic models.

5. A computer program wherein a computer is directed to function as a voice synthesizer that uses recorded voices that are pre-recorded to generate a synthesized voice that reads out a text, comprising:
- a voice input process in which a reading voice is input that is a natural voice reading out a text that is to be generated by the synthesized voice;
  
  an attribute information input process in which a label string and label information are input, the label string being a string of labels that are respectively assigned to each phoneme included in the reading voice and that are placed in a time series, and the label information indicating the border position of each phoneme corresponding to each label;
  
  a parameter extraction process that extracts a characteristic parameter that indicates a characteristic of the reading voice based on the label string, the label information, and the reading voice;
  
  a selection process that selects at least one of the recorded voices in accordance with the characteristic parameter from a recorded voice storage portion that stores the recorded voices that are pre-recorded; and
  
  a voice synthesis process that synthesizes the at least one recorded voice selected by the selection process, and generates the synthesized voice that reads out the text.

6. A voice synthesizing method that uses recorded voices that are pre-recorded to generate a synthesized voice that reads out a text, comprising the steps of:
- inputting a reading voice that is a natural voice reading out a text that is to be generated by the synthesized voice;
  
  inputting attribute information that includes a label string and label information, the label string being a string of labels that are respectively assigned to each phoneme included in the reading voice and that are placed in a time series, and the label information indicating the border position of each phoneme corresponding to each label;
  
  extracting a characteristic parameter that indicates a characteristic of the reading voice based on the label string, the label information, and the reading voice;
  
  selecting at least one of the recorded voices in accordance with the characteristic parameter from a recorded voice storage portion that stores the recorded voices that are pre-recorded; and
  
  generating the synthesized voice that reads out the text by synthesizing the at least one recorded voice selected in the selection step.

7. A voice synthesizer that uses recorded voices that are pre-recorded to generate a synthesized voice that reads out a text, comprising:
- a recorded voice storage portion that stores the recorded voices that are pre-recorded;
  
  a phoneme model storage portion that stores an acoustic model and a prosody model that are generated in advance based on the recorded voices stored in the recorded voice storage portion, the acoustic model modeling an acoustic characteristic of each phoneme included in the recorded voices, and the prosody model modeling a prosody characteristic of each phoneme included in the recorded voices;
  
  a text input portion that is input with a text that is to be generated by the synthesized voice;
  
  an attribute information input portion that is input with a label string and label information, the label string being a string of labels that are respectively assigned to each phoneme included in the reading voice and that are placed in a time series, and the label information indicating the border position of each phoneme corresponding to each label;
  
  a label information adjustment portion that sets, in accordance with a plurality of metrically and/or acoustically different states of each phoneme, the border position of each state;
  
  a text analysis portion that analyses the text and obtains language prosody information;
  
  a characteristic estimation portion that estimates a characteristic of the natural voice reading out the text based on the label string, the label information adjusted by the label information adjustment portion, the language prosody information, and the acoustic model and the prosody model stored in the phoneme model storage portion, and derives a characteristic parameter that indicates the characteristic; and
  
  a voice synthesis portion that selects at least one of the recorded voices from the recorded voice storage portion in accordance with the characteristic parameter, synthesizes the selected at least one recorded voice, and generates the synthesized voice that reads out the text.
- View Dependent Claims (8)
- - 8. The voice synthesizer according to claim 7, wherein the label information indicates a duration of each phoneme corresponding to each label, and the label information adjustment portion assigns the durations to each state in correspondence with the plurality of states.

9. A computer program wherein a computer is directed to function as a voice synthesizer that uses recorded voices that are pre-recorded to generate a synthesized voice that reads out a text, the computer program using:
- a recorded voice storage portion that stores the recorded voices that are pre-recorded; and
  
  a phoneme model storage portion that stores an acoustic model and a prosody model that are generated in advance based on the recorded voices stored in the recorded voice storage portion, the acoustic model modeling an acoustic characteristic of each phoneme included in the recorded voices, and the prosody model modeling a prosody characteristic of each phoneme included in the recorded voices, and comprising;
  
  a text input process in which a text is input that is to be generated by the synthesized voice;
  
  an attribute information input process in which a label string and label information are input, the label string being a string of labels that are respectively assigned to each phoneme included in the text and that are placed in a time series, and the label information indicating the border position of each phoneme corresponding to each label;
  
  a label information adjustment process that sets, in accordance with a plurality of metrically and/or acoustically different states of each phoneme, the border position of each state;
  
  a text analysis process that analyses the text and obtains language prosody information;
  
  a characteristic estimation process that estimates a characteristic of the natural voice reading out the text based on the label string, the label information adjusted by the label information adjustment process, the language prosody information, and the acoustic model and the prosody model stored in the phoneme model storage portion, and derives a characteristic parameter that indicates the characteristic; and
  
  a voice synthesis process that selects at least one of the recorded voices from the recorded voice storage portion in accordance with the characteristic parameter, synthesizes the at least one selected recorded voice, and generates the synthesized voice that reads out the text.

10. A voice synthesizing method that uses recorded voices that are pre-recorded to generate a synthesized voice that reads out a text, the method using:
- a recorded voice storage portion that stores the recorded voices that are pre-recorded; and
  
  a phoneme model storage portion that stores an acoustic model and a prosody model that are generated in advance based on the recorded voices stored in the recorded voice storage portion, the acoustic model modeling an acoustic characteristic of each phoneme included in the recorded voices, and the prosody model modeling a prosody characteristic of each phoneme included in the recorded voices, and comprising the steps of;
  
  inputting a text that is to be generated by the synthesized voice;
  
  inputting attribute information that includes a label string and label information, the label string being a string of labels that are respectively assigned to each phoneme included in the text and that are placed in a time series, and the label information indicating the border position of each phoneme corresponding to each label;
  
  adjusting the label information by setting, in accordance with a plurality of metrically and/or acoustically different states of each phoneme, the border position of each state;
  
  analyzing the text and obtaining language prosody information;
  
  estimating a characteristic of the natural voice reading out the text based on the label string, the label information adjusted by the label information adjustment step, the language prosody information, and the acoustic model and the prosody model stored in the phoneme model storage portion, and deriving a characteristic parameter that indicates the characteristic; and
  
  generating the synthesized voice that reads out the text by selecting at least one of the recorded voices from the recorded voice storage portion in accordance with the characteristic parameter and synthesizing the selected at least one recorded voice.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
OKI Electric Industry Company Limited
Original Assignee
OKI Electric Industry Company Limited
Inventors
Kaneyasu, Tsutomu

Granted Patent

US 7,739,113 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/06 Elementary speech units use...

G10L 13/10 Prosody rules derived from ...

Voice synthesizer, voice synthesizing method, and computer program

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

42 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Voice synthesizer, voice synthesizing method, and computer program

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

42 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links