Apparatus and method for voice conversion using attribute information
First Claim
1. A speech processing apparatus comprising:
- a speech storage configured to store a plurality of speech units of a conversion-source speaker and source-speaker attribute information corresponding to the speech units;
a speech-unit extractor configured to divide the speech of a conversion-target speaker into a predetermined type of a speech unit to form target-speaker speech units;
an attribute-information generator configured to generate target-speaker attribute information corresponding to the target-speaker speech units from the speech of the conversion-target speaker or linguistic information of the speech;
a speech-unit selector configured to calculate costs on the target-speaker attribute information and the source-speaker attribute information using cost functions, and selects one or a plurality of speech units with the same phoneme from the speech storage according to the costs to form a source-speaker speech unit; and
a voice-conversion-rule generator configured to generate speech conversion functions for converting the one or the plurality of source-speaker speech units to the target-speaker speech units based on the target-speaker speech units and the one or the plurality of source-speakerspeech units.
4 Assignments
0 Petitions
Accused Products
Abstract
A speech processing apparatus according to an embodiment of the invention includes a conversion-source-speaker speech-unit database; a voice-conversion-rule-learning-data generating means; and a voice-conversion-rule learning means, with which it makes voice conversion rules. The voice-conversion-rule-learning-data generating means includes a conversion-target-speaker speech-unit extracting means; an attribute-information generating means; a conversion-source-speaker speech-unit database; and a conversion-source-speaker speech-unit selection means. The conversion-source-speaker speech-unit selection means selects conversion-source-speaker speech units corresponding to conversion-target-speaker speech units based on the mismatch between the attribute information of the conversion-target-speaker speech units and that of the conversion-source-speaker speech units, whereby the voice conversion rules are made from the selected pair of the conversion-target-speaker speech units and the conversion-source-speaker speech units.
287 Citations
13 Claims
-
1. A speech processing apparatus comprising:
-
a speech storage configured to store a plurality of speech units of a conversion-source speaker and source-speaker attribute information corresponding to the speech units; a speech-unit extractor configured to divide the speech of a conversion-target speaker into a predetermined type of a speech unit to form target-speaker speech units; an attribute-information generator configured to generate target-speaker attribute information corresponding to the target-speaker speech units from the speech of the conversion-target speaker or linguistic information of the speech; a speech-unit selector configured to calculate costs on the target-speaker attribute information and the source-speaker attribute information using cost functions, and selects one or a plurality of speech units with the same phoneme from the speech storage according to the costs to form a source-speaker speech unit; and a voice-conversion-rule generator configured to generate speech conversion functions for converting the one or the plurality of source-speaker speech units to the target-speaker speech units based on the target-speaker speech units and the one or the plurality of source-speakerspeech units. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method of processing speech, the method comprising:
-
storing in a storing means a plurality of speech units of a conversion-source speaker and source-speaker attribute information corresponding to the speech units; dividing the speech of a conversion-target speaker into a predetermined type of a speech unit to form target-speaker speech units; generating target-speaker attribute information corresponding to the target-speaker speech units from information on the speech of the conversion-target speaker or linguistic information of the speech; calculating costs on the target-speaker attribute information and the source-speaker attribute information using cost functions, and selecting one or a plurality of speech units with the same phoneme from the storing means according to the costs to form a source-speaker speech unit; and generating voice conversion functions for converting the one or the plurality of source-speaker speech units to the target-speaker speech units based on the target-speaker speech units and the one or a plurality of source-speaker speech units.
-
-
13. A computer-readable storage medium having stored therein a program for processing speech, the program causing a computer to implement a process comprising:
-
storing a plurality of speech units of a conversion-source speaker and source-speaker attribute information corresponding to the speech units; dividing the speech of a conversion-target speaker into a predetermined type of a speech unit to form target-speaker speech units; generating target-speaker attribute information corresponding to the target-speaker speech units from information on the speech of the conversion-target speaker or linguistic information of the speech; calculating costs on the target-speaker attribute information and the source-speaker attribute information using cost functions, and selecting one or a plurality of speech units with the same phoneme from the conversion-source-speaker speech units according to the costs to form a source-speaker speech unit; and generating voice conversion functions for converting the one or a plurality of source-speaker speech units to the target-speaker speech units based on the target-speaker speech units and the one or the plurality of source-speaker speech units.
-
Specification