System and method for unit selection text-to-speech using a modified Viterbi approach
First Claim
1. A method comprising:
- in a text-to-speech synthesis system that uses unit selection, imposing ordering constraints on speech units stored in the text-to-speech synthesis system, the ordering constraints indicating speech unit pairs, each respective speech unit pair of the speech units pairs having a respective first speech unit with a respective first pitch and a respective second speech unit having a respective second pitch, the speech unit pairs being suitable for concatenation based on the respective first pitch and the respective second pitch;
selecting, from the speech units and based at least in part on a difference in pitch between the respective first pitch and the respective second pitch being below a threshold value according to the ordering constraints, units for speech synthesis to yield selected speech units; and
synthesizing speech using the selected speech units.
8 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.
30 Citations
20 Claims
-
1. A method comprising:
-
in a text-to-speech synthesis system that uses unit selection, imposing ordering constraints on speech units stored in the text-to-speech synthesis system, the ordering constraints indicating speech unit pairs, each respective speech unit pair of the speech units pairs having a respective first speech unit with a respective first pitch and a respective second speech unit having a respective second pitch, the speech unit pairs being suitable for concatenation based on the respective first pitch and the respective second pitch; selecting, from the speech units and based at least in part on a difference in pitch between the respective first pitch and the respective second pitch being below a threshold value according to the ordering constraints, units for speech synthesis to yield selected speech units; and synthesizing speech using the selected speech units. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A text-to-speech system comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; imposing ordering constraints on speech units stored in the text-to-speech system, the ordering constraints indicating speech unit pairs, each respective speech unit pair of the speech units pairs having a respective first speech unit with a respective first pitch and a respective second speech unit having a respective second pitch, the speech unit pairs being suitable for concatenation based on the respective first pitch and the respective second pitch; selecting, from the speech units and based at least in part on a difference in pitch between the respective first pitch and the respective second pitch being below a threshold value according to the ordering constraints, units for speech synthesis to yield selected speech units; and synthesizing speech using the selected speech units. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable storage device having instructions stored which, when executed by a text-to-speech synthesis system, cause the text-to-speech synthesis system to perform operations comprising:
-
imposing ordering constraints on speech units, the ordering constraints indicating speech unit pairs, each respective speech unit pair of the speech units pairs having a respective first speech unit with a respective first pitch and a respective second speech unit having a respective second pitch, the speech unit pairs being suitable for concatenation based on the respective first pitch and the respective second pitch; selecting, from the speech units and based at least in part on a difference in pitch between the respective first pitch and the respective second pitch being below a threshold value according to the ordering constraints, units for speech synthesis to yield selected speech units; and synthesizing speech using the selected speech units. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification