System and method for unit selection text-to-speech using a modified Viterbi approach
First Claim
1. A method comprising:
- selecting candidate speech units for converting text to speech;
ordering the candidate speech units according to a respective fundamental frequency of each candidate speech unit in the candidate speech units relative to all other fundamental frequencies in the candidate speech units, to yield a linear list of ordered candidate speech units;
constructing a sublist of the ordered candidate speech units, wherein a respective fundamental frequency of each candidate speech unit in the sublist is within a threshold distance of a respective proximate fundamental frequency associated with at least one candidate speech unit in a next linear list of ordered candidate speech units;
concatenating a proposed speech unit in the sublist with a chosen speech unit outside of the candidate speech units, to yield a concatenated speech unit; and
synthesizing the speech using the concatenated speech unit.
8 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.
-
Citations
20 Claims
-
1. A method comprising:
-
selecting candidate speech units for converting text to speech; ordering the candidate speech units according to a respective fundamental frequency of each candidate speech unit in the candidate speech units relative to all other fundamental frequencies in the candidate speech units, to yield a linear list of ordered candidate speech units; constructing a sublist of the ordered candidate speech units, wherein a respective fundamental frequency of each candidate speech unit in the sublist is within a threshold distance of a respective proximate fundamental frequency associated with at least one candidate speech unit in a next linear list of ordered candidate speech units; concatenating a proposed speech unit in the sublist with a chosen speech unit outside of the candidate speech units, to yield a concatenated speech unit; and synthesizing the speech using the concatenated speech unit. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; selecting candidate speech units for converting text to speech; ordering the candidate speech units according to a respective fundamental frequency of each candidate speech unit in the candidate speech units relative to all other fundamental frequencies in the candidate speech units, to yield a linear list of ordered candidate speech units; constructing a sublist of the ordered candidate speech units, wherein a respective fundamental frequency of each candidate speech unit in the sublist is within a threshold distance of a respective proximate fundamental frequency associated with at least one candidate speech unit in a next linear list of ordered candidate speech units; concatenating a proposed speech unit in the sublist with a chosen speech unit outside of the candidate speech units, to yield a concatenated speech unit; and synthesizing the speech using the concatenated speech unit. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
-
selecting candidate speech units for converting text to speech; ordering the candidate speech units according to a respective fundamental frequency of each candidate speech unit in the candidate speech units relative to all other fundamental frequencies in the candidate speech units, to yield a linear list of ordered candidate speech units; constructing a sublist of the ordered candidate speech units, wherein a respective fundamental frequency of each candidate speech unit in the sublist is within a threshold distance of a respective proximate fundamental frequency associated with at least one candidate speech unit in a next linear list of ordered candidate speech units; concatenating a proposed speech unit in the sublist with a chosen speech unit outside of the candidate speech units, to yield a concatenated speech unit; and synthesizing the speech using the concatenated speech unit. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification