Pre-saved data compression for TTS concatenation cost
First Claim
Patent Images
1. A computing device for performing concatenative speech synthesis by a processing unit of the computing device, the computing device comprising:
- a memory;
a processor coupled to the memory, the processor executing a text to speech (TTS) application in conjunction with instructions stored in the memory, wherein the TTS application is configured to;
determine, based on a matrix of concatenation costs, feature vectors for speech segments, wherein some of the speech segments occur at asynchronous time intervals;
apply distance weighting to one of;
the speech segments and at least two consecutive speech segments, wherein the distance weighting is based on feature vectors associated with the speech segments or is based on feature vectors associated with the at least two consecutive speech segments;
cluster the speech segments into a predefined number of groups such that an average distance between speech segments within each group is minimized;
select a representative speech segment for each group; and
generate a compressed concatenation cost matrix based on the representative speech segments.
2 Assignments
0 Petitions
Accused Products
Abstract
Pre-saved concatenation cost data is compressed through speech segment grouping. Speech segments are assigned to a predefined number of groups based on their concatenation cost values with other speech segments. A representative segment is selected for each group. The concatenation cost between two segments in different groups may then be approximated by that between the representative segments of their respective groups, thereby reducing an amount of concatenation cost data to be pre-saved.
27 Citations
18 Claims
-
1. A computing device for performing concatenative speech synthesis by a processing unit of the computing device, the computing device comprising:
-
a memory; a processor coupled to the memory, the processor executing a text to speech (TTS) application in conjunction with instructions stored in the memory, wherein the TTS application is configured to; determine, based on a matrix of concatenation costs, feature vectors for speech segments, wherein some of the speech segments occur at asynchronous time intervals; apply distance weighting to one of;
the speech segments and at least two consecutive speech segments, wherein the distance weighting is based on feature vectors associated with the speech segments or is based on feature vectors associated with the at least two consecutive speech segments;cluster the speech segments into a predefined number of groups such that an average distance between speech segments within each group is minimized; select a representative speech segment for each group; and generate a compressed concatenation cost matrix based on the representative speech segments. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computing device for generating speech employing compressed concatenation cost data, the computing device comprising:
-
a memory; a processor coupled to the memory, the processor executing a text to speech (TTS) application in conjunction with instructions stored in the memory, wherein the TTS application is configured to; determine feature vectors for speech segments, wherein the feature vectors comprise concatenation cost values, and wherein the concatenation cost values are costs of concatenating the speech segments with at least two consecutive speech segments; apply distance weighting to one of;
the speech segments and the at least two consecutive speech segments, wherein the distance weighting is based on feature vectors associated with the speech segments or is based on feature vectors associated with the at least two consecutive speech segmentscluster the speech segments into a predefined number of groups such that an average distance between speech segments within each group is minimized; select a representative speech segment for each group such that an average distance between the representative speech segment and other speech segments within a similar group are minimized; generate a compressed concatenation cost matrix based on the representative speech segments; and pre-save the compressed concatenation cost matrix for real time computations in synthesizing speech. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A computer-readable memory device with instructions stored thereon for generating speech employing compressed concatenation cost data, the instructions comprising:
-
determining, based on a matrix of concatenation costs, feature vectors for speech segments, wherein the matrix of concatenation costs is constructed along a preceding speech segment and a following speech segment for each segment applying distance weighting to one of;
the speech segments and at least two consecutive speech segments, wherein the distance weighting is based on feature vectors associated with the speech segments or is based on feature vectors associated with the at least two consecutive speech segmentsclustering the speech segments into M preceding segment and N following segment groups such that an average distance between speech segments within each group is minimized; selecting a representative speech segment for each group; generating a compressed concatenation cost matrix such that a concatenation cost between the speech segments and the at least two consecutive speech segments is approximated by a concatenation cost between a representative segment associated with the speech segments and another representative speech segment associated with the at least two consecutive speech segments; and pre-saving the compressed concatenation cost matrix for real time computations in synthesizing speech. - View Dependent Claims (16, 17, 18)
-
Specification