System and method for compressing concatenative acoustic inventories for speech synthesis
First Claim
1. A method for compressing concatenative acoustic inventories for speech synthesis, comprising:
- creating an acoustic inventory comprising a plurality of natural speech intervals;
determining a set of peak components for each basis vector in the plurality of natural speech intervals;
determining start and end vectors for the plurality of natural speech intervals;
defining a mapping between a first peak index set associated with the start vector and a second peak index set associated with the end vector such that respective peak points in the first peak index set and the second peak index set are associated with each other;
creating an extended mapping based on the mapping between the first peak index set and the second peak index set;
performing a comparison between a complete morph mapping and a peak morph mapping to determine whether an index is located within the first peak index set;
creating a sequence of approximation vectors based on the complete morph mapping;
determining a time warp function and a corresponding vector in a sequence vector which is proximal to the sequence of approximation vectors;
parameterizing the time warp function by way of a first straight line and a second straight line to approximate a curve which extends through a predetermined spaced; and
storing the parameters index function and names of the acoustic units.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method is used to compress concatenative acoustic inventories for speech. Instead of using general purpose signal compression methods such as vector quantization, the method of the invention uses multiple properties of acoustic inventories to reduce the size of the acoustic inventories, such as the close acoustic match property and acoustic units that are labeled with sufficiently fine distinctions such that between any two phones no events occur that are substantially distinct from these two phones. The close acoustic match property is where acoustic units that share the same phone are acoustically similar at the points where these units may be concatenated. By utilizing multiple properties of acoustic units, the number of parameters per unit that are stored as LPC parameters are minimized. As a result, smaller storage devices may be used due to the reduction of the size of the storage requirements.
-
Citations
18 Claims
-
1. A method for compressing concatenative acoustic inventories for speech synthesis, comprising:
-
creating an acoustic inventory comprising a plurality of natural speech intervals;
determining a set of peak components for each basis vector in the plurality of natural speech intervals;
determining start and end vectors for the plurality of natural speech intervals;
defining a mapping between a first peak index set associated with the start vector and a second peak index set associated with the end vector such that respective peak points in the first peak index set and the second peak index set are associated with each other;
creating an extended mapping based on the mapping between the first peak index set and the second peak index set;
performing a comparison between a complete morph mapping and a peak morph mapping to determine whether an index is located within the first peak index set;
creating a sequence of approximation vectors based on the complete morph mapping;
determining a time warp function and a corresponding vector in a sequence vector which is proximal to the sequence of approximation vectors;
parameterizing the time warp function by way of a first straight line and a second straight line to approximate a curve which extends through a predetermined spaced; and
storing the parameters index function and names of the acoustic units. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 15, 16, 17)
-
-
12. A system for compressing concatenative acoustic inventories for speech synthesis, comprising:
-
an acoustic element retrieval processor, said processor creating an acoustic inventory comprising a plurality of natural speech intervals received from an acoustic element database;
an element processing and concatenation processor;
said element processor performing the steps of;
determining a set of peak components for each basis vector in the plurality of natural speech intervals;
determining start and end vectors for each basis vector in the natural speech intervals;
defining a mapping between a first peak index set associated with the start vector and a second peak index set associated with the end vector such that respective peak points in the first peak index set and the second peak index set are associated with each other;
creating an extended mapping based on the mapping between the first peak index set and the second peak index set;
performing a comparison between a complete morph mapping and a peak morph mapping to determine whether an index is located within the first peak index set;
creating a sequence of approximation vectors based on the complete morph mapping;
determining a time warp function and a corresponding vector in a sequence vector which is proximal to the sequence of approximation vectors; and
parameterizing the time warp function by way of a first straight line and a second straight line to approximate a curve which extends through a predetermined spaced; and
an acoustic storage device for storing the parameters index function and names of the acoustic units.
-
-
13. A method for compressing concatenative acoustic inventories for speech synthesis, comprising:
-
determining a set of phonemes;
determining for each phoneme a set of at least one phones, said set of at least one phones comprising at least one of phonemes which may occur as neighbors of said phoneme in a speech synthesis output and contextual descriptors;
determining an inventory specification comprising a plurality of specifications of a phone sequence which is required by a synthesis input domain;
obtaining a set of human speech recordings containing speech intervals which correspond to sequences of phones which include all phone sequences in the inventory specification;
obtaining a parametric representation of the speech intervals which are obtained such that each speech interval is represented as a trajectory through an acoustic parameter space;
for each phone, obtaining at least one basis vector in the acoustic parameter space from stored trajectories such that one of an initial and final vector of a trajectory of each speech interval is approximated by a corresponding basis vector;
said speech interval having corresponding phone sequences that include a phone in one of an initial and final position;
approximating each stored trajectory by a time varying mathematical combination of basis vectors for a phone which is associated with a stored trajectory to generate approximate trajectories; and
constraining the approximate trajectories such that all approximate trajectories that correspond to acoustic units which start or terminate with a given phone posses substantially identical initial or final frames. - View Dependent Claims (14, 18)
-
Specification