Methods and Apparatus for Rapid Acoustic Unit Selection From a Large Speech Corpus
First Claim
1. A method comprising:
- determining whether an acoustic unit sequential pair to be used for synthesizing speech has a concatenation cost in a concatenation cost database; and
if the concatenation cost database does not contain the concatenation cost for the acoustic unit sequential pair, then assigning a default value to the concatenation cost.
10 Assignments
0 Petitions
Accused Products
Abstract
A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and aching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs, and storing those concatenation costs likely to occur. By constructing a concatenation cost database in this fraction, the processing power required at run-time is greatly reduced with negligible effect on speech quality.
-
Citations
20 Claims
-
1. A method comprising:
-
determining whether an acoustic unit sequential pair to be used for synthesizing speech has a concatenation cost in a concatenation cost database; and if the concatenation cost database does not contain the concatenation cost for the acoustic unit sequential pair, then assigning a default value to the concatenation cost. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a processor; a first module configured to control the processor to determine whether an acoustic sequential pair to be used for synthesizing speech has a concatenation cost and a concatenation database; and a second module configured to control the processor, if the concatenation cost database does not contain the concatenation cost for the acoustic unit sequential pair, to then assign a default value to the concatenation cost. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A method comprising:
-
determining whether an acoustic unit sequential pair to be used for synthesizing speech has a concatenation cost and a concatenation cost database; and if the concatenation cost database does not contain the concatenation cost for the acoustic unit sequential pair, then deriving an actual concatenation cost for the acoustic unit sequential pair. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification