METHODS AND APPARATUS FOR RAPID ACOUSTIC UNIT SELECTION FROM A LARGE SPEECH CORPUS
First Claim
1. A method comprising:
- generating a concatenation cost database by synthesizing, via a processor, a body of speech and identifying acoustic unit sequential pairs in the body of speech and respective concatenation costs;
determining whether an acoustic unit sequential pair to be used for synthesizing speech has a concatenation cost in the concatenation cost database; and
if the concatenation cost database does not contain the concatenation cost for the acoustic unit sequential pair, calculating an actual concatenation cost for the acoustic unit sequential pair.
10 Assignments
0 Petitions
Accused Products
Abstract
A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs or acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and aching the concatenation costs. The number of possible sequential pairs of acoustic units makes such caching prohibitive. Statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs or acoustic units occur in practice. The system synthesizes a large body of speech, identifies the acoustic unit sequential pairs generated and their respective concatenation costs, and stores those concatenation costs likely to occur.
8 Citations
20 Claims
-
1. A method comprising:
-
generating a concatenation cost database by synthesizing, via a processor, a body of speech and identifying acoustic unit sequential pairs in the body of speech and respective concatenation costs; determining whether an acoustic unit sequential pair to be used for synthesizing speech has a concatenation cost in the concatenation cost database; and if the concatenation cost database does not contain the concatenation cost for the acoustic unit sequential pair, calculating an actual concatenation cost for the acoustic unit sequential pair. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20)
-
-
9. A system comprising:
-
a processor; and a computer readable storage medium storing instructions for controlling the processor to perform steps comprising; generating a concatenation cost database by synthesizing, via a processor, a body of speech and identifying acoustic unit sequential pairs in the body of speech and respective concatenation costs; determining whether an acoustic unit sequential pair to be used for synthesizing speech has a concatenation cost in the concatenation cost database; and if the concatenation cost database does not contain the concatenation cost for the acoustic unit sequential pair, calculating an actual concatenation cost for the acoustic unit sequential pair.
-
-
16. A non-transitory computer-readable storage media storing instructions which, when executed by a computing device, cause the computing device to perform steps comprising:
-
generating a concatenation cost database by synthesizing, via a processor, a body of speech and identifying acoustic unit sequential pairs in the body of speech and respective concatenation costs; determining whether an acoustic unit sequential pair to be used for synthesizing speech has a concatenation cost in the concatenation cost database; and if the concatenation cost database does not contain the concatenation cost for the acoustic unit sequential pair, calculating an actual concatenation cost for the acoustic unit sequential pair.
-
Specification