Methods and apparatus for rapid acoustic unit selection from a large speech corpus
First Claim
1. A method comprising:
- determining, via a processor, whether an acoustic unit sequential pair to be used for synthesizing speech has a concatenation cost in a concatenation cost database;
if the concatenation cost database does not contain the concatenation cost for the acoustic unit sequential pair, then assigning a default value as the concatenation cost; and
updating the concatenation cost database by synthesizing a body of speech and identifying acoustic unit sequential pairs generated in the body of speech and respective concatenation costs.
10 Assignments
0 Petitions
Accused Products
Abstract
A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs. By constructing a concatenation cost database in this fashion, the processing power required at run-time is greatly reduced with negligible effect on speech quality.
55 Citations
17 Claims
-
1. A method comprising:
-
determining, via a processor, whether an acoustic unit sequential pair to be used for synthesizing speech has a concatenation cost in a concatenation cost database; if the concatenation cost database does not contain the concatenation cost for the acoustic unit sequential pair, then assigning a default value as the concatenation cost; and updating the concatenation cost database by synthesizing a body of speech and identifying acoustic unit sequential pairs generated in the body of speech and respective concatenation costs. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
a processor; a first module configured to control the processor to determine whether an acoustic sequential pair to be used for synthesizing speech has a concatenation cost and a concatenation database; a second module configured to control the processor, if the concatenation cost database does not contain the concatenation cost for the acoustic unit sequential pair, to assign a default value as the concatenation cost; and a third module configured to control the processor to update the concatenation cost database by synthesizing a body of speech and identifying acoustic unit sequential pairs generated in the body of speech and respective concatenation costs. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A method comprising:
-
determining, via a processor, whether an acoustic unit sequential pair to be used for synthesizing speech has a concatenation cost and a concatenation cost database; if the concatenation cost database does not contain the concatenation cost for the acoustic unit sequential pair, then deriving an actual concatenation cost for the acoustic unit sequential pair; and updating the concatenation cost database by synthesizing a body of speech and identifying acoustic unit sequential pairs generated in the body of speech and respective concatenation costs. - View Dependent Claims (14, 15, 16, 17)
-
Specification