Methods and apparatus for rapid acoustic unit selection from a large speech corpus
First Claim
1. A method of selecting acoustic units from an acoustic unit database for synthesizing speech, a concatenation cost being a measure of the mismatch between an acoustic unit sequential pair, the method comprising:
- selecting one or more acoustic units from the acoustic unit database;
determining whether a concatenation cost of an acoustic unit sequential pair resides in a concatenation cost database;
extracting the concatenation cost of the acoustic unit sequential pair from the concatenation cost database if the concatenation cost database contains the concatenation cost of the acoustic unit sequential pair; and
determining a value to the concatenation cost of the acoustic unit sequential pair if the concatenation cost database does not contain the concatenation cost of the acoustic unit sequential pair.
10 Assignments
0 Petitions
Accused Products
Abstract
A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs, and storing those concatenation costs likely to occur. By constructing a concatenation cost database in this fashion, the processing power required at run-time is greatly reduced with negligible effect on speech quality.
-
Citations
20 Claims
-
1. A method of selecting acoustic units from an acoustic unit database for synthesizing speech, a concatenation cost being a measure of the mismatch between an acoustic unit sequential pair, the method comprising:
-
selecting one or more acoustic units from the acoustic unit database;
determining whether a concatenation cost of an acoustic unit sequential pair resides in a concatenation cost database;
extracting the concatenation cost of the acoustic unit sequential pair from the concatenation cost database if the concatenation cost database contains the concatenation cost of the acoustic unit sequential pair; and
determining a value to the concatenation cost of the acoustic unit sequential pair if the concatenation cost database does not contain the concatenation cost of the acoustic unit sequential pair. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus for selecting acoustic units, comprising:
-
an acoustic unit database containing at least two acoustic units;
a concatenation cost database containing concatenation costs of acoustic unit sequential pairs, a concatenation cost being a measure of the mismatch between an acoustic unit sequential pair, wherein the concatenation cost database comprises a selected subset of concatenation costs of all possible acoustic unit sequential pairs of the acoustic unit database; and
a selecting device that selects acoustic units using the concatenation cost database, wherein the selecting device includes a first determining portion that determines whether a concatenation cost of an acoustic unit sequential pair resides in the concatenation cost database;
an extracting portion that extracts the concatenation cost of the acoustic unit sequential pair from the concatenation cost database if the concatenation cost database contains the concatenation cost of the acoustic unit sequential pair; and
a second determining portion that determines a value to the concatenation cost of the acoustic unit sequential pair if the concatenation cost database does not contain the concatenation cost of the acoustic unit sequential pair. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method of forming a computer readable medium containing a concatenation cost database, a concatenation cost being a measure of the mismatch between an acoustic unit sequential pair, the method comprising;
-
synthesizing a body of speech using a training data set and an acoustic unit database to produce a plurality of synthesized acoustic unit sequential pairs;
calculating a concatenation cost for at least one synthesized acoustic unit sequential pair of the plurality of synthesized acoustic unit sequential pairs;
storing at least one concatenation cost of the calculated concatenation cost in the concatenation cost database; and
determining the concatenation cost for at least one synthesized acoustic unit sequential pair if the calculated concatenation cost is not found in the concatenation cost database.
-
Specification