Methods and apparatus for rapid acoustic unit selection from a large speech corpus

US 20030115049A1
Filed: 02/06/2003
Published: 06/19/2003
Est. Priority Date: 04/30/1999
Status: Active Grant

First Claim

Patent Images

1. A method of selecting acoustic units from an acoustic unit database for synthesizing speech, a concatenation cost being a measure of the mismatch between an acoustic unit sequential pair, the method comprising:

selecting one or more acoustic units from the acoustic unit database;

determining whether a concatenation cost of an acoustic unit sequential pair resides in a concatenation cost database;

extracting the concatenation cost of the acoustic unit sequential pair from the concatenation cost database if the concatenation cost database contains the concatenation cost of the acoustic unit sequential pair; and

determining a value to the concatenation cost of the acoustic unit sequential pair if the concatenation cost database does not contain the concatenation cost of the acoustic unit sequential pair.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs, and storing those concatenation costs likely to occur. By constructing a concatenation cost database in this fashion, the processing power required at run-time is greatly reduced with negligible effect on speech quality.

Citations

20 Claims

1. A method of selecting acoustic units from an acoustic unit database for synthesizing speech, a concatenation cost being a measure of the mismatch between an acoustic unit sequential pair, the method comprising:
- selecting one or more acoustic units from the acoustic unit database;
  
  determining whether a concatenation cost of an acoustic unit sequential pair resides in a concatenation cost database;
  
  extracting the concatenation cost of the acoustic unit sequential pair from the concatenation cost database if the concatenation cost database contains the concatenation cost of the acoustic unit sequential pair; and
  
  determining a value to the concatenation cost of the acoustic unit sequential pair if the concatenation cost database does not contain the concatenation cost of the acoustic unit sequential pair.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method according to claim 1, further comprising synthesizing the one or more acoustic units to produce synthetic speech.
  - 3. The method according to claim 1, wherein forming the concatenation cost database uses a training set of data.
  - 4. The method according to claim 1, wherein forming the concatenation cost database is based on of at least one concatenation cost.
  - 5. The method according to claim 1, wherein selecting at least one acoustic unit from the acoustic unit database further uses at least one target cost of an acoustic unit, the target cost being a measure of the mismatch between the acoustic unit and a phoneme.
  - 6. The method according to claim 1, wherein determining a value for the concatenation cost of the acoustic unit sequential pair includes assigning a default value.
  - 7. The method according to claim 1, wherein determining a value of the concatenation cost of the acoustic unit sequential pair includes computing the concatenation cost of the acoustic unit sequential pair.
  - 8. The method according to claim 1, wherein the default concatenation cost value is large enough to eliminate selection of an acoustic unit sequential pair under any reasonable pruning, but does not disallow the acoustic unit sequential pair selection entirely.
  - 9. The method according to claim 1, wherein selecting at least one acoustic unit from the acoustic unit database further uses a hash table.
  - 10. The method according to claim 1, further comprising:
    - forming a concatenation cost database, wherein the concatenation cost database comprises a selected subset of concatenation costs of possible acoustic unit sequential pairs of the acoustic unit database.

11. An apparatus for selecting acoustic units, comprising:
- an acoustic unit database containing at least two acoustic units;
  
  a concatenation cost database containing concatenation costs of acoustic unit sequential pairs, a concatenation cost being a measure of the mismatch between an acoustic unit sequential pair, wherein the concatenation cost database comprises a selected subset of concatenation costs of all possible acoustic unit sequential pairs of the acoustic unit database; and
  
  a selecting device that selects acoustic units using the concatenation cost database, wherein the selecting device includes a first determining portion that determines whether a concatenation cost of an acoustic unit sequential pair resides in the concatenation cost database;
  
  an extracting portion that extracts the concatenation cost of the acoustic unit sequential pair from the concatenation cost database if the concatenation cost database contains the concatenation cost of the acoustic unit sequential pair; and
  
  a second determining portion that determines a value to the concatenation cost of the acoustic unit sequential pair if the concatenation cost database does not contain the concatenation cost of the acoustic unit sequential pair.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The apparatus of claim 11, further comprising a synthesizer that synthesizes acoustic units to form synthetic speech.
  - 13. The apparatus of claim 11, wherein the concatenation cost database is formed using a training set of data.
  - 14. The apparatus of claim 11, the concatenation cost database is formed based on a value of at least one concatenation cost.
  - 15. The apparatus of claim 11, wherein the selecting device further uses a target cost of an acoustic unit, the target cost being a measure of the mismatch between the acoustic unit and a phoneme specification.
  - 16. The apparatus of claim 11, wherein the second determining portion is assignment portion that assigns a default value to the concatenation cost of the acoustic unit sequential pair.
  - 17. The apparatus of claim 16, wherein the default value is large enough to eliminate selection of an acoustic unit sequential pair under any reasonable pruning, but does not disallow the acoustic unit sequential pair selection entirely.
  - 18. The apparatus of claim 11, wherein the second determining portion is a computing portion that computes the concatenation cost of the acoustic unit sequential pair.
  - 19. The apparatus of claim 11, wherein the selecting device further uses a hash table.

20. A method of forming a computer readable medium containing a concatenation cost database, a concatenation cost being a measure of the mismatch between an acoustic unit sequential pair, the method comprising;
- synthesizing a body of speech using a training data set and an acoustic unit database to produce a plurality of synthesized acoustic unit sequential pairs;
  
  calculating a concatenation cost for at least one synthesized acoustic unit sequential pair of the plurality of synthesized acoustic unit sequential pairs;
  
  storing at least one concatenation cost of the calculated concatenation cost in the concatenation cost database; and
  
  determining the concatenation cost for at least one synthesized acoustic unit sequential pair if the calculated concatenation cost is not found in the concatenation cost database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Beutnagel, Mark Charles., Riley, Michael Dennis, Mohri, Mehryar

Granted Patent

US 6,701,295 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/220
CPC Class Codes

G10L 13/07 Concatenation rules

Methods and apparatus for rapid acoustic unit selection from a large speech corpus

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and apparatus for rapid acoustic unit selection from a large speech corpus

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links